Re: [fossil-users] Fossil scalability
On Sat, 22 Dec, Stefan Bellon wrote: In order to verify whether the problem really is the timestamp, I'm now trying to convert just one of the four repositories using the method of sharing the workspace. Let's see what happens after 5000 or 9000 revisions. This method was quick enough. I was able to convert all 32000 revisions in a day (or so). Much quicker than the other method. But perhaps it's not that I'm touching the files, but that I'm committing from an empty workspace as I open the checkout each time with fossil open --keep and then doing fossil addremove. It looks that the creation of the .fslckout file takes the time when starting with a clear checkout for each revision. At present I'm using an approach of rsync'ing the four working copies into one fossil workspace per revision, cleaning the workspace completely apart from the .fslckout file which I retain over revisions. This looks promising regarding long-time performance. Greetings, Stefan -- Stefan Bellon ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil scalability
On Fri, 21 Dec 2012 15:02:50 +0100 Stefan Bellon sbel...@sbellon.de wrote: On Fri, 21 Dec, Joerg Sonnenberger wrote: On Fri, Dec 21, 2012 at 12:30:25PM +0100, Stefan Bellon wrote: In total, the Subversion repositories hold over 45000 revisions. The first 5000 revisions were converted in a quite acceptable time. But then things started to slow down. At the moment (at revision 8150) one Fossil commit takes around half a minute. There is one issue with commits on large checkouts unrelated to repo-cksum and mtime-changes. The SQLite database in the checkout is essentially rewritten after each commit, which can be very slow with many files. I have set mtime-changes on and repo-cksum off, like Lluís suggested, but only noticed a very minor speed improvement. But now - with those settings still in place - I decided to take a closer look at what happens at commit time. Especially I paid attention to the point when the journal file appears, the point when the New_Version hash is output and the point when the journal file disappears/fossil terminates. I made the following observation: In the vast majority of commits, it takes around 15 to 18 seconds from fossil commit till the journal file appears. Then it takes between 4 and 10 seconds till the New_Version message is output and another 11 to 16 seconds till the journal file disappears and fossil terminates. However, there are a very few commits that work differently. They take the same time till the journal file appears, but then only a very few seconds till New_Version is displayed and again a very short time till fossil terminates. So I would assume that the time till the journal file appears is used for checksum/mtime calculation and file system performance. But the phases during journal lifetime seem to be the database transaction time. Two observations: 1) It looks like fossil always takes its 15 to 20 seconds (for this specific project at this specific state on this specific machine!) till the commit actually begins. Perhaps this can be improved (as Subversion is faster regarding this respect in the same scenario), but perhaps not because checking for modifications has to work differently, I don't know. 2) The database transaction time can vary wildly. Most of the cases it takes 15 to 25 seconds, in some cases however under 5 seconds. Looking at the specific commit data I was unable to detect any suspicious difference between the slow and the quick commits: It happens for just file modifications and for additions the same. Perhaps you can use a ram based disk for the temporary directory, what does iostat tells you? Another point, plus another perhaps, to keep in mind is that database configuration is biased towards little size or low work repositories. A big or a busy or both repository may stress the db with current configuration. Again, iostat and similar tools is your/our friend. There are some pragmas that can be send to sqlite, bigger cache_size, journal_mode=memory, automatic_index=off, temp_store=2, bigger wal-autochekpoint (fossil default is 1, sqlite default is 1000), etc. Afaik, only foreign_keys=off is used. Greetings, Stefan ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
[fossil-users] Fossil scalability
Hi! Previously I haven't used Fossil for very large repositories. But I like its concept and I am thinking about migrating our 15 years of history in four parallel Subversion repositories into one Fossil repository. I wrote a script to replay the commits from Subversion (at the moment just trunk) into Fossil and I am wondering whether I will end up with some usable state or not. In total, the Subversion repositories hold over 45000 revisions. The first 5000 revisions were converted in a quite acceptable time. But then things started to slow down. At the moment (at revision 8150) one Fossil commit takes around half a minute. I'm not sure whether the time is spend in the file system trying to find out which files (of the over 1 ones) to commit or in the real database operations of Fossil. However, I cannot imagine that it's the file system, because the same workspace is used daily to do Subversion commits which work instantaneously. So, is this some database inefficiency that can be solved via some regular optimize command? Or is this just the point where Fossil is not that scalable like other VCS? Just for reference, here's what Fossil's stats page of the current repository state (the conversion is still running) is saying: Repository Size:151875584 bytes (151.9MB) Number Of Artifacts:41671 (stored as 10572 full text and 31099 delta blobs) Uncompressed Artifact Size: 69607 bytes average, 31742157 bytes max, 2900554618 bytes (2.9GB) total Compression Ratio: 19:1 Number Of Check-ins:8150 Number Of Files:11667 Number Of Wiki Pages: 0 Number Of Tickets: 0 Duration Of Project:5649 days or approximately 15.86 years. Project ID: ad001d59eb3892b9dfad405a2bd5a752a04ef448 Server ID: 61c45d1915244bd9ff087951c9f5c6d59819d350 Fossil Version: 1.24 2012-10-22 12:48:04 [8d758d3715] (gcc-4.3.3 20081214 for GNAT Pro 6.2.2 (20090612)) SQLite Version: 2012-10-09 01:39:25 [01dc032b5b] (3.7.15) Database Stats: 148316 pages, 1024 bytes/page, 251 free pages, UTF-8, delete mode I'm looking forward to hearing your experience with projects of that size. Greetings, Stefan -- Stefan Bellon ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil scalability
On Fri, Dec 21, 2012 at 12:30:25PM +0100, Stefan Bellon wrote: In total, the Subversion repositories hold over 45000 revisions. The first 5000 revisions were converted in a quite acceptable time. But then things started to slow down. At the moment (at revision 8150) one Fossil commit takes around half a minute. There is one issue with commits on large checkouts unrelated to repo-cksum and mtime-changes. The SQLite database in the checkout is essentially rewritten after each commit, which can be very slow with many files. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil scalability
On Fri, 21 Dec, Joerg Sonnenberger wrote: On Fri, Dec 21, 2012 at 12:30:25PM +0100, Stefan Bellon wrote: In total, the Subversion repositories hold over 45000 revisions. The first 5000 revisions were converted in a quite acceptable time. But then things started to slow down. At the moment (at revision 8150) one Fossil commit takes around half a minute. There is one issue with commits on large checkouts unrelated to repo-cksum and mtime-changes. The SQLite database in the checkout is essentially rewritten after each commit, which can be very slow with many files. I have set mtime-changes on and repo-cksum off, like Lluís suggested, but only noticed a very minor speed improvement. But now - with those settings still in place - I decided to take a closer look at what happens at commit time. Especially I paid attention to the point when the journal file appears, the point when the New_Version hash is output and the point when the journal file disappears/fossil terminates. I made the following observation: In the vast majority of commits, it takes around 15 to 18 seconds from fossil commit till the journal file appears. Then it takes between 4 and 10 seconds till the New_Version message is output and another 11 to 16 seconds till the journal file disappears and fossil terminates. However, there are a very few commits that work differently. They take the same time till the journal file appears, but then only a very few seconds till New_Version is displayed and again a very short time till fossil terminates. So I would assume that the time till the journal file appears is used for checksum/mtime calculation and file system performance. But the phases during journal lifetime seem to be the database transaction time. Two observations: 1) It looks like fossil always takes its 15 to 20 seconds (for this specific project at this specific state on this specific machine!) till the commit actually begins. Perhaps this can be improved (as Subversion is faster regarding this respect in the same scenario), but perhaps not because checking for modifications has to work differently, I don't know. 2) The database transaction time can vary wildly. Most of the cases it takes 15 to 25 seconds, in some cases however under 5 seconds. Looking at the specific commit data I was unable to detect any suspicious difference between the slow and the quick commits: It happens for just file modifications and for additions the same. Greetings, Stefan -- Stefan Bellon ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil scalability
On Fri, Dec 21, 2012 at 8:25 AM, Joerg Sonnenberger jo...@britannica.bec.de wrote: On Fri, Dec 21, 2012 at 12:30:25PM +0100, Stefan Bellon wrote: In total, the Subversion repositories hold over 45000 revisions. The first 5000 revisions were converted in a quite acceptable time. But then things started to slow down. At the moment (at revision 8150) one Fossil commit takes around half a minute. There is one issue with commits on large checkouts unrelated to repo-cksum and mtime-changes. The SQLite database in the checkout is essentially rewritten after each commit, which can be very slow with many files. Joerg means the _FOSSIL_ or .fslckout database that records the current state of the check-out. That database contains a record for each file in the checkout, and that record gets updated for every file on a commit. So if you have 10 files, that means 10 UPDATEs. But those should all happen within a single transaction, are very small records, and so the total update time should be under one second. I take it you are seeing something different? Can you give me additional information about what you are seeing take place on the _FOSSIL_ database so that I can try to track down the problem? Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil scalability
On Fri, Dec 21, 2012 at 09:33:26AM -0500, Richard Hipp wrote: On Fri, Dec 21, 2012 at 8:25 AM, Joerg Sonnenberger jo...@britannica.bec.de wrote: On Fri, Dec 21, 2012 at 12:30:25PM +0100, Stefan Bellon wrote: In total, the Subversion repositories hold over 45000 revisions. The first 5000 revisions were converted in a quite acceptable time. But then things started to slow down. At the moment (at revision 8150) one Fossil commit takes around half a minute. There is one issue with commits on large checkouts unrelated to repo-cksum and mtime-changes. The SQLite database in the checkout is essentially rewritten after each commit, which can be very slow with many files. Joerg means the _FOSSIL_ or .fslckout database that records the current state of the check-out. That database contains a record for each file in the checkout, and that record gets updated for every file on a commit. Correct. So if you have 10 files, that means 10 UPDATEs. But those should all happen within a single transaction, are very small records, and so the total update time should be under one second. I take it you are seeing something different? Can you give me additional information about what you are seeing take place on the _FOSSIL_ database so that I can try to track down the problem? Last time I looked at the details, it wasn't using UPDATEs, but removed all entries at the beginning and inserted them again. That creates a huge number of disk seeks (even within the transaction) as it ended up rewriting something like a 40MB table. It would definitely help a lot if only the *changed* entries where UPDATEd. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil scalability
Some time ago I did experiments with large numbers of commits and large amounts of data and I thought the fossil performance was quite acceptable. I did see things slow down but I don't recall it being as dramatic as what you are describing. How does your replay script work? Are you overlapping the subversion repo with the fossil one and doing an svn update so that fossil only sees the files touched that actually changed? If all files get touched in the repo that may slow things down unnecessarily. My method for doing what you are doing was to use rsync with appropriate ignores in a flow something like this: In svn repo: svn update rsync -av --delete [some excludes here] ./ /path/to/fossil/repo/ then in fossil repo: fsl addremove fsl commit This flow ensured that only actual changes were seen by fossil. I.e. unchanged files remain untouched. My results: Note: This scenario is very unrealistic in that every file has every line changed hundreds of times which creates thousands of tiny deltas that are time consuming to transfer. Repository size 704 MB Number of artifacts (objects stored) 234078 Number of check-ins 140119 Number of files 239 Data size (files in check out area) 178 MB Step Time (s) Comment Clone 94.609 Higher than would like, happens only once and can always fall back to copy + sync (8 secs) Copy + sync 7.690 Equivalent to clone Sync (no changes) 1.630 Update (no change, no sync) 0.004 Autosync turned off Open 7.060 Checkin (move one line in one file) 5.440 Autosync turned off Sync (one change) 2.500 Revert (all files touched) 6.360 On Fri, Dec 21, 2012 at 4:30 AM, Stefan Bellon sbel...@sbellon.de wrote: Hi! Previously I haven't used Fossil for very large repositories. But I like its concept and I am thinking about migrating our 15 years of history in four parallel Subversion repositories into one Fossil repository. I wrote a script to replay the commits from Subversion (at the moment just trunk) into Fossil and I am wondering whether I will end up with some usable state or not. In total, the Subversion repositories hold over 45000 revisions. The first 5000 revisions were converted in a quite acceptable time. But then things started to slow down. At the moment (at revision 8150) one Fossil commit takes around half a minute. I'm not sure whether the time is spend in the file system trying to find out which files (of the over 1 ones) to commit or in the real database operations of Fossil. However, I cannot imagine that it's the file system, because the same workspace is used daily to do Subversion commits which work instantaneously. So, is this some database inefficiency that can be solved via some regular optimize command? Or is this just the point where Fossil is not that scalable like other VCS? Just for reference, here's what Fossil's stats page of the current repository state (the conversion is still running) is saying: Repository Size:151875584 bytes (151.9MB) Number Of Artifacts:41671 (stored as 10572 full text and 31099 delta blobs) Uncompressed Artifact Size: 69607 bytes average, 31742157 bytes max, 2900554618 bytes (2.9GB) total Compression Ratio: 19:1 Number Of Check-ins:8150 Number Of Files:11667 Number Of Wiki Pages: 0 Number Of Tickets: 0 Duration Of Project:5649 days or approximately 15.86 years. Project ID: ad001d59eb3892b9dfad405a2bd5a752a04ef448 Server ID: 61c45d1915244bd9ff087951c9f5c6d59819d350 Fossil Version: 1.24 2012-10-22 12:48:04 [8d758d3715] (gcc-4.3.3 20081214 for GNAT Pro 6.2.2 (20090612)) SQLite Version: 2012-10-09 01:39:25 [01dc032b5b] (3.7.15) Database Stats: 148316 pages, 1024 bytes/page, 251 free pages, UTF-8, delete mode I'm looking forward to hearing your experience with projects of that size. Greetings, Stefan -- Stefan Bellon ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil scalability
On Fri, 21 Dec, Matt Welland wrote: How does your replay script work? Are you overlapping the subversion repo with the fossil one and doing an svn update so that fossil only sees the files touched that actually changed? That's what I'm trying at the moment in order to see whether that helps. But this only works with one repository and one working copy. For my real task I want to merge (on directory-level) four repositories. I'm not sure whether your rsync idea works in this case. I went for the naive but safe way to do it like the following: Each Subversion repository has its own workspace. I update the one that needs to be updated according to chronological timeline. Then I create a new temporary directory and copy the content of the four workspaces there. This is the workspace as I want it to be stored in Fossil. Then I do the following: cd temporary_dir fossil open --keep ../master.fossil fossil user new $AUTHOR fossil addremove fossil commit --date-override $COMMITDATE --user $AUTHOR -m $MSG Then I remove the temporary directory and continue to update the Subversion workspace with the next chronological commit in the timeline, copying the files together, ... and so on. I used Python's shutil.copytree which is documented as cp -p and therefore should have preserved mode, ownership and timestamp. But I think I should give it another try with explicit cp -a. I don't think I can use rsync because of the merge I want to achieve. Or is rsync even able to merge multiple directories on one side into one on the other? But perhaps it's not that I'm touching the files, but that I'm committing from an empty workspace as I open the checkout each time with fossil open --keep and then doing fossil addremove. If all files get touched in the repo that may slow things down unnecessarily. My method for doing what you are doing was to use rsync with appropriate ignores in a flow something like this: In svn repo: svn update rsync -av --delete [some excludes here] ./ /path/to/fossil/repo/ then in fossil repo: fsl addremove fsl commit If I wanted to convert just one repository, then I could just use the same Subversion workspace as Fossil workspace and just do (meta-code): svn co .../trunk workdir fossil init repo.fossil cd workdir # parse svn log --xml fossil open ../repo.fossil for REV in range($REVISIONS): svn up -r$REV fossil addremove fossil commit ... Or even easier, if I just wanted to convert one repository, then I could take the git-conversion route which gives me all the branches and tags as well and not just trunk (if I understand it correctly). In order to verify whether the problem really is the timestamp, I'm now trying to convert just one of the four repositories using the method of sharing the workspace. Let's see what happens after 5000 or 9000 revisions. Greetings, Stefan -- Stefan Bellon ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users