Re: [fossil-users] Scalability (WAS: something else)
On Tue, Sep 2, 2014 at 11:26 PM, sky5w...@gmail.com wrote: (2) Fossil's purpose is to be able to recreate historical versions of the project - exactly. It cannot do that if historical images have been deleted. I understand the purity intended, but continue to be frustrated by it. :) I merely seek an automated way within Fossil to manage garbage. Re-repoing to delete spam or 'add *.*' mistakes is quite painful. Fossil's design is not only to make it painful, but impossible, to remove old data. Mistakes are painful - that's why we learn from them. To work against a given software's internal assumptions/laws of physics generally leads to pain and suffering. From what i understand, git will quite happily let you remove whatever you want (and afterwards you might even be able to still access the other data, too). -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability (WAS: something else)
On Tue, Sep 2, 2014 at 10:27 AM, Stephan Beal sgb...@googlemail.com wrote: but by very large source control i envision something akin to git's octopus model, reaching fractally out into the universe Fossil uses the octopus model as well. I just don't know of any projects using Fossil that have more than 2 layers of developers. By 2 layers, I mean devs with push access to the main repository and devs who submit patches. As compared to, say, the Linux Kernel project, where several rings of repositories (and their respective maintainers) marshal changes inward. (That said, someone on this list was talking about a project that potentially could involve project devs pushing changes to a Fossil server in their respective offices, which in turn would push the changes to the main office. However, it looks like that person may have decided to pass over Fossil due to real time issues.) ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability (WAS: something else)
On 9/2/2014 08:27, Stephan Beal wrote: On Tue, Sep 2, 2014 at 4:18 PM, sky5w...@gmail.com mailto:sky5w...@gmail.com wrote: Will Fossil ever seek to address very large source control? Fossil's main target is sqlite (it's a cyclic relationship), and in my humble (but quite fallible) opinion that project won't ever need very large source control. While I don't think moving to PostgreSQL is the right answer, there are some things about Fossil where it's making SQLite-based assumptions that break down in projects with large repos. SQLite is a relatively tight code base with little in the way of media files checked into the repo. The SQLite repo is currently 52 megs. The Fossil repo for the web app I work on is 284 megs, probably in very large part because I check in two copies of most every graphic used in the web app: a high-res multi-layered working copy, and a downsampled flattened and optimized PNG version for display. Every time one of these files changes, the whole file has to be copied back into the repo, because you can't diff a PSD or PNG file. (Well, *Fossil* can't.) Fossil currently wants to do a cryptographically strong checksum on every version of every graphic file I've ever created on every checkin. Consequently, a checkin takes several seconds here. There was a recent proposal that you should be able to turn that feature of Fossil off, which would help a lot in such cases. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability (WAS: something else)
On Tue, Sep 02, 2014 at 02:45:13PM -0600, Warren Young wrote: Fossil currently wants to do a cryptographically strong checksum on every version of every graphic file I've ever created on every checkin. Consequently, a checkin takes several seconds here. There was a recent proposal that you should be able to turn that feature of Fossil off, which would help a lot in such cases. Huh? You can. It has been available for ages. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability (WAS: something else)
On 9/2/2014 14:47, Joerg Sonnenberger wrote: On Tue, Sep 02, 2014 at 02:45:13PM -0600, Warren Young wrote: Fossil currently wants to do a cryptographically strong checksum on every version of every graphic file I've ever created on every checkin. Consequently, a checkin takes several seconds here. There was a recent proposal that you should be able to turn that feature of Fossil off, which would help a lot in such cases. Huh? You can. It has been available for ages. This is the message I was thinking of: http://goo.gl/Q0wtIr I see that repo-cksum already exists, so I'm not sure what drh was talking about that's different. Maybe just a threshold, beyond which repo-cksum is always disabled? ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability (WAS: something else)
While disabling checksums helps with speed http://www.fossil-scm.org/index.html/help?cmd=settings It does not help with redundant binary images in the repo. For that, you have to shun and rebuild. If you could flag a file as Keep latest only, that would be less painless. I don't mind the artifact overhead of a changed binary file, but it hurts to have the data stored too. On Tue, Sep 2, 2014 at 4:47 PM, Joerg Sonnenberger jo...@britannica.bec.de wrote: On Tue, Sep 02, 2014 at 02:45:13PM -0600, Warren Young wrote: Fossil currently wants to do a cryptographically strong checksum on every version of every graphic file I've ever created on every checkin. Consequently, a checkin takes several seconds here. There was a recent proposal that you should be able to turn that feature of Fossil off, which would help a lot in such cases. Huh? You can. It has been available for ages. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability (WAS: something else)
On Tue, Sep 2, 2014 at 5:04 PM, Warren Young war...@etr-usa.com wrote: On 9/2/2014 14:47, Joerg Sonnenberger wrote: On Tue, Sep 02, 2014 at 02:45:13PM -0600, Warren Young wrote: Fossil currently wants to do a cryptographically strong checksum on every version of every graphic file I've ever created on every checkin. Consequently, a checkin takes several seconds here. There was a recent proposal that you should be able to turn that feature of Fossil off, which would help a lot in such cases. Huh? You can. It has been available for ages. This is the message I was thinking of: http://goo.gl/Q0wtIr I see that repo-cksum already exists, so I'm not sure what drh was talking about that's different. Maybe just a threshold, beyond which repo-cksum is always disabled? When I write that message, and said it might be possible to implement I had forgotten that I had already implemented it for Joerg, ages ago. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability (WAS: something else)
On Tue, Sep 2, 2014 at 5:07 PM, sky5w...@gmail.com wrote: While disabling checksums helps with speed http://www.fossil-scm.org/index.html/help?cmd=settings It does not help with redundant binary images in the repo. For that, you have to shun and rebuild. If you could flag a file as Keep latest only, that would be less painless. I don't mind the artifact overhead of a changed binary file, but it hurts to have the data stored too. (1) Fossil *does* store binary files as diffs from their predecessor, if they are sufficiently similar (that is, if the diff is smaller than the file itself). the problem is that with compressed images, changing a single pixel can potentially change most bytes of the file, making a diff pointless. (2) Fossil's purpose is to be able to recreate historical versions of the project - exactly. It cannot do that if historical images have been deleted. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability (WAS: something else)
On 9/2/2014 15:07, sky5w...@gmail.com wrote: If you could flag a file as Keep latest only, that would be less painless. That wouldn't work for me. I want the past versions of the image. [*] The branch I made of the web app three years ago won't run right with the current bitmaps. The new ones may be different sizes, have a different design esthetic, etc. With repo-cksum on, Fossil has an O(N) complexity component. Without it, you only have the logarithmic time complexities due to the tree structures of the DB. [*] Well, I suppose I could go through and weed out a few bad ideas, but that goes against Fossil's nature. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability (WAS: something else)
On 9/2/2014 15:11, Richard Hipp wrote: (1) Fossil *does* store binary files as diffs from their predecessor, if they are sufficiently similar (that is, if the diff is smaller than the file itself). the problem is that with compressed images, changing a single pixel can potentially change most bytes of the file, making a diff pointless. You're right, it probably *is* storing diffs of my PSD files, but not the PNGs. Just as well: it's the PSDs that are the real pigs. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability (WAS: something else)
(2) Fossil's purpose is to be able to recreate historical versions of the project - exactly. It cannot do that if historical images have been deleted. I understand the purity intended, but continue to be frustrated by it. :) I merely seek an automated way within Fossil to manage garbage. Re-repoing to delete spam or 'add *.*' mistakes is quite painful. On Tue, Sep 2, 2014 at 5:19 PM, Warren Young war...@etr-usa.com wrote: On 9/2/2014 15:07, sky5w...@gmail.com wrote: If you could flag a file as Keep latest only, that would be less painless. That wouldn't work for me. I want the past versions of the image. [*] The branch I made of the web app three years ago won't run right with the current bitmaps. The new ones may be different sizes, have a different design esthetic, etc. With repo-cksum on, Fossil has an O(N) complexity component. Without it, you only have the logarithmic time complexities due to the tree structures of the DB. [*] Well, I suppose I could go through and weed out a few bad ideas, but that goes against Fossil's nature. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 4:33 PM, Rich Neswold rich.nesw...@gmail.com wrote: I don't have any question; I just thought I'd document my experiences. Thanks for your feedback! IMO (possibly a minority opinion), Fossil has never aspired to host repos quite as large as those. i remember the pkgsrc repo being mentioned before (but thought it was bigger than 2.7GB), and IIRC the delta manifest format was introduced to help support huge repos like that one and the core TCL repo. Fossil's original purpose was to host sqlite, and it works wonders for projects at that scale. i'd be interested in seeing the output of 'dbstat' on your repo, except that it could take some time for it to finish generating its output (so don't feel obligated to try it). Here's the info for the current fossil core repo: [stephan@host:~/cvs/fossil/fossil]$ f dbstat repository-size: 53739520 bytes (53.7MB) artifact-count:24813 (stored as 5784 full text and 19029 delta blobs) artifact-sizes:67440 average, 5153124 max, 1673191067 bytes (1.7GB) total compression-ratio: 31:1 checkins: 6615 files: 821 across all branches wikipages: 26 (294 changes) tickets: 1056 (3355 changes) events:5 tagchanges:737 project-age: 2394 days or approximately 6.55 years. project-id:CE59BB9F186226D80E49D1FA2DB29F935CCA0333 fossil-version:2014-02-07 08:58:55 [90bd20308b] [1.28] (gcc-4.8.1) sqlite-version:2014-01-27 15:02:07 [be1acb610f] (3.8.3) database-stats:52480 pages, 1024 bytes/pg, 109 free pages, UTF-8, delete mode -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 11:17 AM, Stephan Beal sgb...@googlemail.com wrote: On Fri, Feb 7, 2014 at 4:33 PM, Rich Neswold rich.nesw...@gmail.comwrote: I don't have any question; I just thought I'd document my experiences. Thanks for your feedback! IMO (possibly a minority opinion), Fossil has never aspired to host repos quite as large as those. i remember the pkgsrc repo being mentioned before (but thought it was bigger than 2.7GB), and IIRC the delta manifest format was introduced to help support huge repos like that one and the core TCL repo. Fossil's original purpose was to host sqlite, and it works wonders for projects at that scale. I am guessing this is a limitation of SQLite, which is designed to be light. It would be interesting to see how Fossil would perform when plugged in to, for example, PostgreSQL, MariaSQL or other heavy duty SQL server. Of course, that could require rewriting a lot of SQL queries. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, 7 Feb 2014 18:40:32 +0100 Stephan Beal sgb...@googlemail.com wrote: It would be really cool to see someone implement their own SCM based on fossil's core artifact model and their own db back-end, though. What about Monotone? Linus was looking at it, but it was too slow at that time. Sincerely, Gour -- Everyone is forced to act helplessly according to the qualities he has acquired from the modes of material nature; therefore no one can refrain from doing something, not even for a moment. http://www.atmarama.net | Hlapicina (Croatia) | GPG: 52B5C810 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 07, 2014 at 07:39:37PM +0100, Gour wrote: On Fri, 7 Feb 2014 18:40:32 +0100 Stephan Beal sgb...@googlemail.com wrote: It would be really cool to see someone implement their own SCM based on fossil's core artifact model and their own db back-end, though. What about Monotone? Linus was looking at it, but it was too slow at that time. It was a bug of monotone, that slowness. Fixed, for what I remember. But monotone works on sqlite, if the deal is sqlite. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, 7 Feb 2014 20:32:56 +0100 Lluís Batlle i Rossell vi...@viric.name wrote: It was a bug of monotone, that slowness. Fixed, for what I remember. Yeah, too bad. Otherwise we wouldn't see git. :-) But monotone works on sqlite, if the deal is sqlite. Right, but I see Monotone's influence in Fossil. Sincerely, Gour -- He who is satisfied with gain which comes of its own accord, who is free from duality and does not envy, who is steady in both success and failure, is never entangled, although performing actions. http://www.atmarama.net | Hlapicina (Croatia) | GPG: 52B5C810 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 07, 2014 at 05:17:23PM +0100, Stephan Beal wrote: i'd be interested in seeing the output of 'dbstat' on your repo, except that it could take some time for it to finish generating its output (so don't feel obligated to try it). Here's the info for the current fossil core repo: Attached for pkgsrc and src. Joerg repository-size: 2852068352 bytes (2.9GB) artifact-count:1096185 (stored as 190518 full text and 905667 delta blobs) artifact-sizes:23053 average, 5763035 max, 25270457712 bytes (25.3GB) total compression-ratio: 8:1 checkins: 384960 files: 129343 across all branches wikipages: 0 (0 changes) tickets: 0 (0 changes) events:0 tagchanges:83 project-age: 6016 days or approximately 16.47 years. project-id:a93518a42fa8e06695943fd79049ad4fcf8b9d00 fossil-version:2013-02-16 00:04:35 [d2e07756d9] [1.25] (gcc-4.5.3) sqlite-version:2013-02-13 14:04:28 [7e10a62d0e] (3.7.16) database-stats:2785223 pages, 1024 bytes/pg, 11 free pages, UTF-8, wal mode repository-size: 2380333056 bytes (2.4GB) artifact-count:1751692 (stored as 246938 full text and 1504754 delta blobs) artifact-sizes:24080 average, 17336826 max, 42181390896 bytes (42.2GB) total compression-ratio: 17:1 checkins: 278062 files: 284615 across all branches wikipages: 0 (0 changes) tickets: 0 (0 changes) events:0 tagchanges:0 project-age: 7880 days or approximately 21.57 years. project-id:f147779665278afdf4d91757d941046def2b6e5a fossil-version:2013-02-16 00:04:35 [d2e07756d9] [1.25] (gcc-4.5.3) sqlite-version:2013-02-13 14:04:28 [7e10a62d0e] (3.7.16) database-stats:36321 pages, 65536 bytes/pg, 0 free pages, UTF-8, wal mode ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 6:15 PM, Ron Wilson ronw.m...@gmail.com wrote: I am guessing this is a limitation of SQLite, which is designed to be light. It would be interesting to see how Fossil would perform when plugged in to, for example, PostgreSQL, MariaSQL or other heavy duty SQL server. Of course, that could require rewriting a lot of SQL queries. When starting on libfossil i actually looked into that and decided against it primarily because so much of the heavy lifting (and a lot of the lighter work) in fossil is done by sqlite, and it would be a tremendous effort to port that SQL logic either to C code or another SQL dialect. The fossil core model supports arbitrary storage (not necessarily a db), but having sql-based storage greatly simplifies many parts of the functionality and fossil as an application (or library) is very tightly married to sqlite. Then of course: the primary author of sqlite is the one writing most of the SQL in fossil, which means that the SQL is very fine indeed :). It would be possible to do on top of another db, but i don't think anyone's going to volunteer to do it any time soon! It would be really cool to see someone implement their own SCM based on fossil's core artifact model and their own db back-end, though. It would likely require a complete re-implementation, not just rewriting most of the SQL. libfossil (as opposed to fossil) goes out of its way to abstract the sqlite3 API out of the client's view, and could reasonably be ported to work with another db with relatively little work, but the queries themselves are often very sqlite-specific. That's where most of the work would be. Anyway... -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 10:17 AM, Stephan Beal sgb...@googlemail.com wrote: i'd be interested in seeing the output of 'dbstat' on your repo, except that it could take some time for it to finish generating its output (so don't feel obligated to try it). Here's the info for the current fossil core repo: I have another attempt in progress. This time, I'm running it on a quad-core system with 12GB RAM. It's been close to 48 hours and it reports being only 81.5% completed. The fossil file is 10GB, the -shm file is 2.3GB and the -wal file is 30.2GB. When it's done, I'll report the dbstats. Thanks, -- Rich ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 9:11 PM, Joerg Sonnenberger jo...@britannica.bec.dewrote: On Fri, Feb 07, 2014 at 05:17:23PM +0100, Stephan Beal wrote: i'd be interested in seeing the output of 'dbstat' on your repo, except that it could take some time for it to finish generating its output (so don't feel obligated to try it). Here's the info for the current fossil core repo: Attached for pkgsrc and src. Holy cow, that's a lot of checkins. Does 21.5 years make src the oldest-history fossil repo? -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On 2/8/2014 5:19 AM, Stephan Beal wrote: It would be really cool to see someone implement their own SCM based on fossil's core artifact model and their own db back-end, though. It would likely require a complete re-implementation, not just rewriting most of the SQL. Wasn't Veracity (http://veracity-scm.com/) inspired by most of the concepts in Fossil? They also use Fossil as their DB back-end, and IIRC, they were planning to make/sell add-ons that allow using other SQL DB's like PostgreSQL, etc. as the repo back-end. Too bad it's on hold at the moment, though. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 11:40 AM, Stephan Beal sgb...@googlemail.com wrote: On Fri, Feb 7, 2014 at 6:15 PM, Ron Wilson ronw.m...@gmail.com wrote: I am guessing this is a limitation of SQLite, which is designed to be light. It would be interesting to see how Fossil would perform when plugged in to, for example, PostgreSQL, MariaSQL or other heavy duty SQL server. Of course, that could require rewriting a lot of SQL queries. When starting on libfossil i actually looked into that and decided against it primarily because so much of the heavy lifting (and a lot of the lighter work) in fossil is done by sqlite, and it would be a tremendous effort to port that SQL logic either to C code or another SQL dialect. The fossil core [...] One of the nice things about all the features that SQLite3 has been growing lately (CTEs, recursive queries, before that recursive triggers, foreign keys, ...) is that the more business logic that can be expressed declaratively and therefore pushed into SQL, the less complexity one has to have in C, Python, ... That makes it much easier to cope with future schema changes, or dataset changes that require re-planning queries (the RDBMS can do it!). And all those new features are great for expressing complex business logic (particularly CTEs). Sticking to a portable subset of SQL, on the other hand, makes it easier to scale up and down the device stack, dataset sizes, and across the network. Which makes improvements in the lowest common denominator very welcome! Now, if only PostgreSQL (and others) had a duck-type option to match (roughly) SQLite3's duck typing... That would bring the lowest common denominator up to a very useful level. Nico -- ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability
On Sat, 17 Mar 2012 00:44:24 +0100 Jan Danielsson jan.m.daniels...@gmail.com wrote: At first I thought it was a problem with the server being overloaded, but when I finally got all of it downloaded to one of my servers, I tried to pull it to other systems from there, but I'm running into the 504 Gateway Time-out problem here as well (and this particular server has lots of time to spare). Although not helpful to resolve the issue, I'm just curious which server is concerned? Recently I gave up on Cherokee (and embraced lighty) due to 504 errors (serving PHP), but probably it's not connected with your problem. Sincerely, Gour -- In this endeavor there is no loss or diminution, and a little advancement on this path can protect one from the most dangerous type of fear. http://atmarama.net | Hlapicina (Croatia) | GPG: 52B5C810 signature.asc Description: PGP signature ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability
On Sat, Mar 17, 2012 at 12:44:24AM +0100, Jan Danielsson wrote: Joerg's NetBSD repository suddenly grew from ~2.5G to over 6GB. As it has grown, I've been having increasing problems pulling the latest changes. I started getting the database is locked errors and (more often) fossil: server says: 504 Gateway Time-out. These problems occurred in a very non-deterministic manner. Sometimes immediately, other times after a few hours. It got to the point where I had to while [ 1 ]; do fossil pull ; sleep 240 ; done to get it to the most recent check-in. Guessing, I imagine some servers may have some default configuration regarding timeouts in cgi operations. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
On 9/29/2011 2:12 PM, Joerg Sonnenberger wrote: What Operating System is that on? There might be a limit to the number of filesystem objects that can be cached and your tree just large enough to not fit into it. Another thing to try is forcing the _FOSSIL_ file into cache (e.g. cat _FOSSIL_ /dev/null on Unix). I tried this on my Ubuntu box and 'status' took only about two seconds to complete -- however, this was also after a commit or two. In any case, it was faster than doing a 'status' in Windows, even with Windows having the faster machine and hard drive. -Jeff ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
On Fri, Sep 30, 2011 at 10:25:28AM -0400, Jeff Slutter wrote: On 9/29/2011 2:12 PM, Joerg Sonnenberger wrote: What Operating System is that on? There might be a limit to the number of filesystem objects that can be cached and your tree just large enough to not fit into it. Another thing to try is forcing the _FOSSIL_ file into cache (e.g. cat _FOSSIL_ /dev/null on Unix). I tried this on my Ubuntu box and 'status' took only about two seconds to complete -- however, this was also after a commit or two. In any case, it was faster than doing a 'status' in Windows, even with Windows having the faster machine and hard drive. NTFS has some issues dealing with many small files. That's one of the things killing SVN on Windows, so I am not extremely surprised. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
I downloaded the Windows 1.18 version (supposedly build with mingw) from the website and tested it getting the same results as my previous post's timings ('delete' mode and 'wal' mode). Using Sysinternals' Process Monitor it was clear that fossil was reading all the way through the repository several times during the process of commit (it was just streaming ReadFile calls at 1k increments, it would get towards the end and loop back around, occasionally go back and forth between two areas for a bit, then go back to reading through it all - repeat). Also, I just want to clear up a minor point on my previous post, fossil.exe uses much more ram than I mentioned, but still less than hg I also fired up my Ubuntu 11.04 64bit laptop and downloaded 1.19 from the website. I ran the same tests and got nearly identical results for timings. Actually, slightly worse because the laptop's hard drive is a 5400 I believe. -Jeff ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
On Thu, Sep 29, 2011 at 01:45:29PM +0800, mjbmik...@gmail.com wrote: It is the Windows version. I'm currently in the process of commiting a new 0 byte file to an existing 2GB repo and Windows task manager says that the fossil process has read 3GB of data since I issued the commit command several minutes ago, and it's still running. First it read the the entire physical directory structure, then the entire repo file. It then prompted for a commit comment, then started to read the entire repo again. 5GB so far. Looks like it has started to read the entire repo again. 8GB and counting Maybe your filesystem or fossil/sqlite don't know about fseek? :) Do you see any fseek-kind-of syscall? As for the directory structure, it depends on whether you have the mtime-changes setting enabled or not, too. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
Just a thought - is there some virus-scanning software involved, that feels a need to scan every file opened? Sent from my BlackBerry® wireless device -Original Message- From: Lluís Batlle i Rossell virik...@gmail.com Sender: fossil-users-boun...@lists.fossil-scm.org Date: Thu, 29 Sep 2011 09:55:45 To: fossil-users@lists.fossil-scm.org Reply-To: fossil-users@lists.fossil-scm.org Subject: Re: [fossil-users] Scalability, a single file commit and lots of disk reads On Thu, Sep 29, 2011 at 01:45:29PM +0800, mjbmik...@gmail.com wrote: It is the Windows version. I'm currently in the process of commiting a new 0 byte file to an existing 2GB repo and Windows task manager says that the fossil process has read 3GB of data since I issued the commit command several minutes ago, and it's still running. First it read the the entire physical directory structure, then the entire repo file. It then prompted for a commit comment, then started to read the entire repo again. 5GB so far. Looks like it has started to read the entire repo again. 8GB and counting Maybe your filesystem or fossil/sqlite don't know about fseek? :) Do you see any fseek-kind-of syscall? As for the directory structure, it depends on whether you have the mtime-changes setting enabled or not, too. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
On Thu, 29 Sep 2011 08:15:15 + ala...@snell-pym.org.uk wrote: Just a thought - is there some virus-scanning software involved, that feels a need to scan every file opened? The OP got the same results on Ubuntu which supposedy is not infested with antivirus software. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
On Thu, Sep 29, 2011 at 6:16 AM, Jeff Slutter j...@slutter.com wrote: Interesting... I failed to mention in my post that my version of fossil was from 'trunk' sometime this afternoon, build with MSVC 2008. I also made one minor change to fix handling for repos 2gig (MSVC build version only...patch was sent to drh). This _might_ be relevant, might not: fossil help set repo-cksum Compute checksums over all files in each checkout as a double-check of correctness. Defaults to on. Disable on large repositories for a performance improvement. mtime-changesUse file modification times (mtimes) to detect when files have been modified. (Default on.) if mtime-changes is off then it does a longer check on each file. That shouldn't, i think, affect your 1-file commit UNLESS you do: fossil commit -m '...' _without_ specifying any files (in which case fossil has to figure out what's changed, and with 78k files that's gonna take a while). -- - stephan beal http://wanderinghorse.net/home/stephan/ ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
To open the repository to a new checkout it took Fossil about 26 minutes. Roughly 13 minutes extracting the files into the directory, and then 13 minutes of ... doing something, before it came back. The equivalent command in Mercurial (hg update null to reset the checkout then the timed hg update) took 19.5 minutes No low-hanging fruits here. After profiling fossil open, I noticed nothing extraordinary -- most time is spent in compression, decompression, and calculating checksums. http://i.imgur.com/1odR8.png After checkout (13 minutes of ... doing something, before it came back), Fossil goes through all the extracted files, computes MD5 checksum and verifies that it matches the one in the manifest. As Stephan pointed out, you can disable this by turning off repo-cksum setting. Same thing with committing: Fossil checks that content is repo is correct, see http://www.fossil-scm.org/index.html/doc/trunk/www/selfcheck.wiki : Then just before transaction commit, fossil re-extracts the original content of all files that were written, computes the SHA1 checksum again, and verifies that the checksums match. If anything does not match up, an error message is printed and the transaction rolls back. -- Dmitry Chestnykh ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
On Thu, Sep 29, 2011 at 3:18 PM, Jeff Slutter j...@slutter.com wrote: Add was sub 1 second Commit took 59 seconds A few weeks ago someone posted about horrible performance in his BSD Ports repo - many tens of thousands of files. Richard explained (though i cannot find the post at the moment) something about an O(N) factor which gets really slow when you have many, many, many files. IIRC, anyway. Maybe someone who organizes their mails better than i knows which post i'm talking about and can explain this in more detail. I repeated 3 more times (increasing the test?.txt counter) and it was 17-18 seconds each time. Perhaps some sort of OS level caching. Also, the final one I did a commit without specifying the file on the command line and it took 17 seconds. Interesting - i would expect that last one to take much longer in your case. That shows how much i know about Fossil's internals ;). -- - stephan beal http://wanderinghorse.net/home/stephan/ ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
Some good news... I came in to work, disabled repo-cksum, on the copy of the repository at work and tested again. Single file commit took 6 seconds. I made a number of changes to files (11 files total, a collection of edits, adds and removes) and did a fossil commit (without specifying files on the command line) and the commit took about 7 seconds. There seems to be a minimum time of 6 seconds for my operations of status, changes, and commit, and it would make sense that they all have to do the same work at some point (that would be 'finding out what files have changed') I don't expect any filesystem level caching going on this time. So, what happened on my machine at home with the 59 second and 18 second commits versus the consistent 6-8 second commits at work? At home, my hard drives are full-disk encrypted via Truecrypt :) There is a behind the scenes slow-down for all reads and writes. Fossil is not at fault for that by any means. In any case... *** disable repo-cksum if you have a large repository *** I don't know if that 6 seconds can be improved on, but I am definitely much happier than I was yesterday. Thanks everyone for help and insight. Jeff ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
On Thu, Sep 29, 2011 at 5:31 PM, Jeff Slutter j...@slutter.com wrote: I don't know if that 6 seconds can be improved on, but I am definitely much happier than I was yesterday. We all love success stories! Keep 'em coming! :) And thanks for having the patience to try to get to the bottom of the problem, rather than dismissing this wonderful tool outright :). -- - stephan beal http://wanderinghorse.net/home/stephan/ ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
On Thu, Sep 29, 2011 at 11:31:19AM -0400, Jeff Slutter wrote: There seems to be a minimum time of 6 seconds for my operations of status, changes, and commit, and it would make sense that they all have to do the same work at some point (that would be 'finding out what files have changed') What Operating System is that on? There might be a limit to the number of filesystem objects that can be cached and your tree just large enough to not fit into it. Another thing to try is forcing the _FOSSIL_ file into cache (e.g. cat _FOSSIL_ /dev/null on Unix). Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
On 9/29/2011 2:12 PM, Joerg Sonnenberger wrote: What Operating System is that on? There might be a limit to the number of filesystem objects that can be cached and your tree just large enough to not fit into it. Another thing to try is forcing the _FOSSIL_ file into cache (e.g. cat _FOSSIL_ /dev/null on Unix). The delay is on the three machines I tested: * Windows 7 64-bit NTFS, 7200 SATA drive with Truecrypt drive encryption * Windows 7 64-bit NTFS, 7200 SATA drive without Truecrypt drive encryption * Ubuntu 11.04 64-bit ext4, 5400 SATA drive I will try forcing _FOSSIL_ into the cache on the Ubuntu box and report the results. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
On 9/28/2011 11:40 AM, Mike Buckler wrote: my test repo at 40 MB is border line unusable even when run from a fast solid state disk. I have a 1 GB repo that has some irritating lag on my netbook with 5400 RPM drive. But I regularly use 4 repos between 30 and 90 MB and they're all quite snappy. So whatever's causing your problem, I'm pretty sure it's not the physical size of the database. Sorry I don't have more useful info, though. -- Joshua Paine LetterBlock: Web Applications Built With Joy http://letterblock.com/ 301-576-1920 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
Switching to wal mode hasn't made any difference. There is still a huge amount of disk read activity. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
This is pretty relevant to my day today because I sat out to load test Fossil to make sure it will be a good fit for our future projects. I took our existing source and asset folders (so, not importing from Perforce, just taking the current 'head') of our recent project and put them into Fossil. For reference: 78,353 files 5,708 folders total file size: 16,371,476,646 bytes (15.2 GB) Mostly text files (cpp, h, cs, html, xml, json, dae, etc.) with some binary I will provide some times, and some comparisons to Mercurial. Not because I'm out to say Mercurial is better or worse, I need some DVCS to compare it to and Mercurial is installed on my computer -- Windows 7 64-bit quad-core 7200 RPM drive Initial 'add' of all the files took roughly 10 minutes. For comparison, Mercurial took less than a minute. Initial 'commit' of all the files took roughly 50 minutes, Mercurial took about 38 minutes. Repository size (.fossil file) after initial commit was 3,529,995,264 bytes (3.28 GB -- pretty good!). The size of my .hg folder after commit is 3.60GB...but really 3.82GB due to filesystem overhead of 81,482 files (6,085 folders). Fossil beats Mercurial for sure here. To open the repository to a new checkout it took Fossil about 26 minutes. Roughly 13 minutes extracting the files into the directory, and then 13 minutes of ... doing something, before it came back. The equivalent command in Mercurial (hg update null to reset the checkout then the timed hg update) took 19.5 minutes The above timings and stats are important, but not that important, to me. For the above are uncommon operations related to initial setup, initial clone, etc. What matters (to me) is the stats below: Fossil ls takes about 36 seconds, most of that time is just the spew going to the console. If I pipe the results to a file it takes about 8 seconds. Fossil extra takes about 8-12 seconds (with no extra files reported). Fossil changes takes about 6-8 seconds (with no changes reported). Fossil info is less than a second, Fossil status is about 6-8 seconds (since it is practically doing info + changes). Something to note, the very first time you do a ls/changes/extra in a fresh checkout, it takes much longer, about 50-75 seconds to complete. After that, you get the timings above. For comparison to Mercurial, hg status takes 3 seconds -- I did not use any other commands. Also, using fossil artifact and providing the artifact id of the current checkout (to get its manifest) only takes about 6 seconds -- taking that and parsing it might be faster than ls if you are writing tools/scripts around Fossil (as I currently am). I should note that I tried with my Fossil repository both in the default delete mode and went through my process again (post repository creation) with wal mode and the above timings did not change. And now here is where things get bad. I created a single text file in the root of my checkout (created by doing dir * /s out.txt) -- it was several megs in size, lets say about 12 megs. fossil add out.txt--- takes about 1-5 seconds (varies within that range) hg add out.txt --- takes about 3 seconds fossil commit -m testing out.txt --- takes about 17-20 minutes hg commit -m testing --- takes about 10 seconds This has killed me. At first I tried fossil commit without providing the file to commit and it took about 22 minutes. I repeated the process, specifying the file, and it saved ~5 minutes, but still took a bit too long for me to ask my fellow employees to wait for. Also to note: Fossil is very well behaved with its memory usage (only a couple megs of ram) while Mercurial bounces around up to 60 or 100 megs. Fossil seems to have a little better throughput when it comes to writing out files also. With all of the time waiting I can see Fossil is just doing a lot of file I/O (as expected). Just some numbers for you guys. Speaking for myself, I can live with a lot of the above, but that commit time kills it :( I hope there are some possible optimizations available. Thanks, Jeff ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
Are you using the Windows Version of fossil which Richard had build with VC? I had a similar issue. It went away after I have switched to a fossil version build with mingw and gcc. You can clone the fossil repository for test. With this repository it should work really snappy. Chers Hein Am 29.09.2011 05:01, schrieb Mike Buckler: Switching to wal mode hasn't made any difference. There is still a huge amount of disk read activity. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users -- Heinrich Huss PSH Consulting GmbH Am Gewerbepark 10 64823 Groß-Umstadt Phone+49 6078 931 6455 Fax +49 6078 96 9536 Mobile +49 171 43 46 773 e-mailheinrich.h...@psh-consulting.de *** PSH Consulting GmbH Legal Disclaimer *** Diese E-Mail einschließlich ihrer Anhänge ist vertraulich und ist allein für den Gebrauch durch den vorgesehenen Empfänger bestimmt. Dritten ist das Lesen, Verteilen oder Weiterleiten dieser E-Mail untersagt. Wir bitten, eine fehlgeleitete E-Mail unverzüglich vollständig zu löschen und uns eine Nachricht zukommen zu lassen. This email may contain material that is confidential and for the sole use of the intended recipient. Any review, distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Hauptsitz: Magdeburger Strasse 1, D-64720 Michelstadt, USt-IdNr.: DE 21 117 1171 Kommunikation: Telefon +49 6078 931 6455, Telefax +49 6078 96 95 36, www.psh-consulting.de Handelsregister: Amtsgericht Darmstadt, HRB 71404 Geschäftsführer: Heinrich Huss, Thomas Riedl ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability, a single file commit and lots of disk reads
Interesting... I failed to mention in my post that my version of fossil was from 'trunk' sometime this afternoon, build with MSVC 2008. I also made one minor change to fix handling for repos 2gig (MSVC build version only...patch was sent to drh). Now I will have to build fossil.exe tomorrow using mingw and go through my timings again. On 9/29/2011 12:11 AM, Heinrich Huss wrote: Are you using the Windows Version of fossil which Richard had build with VC? I had a similar issue. It went away after I have switched to a fossil version build with mingw and gcc. You can clone the fossil repository for test. With this repository it should work really snappy. Chers Hein Am 29.09.2011 05:01, schrieb Mike Buckler: Switching to wal mode hasn't made any difference. There is still a huge amount of disk read activity. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users