[fossil-users] Scalability limits
Hello, I first want to say what a terrific version control manager Fossil is! I took my first serious look at Fossil last week and have already converted a few of my personal projects away from 'git'. The built-in bug tracker and wiki are genius touches! Thank you, Fossil community, for your efforts. I would like to mention, however, that Fossil hits a scalability wall at some point, making it unsuitable for large projects. I have been trying to pull the NetBSD source repository for a week and have had nothing but problems. As of this moment, I haven't succeeded. I first tried cloning the repository, but it would exit with an error after ~2GB of data was transferred. I then downloaded the repository[2] from the NetBSD FTP site (10GB !) Doing a 'rebuild' starts out fine but, after 24 hours, I get to 60% complete and then it take hours to advance another .1%. I tried to rebuild using various options (--wal and setting the pagesize), but it all ends up slowing down at the same place. The last time I tried it, the .fossil file was 10GB and the journal file reached 11GB! I was able to download and rebuild the pkgsrc repository[3] in a reasonable time -- it's only 2.7GB. So there's some point between the two projects in which fossil's rebuild algorithm becomes so expensive, it can't be cloned. I don't have any question; I just thought I'd document my experiences. -- Rich [1] http://netbsd.sonnenberger.org/ [2] http://ftp.netbsd.org/pub/NetBSD/misc/repositories/fossil/src.fossil [3] http://ftp.netbsd.org/pub/NetBSD/misc/repositories/fossil/pkgsrc.fossil ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 4:33 PM, Rich Neswold rich.nesw...@gmail.com wrote: I don't have any question; I just thought I'd document my experiences. Thanks for your feedback! IMO (possibly a minority opinion), Fossil has never aspired to host repos quite as large as those. i remember the pkgsrc repo being mentioned before (but thought it was bigger than 2.7GB), and IIRC the delta manifest format was introduced to help support huge repos like that one and the core TCL repo. Fossil's original purpose was to host sqlite, and it works wonders for projects at that scale. i'd be interested in seeing the output of 'dbstat' on your repo, except that it could take some time for it to finish generating its output (so don't feel obligated to try it). Here's the info for the current fossil core repo: [stephan@host:~/cvs/fossil/fossil]$ f dbstat repository-size: 53739520 bytes (53.7MB) artifact-count:24813 (stored as 5784 full text and 19029 delta blobs) artifact-sizes:67440 average, 5153124 max, 1673191067 bytes (1.7GB) total compression-ratio: 31:1 checkins: 6615 files: 821 across all branches wikipages: 26 (294 changes) tickets: 1056 (3355 changes) events:5 tagchanges:737 project-age: 2394 days or approximately 6.55 years. project-id:CE59BB9F186226D80E49D1FA2DB29F935CCA0333 fossil-version:2014-02-07 08:58:55 [90bd20308b] [1.28] (gcc-4.8.1) sqlite-version:2014-01-27 15:02:07 [be1acb610f] (3.8.3) database-stats:52480 pages, 1024 bytes/pg, 109 free pages, UTF-8, delete mode -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 11:17 AM, Stephan Beal sgb...@googlemail.com wrote: On Fri, Feb 7, 2014 at 4:33 PM, Rich Neswold rich.nesw...@gmail.comwrote: I don't have any question; I just thought I'd document my experiences. Thanks for your feedback! IMO (possibly a minority opinion), Fossil has never aspired to host repos quite as large as those. i remember the pkgsrc repo being mentioned before (but thought it was bigger than 2.7GB), and IIRC the delta manifest format was introduced to help support huge repos like that one and the core TCL repo. Fossil's original purpose was to host sqlite, and it works wonders for projects at that scale. I am guessing this is a limitation of SQLite, which is designed to be light. It would be interesting to see how Fossil would perform when plugged in to, for example, PostgreSQL, MariaSQL or other heavy duty SQL server. Of course, that could require rewriting a lot of SQL queries. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, 7 Feb 2014 18:40:32 +0100 Stephan Beal sgb...@googlemail.com wrote: It would be really cool to see someone implement their own SCM based on fossil's core artifact model and their own db back-end, though. What about Monotone? Linus was looking at it, but it was too slow at that time. Sincerely, Gour -- Everyone is forced to act helplessly according to the qualities he has acquired from the modes of material nature; therefore no one can refrain from doing something, not even for a moment. http://www.atmarama.net | Hlapicina (Croatia) | GPG: 52B5C810 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 07, 2014 at 07:39:37PM +0100, Gour wrote: On Fri, 7 Feb 2014 18:40:32 +0100 Stephan Beal sgb...@googlemail.com wrote: It would be really cool to see someone implement their own SCM based on fossil's core artifact model and their own db back-end, though. What about Monotone? Linus was looking at it, but it was too slow at that time. It was a bug of monotone, that slowness. Fixed, for what I remember. But monotone works on sqlite, if the deal is sqlite. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, 7 Feb 2014 20:32:56 +0100 Lluís Batlle i Rossell vi...@viric.name wrote: It was a bug of monotone, that slowness. Fixed, for what I remember. Yeah, too bad. Otherwise we wouldn't see git. :-) But monotone works on sqlite, if the deal is sqlite. Right, but I see Monotone's influence in Fossil. Sincerely, Gour -- He who is satisfied with gain which comes of its own accord, who is free from duality and does not envy, who is steady in both success and failure, is never entangled, although performing actions. http://www.atmarama.net | Hlapicina (Croatia) | GPG: 52B5C810 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 07, 2014 at 05:17:23PM +0100, Stephan Beal wrote: i'd be interested in seeing the output of 'dbstat' on your repo, except that it could take some time for it to finish generating its output (so don't feel obligated to try it). Here's the info for the current fossil core repo: Attached for pkgsrc and src. Joerg repository-size: 2852068352 bytes (2.9GB) artifact-count:1096185 (stored as 190518 full text and 905667 delta blobs) artifact-sizes:23053 average, 5763035 max, 25270457712 bytes (25.3GB) total compression-ratio: 8:1 checkins: 384960 files: 129343 across all branches wikipages: 0 (0 changes) tickets: 0 (0 changes) events:0 tagchanges:83 project-age: 6016 days or approximately 16.47 years. project-id:a93518a42fa8e06695943fd79049ad4fcf8b9d00 fossil-version:2013-02-16 00:04:35 [d2e07756d9] [1.25] (gcc-4.5.3) sqlite-version:2013-02-13 14:04:28 [7e10a62d0e] (3.7.16) database-stats:2785223 pages, 1024 bytes/pg, 11 free pages, UTF-8, wal mode repository-size: 2380333056 bytes (2.4GB) artifact-count:1751692 (stored as 246938 full text and 1504754 delta blobs) artifact-sizes:24080 average, 17336826 max, 42181390896 bytes (42.2GB) total compression-ratio: 17:1 checkins: 278062 files: 284615 across all branches wikipages: 0 (0 changes) tickets: 0 (0 changes) events:0 tagchanges:0 project-age: 7880 days or approximately 21.57 years. project-id:f147779665278afdf4d91757d941046def2b6e5a fossil-version:2013-02-16 00:04:35 [d2e07756d9] [1.25] (gcc-4.5.3) sqlite-version:2013-02-13 14:04:28 [7e10a62d0e] (3.7.16) database-stats:36321 pages, 65536 bytes/pg, 0 free pages, UTF-8, wal mode ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 6:15 PM, Ron Wilson ronw.m...@gmail.com wrote: I am guessing this is a limitation of SQLite, which is designed to be light. It would be interesting to see how Fossil would perform when plugged in to, for example, PostgreSQL, MariaSQL or other heavy duty SQL server. Of course, that could require rewriting a lot of SQL queries. When starting on libfossil i actually looked into that and decided against it primarily because so much of the heavy lifting (and a lot of the lighter work) in fossil is done by sqlite, and it would be a tremendous effort to port that SQL logic either to C code or another SQL dialect. The fossil core model supports arbitrary storage (not necessarily a db), but having sql-based storage greatly simplifies many parts of the functionality and fossil as an application (or library) is very tightly married to sqlite. Then of course: the primary author of sqlite is the one writing most of the SQL in fossil, which means that the SQL is very fine indeed :). It would be possible to do on top of another db, but i don't think anyone's going to volunteer to do it any time soon! It would be really cool to see someone implement their own SCM based on fossil's core artifact model and their own db back-end, though. It would likely require a complete re-implementation, not just rewriting most of the SQL. libfossil (as opposed to fossil) goes out of its way to abstract the sqlite3 API out of the client's view, and could reasonably be ported to work with another db with relatively little work, but the queries themselves are often very sqlite-specific. That's where most of the work would be. Anyway... -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 10:17 AM, Stephan Beal sgb...@googlemail.com wrote: i'd be interested in seeing the output of 'dbstat' on your repo, except that it could take some time for it to finish generating its output (so don't feel obligated to try it). Here's the info for the current fossil core repo: I have another attempt in progress. This time, I'm running it on a quad-core system with 12GB RAM. It's been close to 48 hours and it reports being only 81.5% completed. The fossil file is 10GB, the -shm file is 2.3GB and the -wal file is 30.2GB. When it's done, I'll report the dbstats. Thanks, -- Rich ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 9:11 PM, Joerg Sonnenberger jo...@britannica.bec.dewrote: On Fri, Feb 07, 2014 at 05:17:23PM +0100, Stephan Beal wrote: i'd be interested in seeing the output of 'dbstat' on your repo, except that it could take some time for it to finish generating its output (so don't feel obligated to try it). Here's the info for the current fossil core repo: Attached for pkgsrc and src. Holy cow, that's a lot of checkins. Does 21.5 years make src the oldest-history fossil repo? -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
[fossil-users] http_ssl paren issue
I think maybe http_ssl needs an extra set of parentheses to deal with this: ./src/http_ssl.c: In function 'ssl_open': ./src/http_ssl.c:288: warning: cast to pointer from integer of different size -- James Turner Index: src/http_ssl.c == --- src/http_ssl.c +++ src/http_ssl.c @@ -283,11 +283,11 @@ return 1; } BIO_get_ssl(iBio, ssl); #if (SSLEAY_VERSION_NUMBER = 0x00908070) !defined(OPENSSL_NO_TLSEXT) - if( !SSL_set_tlsext_host_name(ssl, pUrlData-useProxy?pUrlData-hostname:pUrlData-name) ){ + if( !SSL_set_tlsext_host_name(ssl, (pUrlData-useProxy?pUrlData-hostname:pUrlData-name)) ){ fossil_warning(WARNING: failed to set server name indication (SNI), continuing without it.\n); } #endif ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On 2/8/2014 5:19 AM, Stephan Beal wrote: It would be really cool to see someone implement their own SCM based on fossil's core artifact model and their own db back-end, though. It would likely require a complete re-implementation, not just rewriting most of the SQL. Wasn't Veracity (http://veracity-scm.com/) inspired by most of the concepts in Fossil? They also use Fossil as their DB back-end, and IIRC, they were planning to make/sell add-ons that allow using other SQL DB's like PostgreSQL, etc. as the repo back-end. Too bad it's on hold at the moment, though. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Scalability limits
On Fri, Feb 7, 2014 at 11:40 AM, Stephan Beal sgb...@googlemail.com wrote: On Fri, Feb 7, 2014 at 6:15 PM, Ron Wilson ronw.m...@gmail.com wrote: I am guessing this is a limitation of SQLite, which is designed to be light. It would be interesting to see how Fossil would perform when plugged in to, for example, PostgreSQL, MariaSQL or other heavy duty SQL server. Of course, that could require rewriting a lot of SQL queries. When starting on libfossil i actually looked into that and decided against it primarily because so much of the heavy lifting (and a lot of the lighter work) in fossil is done by sqlite, and it would be a tremendous effort to port that SQL logic either to C code or another SQL dialect. The fossil core [...] One of the nice things about all the features that SQLite3 has been growing lately (CTEs, recursive queries, before that recursive triggers, foreign keys, ...) is that the more business logic that can be expressed declaratively and therefore pushed into SQL, the less complexity one has to have in C, Python, ... That makes it much easier to cope with future schema changes, or dataset changes that require re-planning queries (the RDBMS can do it!). And all those new features are great for expressing complex business logic (particularly CTEs). Sticking to a portable subset of SQL, on the other hand, makes it easier to scale up and down the device stack, dataset sizes, and across the network. Which makes improvements in the lowest common denominator very welcome! Now, if only PostgreSQL (and others) had a duck-type option to match (roughly) SQLite3's duck typing... That would bring the lowest common denominator up to a very useful level. Nico -- ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users