Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in
> So after about 8 hours and 3 restarts, your tarball finally downloaded… and > inside, I found another 17 TB tarball! GB, not TB. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in
On Jan 27, 2018, at 12:20 AM, Martin Vahiwrote: > > https://temporary.softf1.com/2018/bugs/2018_01_26_Fossil_out_of_RAM_bug_test_case_t1.tar.xz So after about 8 hours and 3 restarts, your tarball finally downloaded… and inside, I found another 17 TB tarball! That gives us an easy workaround: unpack the tarball into a subdirectory of your repository checkout and check that subdirectory in. Then the hundreds or thousands of files in that tarball will each be inserted into the DB separately, so you won’t run out of memory. I didn’t actually try your test case because it’s far from minimal. As I recall, it was something like a 200 line shell script. I wasn’t going to take the time to audit all that code just to try your test. Using clues from that tarball’s contents, I found your Fossil repository, whose name I now forget, but while poking around in its Files section, I saw a lot of this sort of thing: 1. I saw not just other tarballs already checked in, but *compressed* tarballs (.tar.xz), which means that if just a single byte in one of the contents of that tarball was modified and checked back in, almost the entire contents following that change point would change, a terrible waste of space. Fossil not only already has compression, but it also has *delta* compression, meaning that if you’d left that tarball uncompressed, it wouldn’t be much bigger inside Fossil, and also, new versions of that tarball would be stored with minimal size inflation. As a rule, you should not check any compressed artifact into Fossil if there is any chance that it will ever be updated later. Doing so defeats the delta compression algorithm. (And if it’s checked in just once, ever, you might want to be using Fossil’s unversioned files feature.) This rule affects many file types besides the ones you immediately think of. For example, I recall seeing at least one PDF. I didn’t check, but chances are excellent that it was compressed, so that checking in an updated version will create an extra-large delta. Decompressing the PDF before checking it in will result in a net smaller Fossil repository if you ever check in a change to that PDF. 2. I also saw a Git checkout inside your Fossil repository. This means you’re checking in two copies of all files at the tip of the repository branch you happen to have checked out of the Git repo when you checked that Git repo into Fossil. If you wanted the complete history of the remote Git repo, checking it in in Git fastexport format would have been more efficient. Personally, whenever I feel the need to re-host someone else’s Git repository inside my Fossil repository, I write a script that merges in the tip of the remote Git repo into the Fossil subdirectory that hosts it. My repo therefore only initially hosts the tip of the remote Git branch I’m checking out, and on each update, I check in only the diffs since the last update. The vast majority of the remote project’s history I delegate to the remote Git repo. If I felt the need to maintain a duplicate copy of the entire remote repository, I’d do it outside my Fossil repository. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in
On Jan 27, 2018, at 3:09 AM, Stephan Bealwrote: > > Simply reading that file for insertion into the db requires 1x its size. Obviously this *could* be changed, so that BLOBs stream into and out of the SQLite DB in chunks. That gets us back to motivation: why spend the effort on a use case that only breaks down when working with files with a minimum size approaching the size of available VM? If you want that to change, Martin, you’ll have to justify your use case. Why do you want to do this, and why do you think Fossil is a sensible platform for supporting that use case? There are cases where you want a DVCS and cases where you want a distributed filesystem. This seems like one of those latter cases. > Your system is very likely failing on that first allocation of 17GiB. Yes, which means this is not a “bug.” It just means you’ll need something like 64 GB of VM and a 64-bit OS to work with this particular Fossil repository. If that means you force your OS to do heavy paging while doing so, you can expect Fossil to be much slower than copying similarly-sized files around on a filesystem. I can’t do anything with your test case right now, Martin. The torrent tracker isn’t responding at all, and the file download is currently proceeding at about 1.5 Mbit/sec, so it’s going to take hours to get here. I may try it from another location before Monday, but don’t hold your breath waiting on me. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in
On Sat, Jan 27, 2018 at 8:20 AM, Martin Vahiwrote: > About 5GiB of it is the Fossil repository file and > about 17GiB of it is a tar-file with about 140k files that > the test tries to insert to the fossil repository. > i'm gonna go ahead and say it because every else is thinking it: Short answer: Nope. The longer answer: As Warren said before, Fossil is going to need some multiplier of that size in memory. Simply reading that file for insertion into the db requires 1x its size. The sqlite3 bind process is, i see now, already optimized as far as it can be to eliminate yet another in-memory copy of that blob: http://fossil-scm.org/fossil/artifact/6d07632054b709a5?ln=350-351 Your system is very likely failing on that first allocation of 17GiB. If it's not, then it's going to fail further down the line when... a) you use the 'zip' or 'tar' commands, which build their archives in memory. If it doesn't fail here then it will fail when... b) you try to commit a change to that file. In that case, fossil needs 2-3x that amount of memory (in separate allocations) in order to be able to create and apply the delta: 37.x-51GiB of RAM _just for that one file_. That's excluding any other memory costs it has. Fossil is intended for managing source code, not... whatever it is that you believe a 17GiB blob needs to be doing in a source control system (in a piece of hardware containing only a small fraction that amount of memory, no less). Barring major architectural upheaval (one step of which would be reimplementing the delta generator and applicator to stream their i/o, rather than working in-memory), your use case simply is not realistic in fossil. -- - stephan beal http://wanderinghorse.net/home/stephan/ "Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do." -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in
> Date: Mon, 22 Jan 2018 09:06:10 -0500 > From: Richard Hipp> On 1/22/18, Martin Vahi wrote: >> >> citation--start--- >> Fossil internal error: out of memory >> citation--end- >> >> It happened during the execution of the >> >> fossil ci > > Do you have a test case that we can use for debugging? >... Now I do. A ~18.3GiB file resides at https://temporary.softf1.com/2018/bugs/2018_01_26_Fossil_out_of_RAM_bug_test_case_t1.tar.xz SHA256: e671cbfc804b91d2295e00deae5f9ca4ab81b7c8a94ee7a3c7a2118ef952d2f9 The tar.xz can be also downloaded with BitTorrent. The torrent file resides at: https://temporary.softf1.com/2018/bugs/2018_01_26_Fossil_out_of_RAM_bug_test_case_t1.tar.xz.torrent After unpacking the tar.xz, the tar-file is about 23GiB. About 5GiB of it is the Fossil repository file and about 17GiB of it is a tar-file with about 140k files that the test tries to insert to the fossil repository. The test script makes a copy of the 5GiB Fossil repository file, runs "fossil open" on it, which copies files from the repository file copy to the temporary sandbox folder, and then the test unpacks the 17GiB tar-file to the fossil sandbox folder and runs "fossil add" on the new files. The overall HDD requirement is roughly ~18GiB + (3 * ~23GiB) = ~87GiB === ~90GiB I'll probably delete the tar.xz and the torrent file from my home page after a few months, depending on how much I need the HDD space at my hosting account. Thank You (all) for the help and for the comments. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in
[Default] On Mon, 22 Jan 2018 13:54:00 -0700, Warren Youngwrote: > Fossil makes some reasonable assumptions about its working > environment: individual file sizes are a small fraction of > available VM, the entire repository probably fits into VM, > and if not, then it can at least get several artifacts into > VM at once, etc. > > If you’re dealing with artifact sizes a large fraction of > the size of your virtual memory size, then you’re probably > asking for trouble with Fossil. I agree. In other words, Martin could increase the size of the swap file, and try the checkin again. -- Regards, Kees Nuyt ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in
On Jan 22, 2018, at 6:44 AM, Martin Vahiwrote: > > Fossil had an opportunity to allocate > at least 1GiB of RAM There are cases where Fossil needs 2-3x the size of an object to operate. For example, my understanding of the way Fossil does the diff operation is that it loads the new and old files into memory, and then it creates an output buffer which needs to be large enough to hold the differences. You speak of algorithms, and indeed, you might talk about creating a sliding window version of the diff algorithm so that you only need something like 1.2N memory, where N is the size of the output buffer, the rest going to the input files’ sliding windows, but then we get back to the need for motivating examples. With 1 GB of RAM and presumably some nonzero multiple of that for VM, the current Fossil diff algorithm only breaks down when you’re checking in diffs on files hundreds of megs in size, which begs the question, “Seriously?” > In my opinion the correct > case might be that Fossil should be able to run even > on the old Raspberry Pi 1 that has 512MiB RAM It does. One of my public Fossil projects is based on the Pi, and the Pi B+ remains a major development and deployment target. It’s not surprising that it works well there, since the largest file in this project’s Fossil repository is 3.3 MB and the total repository size is 37 MB, so 512 MB of RAM and some amount of VM on top of that is plenty for this particular application. I got all of that from my repository’s /stat page. What does your repository’s /stat page show? > Fossil should just look, > how much free RAM the computer has Easier said than done, which is why the C Standard doesn’t have a way to get that number. No single one of the answers to this similar question on Stack Overflow are entirely correct: https://stackoverflow.com/questions/2513505/how-to-get-available-memory-c-g When it takes multiple answers which are correct within a fixed scope to come up with a proper cross-platform solution, it’s a good bet that you’re probably chasing the wrong problem. (The accepted answer to that question is arguably even wrong. It’s certainly unsuitable for Fossil’s purposes.) Even if Fossil were to mash up all of that advice into a solution that works everywhere, Windows, Linux, and the BSDs (including macOS, in this case) don’t all agree on what “free RAM” means. The BSDs have the concept of “wired” memory, which doesn’t exist on the other two. Windows has non-pageable RAM, which the other two don’t, etc. Then you add in all the other random OSes Fossil runs on, and things get even more complicated. > adjust its algorithm parameters accordingly. Fossil makes some reasonable assumptions about its working environment: individual file sizes are a small fraction of available VM, the entire repository probably fits into VM, and if not, then it can at least get several artifacts into VM at once, etc. If you’re dealing with artifact sizes a large fraction of the size of your virtual memory size, then you’re probably asking for trouble with Fossil. Fossil would have been designed differently if that were the common use case. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in
On 1/22/18, Martin Vahiwrote: > > citation--start--- > Fossil internal error: out of memory > citation--end- > > It happened during the execution of the > > fossil ci Do you have a test case that we can use for debugging? -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
[fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in
citation--start--- Fossil internal error: out of memory citation--end- It happened during the execution of the fossil ci Given that the Fossil had an opportunity to allocate at least 1GiB of RAM without running out of RAM, the issue must have something to do with the algorithm. In my opinion the correct case might be that Fossil should be able to run even on the old Raspberry Pi 1 that has 512MiB RAM in total and Fossil should just look, how much free RAM the computer has and adjust its algorithm parameters accordingly. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users