Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in

2018-01-29 Thread Warren Young
> So after about 8 hours and 3 restarts, your tarball finally downloaded… and 
> inside, I found another 17 TB tarball!  

GB, not TB.

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in

2018-01-29 Thread Warren Young
On Jan 27, 2018, at 12:20 AM, Martin Vahi  wrote:
> 
> https://temporary.softf1.com/2018/bugs/2018_01_26_Fossil_out_of_RAM_bug_test_case_t1.tar.xz

So after about 8 hours and 3 restarts, your tarball finally downloaded… and 
inside, I found another 17 TB tarball!  

That gives us an easy workaround: unpack the tarball into a subdirectory of 
your repository checkout and check that subdirectory in.  Then the hundreds or 
thousands of files in that tarball will each be inserted into the DB 
separately, so you won’t run out of memory.

I didn’t actually try your test case because it’s far from minimal.  As I 
recall, it was something like a 200 line shell script.  I wasn’t going to take 
the time to audit all that code just to try your test.

Using clues from that tarball’s contents, I found your Fossil repository, whose 
name I now forget, but while poking around in its Files section, I saw a lot of 
this sort of thing:

1. I saw not just other tarballs already checked in, but *compressed* tarballs 
(.tar.xz), which means that if just a single byte in one of the contents of 
that tarball was modified and checked back in, almost the entire contents 
following that change point would change, a terrible waste of space.

Fossil not only already has compression, but it also has *delta* compression, 
meaning that if you’d left that tarball uncompressed, it wouldn’t be much 
bigger inside Fossil, and also, new versions of that tarball would be stored 
with minimal size inflation.

As a rule, you should not check any compressed artifact into Fossil if there is 
any chance that it will ever be updated later.  Doing so defeats the delta 
compression algorithm.

(And if it’s checked in just once, ever, you might want to be using Fossil’s 
unversioned files feature.)

This rule affects many file types besides the ones you immediately think of.  
For example, I recall seeing at least one PDF.  I didn’t check, but chances are 
excellent that it was compressed, so that checking in an updated version will 
create an extra-large delta.  Decompressing the PDF before checking it in will 
result in a net smaller Fossil repository if you ever check in a change to that 
PDF.

2. I also saw a Git checkout inside your Fossil repository.  This means you’re 
checking in two copies of all files at the tip of the repository branch you 
happen to have checked out of the Git repo when you checked that Git repo into 
Fossil.  If you wanted the complete history of the remote Git repo, checking it 
in in Git fastexport format would have been more efficient.  

Personally, whenever I feel the need to re-host someone else’s Git repository 
inside my Fossil repository, I write a script that merges in the tip of the 
remote Git repo into the Fossil subdirectory that hosts it.  My repo therefore 
only initially hosts the tip of the remote Git branch I’m checking out, and on 
each update, I check in only the diffs since the last update.  The vast 
majority of the remote project’s history I delegate to the remote Git repo.

If I felt the need to maintain a duplicate copy of the entire remote 
repository, I’d do it outside my Fossil repository.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in

2018-01-27 Thread Warren Young
On Jan 27, 2018, at 3:09 AM, Stephan Beal  wrote:
> 
> Simply reading that file for insertion into the db requires 1x its size.

Obviously this *could* be changed, so that BLOBs stream into and out of the 
SQLite DB in chunks.  That gets us back to motivation: why spend the effort on 
a use case that only breaks down when working with files with a minimum size 
approaching the size of available VM?

If you want that to change, Martin, you’ll have to justify your use case.  Why 
do you want to do this, and why do you think Fossil is a sensible platform for 
supporting that use case?

There are cases where you want a DVCS and cases where you want a distributed 
filesystem. This seems like one of those latter cases.

> Your system is very likely failing on that first allocation of 17GiB.

Yes, which means this is not a “bug.”  It just means you’ll need something like 
64 GB of VM and a 64-bit OS to work with this particular Fossil repository.

If that means you force your OS to do heavy paging while doing so, you can 
expect Fossil to be much slower than copying similarly-sized files around on a 
filesystem.

I can’t do anything with your test case right now, Martin.  The torrent tracker 
isn’t responding at all, and the file download is currently proceeding at about 
1.5 Mbit/sec, so it’s going to take hours to get here.  I may try it from 
another location before Monday, but don’t hold your breath waiting on me.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in

2018-01-27 Thread Stephan Beal
On Sat, Jan 27, 2018 at 8:20 AM, Martin Vahi  wrote:

> About  5GiB of it is the Fossil repository file and
> about 17GiB of it is a tar-file with about 140k files that
> the test tries to insert to the fossil repository.
>

i'm gonna go ahead and say it because every else is thinking it:

Short answer:

Nope.

The longer answer:

As Warren said before, Fossil is going to need some multiplier of that size
in memory. Simply reading that file for insertion into the db requires 1x
its size. The sqlite3 bind process is, i see now, already optimized as far
as it can be to eliminate yet another in-memory copy of that blob:

http://fossil-scm.org/fossil/artifact/6d07632054b709a5?ln=350-351

Your system is very likely failing on that first allocation of 17GiB. If
it's not, then it's going to fail further down the line when...

a) you use the 'zip' or 'tar' commands, which build their archives in
memory. If it doesn't fail here then it will fail when...

b) you try to commit a change to that file. In that case, fossil needs 2-3x
that amount of memory (in separate allocations) in order to be able to
create and apply the delta: 37.x-51GiB of RAM _just for that one file_.
That's excluding any other memory costs it has.

Fossil is intended for managing source code, not... whatever it is that you
believe a 17GiB blob needs to be doing in a source control system (in a
piece of hardware containing only a small fraction that amount of memory,
no less). Barring major architectural upheaval (one step of which would be
reimplementing the delta generator and applicator to stream their i/o,
rather than working in-memory), your use case simply is not realistic in
fossil.

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in

2018-01-26 Thread Martin Vahi
> Date: Mon, 22 Jan 2018 09:06:10 -0500
> From: Richard Hipp 
> On 1/22/18, Martin Vahi  wrote:
>>
>> citation--start---
>> Fossil internal error: out of memory
>> citation--end-
>>
>> It happened during the execution of the
>>
>> fossil ci
>
> Do you have a test case that we can use for debugging?
>...

Now I do. A ~18.3GiB file resides at


https://temporary.softf1.com/2018/bugs/2018_01_26_Fossil_out_of_RAM_bug_test_case_t1.tar.xz

SHA256:
e671cbfc804b91d2295e00deae5f9ca4ab81b7c8a94ee7a3c7a2118ef952d2f9

The tar.xz can be also downloaded with BitTorrent.
The torrent file resides at:


https://temporary.softf1.com/2018/bugs/2018_01_26_Fossil_out_of_RAM_bug_test_case_t1.tar.xz.torrent

After unpacking the tar.xz, the tar-file is about 23GiB.
About  5GiB of it is the Fossil repository file and
about 17GiB of it is a tar-file with about 140k files that
the test tries to insert to the fossil repository.
The test script makes a copy of the 5GiB Fossil repository
file, runs "fossil open" on it, which copies files
from the repository file copy to the temporary sandbox folder,
and then the test unpacks the 17GiB tar-file to
the fossil sandbox folder and runs "fossil add" on the
new files. The overall HDD requirement is roughly

~18GiB + (3 * ~23GiB)  = ~87GiB === ~90GiB

I'll probably delete the tar.xz and the torrent file
from my home page after a few months, depending on
how much I need the HDD space at my hosting account.

Thank You (all) for the help and for the comments.


___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in

2018-01-22 Thread Kees Nuyt
[Default] On Mon, 22 Jan 2018 13:54:00 -0700, Warren Young
 wrote:

> Fossil makes some reasonable assumptions about its working
> environment: individual file sizes are a small fraction of
> available VM, the entire repository probably fits into VM,
> and if not, then it can at least get several artifacts into
> VM at once, etc.
> 
> If you’re dealing with artifact sizes a large fraction of
> the size of your virtual memory size, then you’re probably
> asking for trouble with Fossil.

I agree. In other words, Martin could increase the size
of the swap file, and try the checkin again.

-- 
Regards,
Kees Nuyt
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in

2018-01-22 Thread Warren Young
On Jan 22, 2018, at 6:44 AM, Martin Vahi  wrote:
> 
> Fossil had an opportunity to allocate
> at least 1GiB of RAM

There are cases where Fossil needs 2-3x the size of an object to operate.  

For example, my understanding of the way Fossil does the diff operation is that 
it loads the new and old files into memory, and then it creates an output 
buffer which needs to be large enough to hold the differences.

You speak of algorithms, and indeed, you might talk about creating a sliding 
window version of the diff algorithm so that you only need something like 1.2N 
memory, where N is the size of the output buffer, the rest going to the input 
files’ sliding windows, but then we get back to the need for motivating 
examples.

With 1 GB of RAM and presumably some nonzero multiple of that for VM, the 
current Fossil diff algorithm only breaks down when you’re checking in diffs on 
files hundreds of megs in size, which begs the question, “Seriously?”

> In my opinion the correct
> case might be that Fossil should be able to run even
> on the old Raspberry Pi 1 that has 512MiB RAM

It does.  One of my public Fossil projects is based on the Pi, and the Pi B+ 
remains a major development and deployment target.

It’s not surprising that it works well there, since the largest file in this 
project’s Fossil repository is 3.3 MB and the total repository size is 37 MB, 
so 512 MB of RAM and some amount of VM on top of that is plenty for this 
particular application.

I got all of that from my repository’s /stat page.  What does your repository’s 
/stat page show?

> Fossil should just look,
> how much free RAM the computer has

Easier said than done, which is why the C Standard doesn’t have a way to get 
that number.

No single one of the answers to this similar question on Stack Overflow are 
entirely correct:

   https://stackoverflow.com/questions/2513505/how-to-get-available-memory-c-g

When it takes multiple answers which are correct within a fixed scope to come 
up with a proper cross-platform solution, it’s a good bet that you’re probably 
chasing the wrong problem.

(The accepted answer to that question is arguably even wrong.  It’s certainly 
unsuitable for Fossil’s purposes.)

Even if Fossil were to mash up all of that advice into a solution that works 
everywhere, Windows, Linux, and the BSDs (including macOS, in this case) don’t 
all agree on what “free RAM” means.  The BSDs have the concept of “wired” 
memory, which doesn’t exist on the other two.  Windows has non-pageable RAM, 
which the other two don’t, etc.

Then you add in all the other random OSes Fossil runs on, and things get even 
more complicated.

> adjust its algorithm parameters accordingly.

Fossil makes some reasonable assumptions about its working environment: 
individual file sizes are a small fraction of available VM, the entire 
repository probably fits into VM, and if not, then it can at least get several 
artifacts into VM at once, etc.

If you’re dealing with artifact sizes a large fraction of the size of your 
virtual memory size, then you’re probably asking for trouble with Fossil.  
Fossil would have been designed differently if that were the common use case.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in

2018-01-22 Thread Richard Hipp
On 1/22/18, Martin Vahi  wrote:
>
> citation--start---
> Fossil internal error: out of memory
> citation--end-
>
> It happened during the execution of the
>
> fossil ci

Do you have a test case that we can use for debugging?

-- 
D. Richard Hipp
d...@sqlite.org
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


[fossil-users] Bug Report: Fossil 2.3 runs out of Memory During Check-in

2018-01-22 Thread Martin Vahi

citation--start---
Fossil internal error: out of memory
citation--end-

It happened during the execution of the

fossil ci

Given that the Fossil had an opportunity to allocate
at least 1GiB of RAM without running out of RAM,
the issue must have something to do with the
algorithm. In my opinion the correct
case might be that Fossil should be able to run even
on the old Raspberry Pi 1 that has 512MiB RAM
in total and Fossil should just look,
how much free RAM the computer has and
adjust its algorithm parameters accordingly.

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users