Bug#294929: rzip does not work for large files

2005-03-13 Thread Alec Berryman
Marc A. Lehmann  on 2005-02-13 08:12:48 +0100:

  The man page could be a little clearer.  If you would like to
  submit a diff, that would be great; otherwise, I'll take care of
  it as time permits.
 
 I'll put it on my todo list, dont' wait for me, though, I am pretty
 busy. But if a diff arrives and you haven't patched it yet, feel
 free.

Hi Marc,

I took a look at the man page today seeing how I would alter it, and
it seems I missed the place where it does talk about memory usage
during our initial conversation:

-0..9  Set the compression level from 0 to 9. The default is to use
   level 9, which is the slowest but gives the best compression
   rate.  The compression level is also strongly related to how
   much memory rzip uses, so if you are running rzip on a machine
   with limited amounts of memory then you will probably want to
   choose a level less than 9.

Do you feel like that should be expanded on?  I'm leaning towards 'no'
and beating myself over the head for not RTFM'ing close enough :)

Alec


pgpWN36zETGIB.pgp
Description: PGP signature


Bug#294929: rzip does not work for large files

2005-02-12 Thread Alec Berryman
root on 2005-02-12 14:25:47 +0100:

 Package: rzip
 Version: 2.0-2
 Severity: important
 
 
 Unlike the manpage claims, rzip does not work for large files, as it
 tries to mmap the whole file into memory:
 
 -rw---  1 root root 842895360 Feb 12 12:45 backup.tar
 
# strace rzip -9 backup.tar
...
fstat64(3, {st_mode=S_IFREG|0600, st_size=842895360, ...}) = 0
mmap2(NULL, 842895360, PROT_READ, MAP_SHARED, 3, 0) = -1 ENOMEM (Cannot 
 allocate memory)
write(2, Failed to map buffer in rzip_fd\n, 32Failed to map buffer in 
 rzip_fd
 
 That is, rzip fails for most files that it is designed for.

I'm guessing that the problem is the amount of memory you have
available.  rzip will copy into memory up to 900MB of the file at a
time (see the man page, section COMPRESSION ALGORITHM); how much RAM
+ swap do you have available?


pgpC0YdJmBjjD.pgp
Description: PGP signature


Bug#294929: rzip does not work for large files

2005-02-12 Thread pcg
On Sat, Feb 12, 2005 at 09:12:01AM -0500, Alec Berryman [EMAIL PROTECTED] 
wrote:
  That is, rzip fails for most files that it is designed for.
 
 I'm guessing that the problem is the amount of memory you have
 available.  rzip will copy into memory up to 900MB of the file at a
 time (see the man page, section COMPRESSION ALGORITHM); how much RAM

Hmm, nothing in the man page claims it's copying that much into memory, or
that it needs that much memory. It does refer to 900MB history buffer,
so maybe that means it needs that much memory (in fatc, strace does not
show that it tries to allocate that much memory, all it dos is mmap the
file, but the same could be done by read()ing it).

If that is indeed the case, that explains the problem. It might help if
that gets documented more clearly in the manpage. I somehow doubt that it
needs 900MB of ram, though.

 + swap do you have available?

well, way less :)

-- 
The choice of a
  -==- _GNU_
  ==-- _   generation Marc Lehmann
  ---==---(_)__  __   __  [EMAIL PROTECTED]
  --==---/ / _ \/ // /\ \/ /  http://schmorp.de/
  -=/_/_//_/\_,_/ /_/\_\  XX11-RIPE


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#294929: rzip does not work for large files

2005-02-12 Thread Alec Berryman
 Marc A. Lehmann  on 2005-02-12 19:23:55 +0100:

 Hmm, nothing in the man page claims it's copying that much into
 memory, or that it needs that much memory. It does refer to 900MB
 history buffer, 

For the history buffer to be effective, it must be in memory and can't
be piecemeal loaded and discarded.  For a similar reason, rzip can't
be used in a pipe.  The author is not under the impression it can be
easily coded around.

 so maybe that means it needs that much memory (in fatc, strace does
 not show that it tries to allocate that much memory, all it dos is
 mmap the file, but the same could be done by read()ing it).

As mentioned, the history buffer needs to be in memory; I don't see
the advantage in read()ing it instead of mmap()ing it, since the end
result is that up to 900MB of the file is in memory at the same time.

 If that is indeed the case, that explains the problem. It might help
 if that gets documented more clearly in the manpage. I somehow doubt
 that it needs 900MB of ram, though.

It doesn't *really* need 900MB of RAM + swap, as I read the code; take
a look at rzip.c:581 if you're so inclined.  The minimum history
buffer size is 100MB, and each additional level of compression adds
another 100MB of memory.  In your original example, you used `rzip
-9`.  I am under the impression that this is how bzip2 works.

The man page could be a little clearer.  If you would like to submit a
diff, that would be great; otherwise, I'll take care of it as time
permits.


pgpKTLVRWclhe.pgp
Description: PGP signature


Bug#294929: rzip does not work for large files

2005-02-12 Thread pcg
On Sat, Feb 12, 2005 at 07:10:55PM -0500, Alec Berryman [EMAIL PROTECTED] 
wrote:
  Marc A. Lehmann  on 2005-02-12 19:23:55 +0100:
 
  Hmm, nothing in the man page claims it's copying that much into
  memory, or that it needs that much memory. It does refer to 900MB
  history buffer, 
 
 For the history buffer to be effective,
   
BTW, the bug report should either (preferably) be closed or tagged wishlist.

  not show that it tries to allocate that much memory, all it dos is
  mmap the file, but the same could be done by read()ing it).
 
 As mentioned, the history buffer needs to be in memory; I don't see
 the advantage in read()ing it instead of mmap()ing it, since the end
 result is that up to 900MB of the file is in memory at the same time.

Well, the advantage would be that it would run :)

Anyway, I looked at the source, and what it does is roughly this:

- hash by linearly reading through the file, if a possible match
  is found, look back and compare, if not, don't look back.

Anyways, it's clear that some work would be involved, so the only request I
have would be to document that (more clearly). There is a difference between
an algorithm that has an effective history buffer of 900MB and an algorithm
that needs 900MB of RAM to me :)

 It doesn't *really* need 900MB of RAM + swap, as I read the code; take
 a look at rzip.c:581 if you're so inclined.  The minimum history
 buffer size is 100MB, and each additional level of compression adds
 another 100MB of memory.  In your original example, you used `rzip
 -9`.  I am under the impression that this is how bzip2 works.

Oh, I assumed that would be the bzip2 compression level. Again, this could be
mentioned in the manpage.

 The man page could be a little clearer.  If you would like to submit a
 diff, that would be great; otherwise, I'll take care of it as time
 permits.

I'll put it on my todo list, dont' wait for me, though, I am pretty busy. But
if a diff arrives and you haven't patched it yet, feel free.

Thanks for the explanation and insights!

-- 
The choice of a
  -==- _GNU_
  ==-- _   generation Marc Lehmann
  ---==---(_)__  __   __  [EMAIL PROTECTED]
  --==---/ / _ \/ // /\ \/ /  http://schmorp.de/
  -=/_/_//_/\_,_/ /_/\_\  XX11-RIPE


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]