Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-14 Thread Rich Freeman
On Wed, May 14, 2014 at 12:53 PM, Roy Bamford wrote: > What about not compressing files smaller than the filesysem block size > at all. In my case its 4k. Any file gets allocated 4k on disc anyway, > so compression/decompression is just a waste of resource for files > <=4k. > > I'm not suggestin

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-14 Thread Roy Bamford
On 2014.05.12 10:35, Tom Wijsman wrote: > On Sun, 11 May 2014 19:46:50 +0200 > Michał Górny wrote: > > > Rationale: xz-utils is quite widespread nowadays and it is a part > > of @system set. It can achieve better compression ratio than bzip2, > > and faster decompression at the same time. > > So

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-14 Thread Ulrich Mueller
> On Wed, 14 May 2014, Andreas K Huettel wrote: > However, I'm not so happy with a "semi-random" compres/dont compress > decision for other files. Maybe some program expects a certain > filename to display a README? If there is a clear-cut decision, then > the code can be adapted, but if the p

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-14 Thread Andreas K. Huettel
Am Dienstag, 13. Mai 2014, 15:42:11 schrieb Ulrich Mueller: > > Compression for very small files was systematically studied by vapier > in bug 169260, which led to the current threshold of 128 bytes. Files > smaller than that "usually don't compress at all". > As long as this concerns manpages (

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-14 Thread viv...@gmail.com
On 05/13/14 13:01, Andrew Savchenko wrote: > f we are trying to consider a majority of users (and thus to > select reasonable defaults), from disk usage + decompression > overhead point of view it will be the best to store compressed files > if they are at least one filesystem block smaller than or

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Kent Fredric
On 12 May 2014 21:35, Tom Wijsman wrote: > What about putting multiple doc / man / info files in a single .xz file > for each package? > How would one use them if they're installed as a single .xz file per package? Is there a trick that exists to allow this to even work for "man man" ? I'm gu

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Andrew Savchenko
On Tue, 13 May 2014 08:18:25 -0400 Rich Freeman wrote: > On Tue, May 13, 2014 at 7:01 AM, Andrew Savchenko wrote: > > > > If we are trying to consider all possible cases, some filesystems > > may benefit even from compression of very small files (e.g. from > > 140 to 100 bytes) due to packing of m

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Tom Wijsman
On Tue, 13 May 2014 06:08:52 +0400 Andrew Savchenko wrote: > 1. How tools like man or info are supposed to work with such > bundle? They are not expecting to have multiple man/info files into > single xz bundle. Hmm, true; they would need to be adapted, which involves talking to upstream. Benchm

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Ulrich Mueller
> On Tue, 13 May 2014, Rich Freeman wrote: > Btrfs also supports file inlining, so every byte saved on small files > does actually help (I believe the data structure that stores the > inlined data doesn't have a fixed record size). Then again, btrfs > also supports lzo compression and I belie

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Rich Freeman
On Tue, May 13, 2014 at 7:01 AM, Andrew Savchenko wrote: > > If we are trying to consider all possible cases, some filesystems > may benefit even from compression of very small files (e.g. from > 140 to 100 bytes) due to packing of multiple small files in the > same inode. ReiserFS is a good examp

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Andrew Savchenko
On Tue, 13 May 2014 07:55:56 +0200 Ulrich Mueller wrote: > > On Tue, 13 May 2014, Andrew Savchenko wrote: > > > Please consider that by default du shows block size, not byte size. > > Than means that if file is actually 1234 bytes large, without -b it > > will be still accounted for 4096 bytes

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Ulrich Mueller
> On Tue, 13 May 2014, Andrew Savchenko wrote: > Please consider that by default du shows block size, not byte size. > Than means that if file is actually 1234 bytes large, without -b it > will be still accounted for 4096 bytes on 4K-block filesystem. This raises another question, namely if f

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Andrew Savchenko
Hello, On Mon, 12 May 2014 14:47:36 +0400 Alexander Tsoy wrote: > В Sun, 11 May 2014 18:26:32 -0500 > Gordon Pettey пишет: > > > A lot of small files (e.g. AUTHORS, ChangeLog > > > > FWIW: On my system, I have 59M of bz2 files in /usr/share/man and > > /usr/share/doc. A short script to decompre

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Andrew Savchenko
On Mon, 12 May 2014 11:35:00 +0200 Tom Wijsman wrote: > On Sun, 11 May 2014 19:46:50 +0200 > Michał Górny wrote: > > > Rationale: xz-utils is quite widespread nowadays and it is a > > part of @system set. It can achieve better compression ratio > > than bzip2, and faster decompression at the same

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Gordon Pettey
On Mon, May 12, 2014 at 5:47 AM, Alexander Tsoy wrote: > В Sun, 11 May 2014 18:26:32 -0500 > Gordon Pettey пишет: > > > A lot of small files (e.g. AUTHORS, ChangeLog > > > > FWIW: On my system, I have 59M of bz2 files in /usr/share/man and > > /usr/share/doc. A short script to decompress those a

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Alexander Tsoy
В Mon, 12 May 2014 14:17:11 +0200 Tom Wijsman пишет: > On Mon, 12 May 2014 14:47:36 +0400 > Alexander Tsoy wrote: > > > Here is my test results. xz options: "--lzma2=preset=6e,dict=4MiB". > > Larger dictionary size does not improve compression ratio, I get > > even worse results with just "-6e"

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Tom Wijsman
On Mon, 12 May 2014 14:47:36 +0400 Alexander Tsoy wrote: > Here is my test results. xz options: "--lzma2=preset=6e,dict=4MiB". > Larger dictionary size does not improve compression ratio, I get > even worse results with just "-6e" or "-9e". man-bz2 is a full copy of > my /usr/share/man, man-xz is

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Alexander Tsoy
В Mon, 12 May 2014 14:47:36 +0400 Alexander Tsoy пишет: > В Sun, 11 May 2014 18:26:32 -0500 > Gordon Pettey пишет: > > > A lot of small files (e.g. AUTHORS, ChangeLog > > > > FWIW: On my system, I have 59M of bz2 files in /usr/share/man and > > /usr/share/doc. A short script to decompress thos

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Alexander Tsoy
В Sun, 11 May 2014 18:26:32 -0500 Gordon Pettey пишет: > A lot of small files (e.g. AUTHORS, ChangeLog > > FWIW: On my system, I have 59M of bz2 files in /usr/share/man and > /usr/share/doc. A short script to decompress those and recompress with xz > -6e reduced that to 36M. Very strange o_O

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Tom Wijsman
On Mon, 12 May 2014 11:31:45 +0200 Marcin Mirosław wrote: > Imho there is no real advantages to change current compressor for man > files. It's insufficient to experiment on a single file to make such claim, you may very well found a file that works equally well with multiple compression algorit

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Tom Wijsman
On Sun, 11 May 2014 19:46:50 +0200 Michał Górny wrote: > Rationale: xz-utils is quite widespread nowadays and it is a part > of @system set. It can achieve better compression ratio than bzip2, > and faster decompression at the same time. Some thoughts: What about putting multiple doc / man / in

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Marcin Mirosław
W dniu 11.05.2014 23:27, Pacho Ramos pisze: > El dom, 11-05-2014 a las 19:46 +0200, Michał Górny escribió: >> Hello, developers. >> >> I'd like to raise the following item for discussion: making .xz >> the default compressor used by portage for documentation, man pages >> and info files. That is, t

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-11 Thread Samuli Suominen
On 11/05/14 20:46, Michał Górny wrote: > Hello, developers. > > I'd like to raise the following item for discussion: making .xz > the default compressor used by portage for documentation, man pages > and info files. That is, the equivalent of: > > PORTAGE_COMPRESS=xz > > in make.globals. > > I

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-11 Thread Gordon Pettey
A lot of small files (e.g. AUTHORS, ChangeLog FWIW: On my system, I have 59M of bz2 files in /usr/share/man and /usr/share/doc. A short script to decompress those and recompress with xz -6e reduced that to 36M. I don't have a comparison for individual file differences. I posted the short bash scr

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-11 Thread Pacho Ramos
El dom, 11-05-2014 a las 19:46 +0200, Michał Górny escribió: > Hello, developers. > > I'd like to raise the following item for discussion: making .xz > the default compressor used by portage for documentation, man pages > and info files. That is, the equivalent of: > > PORTAGE_COMPRESS=xz > >

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-11 Thread Alexander Tsoy
В Sun, 11 May 2014 19:46:50 +0200 Michał Górny пишет: > Hello, developers. > > I'd like to raise the following item for discussion: making .xz > the default compressor used by portage for documentation, man pages > and info files. That is, the equivalent of: > > PORTAGE_COMPRESS=xz > > in ma