Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-14 Thread viv...@gmail.com
On 05/13/14 13:01, Andrew Savchenko wrote: f we are trying to consider a majority of users (and thus to select reasonable defaults), from disk usage + decompression overhead point of view it will be the best to store compressed files if they are at least one filesystem block smaller than

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-14 Thread Andreas K. Huettel
Am Dienstag, 13. Mai 2014, 15:42:11 schrieb Ulrich Mueller: Compression for very small files was systematically studied by vapier in bug 169260, which led to the current threshold of 128 bytes. Files smaller than that usually don't compress at all. As long as this concerns manpages (where

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-14 Thread Ulrich Mueller
On Wed, 14 May 2014, Andreas K Huettel wrote: However, I'm not so happy with a semi-random compres/dont compress decision for other files. Maybe some program expects a certain filename to display a README? If there is a clear-cut decision, then the code can be adapted, but if the portage

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-14 Thread Roy Bamford
On 2014.05.12 10:35, Tom Wijsman wrote: On Sun, 11 May 2014 19:46:50 +0200 Michał Górny mgo...@gentoo.org wrote: Rationale: xz-utils is quite widespread nowadays and it is a part of @system set. It can achieve better compression ratio than bzip2, and faster decompression at the same

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-14 Thread Rich Freeman
On Wed, May 14, 2014 at 12:53 PM, Roy Bamford neddyseag...@gentoo.org wrote: What about not compressing files smaller than the filesysem block size at all. In my case its 4k. Any file gets allocated 4k on disc anyway, so compression/decompression is just a waste of resource for files =4k.

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Andrew Savchenko
On Tue, 13 May 2014 07:55:56 +0200 Ulrich Mueller wrote: On Tue, 13 May 2014, Andrew Savchenko wrote: Please consider that by default du shows block size, not byte size. Than means that if file is actually 1234 bytes large, without -b it will be still accounted for 4096 bytes on 4K-block

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Rich Freeman
On Tue, May 13, 2014 at 7:01 AM, Andrew Savchenko birc...@gmail.com wrote: If we are trying to consider all possible cases, some filesystems may benefit even from compression of very small files (e.g. from 140 to 100 bytes) due to packing of multiple small files in the same inode. ReiserFS is

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Ulrich Mueller
On Tue, 13 May 2014, Rich Freeman wrote: Btrfs also supports file inlining, so every byte saved on small files does actually help (I believe the data structure that stores the inlined data doesn't have a fixed record size). Then again, btrfs also supports lzo compression and I believe this

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Tom Wijsman
On Tue, 13 May 2014 06:08:52 +0400 Andrew Savchenko birc...@gmail.com wrote: 1. How tools like man or info are supposed to work with such bundle? They are not expecting to have multiple man/info files into single xz bundle. Hmm, true; they would need to be adapted, which involves talking to

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Andrew Savchenko
On Tue, 13 May 2014 08:18:25 -0400 Rich Freeman wrote: On Tue, May 13, 2014 at 7:01 AM, Andrew Savchenko birc...@gmail.com wrote: If we are trying to consider all possible cases, some filesystems may benefit even from compression of very small files (e.g. from 140 to 100 bytes) due to

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-13 Thread Kent Fredric
On 12 May 2014 21:35, Tom Wijsman tom...@gentoo.org wrote: What about putting multiple doc / man / info files in a single .xz file for each package? How would one use them if they're installed as a single .xz file per package? Is there a trick that exists to allow this to even work for man

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Marcin Mirosław
W dniu 11.05.2014 23:27, Pacho Ramos pisze: El dom, 11-05-2014 a las 19:46 +0200, Michał Górny escribió: Hello, developers. I'd like to raise the following item for discussion: making .xz the default compressor used by portage for documentation, man pages and info files. That is, the

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Tom Wijsman
On Sun, 11 May 2014 19:46:50 +0200 Michał Górny mgo...@gentoo.org wrote: Rationale: xz-utils is quite widespread nowadays and it is a part of @system set. It can achieve better compression ratio than bzip2, and faster decompression at the same time. Some thoughts: What about putting multiple

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Tom Wijsman
On Mon, 12 May 2014 11:31:45 +0200 Marcin Mirosław mar...@mejor.pl wrote: Imho there is no real advantages to change current compressor for man files. It's insufficient to experiment on a single file to make such claim, you may very well found a file that works equally well with multiple

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Alexander Tsoy
В Sun, 11 May 2014 18:26:32 -0500 Gordon Pettey petteyg...@gmail.com пишет: A lot of small files (e.g. AUTHORS, ChangeLog FWIW: On my system, I have 59M of bz2 files in /usr/share/man and /usr/share/doc. A short script to decompress those and recompress with xz -6e reduced that to 36M.

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Alexander Tsoy
В Mon, 12 May 2014 14:47:36 +0400 Alexander Tsoy alexan...@tsoy.me пишет: В Sun, 11 May 2014 18:26:32 -0500 Gordon Pettey petteyg...@gmail.com пишет: A lot of small files (e.g. AUTHORS, ChangeLog FWIW: On my system, I have 59M of bz2 files in /usr/share/man and /usr/share/doc. A

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Tom Wijsman
On Mon, 12 May 2014 14:47:36 +0400 Alexander Tsoy alexan...@tsoy.me wrote: Here is my test results. xz options: --lzma2=preset=6e,dict=4MiB. Larger dictionary size does not improve compression ratio, I get even worse results with just -6e or -9e. man-bz2 is a full copy of my /usr/share/man,

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Alexander Tsoy
В Mon, 12 May 2014 14:17:11 +0200 Tom Wijsman tom...@gentoo.org пишет: On Mon, 12 May 2014 14:47:36 +0400 Alexander Tsoy alexan...@tsoy.me wrote: Here is my test results. xz options: --lzma2=preset=6e,dict=4MiB. Larger dictionary size does not improve compression ratio, I get even worse

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Gordon Pettey
On Mon, May 12, 2014 at 5:47 AM, Alexander Tsoy alexan...@tsoy.me wrote: В Sun, 11 May 2014 18:26:32 -0500 Gordon Pettey petteyg...@gmail.com пишет: A lot of small files (e.g. AUTHORS, ChangeLog FWIW: On my system, I have 59M of bz2 files in /usr/share/man and /usr/share/doc. A short

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Andrew Savchenko
On Mon, 12 May 2014 11:35:00 +0200 Tom Wijsman wrote: On Sun, 11 May 2014 19:46:50 +0200 Michał Górny mgo...@gentoo.org wrote: Rationale: xz-utils is quite widespread nowadays and it is a part of @system set. It can achieve better compression ratio than bzip2, and faster decompression at

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Andrew Savchenko
Hello, On Mon, 12 May 2014 14:47:36 +0400 Alexander Tsoy wrote: В Sun, 11 May 2014 18:26:32 -0500 Gordon Pettey petteyg...@gmail.com пишет: A lot of small files (e.g. AUTHORS, ChangeLog FWIW: On my system, I have 59M of bz2 files in /usr/share/man and /usr/share/doc. A short script

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-12 Thread Ulrich Mueller
On Tue, 13 May 2014, Andrew Savchenko wrote: Please consider that by default du shows block size, not byte size. Than means that if file is actually 1234 bytes large, without -b it will be still accounted for 4096 bytes on 4K-block filesystem. This raises another question, namely if files

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-11 Thread Alexander Tsoy
В Sun, 11 May 2014 19:46:50 +0200 Michał Górny mgo...@gentoo.org пишет: Hello, developers. I'd like to raise the following item for discussion: making .xz the default compressor used by portage for documentation, man pages and info files. That is, the equivalent of: PORTAGE_COMPRESS=xz

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-11 Thread Gordon Pettey
A lot of small files (e.g. AUTHORS, ChangeLog FWIW: On my system, I have 59M of bz2 files in /usr/share/man and /usr/share/doc. A short script to decompress those and recompress with xz -6e reduced that to 36M. I don't have a comparison for individual file differences. I posted the short bash

Re: [gentoo-dev] RFC: using .xz for doc/man/info compression

2014-05-11 Thread Samuli Suominen
On 11/05/14 20:46, Michał Górny wrote: Hello, developers. I'd like to raise the following item for discussion: making .xz the default compressor used by portage for documentation, man pages and info files. That is, the equivalent of: PORTAGE_COMPRESS=xz in make.globals. I like it,