Re: [gentoo-user] OT Best way to compress files with digits
Well, you could just save the generating algorithm. *scnr* I think compressing pi is hardly possible, as the numbers are distributed pretty randomly. But why do you want to compress? You can't work on compressed data. And there are enough sites on the internet, where you can get your digits again. Pi is not supposed to change over the years :-) Cheers Ralf On 31.10.2014 17:36, meino.cra...@gmx.de wrote: Hi, I have a lot of files with digits of PI. The digits are the characters of 0-9. Currently they are ZIPped, which I think is not the best way to do that. I read of 7zips PPMd which compresses natural text quite well...but my files are not natural text (as they are also no binary data). With what practical way of compression is it possible to compress the files (file by file) as much as possible? Thank you very much in advance for any help! Best regards, mcc
Re: [gentoo-user] OT Best way to compress files with digits
Ralf ralf+gen...@ramses-pyramidenbau.de [14-10-31 16:48]: Well, you could just save the generating algorithm. *scnr* I think compressing pi is hardly possible, as the numbers are distributed pretty randomly. But why do you want to compress? You can't work on compressed data. And there are enough sites on the internet, where you can get your digits again. Pi is not supposed to change over the years :-) Cheers Ralf On 31.10.2014 17:36, meino.cra...@gmx.de wrote: Hi, I have a lot of files with digits of PI. The digits are the characters of 0-9. Currently they are ZIPped, which I think is not the best way to do that. I read of 7zips PPMd which compresses natural text quite well...but my files are not natural text (as they are also no binary data). With what practical way of compression is it possible to compress the files (file by file) as much as possible? Thank you very much in advance for any help! Best regards, mcc Hi Ralf, I have a damn slow Internet connection and searching through millions of digits is not always provided. Despite that: I want to do more with that digits, I have to download them again and again. Its better to get a copy of the 2014th version of PI for later reference local on my hd. I am currently checking the compression tools I know of for the best compression ration. But I will definitly miss those I dont know... And sometimes one can do magic with option and switches of that kind of tools I also dont know of. If someone has suggestionsalways appreciated! :) Best regards, mcc
Re: [gentoo-user] OT Best way to compress files with digits
On 10/31/2014 04:59:17 PM, meino.cra...@gmx.de wrote: If someone has suggestionsalways appreciated! :) It's best to ask on the news group comp.compression. There are top international specialists. Helmut
Re: [gentoo-user] OT Best way to compress files with digits
On Fri, Oct 31, 2014 at 11:59 AM, meino.cra...@gmx.de wrote: I am currently checking the compression tools I know of for the best compression ration. But I will definitly miss those I dont know... And sometimes one can do magic with option and switches of that kind of tools I also dont know of. I can't imagine that any tool will do much better than something like lzo, gzip, xz, etc. You'll definitely benefit from compression though - your text files full of digits are encoding 3.3 bits of information in an 8-bit ascii character and even if the order of digits in pi can be treated as purely random just about any compression algorithm is going to get pretty close to that 3.3 bits per digit figure. -- Rich
Re: [gentoo-user] OT Best way to compress files with digits
Hello, On Fri, 31 Oct 2014, Rich Freeman wrote: On Fri, Oct 31, 2014 at 11:59 AM, meino.cra...@gmx.de wrote: I am currently checking the compression tools I know of for the best compression ration. But I will definitly miss those I dont know... And sometimes one can do magic with option and switches of that kind of tools I also dont know of. With 100k pseudo-random digits from bash's $RANDOM % 10 and a linebreak every 100 digits (in t.lst) I get this (each with --best / -9 / -m5 (rar) compression-level option): $ du -b * | sort -rn 101000 t.lst 61544 t.lzop 50733 t.zoo 49696 t.zip 49609 t.lha 49554 t.gz 48907 t.Z 44942 t.rar 44661 t.rzip 44638 t.7z 44592 t.xz 44572 t.bz2 44546 t.lzma 44543 t.lzip What I find remarkable is that both gzip and good old compress (.Z) are rather good ;) And above is probably a quite comprehensible list, and except .Z, .gz and .bz2 all are name as the binaries used to create them. I'd use bzip2/xz/lz as there are e.g. [blx]z(e)(grep|cat|less), but not e.g. 7zgrep, and I guess they can easy access to those archives quite a bit. I can't imagine that any tool will do much better than something like lzo, gzip, xz, etc. You'll definitely benefit from compression though - your text files full of digits are encoding 3.3 bits of information in an 8-bit ascii character and even if the order of digits in pi can be treated as purely random just about any compression algorithm is going to get pretty close to that 3.3 bits per digit figure. Good estimate: $ calc '101000/(8/3.3)' 41662.5 and I get from (lzip) $ calc 44543*8/101000 3.528...(bits/digit) to zip: $ calc 49696*8/101000 ~3.93 (bits/digit) HTH, -dnh -- Q: Hobbies? A: Hating music.-- Marvin
Re: [gentoo-user] OT Best way to compress files with digits
On Fri, Oct 31, 2014 at 2:55 PM, David Haller gen...@dhaller.de wrote: On Fri, 31 Oct 2014, Rich Freeman wrote: I can't imagine that any tool will do much better than something like lzo, gzip, xz, etc. You'll definitely benefit from compression though - your text files full of digits are encoding 3.3 bits of information in an 8-bit ascii character and even if the order of digits in pi can be treated as purely random just about any compression algorithm is going to get pretty close to that 3.3 bits per digit figure. Good estimate: $ calc '101000/(8/3.3)' 41662.5 and I get from (lzip) $ calc 44543*8/101000 3.528...(bits/digit) to zip: $ calc 49696*8/101000 ~3.93 (bits/digit) Actually, I'm surprised how far off of this the various methods are. I was expecting SOME overhead, but not this much. A fairly quick algorithm would be to encode every possible set of 96 digits into a 40 byte code (that is just a straight decimal-binary conversion). Then read a word at a time and translate it. This will only waste 0.011 bits per digit. -- Rich