Re: [gentoo-user] OT Best way to compress files with digits

2014-10-31 Thread Ralf
Well, you could just save the generating algorithm. *scnr*

I think compressing pi is hardly possible, as the numbers are
distributed pretty randomly.
But why do you want to compress? You can't work on compressed data.
And there are enough sites on the internet, where you can get your
digits again.

Pi is not supposed to change over the years :-)

Cheers
  Ralf

On 31.10.2014 17:36, meino.cra...@gmx.de wrote:
  Hi,

  I have a lot of files with digits of PI. The digits
  are the characters of 0-9. Currently they are ZIPped,
  which I think is not the best way to do that.

  I read of 7zips PPMd which compresses natural text
  quite well...but my files are not natural text (as
  they are also no binary data).

  With what practical way of compression is it possible
  to compress the files (file by file) as much as possible?

  Thank you very much in advance for any help!

  Best regards,
  mcc







Re: [gentoo-user] OT Best way to compress files with digits

2014-10-31 Thread meino . cramer
Ralf ralf+gen...@ramses-pyramidenbau.de [14-10-31 16:48]:
 Well, you could just save the generating algorithm. *scnr*
 
 I think compressing pi is hardly possible, as the numbers are
 distributed pretty randomly.
 But why do you want to compress? You can't work on compressed data.
 And there are enough sites on the internet, where you can get your
 digits again.
 
 Pi is not supposed to change over the years :-)
 
 Cheers
   Ralf
 
 On 31.10.2014 17:36, meino.cra...@gmx.de wrote:
   Hi,
 
   I have a lot of files with digits of PI. The digits
   are the characters of 0-9. Currently they are ZIPped,
   which I think is not the best way to do that.
 
   I read of 7zips PPMd which compresses natural text
   quite well...but my files are not natural text (as
   they are also no binary data).
 
   With what practical way of compression is it possible
   to compress the files (file by file) as much as possible?
 
   Thank you very much in advance for any help!
 
   Best regards,
   mcc
 
 
 
 
 
Hi Ralf,

I have a damn slow Internet connection and searching through
millions of digits is not always provided. Despite that: I want
to do more with that digits, I have to download them again and
again. Its better to get a copy of the 2014th version of PI for
later reference local on my hd.

I am currently checking the compression tools I know of for the
best compression ration. But I will definitly miss those I dont
know...
And sometimes one can do magic with option and switches of that
kind of tools I also dont know of.

If someone has suggestionsalways appreciated! :)

Best regards,
mcc





Re: [gentoo-user] OT Best way to compress files with digits

2014-10-31 Thread Helmut Jarausch

On 10/31/2014 04:59:17 PM, meino.cra...@gmx.de wrote:

If someone has suggestionsalways appreciated! :)


It's best to ask on the news group comp.compression.
There are top international specialists.

Helmut




Re: [gentoo-user] OT Best way to compress files with digits

2014-10-31 Thread Rich Freeman
On Fri, Oct 31, 2014 at 11:59 AM,  meino.cra...@gmx.de wrote:
 I am currently checking the compression tools I know of for the
 best compression ration. But I will definitly miss those I dont
 know...
 And sometimes one can do magic with option and switches of that
 kind of tools I also dont know of.

I can't imagine that any tool will do much better than something like
lzo, gzip, xz, etc.  You'll definitely benefit from compression though
- your text files full of digits are encoding 3.3 bits of information
in an 8-bit ascii character and even if the order of digits in pi can
be treated as purely random just about any compression algorithm is
going to get pretty close to that 3.3 bits per digit figure.

--
Rich



Re: [gentoo-user] OT Best way to compress files with digits

2014-10-31 Thread David Haller
Hello,

On Fri, 31 Oct 2014, Rich Freeman wrote:
On Fri, Oct 31, 2014 at 11:59 AM,  meino.cra...@gmx.de wrote:
 I am currently checking the compression tools I know of for the
 best compression ration. But I will definitly miss those I dont
 know...
 And sometimes one can do magic with option and switches of that
 kind of tools I also dont know of.

With 100k pseudo-random digits from bash's $RANDOM % 10 and a
linebreak every 100 digits (in t.lst) I get this (each with --best /
-9 / -m5 (rar) compression-level option):

$ du -b * | sort -rn
101000  t.lst
61544   t.lzop
50733   t.zoo
49696   t.zip
49609   t.lha
49554   t.gz
48907   t.Z
44942   t.rar
44661   t.rzip
44638   t.7z
44592   t.xz
44572   t.bz2
44546   t.lzma
44543   t.lzip

What I find remarkable is that both gzip and good old compress (.Z)
are rather good ;) And above is probably a quite comprehensible list,
and except .Z, .gz and .bz2 all are name as the binaries used to
create them.

I'd use bzip2/xz/lz as there are e.g. [blx]z(e)(grep|cat|less), but
not e.g. 7zgrep, and I guess they can easy access to those archives
quite a bit.

I can't imagine that any tool will do much better than something like
lzo, gzip, xz, etc.  You'll definitely benefit from compression though
- your text files full of digits are encoding 3.3 bits of information
in an 8-bit ascii character and even if the order of digits in pi can
be treated as purely random just about any compression algorithm is
going to get pretty close to that 3.3 bits per digit figure.

Good estimate:

$ calc '101000/(8/3.3)'
41662.5
and I get from (lzip)
$ calc 44543*8/101000 
3.528...(bits/digit)
to zip:
$ calc 49696*8/101000
~3.93   (bits/digit)

HTH,
-dnh

-- 
Q: Hobbies?
A: Hating music.-- Marvin



Re: [gentoo-user] OT Best way to compress files with digits

2014-10-31 Thread Rich Freeman
On Fri, Oct 31, 2014 at 2:55 PM, David Haller gen...@dhaller.de wrote:

 On Fri, 31 Oct 2014, Rich Freeman wrote:

I can't imagine that any tool will do much better than something like
lzo, gzip, xz, etc.  You'll definitely benefit from compression though
- your text files full of digits are encoding 3.3 bits of information
in an 8-bit ascii character and even if the order of digits in pi can
be treated as purely random just about any compression algorithm is
going to get pretty close to that 3.3 bits per digit figure.

 Good estimate:

 $ calc '101000/(8/3.3)'
 41662.5
 and I get from (lzip)
 $ calc 44543*8/101000
 3.528...(bits/digit)
 to zip:
 $ calc 49696*8/101000
 ~3.93   (bits/digit)

Actually, I'm surprised how far off of this the various methods are.
I was expecting SOME overhead, but not this much.

A fairly quick algorithm would be to encode every possible set of 96
digits into a 40 byte code (that is just a straight decimal-binary
conversion).  Then read a word at a time and translate it.  This
will only waste 0.011 bits per digit.

--
Rich