Re: [HACKERS] Compression and on-disk sorting

Tom Lane Wed, 17 May 2006 12:31:57 -0700

Greg Stark <[EMAIL PROTECTED]> writes:
> The ideal way to handle the situation you're describing would be to interleave
> the tuples so that you have all 1000 values of the first column, followed by
> all 1000 values of the second column and so on. Then you run a generic
> algorithm on this and it achieves very high compression rates since there are
> a lot of repeating patterns.


It's not obvious to me that that yields a form more compressible than
what we have now.  As long as the previous value is within the lookback
window, an LZ-style compressor will still be able to use it.  More
importantly, the layout you describe would be unable to take advantage
of any cross-column correlation, which in real data is likely to be a
useful property for compression.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Compression and on-disk sorting

Reply via email to