On 12/19/2013 10:44 AM, Heikki Linnakangas wrote:
On 12/19/2013 08:37 AM, Oleg Bartunov wrote:
before digging deep into the art of comp/decomp world I'd like to know
if you familiar with results of
http://wwwconference.org/www2008/papers/pdf/p387-zhangA.pdf paper and
some newer research ?
Yeah, I saw that paper.
Do we agree in what we really want ? Basically,
there are three main features: size, compression, decompression speed
- we should take two :)
According to that Zhang et al paper you linked, the Vbyte method
actually performs the worst on all of those measures. The other
algorithms are quite similar in terms of size (PForDelta being the most
efficient), while PForDelta is significantly faster to compress/decompress.
Just by looking at those numbers, PForDelta looks like a clear winner.
However, it operates on much bigger batches than the other algorithms; I
haven't looked at it in detail, but Zhang et al used 128 integer
batches, and they say that 32 integers is the minimum batch size. If we
want to use it for the inline posting lists stored in entry tuples, that
would be quite wasteful if there are only a few item pointers on the tuple.
Also, in the tests I've run, the compression/decompression speed is not
a significant factor in total performance, with either varbyte encoding
or Simple9-like encoding I hacked together.
One disadvantage of Simple9 and other encodings that operate in batches
is that removing a value from the middle can increase the number of
bytes required for the remaining values. That's a problem during vacuum;
it's possible that after vacuuming away one item pointer, the remaining
items no longer fit on the page.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: