Re: [HACKERS] Proposal: custom compression methods

Tomas Vondra Wed, 16 Dec 2015 04:18:40 -0800

Hi,

On 12/14/2015 12:51 PM, Simon Riggs wrote:

On 13 December 2015 at 17:28, Alexander Korotkov
<[email protected] <mailto:[email protected]>> wrote:


    it would be nice to make compression methods pluggable.


Agreed.

My thinking is that this should be combined with work to make use of
the compressed data, which is why Alvaro, Tomas, David have been
working on Col Store API for about 18 months and work on that
continues with more submissions for 9.6 due.

I'm not sure it makes sense to combine those two uses of compression,because there are various differences - some subtle, some less subtle.It's a bit difficult to discuss this without any column storebackground, but I'll try anyway.

The compression methods discussed in this thread, used to compress asingle varlena value, are "general-purpose" in the sense that theyoperate on opaque stream of bytes, without any additional context (e.g.about structure of the data being compressed). So essentially themethods have an API like this:


  int   compress(char *src, int srclen, char *dst, int dstlen);
  int decompress(char *src, int srclen, char *dst, int dstlen);

And possibly some auxiliary methods like "estimate compressed length"and such.

OTOH the compression methods we're messing with while working on thecolumn store are quite different - they operate on columns (i.e. "arraysof Datums"). Also, column stores prefer "light-weight" compressionmethods like RLE or DICT (dictionary compression) because those methodsallow execution on compressed data when done properly. Which for examplerequires additional info about the data type in the column, so that theRLE groups match the data type length.

So the API of those methods looks quite different, compared to thegeneral-purpose methods. Not only the compression/decompression methodswill have additional parameters with info about the data type, butthere'll be methods used for iterating over values in the compresseddata etc.

Of course, it'd be nice to have the ability to add/remove even thoselight-weight methods, but I'm not sure it makes sense to squash theminto the same catalog. I can imagine a catalog suitable for both APIs(essentially having two groups of columns, one for each type ofcompression algorithm), but I can't really imagine a compression methodproviding both interfaces at the same time.

In any case, I don't think this is the main challenge the patch needs tosolve at this point.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: custom compression methods

Reply via email to