Hi,

On 12/14/2015 12:51 PM, Simon Riggs wrote:
On 13 December 2015 at 17:28, Alexander Korotkov
<a.korot...@postgrespro.ru <mailto:a.korot...@postgrespro.ru>> wrote:

    it would be nice to make compression methods pluggable.


Agreed.

My thinking is that this should be combined with work to make use of
the compressed data, which is why Alvaro, Tomas, David have been
working on Col Store API for about 18 months and work on that
continues with more submissions for 9.6 due.

I'm not sure it makes sense to combine those two uses of compression, because there are various differences - some subtle, some less subtle. It's a bit difficult to discuss this without any column store background, but I'll try anyway.

The compression methods discussed in this thread, used to compress a single varlena value, are "general-purpose" in the sense that they operate on opaque stream of bytes, without any additional context (e.g. about structure of the data being compressed). So essentially the methods have an API like this:

  int   compress(char *src, int srclen, char *dst, int dstlen);
  int decompress(char *src, int srclen, char *dst, int dstlen);

And possibly some auxiliary methods like "estimate compressed length" and such.

OTOH the compression methods we're messing with while working on the column store are quite different - they operate on columns (i.e. "arrays of Datums"). Also, column stores prefer "light-weight" compression methods like RLE or DICT (dictionary compression) because those methods allow execution on compressed data when done properly. Which for example requires additional info about the data type in the column, so that the RLE groups match the data type length.

So the API of those methods looks quite different, compared to the general-purpose methods. Not only the compression/decompression methods will have additional parameters with info about the data type, but there'll be methods used for iterating over values in the compressed data etc.

Of course, it'd be nice to have the ability to add/remove even those light-weight methods, but I'm not sure it makes sense to squash them into the same catalog. I can imagine a catalog suitable for both APIs (essentially having two groups of columns, one for each type of compression algorithm), but I can't really imagine a compression method providing both interfaces at the same time.

In any case, I don't think this is the main challenge the patch needs to solve at this point.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to