On 12/14/2017 04:21 PM, Robert Haas wrote:
> On Wed, Dec 13, 2017 at 5:10 AM, Tomas Vondra
> <tomas.von...@2ndquadrant.com> wrote:
>>> 2. If several data types can benefit from a similar approach, it has
>>> to be separately implemented for each one.
>>
>> I don't think the current solution improves that, though. If you
>> want to exploit internal features of individual data types, it
>> pretty much requires code customized to every such data type.
>>
>> For example you can't take the tsvector compression and just slap
>> it on tsquery, because it relies on knowledge of internal tsvector
>> structure. So you need separate implementations anyway.
> 
> I don't think that's necessarily true. Certainly, it's true that
> *if* tsvector compression depends on knowledge of internal tsvector 
> structure, *then* that you can't use the implementation for anything 
> else (this, by the way, means that there needs to be some way for a 
> compression method to reject being applied to a column of a data
> type it doesn't like).

I believe such dependency (on implementation details) is pretty much the
main benefit of datatype-aware compression methods. If you don't rely on
such assumption, then I'd say it's a general-purpose compression method.

> However, it seems possible to imagine compression algorithms that can
> work for a variety of data types, too. There might be a compression
> algorithm that is theoretically a general-purpose algorithm but has
> features which are particularly well-suited to, say, JSON or XML
> data, because it looks for word boundaries to decide on what strings
> to insert into the compression dictionary.
> 

Can you give an example of such algorithm? Because I haven't seen such
example, and I find arguments based on hypothetical compression methods
somewhat suspicious.

FWIW I'm not against considering such compression methods, but OTOH it
may not be such a great primary use case to drive the overall design.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to