Peter Geoghegan <p...@bowt.ie> wrote: > Consumers of this new infrastructure probably won't be limited to the > deduplication feature;
It'd also solve an open problem of the aggregate push-down patch [1], in particular see the mention of pg_opclass in [2]: the partial aggregate node below the final join must not put multiple opclass-equal values of which are not byte-wise equal into the same group because some information needed by WHERE or JOIN/ON condition may be lost this way. The scale of the numeric type is the most obvious example. > I would like to: > > * Get some buy-in on whether or not the precise distinctions I would > like to make are correct for deduplication in particular, and as > useful as possible for other cases that we may need to add later on. > > * Figure out the exact interface through which opclass/opfamily > authors can represent that their notion of equality is compatible with > deduplication/compression. It's not entirely clear to me whether opclass or opfamily should carry this information. opclass probably makes more sense for index related problems and the aggregate push-down patch can live with that. I don't see particular reason to add any flag to opfamily. (Planner uses uses both pg_opclass and pg_opfamily catalogs.) I think the fact that the aggregate push-down would benefit from this enhancement should affect choice of the new catalog attribute name, i.e. it should be not mention words as concrete as "deduplication" or "compression". > (I think that the use of nondeterministic collations should disable > deduplication without explicit action from the operator class -- that > should just be baked in.) (I think the aggregate push-down needs to consider the nondeterministic collations too, I missed that so far.) [1] https://commitfest.postgresql.org/24/1247/ [2] https://www.postgresql.org/message-id/10529.1547561178%40localhost -- Antonin Houska Web: https://www.cybertec-postgresql.com