Robert Haas <robertmh...@gmail.com> wrote: > On Mon, Jan 29, 2018 at 3:32 AM, Antonin Houska <a...@cybertec.at> wrote: > > I think of a variant of this: implement an universal function that tests the > > binary values for equality (besides the actual arguments, caller would have > > to > > pass info like typlen, typalign, typbyval for each argument, and cache these > > for repeated calls) and set pg_proc(oprcode) to 0 wherever this function is > > sufficient. Thus the problematic cases like numeric, citext, etc. would be > > detected by their non-zero oprcode. > > I don't think that's a good option, because we don't want int4eq to > have to go through a code path that has branches to support varlenas > and cstrings. Andres is busy trying to speed up expression evaluation > by removing unnecessary code branches; adding new ones would be > undoing that valuable work.
I spent some more time thinking about this. What about adding a new strategy number for hash index operator classes, e.g. HTBinaryEqualStrategyNumber? For most types both HTEqualStrategyNumber and HTBinaryEqualStrategyNumber strategy would point to the same operator, but types like numeric would naturally have them different. Thus the pushed-down partial aggregation can only use the HTBinaryEqualStrategyNumber's operator to compare grouping expressions. In the initial version (until we have useful statistics for the binary values) we can avoid the aggregation push-down if the grouping expression output type has the two strategies implemented using different functions because, as you noted upthread, grouping based on binary equality can result in excessive number of groups. One open question is whether the binary equality operator needs a separate operator class or not. If an opclass cares only about the binary equality, its hash function(s) can be a lot simpler. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com