Re: [HACKERS] Hashable custom types
Paul Ramsey pram...@cleverelephant.ca writes: On Fri, Apr 25, 2014 at 4:54 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey pram...@cleverelephant.ca wrote: Is it possible to make custom types hashable? There's no hook in the CREATE TYPE call for a hash function, but can one be hooked up somewhere else? In an operator? See 35.14.6., System Dependencies on Operator Classes Coming back to this, I created an appropriate opclass... CREATE OR REPLACE FUNCTION geometry_hash_eq(geom1 geometry, geom2 geometry) RETURNS boolean AS '$libdir/postgis-2.2', 'lwgeom_hash_eq' LANGUAGE 'c' IMMUTABLE STRICT; Why are you creating a separate equality operator, rather than just using the type's existing equality operator (I assume it's got one) in the hash opclass? It still says I lack the secret sauce... ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. UNION will preferentially glom onto the btree equality operator, if memory serves. If that isn't also the hash equality operator, things won't work pleasantly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hashable custom types
On July 8, 2015 at 1:36:49 PM, Tom Lane (t...@sss.pgh.pa.us) wrote: Paul Ramsey pram...@cleverelephant.ca writes: UNION will preferentially glom onto the btree equality operator, if memory serves. If that isn't also the hash equality operator, things won't work pleasantly. So… what does that mean for types that have both btree and hash equality operators? Don’t all the built-ins also have this problem? What I'm asking is why it would possibly be sensible to have different notions of equality for hash and btree. In every existing type that has both btree and hash opclasses, the equality members of those opclasses are *the same operator*. You don't really want UNION giving different answers depending on which implementation the planner happens to pick, do you? Well, that’s where things get pretty darn ugly. For reasons of practicality at the time, our equality btree operator ‘=‘ got tied to ‘bounding box equals’ way back in the mists of pre-history. (Basically the reasoning was “all our index work is about bounding boxes!”. I must admit, given my understanding of the zen of PostgreSQL now (and the features that PostgreSQL now has) that would not happen now.) So… yeah. Probably ‘=‘ should be something else, probably something quite expensive but logically defensible like a geometric equality test, but it’s not, and we have lots of third part stuff layered on top of us that expects existing semantics. If getting the objects to UNION means that the btree and hash ops have to be the same, then that just means I’m SOL and will revert to my hack of just casting the objects to bytea to force them to UNION in the recursive CTE. P.
Re: [HACKERS] Hashable custom types
Paul Ramsey pram...@cleverelephant.ca writes: UNION will preferentially glom onto the btree equality operator, if memory serves. If that isn't also the hash equality operator, things won't work pleasantly. So⦠what does that mean for types that have both btree and hash equality operators? Donât all the built-ins also have this problem? What I'm asking is why it would possibly be sensible to have different notions of equality for hash and btree. In every existing type that has both btree and hash opclasses, the equality members of those opclasses are *the same operator*. You don't really want UNION giving different answers depending on which implementation the planner happens to pick, do you? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hashable custom types
It still says I lack the secret sauce... ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. UNION will preferentially glom onto the btree equality operator, if memory serves. If that isn't also the hash equality operator, things won't work pleasantly. So… what does that mean for types that have both btree and hash equality operators? Don’t all the built-ins also have this problem?
Re: [HACKERS] Hashable custom types
On Fri, Apr 25, 2014 at 4:54 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey pram...@cleverelephant.ca wrote: Is it possible to make custom types hashable? There's no hook in the CREATE TYPE call for a hash function, but can one be hooked up somewhere else? In an operator? See 35.14.6., System Dependencies on Operator Classes Coming back to this, I created an appropriate opclass... CREATE OR REPLACE FUNCTION geometry_hash_eq(geom1 geometry, geom2 geometry) RETURNS boolean AS '$libdir/postgis-2.2', 'lwgeom_hash_eq' LANGUAGE 'c' IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION geometry_hash(geom1 geometry) RETURNS integer AS '$libdir/postgis-2.2', 'lwgeom_hash' LANGUAGE 'c' IMMUTABLE STRICT; -- Availability: 0.9.0 CREATE OPERATOR == ( LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE = geometry_hash_eq, COMMUTATOR = '==', RESTRICT = contsel, JOIN = contjoinsel ); CREATE OPERATOR CLASS hash_geometry_ops DEFAULT FOR TYPE geometry USING hash AS OPERATOR 1 == (geometry, geometry), FUNCTION 1 geometry_hash(geometry); I even tested that it as an index! create index hashidx on points using hash ( the_geom_webmercator); CREATE INDEX But when I run my recursive query... WITH RECURSIVE find_cluster(cartodb_id, cluster_id, geom) AS ( (SELECT points.cartodb_id, points.cartodb_id as cluster_id, points.the_geom_webmercator as geom FROM points WHERE points.cartodb_id in (select cartodb_id from points)) UNION (SELECT pts.cartodb_id, n.cluster_id, pts.the_geom_webmercator as geom FROM points pts JOIN find_cluster n ON ST_DWithin(n.geom, pts.the_geom_webmercator, 2) WHERE n.cartodb_id pts.cartodb_id) ) SELECT * FROM find_cluster; It still says I lack the secret sauce... ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. What's the sauce? Thanks! P -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hashable custom types
On Fri, Apr 25, 2014 at 04:47:49PM -0700, Paul Ramsey wrote: When trying to write a recursive CTE using the PostGIS geometry type, I was told this: ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. This leads to an interesting question, which is why does our implementation require this. I'm guessing it's a performance optimization. Quoth src/backend/executor/nodeRecursiveunion.c: /* * To implement UNION (without ALL), we need a hashtable that stores tuples * already seen. The hash key is computed from the grouping columns. */ As hashing can only approximately guarantee uniqueness (pigeonhole principle, blah, blah), is there some other similarly performant mechanism for tracking seen tuples that might work at least in cases where we don't have a hash function for the data type? Some kind of tree, perhaps, or does that require too many other things (total ordering, e.g.)? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hashable custom types
David Fetter da...@fetter.org writes: On Fri, Apr 25, 2014 at 04:47:49PM -0700, Paul Ramsey wrote: ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. This leads to an interesting question, which is why does our implementation require this. I'm guessing it's a performance optimization. Well, you clearly need to have a notion of equality for each column datatype, or else UNION doesn't mean anything. In general we consider that a datatype's notion of equality can be defined either by its default btree opclass (which supports sort-based query algorithms) or by its default hash opclass (which supports hash-based query algorithms). The plain UNION code supports either sorting or hashing, but we've not gotten around to supporting a sort-based approach to recursive UNION. I'm not convinced that it's worth doing ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hashable custom types
The plain UNION code supports either sorting or hashing, but we've not gotten around to supporting a sort-based approach to recursive UNION. I'm not convinced that it's worth doing ... regards, tom lane Without sorting, isnt the scope of a recursive UNION with custom datatypes pretty restrictive? As is, even the sorting shall be a bit restrictive due to the costs associated. I feel what David has suggested upthread should be good. Maybe an experimental patch with a workload that should give a load factor 1 for the hash table should prove some performance points. Even if thats not the case, we should really do something to improve the scope of usability of recursive UNION with custom types. Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Hashable custom types
On Sat, Apr 26, 2014 at 6:39 PM, Atri Sharma atri.j...@gmail.com wrote: Without sorting, isnt the scope of a recursive UNION with custom datatypes pretty restrictive? All the default data types are hashable. It's not hard to add a hash operator class. In a clean slate design it would probably have been simpler to just make it a requirement that any data type provide a default hash operator (and probably a default btree comparator). Postgres provides a lot of degrees of freedom but it should probably be considered best practice to just provide both even if you don't envision one or the other being used directly by users for indexes. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hashable custom types
Greg Stark st...@mit.edu writes: On Sat, Apr 26, 2014 at 6:39 PM, Atri Sharma atri.j...@gmail.com wrote: Without sorting, isnt the scope of a recursive UNION with custom datatypes pretty restrictive? All the default data types are hashable. It's not hard to add a hash operator class. In a clean slate design it would probably have been simpler to just make it a requirement that any data type provide a default hash operator (and probably a default btree comparator). Postgres provides a lot of degrees of freedom but it should probably be considered best practice to just provide both even if you don't envision one or the other being used directly by users for indexes. A btree opclass requires that you invent some one-dimensional sort order for the datatype, which might be a difficult thing; so I think it's fully reasonable not to require datatypes to have btree support. Hashing doesn't require any semantic assumptions beyond having an equality rule, which is clearly *necessary* if you want to do stuff like UNION or DISTINCT. So from that standpoint it's perfectly reasonable for recursive UNION to require a hashable equality operator, whereas the other case of requiring a sortable operator would be a lot harder to defend. Having said that, I can also believe that there might be datatypes for which implementing a hash function would be a lot harder than implementing sorting; this could be true if your equality rule allows for a lot of different physical representations of equal values. But I'm not so excited about such cases that I want to do the work of figuring out a way to implement recursive UNION by sorting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hashable custom types
When trying to write a recursive CTE using the PostGIS geometry type, I was told this: ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. Is it possible to make custom types hashable? There's no hook in the CREATE TYPE call for a hash function, but can one be hooked up somewhere else? In an operator? Thanks, P -- Paul Ramsey http://cleverelephant.ca http://postgis.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hashable custom types
On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey pram...@cleverelephant.ca wrote: Is it possible to make custom types hashable? There's no hook in the CREATE TYPE call for a hash function, but can one be hooked up somewhere else? In an operator? See 35.14.6., System Dependencies on Operator Classes -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers