Re: [HACKERS] Hashable custom types

2015-07-08 Thread Tom Lane
Paul Ramsey pram...@cleverelephant.ca writes:
 On Fri, Apr 25, 2014 at 4:54 PM, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey pram...@cleverelephant.ca 
 wrote:
 Is it possible to make custom types hashable? There's no hook in the
 CREATE TYPE call for a hash function, but can one be hooked up
 somewhere else? In an operator?

 See 35.14.6., System Dependencies on Operator Classes

 Coming back to this, I created an appropriate opclass...

 CREATE OR REPLACE FUNCTION geometry_hash_eq(geom1 geometry, geom2 geometry)
 RETURNS boolean
 AS '$libdir/postgis-2.2', 'lwgeom_hash_eq'
 LANGUAGE 'c' IMMUTABLE STRICT;

Why are you creating a separate equality operator, rather than just using
the type's existing equality operator (I assume it's got one) in the
hash opclass?

 It still says I lack the secret sauce...

 ERROR:  could not implement recursive UNION
 DETAIL:  All column datatypes must be hashable.

UNION will preferentially glom onto the btree equality operator, if memory
serves.  If that isn't also the hash equality operator, things won't work
pleasantly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hashable custom types

2015-07-08 Thread Paul Ramsey
On July 8, 2015 at 1:36:49 PM, Tom Lane (t...@sss.pgh.pa.us) wrote:
Paul Ramsey pram...@cleverelephant.ca writes: 
 UNION will preferentially glom onto the btree equality operator, if memory  
 serves. If that isn't also the hash equality operator, things won't work  
 pleasantly.  

 So… what does that mean for types that have both btree and hash equality 
 operators? Don’t all the built-ins also have this problem?  

What I'm asking is why it would possibly be sensible to have different 
notions of equality for hash and btree. In every existing type that has 
both btree and hash opclasses, the equality members of those opclasses 
are *the same operator*. You don't really want UNION giving different 
answers depending on which implementation the planner happens to pick, 
do you? 
Well, that’s where things get pretty darn ugly. For reasons of practicality at 
the time, our equality btree operator ‘=‘ got tied to ‘bounding box equals’ way 
back in the mists of pre-history. (Basically the reasoning was “all our index 
work is about bounding boxes!”. I must admit, given my understanding of the zen 
of PostgreSQL now (and the features that PostgreSQL now has) that would not 
happen now.)

So… yeah. Probably ‘=‘ should be something else, probably something quite 
expensive but logically defensible  like a geometric equality test, but it’s 
not, and we have lots of third part stuff layered on top of us that expects 
existing semantics.

If getting the objects to UNION means that the btree and hash ops have to be 
the same, then that just means I’m SOL and will revert to my hack of just 
casting the objects to bytea to force them to UNION in the recursive CTE.

P.

Re: [HACKERS] Hashable custom types

2015-07-08 Thread Tom Lane
Paul Ramsey pram...@cleverelephant.ca writes:
 UNION will preferentially glom onto the btree equality operator, if memory  
 serves. If that isn't also the hash equality operator, things won't work  
 pleasantly.  

 So… what does that mean for types that have both btree and hash equality 
 operators? Don’t all the built-ins also have this problem? 

What I'm asking is why it would possibly be sensible to have different
notions of equality for hash and btree.  In every existing type that has
both btree and hash opclasses, the equality members of those opclasses
are *the same operator*.  You don't really want UNION giving different
answers depending on which implementation the planner happens to pick,
do you?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hashable custom types

2015-07-08 Thread Paul Ramsey

 It still says I lack the secret sauce...  

 ERROR: could not implement recursive UNION  
 DETAIL: All column datatypes must be hashable.  

UNION will preferentially glom onto the btree equality operator, if memory  
serves. If that isn't also the hash equality operator, things won't work  
pleasantly.  

So… what does that mean for types that have both btree and hash equality 
operators? Don’t all the built-ins also have this problem? 

Re: [HACKERS] Hashable custom types

2015-07-08 Thread Paul Ramsey
On Fri, Apr 25, 2014 at 4:54 PM, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey pram...@cleverelephant.ca 
 wrote:
 Is it possible to make custom types hashable? There's no hook in the
 CREATE TYPE call for a hash function, but can one be hooked up
 somewhere else? In an operator?

 See 35.14.6., System Dependencies on Operator Classes

Coming back to this, I created an appropriate opclass...

CREATE OR REPLACE FUNCTION geometry_hash_eq(geom1 geometry, geom2 geometry)
RETURNS boolean
AS '$libdir/postgis-2.2', 'lwgeom_hash_eq'
LANGUAGE 'c' IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION geometry_hash(geom1 geometry)
RETURNS integer
AS '$libdir/postgis-2.2', 'lwgeom_hash'
LANGUAGE 'c' IMMUTABLE STRICT;

-- Availability: 0.9.0
CREATE OPERATOR == (
LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE = geometry_hash_eq,
COMMUTATOR = '==',
RESTRICT = contsel, JOIN = contjoinsel
);

CREATE OPERATOR CLASS hash_geometry_ops
  DEFAULT FOR TYPE geometry USING hash AS
  OPERATOR 1 == (geometry, geometry),
  FUNCTION 1 geometry_hash(geometry);

I even tested that it as an index!

   create index hashidx on points using hash ( the_geom_webmercator);
  CREATE INDEX

But when I run my recursive query...

WITH RECURSIVE find_cluster(cartodb_id, cluster_id, geom) AS (

(SELECT
  points.cartodb_id, points.cartodb_id as cluster_id,
points.the_geom_webmercator as geom
FROM points
WHERE points.cartodb_id in (select cartodb_id from points))
UNION
(SELECT
  pts.cartodb_id, n.cluster_id, pts.the_geom_webmercator as geom
FROM points pts
JOIN find_cluster n
ON ST_DWithin(n.geom, pts.the_geom_webmercator, 2)
WHERE n.cartodb_id  pts.cartodb_id)
)
SELECT * FROM find_cluster;

It still says I lack the secret sauce...

ERROR:  could not implement recursive UNION
DETAIL:  All column datatypes must be hashable.

What's the sauce?

Thanks!

P






 --
 Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hashable custom types

2014-04-26 Thread David Fetter
On Fri, Apr 25, 2014 at 04:47:49PM -0700, Paul Ramsey wrote:
 When trying to write a recursive CTE using the PostGIS geometry type,
 I was told this:
 
 ERROR:  could not implement recursive UNION
 DETAIL:  All column datatypes must be hashable.

This leads to an interesting question, which is why does our
implementation require this.  I'm guessing it's a performance
optimization.

Quoth src/backend/executor/nodeRecursiveunion.c:

/*
 * To implement UNION (without ALL), we need a hashtable that stores tuples
 * already seen.  The hash key is computed from the grouping columns.
 */

As hashing can only approximately guarantee uniqueness (pigeonhole
principle, blah, blah), is there some other similarly performant
mechanism for tracking seen tuples that might work at least in cases
where we don't have a hash function for the data type?  Some kind of
tree, perhaps, or does that require too many other things (total
ordering, e.g.)?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hashable custom types

2014-04-26 Thread Tom Lane
David Fetter da...@fetter.org writes:
 On Fri, Apr 25, 2014 at 04:47:49PM -0700, Paul Ramsey wrote:
 ERROR:  could not implement recursive UNION
 DETAIL:  All column datatypes must be hashable.

 This leads to an interesting question, which is why does our
 implementation require this.  I'm guessing it's a performance
 optimization.

Well, you clearly need to have a notion of equality for each column
datatype, or else UNION doesn't mean anything.

In general we consider that a datatype's notion of equality can
be defined either by its default btree opclass (which supports
sort-based query algorithms) or by its default hash opclass
(which supports hash-based query algorithms).

The plain UNION code supports either sorting or hashing, but
we've not gotten around to supporting a sort-based approach
to recursive UNION.  I'm not convinced that it's worth doing ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hashable custom types

2014-04-26 Thread Atri Sharma
The plain UNION code supports either sorting or hashing, but
 we've not gotten around to supporting a sort-based approach
 to recursive UNION.  I'm not convinced that it's worth doing ...

 regards, tom lane



Without sorting, isnt the scope of a recursive UNION with custom datatypes
pretty restrictive?

As is, even the sorting shall be a bit restrictive due to the costs
associated. I feel what David has suggested upthread should be good. Maybe
an experimental patch with a workload that should give a load factor 1 for
the hash table should prove some performance points.

Even if thats not the case, we should really do something to improve the
scope of usability of recursive UNION with custom types.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Hashable custom types

2014-04-26 Thread Greg Stark
On Sat, Apr 26, 2014 at 6:39 PM, Atri Sharma atri.j...@gmail.com wrote:
 Without sorting, isnt the scope of a recursive UNION with custom datatypes
 pretty restrictive?

All the default data types are hashable. It's not hard to add a hash
operator class. In a clean slate design it would probably have been
simpler to just make it a requirement that any data type provide a
default hash operator (and probably a default btree comparator).
Postgres provides a lot of degrees of freedom but it should probably
be considered best practice to just provide both even if you don't
envision one or the other being used directly by users for indexes.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hashable custom types

2014-04-26 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Sat, Apr 26, 2014 at 6:39 PM, Atri Sharma atri.j...@gmail.com wrote:
 Without sorting, isnt the scope of a recursive UNION with custom datatypes
 pretty restrictive?

 All the default data types are hashable. It's not hard to add a hash
 operator class. In a clean slate design it would probably have been
 simpler to just make it a requirement that any data type provide a
 default hash operator (and probably a default btree comparator).
 Postgres provides a lot of degrees of freedom but it should probably
 be considered best practice to just provide both even if you don't
 envision one or the other being used directly by users for indexes.

A btree opclass requires that you invent some one-dimensional sort order
for the datatype, which might be a difficult thing; so I think it's fully
reasonable not to require datatypes to have btree support.  Hashing
doesn't require any semantic assumptions beyond having an equality rule,
which is clearly *necessary* if you want to do stuff like UNION or
DISTINCT.  So from that standpoint it's perfectly reasonable for recursive
UNION to require a hashable equality operator, whereas the other case of
requiring a sortable operator would be a lot harder to defend.

Having said that, I can also believe that there might be datatypes for
which implementing a hash function would be a lot harder than implementing
sorting; this could be true if your equality rule allows for a lot of
different physical representations of equal values.  But I'm not so
excited about such cases that I want to do the work of figuring out a
way to implement recursive UNION by sorting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hashable custom types

2014-04-25 Thread Paul Ramsey
When trying to write a recursive CTE using the PostGIS geometry type,
I was told this:

ERROR:  could not implement recursive UNION
DETAIL:  All column datatypes must be hashable.

Is it possible to make custom types hashable? There's no hook in the
CREATE TYPE call for a hash function, but can one be hooked up
somewhere else? In an operator?

Thanks,

P

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hashable custom types

2014-04-25 Thread Peter Geoghegan
On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey pram...@cleverelephant.ca wrote:
 Is it possible to make custom types hashable? There's no hook in the
 CREATE TYPE call for a hash function, but can one be hooked up
 somewhere else? In an operator?

See 35.14.6., System Dependencies on Operator Classes


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers