Re: Hash support for row types

2020-11-19 Thread Peter Eisentraut

On 2020-11-17 20:33, Tom Lane wrote:

Peter Eisentraut  writes:

I wrote a new patch to add a lot more tests around hash-based plans.
This is intended to apply separately from the other patch, and the other
patch would then "flip" some of the test cases.


OK, that addresses my concerns.


Thanks.  I have committed the tests and then subsequently the feature patch.

--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: Hash support for row types

2020-11-17 Thread Tom Lane
Peter Eisentraut  writes:
> I wrote a new patch to add a lot more tests around hash-based plans. 
> This is intended to apply separately from the other patch, and the other 
> patch would then "flip" some of the test cases.

OK, that addresses my concerns.

regards, tom lane




Re: Hash support for row types

2020-11-17 Thread Peter Eisentraut
I wrote a new patch to add a lot more tests around hash-based plans. 
This is intended to apply separately from the other patch, and the other 
patch would then "flip" some of the test cases.


On 2020-11-13 20:51, Tom Lane wrote:

* The new test in with.sql claims to be testing row hashing, but
it's not very apparent that any such thing actually happens.  Maybe
EXPLAIN the query, as well as execute it, to confirm that a
hash-based plan is used.


The recursive union requires hashing, but this is not visible in the 
plan.  You only get an error if there is no hashing support for a type. 
I have added a test for this.


For the non-recursive union, I have added more tests that show this in 
the plans.



* Is it worth devising a test case in which hashing is not possible
because one of the columns isn't hashable?  I have mixed feelings
about this because the set of suitable column types may decrease
to empty over time, making it hard to maintain the test case.


I used the money type for now.  If someone adds hash support for that, 
we'll change it.  I don't think this will change too rapidly, though.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/
From dface8b1a8a5714001b2f6cd7f949d2f78047306 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 17 Nov 2020 13:54:52 +0100
Subject: [PATCH] Add more tests for hashing and hash-based plans

- Test hashing of an array of a non-hashable element type.

- Test UNION [DISTINCT] with hash- and sort-based plans.  (Previously,
  only INTERSECT and EXCEPT where tested there.)

- Test UNION [DISTINCT] with a non-hashable column type.  This
  currently reverts to a sort-based plan even if enable_hashagg is on.

- Test UNION/INTERSECT/EXCEPT hash- and sort-based plans with arrays
  as column types.  Also test an array with a non-hashable element
  type.

- Test UNION/INTERSECT/EXCEPT similarly with row types as column
  types.  Currently, this uses only sort-based plans because there is
  no hashing support for row types.

- Add a test case that shows that recursive queries using UNION
  [DISTINCT] require hashable column types.

- Add a currently failing test that uses UNION DISTINCT in a
  cycle-detection use case using row types as column types.

Discussion: 
https://www.postgresql.org/message-id/flat/38eccd35-4e2d-6767-1b3c-dada1eac3124%402ndquadrant.com
---
 src/test/regress/expected/hash_func.out |   7 +
 src/test/regress/expected/union.out | 357 +++-
 src/test/regress/expected/with.out  |  20 ++
 src/test/regress/sql/hash_func.sql  |   6 +
 src/test/regress/sql/union.sql  |  92 +-
 src/test/regress/sql/with.sql   |  18 ++
 6 files changed, 498 insertions(+), 2 deletions(-)

diff --git a/src/test/regress/expected/hash_func.out 
b/src/test/regress/expected/hash_func.out
index e6e3410aaa..e7d615fde5 100644
--- a/src/test/regress/expected/hash_func.out
+++ b/src/test/regress/expected/hash_func.out
@@ -177,6 +177,13 @@ WHERE  hash_array(v)::bit(32) != hash_array_extended(v, 
0)::bit(32)
 ---+--+---+---
 (0 rows)
 
+-- array hashing with non-hashable element type
+SELECT v as value, hash_array(v)::bit(32) as standard
+FROM   (VALUES ('{0}'::money[])) x(v);
+ERROR:  could not identify a hash function for type money
+SELECT v as value, hash_array_extended(v, 0)::bit(32) as extended0
+FROM   (VALUES ('{0}'::money[])) x(v);
+ERROR:  could not identify an extended hash function for type money
 SELECT v as value, hashbpchar(v)::bit(32) as standard,
hashbpcharextended(v, 0)::bit(32) as extended0,
hashbpcharextended(v, 1)::bit(32) as extended1
diff --git a/src/test/regress/expected/union.out 
b/src/test/regress/expected/union.out
index 6e72e92d80..22e1ff5c42 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -345,8 +345,28 @@ ERROR:  FOR NO KEY UPDATE is not allowed with 
UNION/INTERSECT/EXCEPT
 1 |2 |3
 (1 row)
 
--- exercise both hashed and sorted implementations of INTERSECT/EXCEPT
+-- exercise both hashed and sorted implementations of UNION/INTERSECT/EXCEPT
 set enable_hashagg to on;
+explain (costs off)
+select count(*) from
+  ( select unique1 from tenk1 union select fivethous from tenk1 ) ss;
+   QUERY PLAN   
+
+ Aggregate
+   ->  HashAggregate
+ Group Key: tenk1.unique1
+ ->  Append
+   ->  Index Only Scan using tenk1_unique1 on tenk1
+   ->  Seq Scan on tenk1 tenk1_1
+(6 rows)
+
+select count(*) from
+  ( select unique1 from tenk1 union select fivethous from tenk1 ) ss;
+ count 
+---
+ 1
+(1 row)
+
 explain (costs off)
 select count(*) from
   ( select unique1 from tenk1 intersect select fivethous from tenk1 ) ss;
@@ -389,6 +409,27 @@ select unique1 from tenk1 except select unique2 from tenk1 
where unique2 != 10;
 

Re: Hash support for row types

2020-11-13 Thread Tom Lane
Peter Eisentraut  writes:
> Here is an updated patch with the type cache integration added.

> To your point, this now checks each fields hashability before 
> considering a row type as hashable.  It can still have run-time errors 
> for untyped record datums, but that's not something we can do anything 
> about.

This looks good code-wise.  A couple small niggles on the tests:

* The new test in with.sql claims to be testing row hashing, but
it's not very apparent that any such thing actually happens.  Maybe
EXPLAIN the query, as well as execute it, to confirm that a
hash-based plan is used.

* Is it worth devising a test case in which hashing is not possible
because one of the columns isn't hashable?  I have mixed feelings
about this because the set of suitable column types may decrease
to empty over time, making it hard to maintain the test case.

I marked it RFC.

regards, tom lane




Re: Hash support for row types

2020-10-23 Thread Peter Eisentraut

On 2020-10-20 17:10, Peter Eisentraut wrote:

On 2020-10-20 01:32, Andres Freund wrote:

How does this deal with row types with a field that doesn't have a hash
function? Erroring out at runtime could cause queries that used to
succeed, e.g. because all fields have btree ops, to fail, if we just have
a generic unconditionally present hash opclass?  Is that an OK
"regression"?


Good point.  There is actually code in the type cache that is supposed
to handle that, so I'll need to adjust that.


Here is an updated patch with the type cache integration added.

To your point, this now checks each fields hashability before 
considering a row type as hashable.  It can still have run-time errors 
for untyped record datums, but that's not something we can do anything 
about.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From eda6f7182cfe8ac7317d6874317ace24c7c7d5f6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 23 Oct 2020 09:33:41 +0200
Subject: [PATCH v2] Hash support for row types

Add hash functions for the record type as well as a hash operator
family and operator class for the record type.  This enables all the
hash functionality for the record type such as UNION DISTINCT, hash
joins, and hash partitioning.

Discussion: 
https://www.postgresql.org/message-id/flat/38eccd35-4e2d-6767-1b3c-dada1eac3124%402ndquadrant.com
---
 doc/src/sgml/queries.sgml   |   9 -
 src/backend/utils/adt/rowtypes.c| 249 
 src/backend/utils/cache/lsyscache.c |   7 +-
 src/backend/utils/cache/typcache.c  |  78 +---
 src/include/catalog/pg_amop.dat |   5 +
 src/include/catalog/pg_amproc.dat   |   4 +
 src/include/catalog/pg_opclass.dat  |   2 +
 src/include/catalog/pg_operator.dat |   2 +-
 src/include/catalog/pg_opfamily.dat |   2 +
 src/include/catalog/pg_proc.dat |   7 +
 src/test/regress/expected/hash_func.out |  12 ++
 src/test/regress/expected/join.out  |   1 +
 src/test/regress/expected/with.out  |  38 
 src/test/regress/sql/hash_func.sql  |   9 +
 src/test/regress/sql/join.sql   |   1 +
 src/test/regress/sql/with.sql   |  10 +
 16 files changed, 402 insertions(+), 34 deletions(-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index f06afe2c3f..8e70e57b72 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2182,15 +2182,6 @@ Search Order
 

 
-   
-
- The queries shown in this and the following section involving
- ROW constructors in the target list only support
- UNION ALL (not plain UNION) in the
- current implementation.
-
-   
-

 
  Omit the ROW() syntax in the common case where only one
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 674cf0a55d..28bbce128f 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -19,6 +19,7 @@
 #include "access/detoast.h"
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/hashfn.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
@@ -1766,3 +1767,251 @@ btrecordimagecmp(PG_FUNCTION_ARGS)
 {
PG_RETURN_INT32(record_image_cmp(fcinfo));
 }
+
+
+/*
+ * Row type hash functions
+ */
+
+Datum
+hash_record(PG_FUNCTION_ARGS)
+{
+   HeapTupleHeader record = PG_GETARG_HEAPTUPLEHEADER(0);
+   uint32  result = 0;
+   Oid tupType;
+   int32   tupTypmod;
+   TupleDesc   tupdesc;
+   HeapTupleData tuple;
+   int ncolumns;
+   RecordCompareData *my_extra;
+   Datum  *values;
+   bool   *nulls;
+
+   check_stack_depth();/* recurses for record-type columns */
+
+   /* Extract type info from tuple */
+   tupType = HeapTupleHeaderGetTypeId(record);
+   tupTypmod = HeapTupleHeaderGetTypMod(record);
+   tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
+   ncolumns = tupdesc->natts;
+
+   /* Build temporary HeapTuple control structure */
+   tuple.t_len = HeapTupleHeaderGetDatumLength(record);
+   ItemPointerSetInvalid(&(tuple.t_self));
+   tuple.t_tableOid = InvalidOid;
+   tuple.t_data = record;
+
+   /*
+* We arrange to look up the needed hashing info just once per series
+* of calls, assuming the record type doesn't change underneath us.
+*/
+   my_extra = (RecordCompareData *) fcinfo->flinfo->fn_extra;
+   if (my_extra == NULL ||
+   my_extra->ncolumns < ncolumns)
+   {
+   fcinfo->flinfo->fn_extra =
+   MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+  
o

Re: Hash support for row types

2020-10-20 Thread Tom Lane
Robert Haas  writes:
> Do we need to worry about what happens if somebody modifies the
> opclass/opfamily definitions?

There's a lot of places that you can break by doing that.  I'm not
too concerned about it.

regards, tom lane




Re: Hash support for row types

2020-10-20 Thread Robert Haas
On Tue, Oct 20, 2020 at 11:10 AM Peter Eisentraut
 wrote:
> On 2020-10-20 01:32, Andres Freund wrote:
> > How does this deal with row types with a field that doesn't have a hash
> > function? Erroring out at runtime could cause queries that used to
> > succeed, e.g. because all fields have btree ops, to fail, if we just have
> > a generic unconditionally present hash opclass?  Is that an OK
> > "regression"?
>
> Good point.  There is actually code in the type cache that is supposed
> to handle that, so I'll need to adjust that.

Do we need to worry about what happens if somebody modifies the
opclass/opfamily definitions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Hash support for row types

2020-10-20 Thread Peter Eisentraut

On 2020-10-20 01:32, Andres Freund wrote:

How does this deal with row types with a field that doesn't have a hash
function? Erroring out at runtime could cause queries that used to
succeed, e.g. because all fields have btree ops, to fail, if we just have
a generic unconditionally present hash opclass?  Is that an OK
"regression"?


Good point.  There is actually code in the type cache that is supposed 
to handle that, so I'll need to adjust that.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Hash support for row types

2020-10-19 Thread Andres Freund
Hi,

On 2020-10-19 10:01:14 +0200, Peter Eisentraut wrote:
> In [0] it was discussed that hash support for row types/record would be
> handy.  So I implemented that.

> The implementation hashes each field and combines the hash values.  Most of
> the code structure can be borrowed from the record comparison
> functions/btree support.  To combine the hash values, I adapted the code
> from the array hashing functions.  (The hash_combine()/hash_combine64()
> functions also looked sensible, but they don't appear to work in a way that
> satisfies the hash_func regression test.  Could be documented better.)
> 
> The main motivation is to support UNION [DISTINCT] as discussed in [0], but
> this also enables other hash-related functionality such as hash joins (as
> one regression test accidentally revealed) and hash partitioning.

How does this deal with row types with a field that doesn't have a hash
function? Erroring out at runtime could cause queries that used to
succeed, e.g. because all fields have btree ops, to fail, if we just have
a generic unconditionally present hash opclass?  Is that an OK
"regression"?

Greetings,

Andres Freund




Hash support for row types

2020-10-19 Thread Peter Eisentraut
In [0] it was discussed that hash support for row types/record would be 
handy.  So I implemented that.


The implementation hashes each field and combines the hash values.  Most 
of the code structure can be borrowed from the record comparison 
functions/btree support.  To combine the hash values, I adapted the code 
from the array hashing functions.  (The hash_combine()/hash_combine64() 
functions also looked sensible, but they don't appear to work in a way 
that satisfies the hash_func regression test.  Could be documented better.)


The main motivation is to support UNION [DISTINCT] as discussed in [0], 
but this also enables other hash-related functionality such as hash 
joins (as one regression test accidentally revealed) and hash partitioning.



[0]: 
https://www.postgresql.org/message-id/flat/52beaf44-ccc3-0ba1-45c7-74aa251cd6ab%402ndquadrant.com#9559845e0ee2129c483b745b9843c571


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 3c9269d60abc5033804802713d9a10f72e77f3e0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 19 Oct 2020 09:44:15 +0200
Subject: [PATCH v1] Hash support for row types

Add hash functions for the record type as well as a hash operator
family and operator class for the record type.  This enables all the
hash functionality for the record type such as UNION DISTINCT, hash
joins, and hash partitioning.
---
 doc/src/sgml/queries.sgml   |   9 -
 src/backend/utils/adt/rowtypes.c| 249 
 src/include/catalog/pg_amop.dat |   5 +
 src/include/catalog/pg_amproc.dat   |   4 +
 src/include/catalog/pg_opclass.dat  |   2 +
 src/include/catalog/pg_operator.dat |   2 +-
 src/include/catalog/pg_opfamily.dat |   2 +
 src/include/catalog/pg_proc.dat |   7 +
 src/test/regress/expected/hash_func.out |  12 ++
 src/test/regress/expected/join.out  |   1 +
 src/test/regress/expected/with.out  |  38 
 src/test/regress/sql/hash_func.sql  |   9 +
 src/test/regress/sql/join.sql   |   1 +
 src/test/regress/sql/with.sql   |  10 +
 14 files changed, 341 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index f06afe2c3f..8e70e57b72 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2182,15 +2182,6 @@ Search Order
 

 
-   
-
- The queries shown in this and the following section involving
- ROW constructors in the target list only support
- UNION ALL (not plain UNION) in the
- current implementation.
-
-   
-

 
  Omit the ROW() syntax in the common case where only one
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 674cf0a55d..5c86259929 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -19,6 +19,7 @@
 #include "access/detoast.h"
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/hashfn.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
@@ -1766,3 +1767,251 @@ btrecordimagecmp(PG_FUNCTION_ARGS)
 {
PG_RETURN_INT32(record_image_cmp(fcinfo));
 }
+
+
+/*
+ * Row type hash functions
+ */
+
+Datum
+record_hash(PG_FUNCTION_ARGS)
+{
+   HeapTupleHeader record = PG_GETARG_HEAPTUPLEHEADER(0);
+   uint32  result = 0;
+   Oid tupType;
+   int32   tupTypmod;
+   TupleDesc   tupdesc;
+   HeapTupleData tuple;
+   int ncolumns;
+   RecordCompareData *my_extra;
+   Datum  *values;
+   bool   *nulls;
+
+   check_stack_depth();/* recurses for record-type columns */
+
+   /* Extract type info from tuple */
+   tupType = HeapTupleHeaderGetTypeId(record);
+   tupTypmod = HeapTupleHeaderGetTypMod(record);
+   tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
+   ncolumns = tupdesc->natts;
+
+   /* Build temporary HeapTuple control structure */
+   tuple.t_len = HeapTupleHeaderGetDatumLength(record);
+   ItemPointerSetInvalid(&(tuple.t_self));
+   tuple.t_tableOid = InvalidOid;
+   tuple.t_data = record;
+
+   /*
+* We arrange to look up the needed hashing info just once per series
+* of calls, assuming the record type doesn't change underneath us.
+*/
+   my_extra = (RecordCompareData *) fcinfo->flinfo->fn_extra;
+   if (my_extra == NULL ||
+   my_extra->ncolumns < ncolumns)
+   {
+   fcinfo->flinfo->fn_extra =
+   MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+  
offsetof(RecordCompareData, columns) +
+  ncolumns