26.08.2019 14:15, Antonin Houska wrote:
Peter Geoghegan <p...@bowt.ie> wrote:

Consumers of this new infrastructure probably won't be limited to the
deduplication feature;
It'd also solve an open problem of the aggregate push-down patch [1], in
particular see the mention of pg_opclass in [2]: the partial aggregate
node below the final join must not put multiple opclass-equal values of
which are not byte-wise equal into the same group because some
information needed by WHERE or JOIN/ON condition may be lost this
way. The scale of the numeric type is the most obvious example.

I would like to:

* Get some buy-in on whether or not the precise distinctions I would
like to make are correct for deduplication in particular, and as
useful as possible for other cases that we may need to add later on.

* Figure out the exact interface through which opclass/opfamily
authors can represent that their notion of equality is compatible with
deduplication/compression.
It's not entirely clear to me whether opclass or opfamily should carry
this information. opclass probably makes more sense for index related
problems and the aggregate push-down patch can live with that. I don't
see particular reason to add any flag to opfamily. (Planner uses uses
both pg_opclass and pg_opfamily catalogs.)

I think the fact that the aggregate push-down would benefit from this
enhancement should affect choice of the new catalog attribute name,
i.e. it should be not mention words as concrete as "deduplication" or
"compression".


The patch implementing new opclass option is attached.

It adds new attribute pg_opclass.opcisbitwise, which is set to true if opclass equality is the same as binary equality.
By default it is true. It is set to false for numeric and float4, float8.

Does anyarray opclasses need special treatment?

New syntax for create opclass isĀ  "CREATE OPERATOR CLASS NOT BITWISE ..."

Any ideas on better names?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit 5916d188be1cfff798845720d6e955327aa8c693
Author: Anastasia <a.lubennik...@postgrespro.ru>
Date:   Mon Sep 30 19:31:40 2019 +0300

    Opclass bitwise equality check

diff --git a/doc/src/sgml/ref/create_opclass.sgml b/doc/src/sgml/ref/create_opclass.sgml
index dd5252f..eb2e086 100644
--- a/doc/src/sgml/ref/create_opclass.sgml
+++ b/doc/src/sgml/ref/create_opclass.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAULT ] FOR TYPE <replaceable class="parameter">data_type</replaceable>
+CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAULT | NOT BITWISE ] FOR TYPE <replaceable class="parameter">data_type</replaceable>
   USING <replaceable class="parameter">index_method</replaceable> [ FAMILY <replaceable class="parameter">family_name</replaceable> ] AS
   {  OPERATOR <replaceable class="parameter">strategy_number</replaceable> <replaceable class="parameter">operator_name</replaceable> [ ( <replaceable class="parameter">op_type</replaceable>, <replaceable class="parameter">op_type</replaceable> ) ] [ FOR SEARCH | FOR ORDER BY <replaceable class="parameter">sort_family_name</replaceable> ]
    | FUNCTION <replaceable class="parameter">support_number</replaceable> [ ( <replaceable class="parameter">op_type</replaceable> [ , <replaceable class="parameter">op_type</replaceable> ] ) ] <replaceable class="parameter">function_name</replaceable> ( <replaceable class="parameter">argument_type</replaceable> [, ...] )
@@ -106,6 +106,18 @@ CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAUL
     </listitem>
    </varlistentry>
 
+    <varlistentry>
+    <term><literal>NOT BITWISE</literal></term>
+    <listitem>
+     <para>
+      If present, the operator class equality is not the same as equivalence.
+      For example, two numerics can compare equal but have different scales.
+      Most opclasses implement bitwise equal comparison, alternative behaviour
+      must be set explicitly.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">data_type</replaceable></term>
     <listitem>
diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index 6a1ccde..bb6a0a7 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -654,6 +654,7 @@ DefineOpClass(CreateOpClassStmt *stmt)
 	values[Anum_pg_opclass_opcintype - 1] = ObjectIdGetDatum(typeoid);
 	values[Anum_pg_opclass_opcdefault - 1] = BoolGetDatum(stmt->isDefault);
 	values[Anum_pg_opclass_opckeytype - 1] = ObjectIdGetDatum(storageoid);
+	values[Anum_pg_opclass_opcisbitwise - 1] = BoolGetDatum(!stmt->isNotBitwise);
 
 	tup = heap_form_tuple(rel->rd_att, values, nulls);
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 3432bb9..c2cf06e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3785,6 +3785,7 @@ _copyCreateOpClassStmt(const CreateOpClassStmt *from)
 	COPY_NODE_FIELD(datatype);
 	COPY_NODE_FIELD(items);
 	COPY_SCALAR_FIELD(isDefault);
+	COPY_SCALAR_FIELD(isNotBitwise);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 18cb014..52e8f0b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1607,6 +1607,7 @@ _equalCreateOpClassStmt(const CreateOpClassStmt *a, const CreateOpClassStmt *b)
 	COMPARE_NODE_FIELD(datatype);
 	COMPARE_NODE_FIELD(items);
 	COMPARE_SCALAR_FIELD(isDefault);
+	COMPARE_SCALAR_FIELD(isNotBitwise);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3f67aaf..45a4f8a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -590,6 +590,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
 
+%type <boolean>		opt_not_bitwise
+
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -616,7 +618,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	AGGREGATE ALL ALSO ALTER ALWAYS ANALYSE ANALYZE AND ANY ARRAY AS ASC
 	ASSERTION ASSIGNMENT ASYMMETRIC AT ATTACH ATTRIBUTE AUTHORIZATION
 
-	BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
+	BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT BITWISE
 	BOOLEAN_P BOTH BY
 
 	CACHE CALL CALLED CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
@@ -5951,16 +5953,17 @@ opt_if_not_exists: IF_P NOT EXISTS              { $$ = true; }
  *****************************************************************************/
 
 CreateOpClassStmt:
-			CREATE OPERATOR CLASS any_name opt_default FOR TYPE_P Typename
+			CREATE OPERATOR CLASS any_name opt_default opt_not_bitwise FOR TYPE_P Typename
 			USING access_method opt_opfamily AS opclass_item_list
 				{
 					CreateOpClassStmt *n = makeNode(CreateOpClassStmt);
 					n->opclassname = $4;
 					n->isDefault = $5;
-					n->datatype = $8;
-					n->amname = $10;
-					n->opfamilyname = $11;
-					n->items = $13;
+					n->isNotBitwise = $6;
+					n->datatype = $9;
+					n->amname = $11;
+					n->opfamilyname = $12;
+					n->items = $14;
 					$$ = (Node *) n;
 				}
 		;
@@ -6023,6 +6026,10 @@ opt_default:	DEFAULT						{ $$ = true; }
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_not_bitwise: NOT BITWISE				{ $$ = true; }
+			| /*EMPTY*/						{ $$ = false; }
+		;
+
 opt_opfamily:	FAMILY any_name				{ $$ = $2; }
 			| /*EMPTY*/						{ $$ = NIL; }
 		;
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index c689b8f..84f867a 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	201909251
+#define CATALOG_VERSION_NO	201909301
 
 #endif
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index 2d57510..51fccae 100644
--- a/src/include/catalog/pg_opclass.dat
+++ b/src/include/catalog/pg_opclass.dat
@@ -44,14 +44,14 @@
 { opcmethod => 'hash', opcname => 'date_ops', opcfamily => 'hash/date_ops',
   opcintype => 'date' },
 { opcmethod => 'btree', opcname => 'float4_ops', opcfamily => 'btree/float_ops',
-  opcintype => 'float4' },
+  opcintype => 'float4', opcisbitwise => 'f' },
 { opcmethod => 'hash', opcname => 'float4_ops', opcfamily => 'hash/float_ops',
-  opcintype => 'float4' },
+  opcintype => 'float4', opcisbitwise => 'f' },
 { oid => '3123', oid_symbol => 'FLOAT8_BTREE_OPS_OID',
   opcmethod => 'btree', opcname => 'float8_ops', opcfamily => 'btree/float_ops',
-  opcintype => 'float8' },
+  opcintype => 'float8', opcisbitwise => 'f' },
 { opcmethod => 'hash', opcname => 'float8_ops', opcfamily => 'hash/float_ops',
-  opcintype => 'float8' },
+  opcintype => 'float8', opcisbitwise => 'f' },
 { opcmethod => 'btree', opcname => 'inet_ops', opcfamily => 'btree/network_ops',
   opcintype => 'inet' },
 { opcmethod => 'hash', opcname => 'inet_ops', opcfamily => 'hash/network_ops',
@@ -100,9 +100,11 @@
   opcintype => 'name' },
 { oid => '3125', oid_symbol => 'NUMERIC_BTREE_OPS_OID',
   opcmethod => 'btree', opcname => 'numeric_ops',
-  opcfamily => 'btree/numeric_ops', opcintype => 'numeric' },
+  opcfamily => 'btree/numeric_ops', opcintype => 'numeric',
+  opcisbitwise => 'f'},
 { opcmethod => 'hash', opcname => 'numeric_ops',
-  opcfamily => 'hash/numeric_ops', opcintype => 'numeric' },
+  opcfamily => 'hash/numeric_ops', opcintype => 'numeric',
+  opcisbitwise => 'f'},
 { oid => '1981', oid_symbol => 'OID_BTREE_OPS_OID',
   opcmethod => 'btree', opcname => 'oid_ops', opcfamily => 'btree/oid_ops',
   opcintype => 'oid' },
diff --git a/src/include/catalog/pg_opclass.h b/src/include/catalog/pg_opclass.h
index 84853c1..374ac4f 100644
--- a/src/include/catalog/pg_opclass.h
+++ b/src/include/catalog/pg_opclass.h
@@ -73,6 +73,9 @@ CATALOG(pg_opclass,2616,OperatorClassRelationId)
 
 	/* type of data in index, or InvalidOid */
 	Oid			opckeytype BKI_DEFAULT(0) BKI_LOOKUP(pg_type);
+
+	/* T if opclass equality also means "bitwise equality" */
+	bool			opcisbitwise BKI_DEFAULT(t);
 } FormData_pg_opclass;
 
 /* ----------------
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d93a79a..cd390a2 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2577,6 +2577,7 @@ typedef struct CreateOpClassStmt
 	TypeName   *datatype;		/* datatype of indexed column */
 	List	   *items;			/* List of CreateOpClassItem nodes */
 	bool		isDefault;		/* Should be marked as default for type? */
+	bool		isNotBitwise;		/* Is opclass equality bitwise? */
 } CreateOpClassStmt;
 
 #define OPCLASS_ITEM_OPERATOR		1
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 00ace84..d6a8e8f 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -58,6 +58,7 @@ PG_KEYWORD("between", BETWEEN, COL_NAME_KEYWORD)
 PG_KEYWORD("bigint", BIGINT, COL_NAME_KEYWORD)
 PG_KEYWORD("binary", BINARY, TYPE_FUNC_NAME_KEYWORD)
 PG_KEYWORD("bit", BIT, COL_NAME_KEYWORD)
+PG_KEYWORD("bitwise", BITWISE, UNRESERVED_KEYWORD)
 PG_KEYWORD("boolean", BOOLEAN_P, COL_NAME_KEYWORD)
 PG_KEYWORD("both", BOTH, RESERVED_KEYWORD)
 PG_KEYWORD("by", BY, UNRESERVED_KEYWORD)

Reply via email to