Re: [HACKERS] lcr v5 - primary/candidate key in relcache

2013-09-12 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 Robert Haas wrote:
 Andres Freund and...@2ndquadrant.com wrote:
 0007 wal_decoding: Add information about a tables primary key to
  struct RelationData
 * Could be used in the matview refresh code

 I think you and Kevin should discuss whether this is actually the
 right way to do this.  ISTM that if logical replication and
 materialized views end up selecting different approaches to this
 problem, everybody loses.

 The patch we're discussion here adds a new struct RelationData field
 called 'rd_primary' (should possibly be renamed) which contains
 information about the best candidate key available for a table.

 From the header comments:
 /*
 * The 'best' primary or candidate key that has been found, only set
 * correctly if RelationGetIndexList has been called/rd_indexvalid  0.
 *
 * Indexes are chosen in the following order:
 * * Primary Key
 * * oid index
 * * the first (OID order) unique, immediate, non-partial and
 *  non-expression index over one or more NOT NULL'ed columns
 */
 Oid rd_primary;

 I thought we could use that in matview.c:refresh_by_match_merge() to
 select a more efficient diff if rd_primary has a valid index. In that
 case you only'd need to compare that index's fields which should result
 in an more efficient plan.

 Maybe it's also useful in other cases for you?

 If it's relevant at all, would you like to have a different priority
 list than the one above?

My first thought was that it was necessary to use all unique,
immediate, non-partial, non-expression indexes to avoid getting
errors on the UPDATE phase of the concurrent refresh for transient
duplicates; but then I remembered that I had to give up on that and
do it all with DELETE followed by INSERT, which eliminates that
risk.  As things now stand the *existence* of any unique,
non-partial, non-expression index (note that immediate is not
needed) is sufficient for correctness.  We could now even drop that,
I think, if we added a duplicate check at the end in the absence of
such an index.

The reason I left it comparing columns from *all* such indexes is
that it gives the optimizer the chance to pick the one that looks
fastest.  With the upcoming patch that can add some extra
equality comparisons in addition to the identical comparisons
the patch uses, so the mechanism you propose might be a worthwhile
optimization for some cases as long as it does a good job of
picking *the fastest* such index.  The above method of choosing an
index doesn't seem to necessarily ensure that.

Also, if you need to include the immediate test, it could not be
used for RMVC without fallback code if this mechanism didn't find
an appropriate index.  Of course, that would satisfy those who
would like to relax the requirement for a unique index on the MV to
be able to use RMVC.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lcr v5 - primary/candidate key in relcache

2013-09-05 Thread Andres Freund
Hi Kevin,

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:
 On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  0007 wal_decoding: Add information about a tables primary key to struct 
  RelationData
  * Could be used in the matview refresh code

 I think you and Kevin should discuss whether this is actually the
 right way to do this.  ISTM that if logical replication and
 materialized views end up selecting different approaches to this
 problem, everybody loses.

The patch we're discussion here adds a new struct RelationData field
called 'rd_primary' (should possibly be renamed) which contains
information about the best candidate key available for a table.

From the header comments:
/*
 * The 'best' primary or candidate key that has been found, only set
 * correctly if RelationGetIndexList has been called/rd_indexvalid  0.
 *
 * Indexes are chosen in the following order:
 * * Primary Key
 * * oid index
 * * the first (OID order) unique, immediate, non-partial and
 *   non-expression index over one or more NOT NULL'ed columns
 */
Oid rd_primary;

I thought we could use that in matview.c:refresh_by_match_merge() to
select a more efficient diff if rd_primary has a valid index. In that
case you only'd need to compare that index's fields which should result
in an more efficient plan.

Maybe it's also useful in other cases for you?

If it's relevant at all, would you like to have a different priority
list than the one above?

Regards,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From ee85b3bd8d8cc25fa547c004f6f6ea6bccef7c66 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 4/9] wal_decoding: Add information about a tables primary key
 to struct RelationData

'rd_primary' now contains the Oid of an index over uniquely identifying
columns. Several types of indexes are interesting and are collected in that
order:
* Primary Key
* oid index
* the first (OID order) unique, immediate, non-partial and
  non-expression index over one or more NOT NULL'ed columns

To gather rd_primary value RelationGetIndexList() needs to have been called.

This is helpful because for logical replication we frequently - on the sending
and receiving side - need to lookup that index and RelationGetIndexList already
gathers all the necessary information.

This could be used to replace tablecmd.c's transformFkeyGetPrimaryKey, but
would change the meaning of that, so it seems to require additional discussion.
---
 src/backend/utils/cache/relcache.c | 52 +++---
 src/include/utils/rel.h| 12 +
 2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 66fb63b..c588c29 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3465,7 +3465,9 @@ RelationGetIndexList(Relation relation)
 	ScanKeyData skey;
 	HeapTuple	htup;
 	List	   *result;
-	Oid			oidIndex;
+	Oid			oidIndex = InvalidOid;
+	Oid			pkeyIndex = InvalidOid;
+	Oid			candidateIndex = InvalidOid;
 	MemoryContext oldcxt;
 
 	/* Quick exit if we already computed the list. */
@@ -3522,17 +3524,61 @@ RelationGetIndexList(Relation relation)
 		Assert(!isnull);
 		indclass = (oidvector *) DatumGetPointer(indclassDatum);
 
+		if (!IndexIsValid(index))
+			continue;
+
 		/* Check to see if it is a unique, non-partial btree index on OID */
-		if (IndexIsValid(index) 
-			index-indnatts == 1 
+		if (index-indnatts == 1 
 			index-indisunique  index-indimmediate 
 			index-indkey.values[0] == ObjectIdAttributeNumber 
 			indclass-values[0] == OID_BTREE_OPS_OID 
 			heap_attisnull(htup, Anum_pg_index_indpred))
 			oidIndex = index-indexrelid;
+
+		if (index-indisunique 
+			index-indimmediate 
+			heap_attisnull(htup, Anum_pg_index_indpred))
+		{
+			/* always prefer primary keys */
+			if (index-indisprimary)
+pkeyIndex = index-indexrelid;
+			else if (!OidIsValid(pkeyIndex)
+	 !OidIsValid(oidIndex)
+	 !OidIsValid(candidateIndex))
+			{
+int key;
+bool found = true;
+for (key = 0; key  index-indnatts; key++)
+{
+	int16 attno = index-indkey.values[key];
+	Form_pg_attribute attr;
+	/* internal column, like oid */
+	if (attno = 0)
+		continue;
+
+	attr = relation-rd_att-attrs[attno - 1];
+	if (!attr-attnotnull)
+	{
+		found = false;
+		break;
+	}
+}
+if (found)
+	candidateIndex = index-indexrelid;
+			}
+		}
 	}
 
 	systable_endscan(indscan);
+
+	if (OidIsValid(pkeyIndex))
+		relation-rd_primary = pkeyIndex;
+	/* prefer oid indexes over normal candidate ones */
+	else if (OidIsValid(oidIndex))
+		relation-rd_primary = oidIndex;
+	else if (OidIsValid(candidateIndex))
+