On 02/05/2014 12:42 PM, Alexander Korotkov wrote:
Attached patch is "light" version of fast scan. It does extra consistent
function calls only on startScanKey, no extra calls during scan of the
index.
It finds subset of rarest entries absence of which guarantee false
consistent function result.

Sounds good. Since the extra consistent calls are only performed at startup, it's unlikely to cause any noticeable performance regression, even when it's not helping.

I've run real-life tests based of postgresql.org logs (33318 queries). Here
is a table with summary time of running whole test case.

=# select method, sum(time) from test_result group by method order by
method;
        method        |       sum
---------------------+------------------
  fast-scan-11        | 126916.111999999
  fast-scan-light     |       137321.211
  heikki              |       466751.693
  heikki-true-ternary | 454113.623999997
  master              |       446976.288
(6 rows)

where 'heikki' is gin-ternary-logic binary-heap
preconsistent-only-on-new-page.patch and 'heikki-true-ternary' is version
with my catalog changes promoting ternary consistent function to opclass.

Looks good.

We can see fast-scan-light gives almost same performance benefit on "&"
queries. However, I test only CPU effect, not IO effect.
Any thoughts?

This test doesn't handle lossy-pages correctly:

                /*
                 * Check if all items marked as scanEntryUseForSkip is not 
present in
                 * minItem. If so, we can correct advancePast.
                 */
                if (ginCompareItemPointers(&minItem, &minItemSkip) < 0)
                {
                        advancePast = minItemSkip;
                        advancePast.ip_posid--;
                        continue;
                }


If minItemSkip is a lossy page, we skip all exact items on the same page. That's wrong, here's a test case to demonstrate:

CREATE TABLE foo (ts tsvector);
INSERT INTO foo SELECT to_tsvector('foo bar'||g) from generate_series(1, 1000000) g;
CREATE INDEX i_foo ON foo USING gin (ts);

set work_mem='64'; -- to force lossy pages
-- should return 111111
select count(*) from foo where ts @@ to_tsquery('foo & bar4:*');

After some fiddling (including fixing the above), I ended up with the attached version of your patch. I still left out the catalog changes, again not because I don't think we should have them, but to split this into smaller patches that can be reviewed separately. I extended the "both ways" shim function to work with more than one "maybe" input. Of course, that is O(n^2), where n is the number of "maybe" arguments, so that won't scale, but it's OK for testing purposes. And many if not most real world queries, too.

I had a very hard time understanding what it really means for an entry to be in the scanEntryUseForSkip array. What does it mean to "use" an entry for skipping?

So I refactored that logic, to hopefully make it more clear. In essence, the entries are divided into two groups, required and other items. If none of the entries in the required set are true, then we know that the overall qual cannot match. See the comments for a more detailed explanation. I'm not wedded to the term "required" here; one might think that *all* the entries in the required set must be TRUE for a match, while it's actually that at least one of them must be TRUE.

In keyGetItem, we fetch the minimum item among all the entries in the requiredEntries set. That's the next item we're going to return, regardless of any items in otherEntries set. But before calling the consistent function, we advance the other entries up to that point, so that we know whether to pass TRUE or FALSE to the consistent function. IOW, otherEntries only provide extra information for the consistent function to decide if we have a match or not - they don't affect which items we return to the caller.

I think this is pretty close to committable state now. I'm slightly disappointed that we can't do the skipping in more scenarios. For example, in "foo & bar", we can currently skip "bar" entry up to the next "foo", but we cannot skip "foo" up to the next "bar". But this is fairly simple, and since we sort the entries by estimated frequency, should already cover all the pathological "rare & frequent" type queries. So that's OK.

- Heikki
diff --git a/src/backend/access/gin/Makefile b/src/backend/access/gin/Makefile
index aabc62f..db4f496 100644
--- a/src/backend/access/gin/Makefile
+++ b/src/backend/access/gin/Makefile
@@ -14,6 +14,6 @@ include $(top_builddir)/src/Makefile.global
 
 OBJS = ginutil.o gininsert.o ginxlog.o ginentrypage.o gindatapage.o \
 	ginbtree.o ginscan.o ginget.o ginvacuum.o ginarrayproc.o \
-	ginbulk.o ginfast.o ginpostinglist.o
+	ginbulk.o ginfast.o ginpostinglist.o ginlogic.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index a45d722..b6f521c 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -32,41 +32,6 @@ typedef struct pendingPosition
 	bool	   *hasMatchKey;
 } pendingPosition;
 
-
-/*
- * Convenience function for invoking a key's consistentFn
- */
-static bool
-callConsistentFn(GinState *ginstate, GinScanKey key)
-{
-	/*
-	 * If we're dealing with a dummy EVERYTHING key, we don't want to call the
-	 * consistentFn; just claim it matches.
-	 */
-	if (key->searchMode == GIN_SEARCH_MODE_EVERYTHING)
-	{
-		key->recheckCurItem = false;
-		return true;
-	}
-
-	/*
-	 * Initialize recheckCurItem in case the consistentFn doesn't know it
-	 * should set it.  The safe assumption in that case is to force recheck.
-	 */
-	key->recheckCurItem = true;
-
-	return DatumGetBool(FunctionCall8Coll(&ginstate->consistentFn[key->attnum - 1],
-								 ginstate->supportCollation[key->attnum - 1],
-										  PointerGetDatum(key->entryRes),
-										  UInt16GetDatum(key->strategy),
-										  key->query,
-										  UInt32GetDatum(key->nuserentries),
-										  PointerGetDatum(key->extra_data),
-									   PointerGetDatum(&key->recheckCurItem),
-										  PointerGetDatum(key->queryValues),
-									 PointerGetDatum(key->queryCategories)));
-}
-
 /*
  * Goes to the next page if current offset is outside of bounds
  */
@@ -453,13 +418,95 @@ restartScanEntry:
 	freeGinBtreeStack(stackEntry);
 }
 
+/*
+ * Comparison function for scan entry indexes. Sorts by predictNumberResult,
+ * least frequent items first.
+ */
+static int
+entryIndexByFrequencyCmp(const void *a1, const void *a2, void *arg)
+{
+	const GinScanKey key = (const GinScanKey) arg;
+	int			i1 = *(const int *) a1;
+	int			i2 = *(const int *) a2;
+	uint32		n1 = key->scanEntry[i1]->predictNumberResult;
+	uint32		n2 = key->scanEntry[i2]->predictNumberResult;
+
+	if (n1 < n2)
+		return -1;
+	else if (n1 == n2)
+		return 0;
+	else
+		return 1;
+}
+
 static void
-startScanKey(GinState *ginstate, GinScanKey key)
+startScanKey(GinState *ginstate, GinScanOpaque so, GinScanKey key)
 {
+	MemoryContext oldCtx = CurrentMemoryContext;
+	int			i;
+	int		   *entryIndexes;
+
 	ItemPointerSetMin(&key->curItem);
 	key->curItemMatches = false;
 	key->recheckCurItem = false;
 	key->isFinished = false;
+
+	/*
+	 * Divide the entries into two groups, requiredEntries and otherEntries.
+	 * Entries in otherEntries alone are not enough for a match, without any
+	 * items from requiredEntries, but are needed by the consistent function
+	 * to decide if an item matches. When scanning, we can skip over items in
+	 * otherEntries that have no corresponding matches in requiredEntries.
+	 * That speeds up considerably queries like "frequent & rare", if the
+	 * frequent term can be put in otherEntries.
+	 *
+	 * There can be many legal ways to divide them entries into these two
+	 * groups. A conservative division is to just put everything in
+	 * requiredEntries group, but the more you can put in otherEntries, the
+	 * more you can skip during the scan. To maximize skipping, we try to put
+	 * as many frequent items as possible into otherEntries, and less frequent
+	 * ones into requiredEntries. To do that, sort the entries by frequency
+	 * (predictNumberResult), and put entries into requiredEntries in that
+	 * order, until the consistent function says that none of the remaining
+	 * entries can form a match, without any items from requiredEntries.
+	 */
+	key->requiredEntries = palloc(key->nentries * sizeof(GinScanEntry));
+	key->otherEntries = palloc(key->nentries * sizeof(GinScanEntry));
+	key->nrequired = key->nother = 0;
+
+	if (key->nentries > 1)
+	{
+		MemoryContextSwitchTo(so->tempCtx);
+
+		entryIndexes = (int *) palloc(sizeof(int) * key->nentries);
+		for (i = 0; i < key->nentries; i++)
+			entryIndexes[i] = i;
+
+		qsort_arg(entryIndexes, key->nentries, sizeof(int),
+				  entryIndexByFrequencyCmp, key);
+
+		for (i = 0; i < key->nentries; i++)
+			key->entryRes[i] = GIN_MAYBE;
+
+		for (i = 0; i < key->nentries; i++)
+		{
+			key->requiredEntries[key->nrequired++] = key->scanEntry[entryIndexes[i]];
+			key->entryRes[entryIndexes[i]] = GIN_FALSE;
+
+			if (key->triConsistentFn(key) == GIN_FALSE)
+				break;
+		}
+		for (i = i + 1; i < key->nentries; i++)
+			key->otherEntries[key->nother++] = key->scanEntry[entryIndexes[i]];
+
+		/* clean up after consistentFn calls (also frees entryIndexes) */
+		MemoryContextSwitchTo(oldCtx);
+		MemoryContextReset(so->tempCtx);
+	}
+	else
+	{
+		key->requiredEntries[key->nrequired++] = key->scanEntry[0];
+	}
 }
 
 static void
@@ -494,7 +541,7 @@ startScan(IndexScanDesc scan)
 	}
 
 	for (i = 0; i < so->nkeys; i++)
-		startScanKey(ginstate, so->keys + i);
+		startScanKey(ginstate, so, so->keys + i);
 }
 
 /*
@@ -812,10 +859,9 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 	ItemPointerData minItem;
 	ItemPointerData curPageLossy;
 	uint32		i;
-	uint32		lossyEntry;
 	bool		haveLossyEntry;
 	GinScanEntry entry;
-	bool		res;
+	GinLogicValue res;
 	MemoryContext oldCtx;
 	bool		allFinished;
 
@@ -823,7 +869,7 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 
 	/*
 	 * We might have already tested this item; if so, no need to repeat work.
-	 * (Note: the ">" case can happen, if minItem is exact but we previously
+	 * (Note: the ">" case can happen, if advancePast is exact but we previously
 	 * had to set curItem to a lossy-page pointer.)
 	 */
 	if (ginCompareItemPointers(&key->curItem, &advancePast) > 0)
@@ -839,9 +885,12 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 	 */
 	ItemPointerSetMax(&minItem);
 	allFinished = true;
-	for (i = 0; i < key->nentries; i++)
+	for (i = 0; i < key->nrequired; i++)
 	{
-		entry = key->scanEntry[i];
+		entry = key->requiredEntries[i];
+
+		if (entry->isFinished)
+			continue;
 
 		/*
 		 * Advance this stream if necessary.
@@ -850,18 +899,16 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 		 * ItemPointerSetMin, this ensures we fetch the first item for each
 		 * entry on the first call.
 		 */
-		while (entry->isFinished == FALSE &&
-			   ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+		if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
 		{
 			entryGetItem(ginstate, entry, advancePast);
+			if (entry->isFinished)
+				continue;
 		}
 
-		if (!entry->isFinished)
-		{
-			allFinished = FALSE;
-			if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
-				minItem = entry->curItem;
-		}
+		allFinished = false;
+		if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
+			minItem = entry->curItem;
 	}
 
 	if (allFinished)
@@ -872,38 +919,96 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 	}
 
 	/*
-	 * OK, set key->curItem and perform consistentFn test.
+	 * Ok, we now know that there are no matches < minItem.
+	 *
+	 * If minItem is lossy, it means that there there were no exact items on
+	 * the page among requiredEntries, because lossy pointers sort after exact
+	 * items. However, there might be exact items for the same page among
+	 * otherEntries, so we mustn't advance past them.
 	 */
-	key->curItem = minItem;
+	if (ItemPointerIsLossyPage(&minItem))
+	{
+		if (GinItemPointerGetBlockNumber(&advancePast) <
+			GinItemPointerGetBlockNumber(&minItem))
+		{
+			advancePast.ip_blkid = minItem.ip_blkid;
+			advancePast.ip_posid = 0;
+		}
+	}
+	else
+	{
+		Assert(minItem.ip_posid > 0);
+		advancePast = minItem;
+		advancePast.ip_posid--;
+	}
+
+	/*
+	 * We might not have loaded all the entry streams for this TID yet. We
+	 * could call the consistent function, passing MAYBE for those entries, to
+	 * see if it can decide if this TID matches based on the information we
+	 * have. But if the consistent-function is expensive, and cannot in fact
+	 * decide with partial information, that could be a big loss. So, load all
+	 * the additional entries, before calling the consistent function.
+	 */
+	for (i = 0; i < key->nother; i++)
+	{
+		entry = key->otherEntries[i];
+
+		if (entry->isFinished)
+			continue;
+
+		if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+		{
+			entryGetItem(ginstate, entry, advancePast);
+			if (entry->isFinished)
+				continue;
+		}
+
+		/*
+		 * Normally, none of the items in otherEntries can have a curItem
+		 * larger than minItem. But if minItem is a lossy page, then there
+		 * might be exact items on the same page among otherEntries.
+		 */
+		if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
+		{
+			Assert(ItemPointerIsLossyPage(&minItem));
+			minItem = entry->curItem;
+		}
+	}
 
 	/*
+	 * Ok, we've advanced all the entries up to minItem now. Set key->curItem,
+	 * and perform consistentFn test.
+	 *
 	 * Lossy-page entries pose a problem, since we don't know the correct
 	 * entryRes state to pass to the consistentFn, and we also don't know what
 	 * its combining logic will be (could be AND, OR, or even NOT). If the
 	 * logic is OR then the consistentFn might succeed for all items in the
 	 * lossy page even when none of the other entries match.
 	 *
-	 * If we have a single lossy-page entry then we check to see if the
-	 * consistentFn will succeed with only that entry TRUE.  If so, we return
-	 * a lossy-page pointer to indicate that the whole heap page must be
+	 * Our strategy is to call the tri-state consistent function, with the
+	 * lossy-page entries set to MAYBE, and all the other entries FALSE. If it
+	 * returns FALSE, none of the lossy items alone are enough for a match, so
+	 * we don't need to return a lossy-page pointer. Otherwise, return a
+	 * lossy-page pointer to indicate that the whole heap page must be
 	 * checked.  (On subsequent calls, we'll do nothing until minItem is past
 	 * the page altogether, thus ensuring that we never return both regular
 	 * and lossy pointers for the same page.)
 	 *
-	 * This idea could be generalized to more than one lossy-page entry, but
-	 * ideally lossy-page entries should be infrequent so it would seldom be
-	 * the case that we have more than one at once.  So it doesn't seem worth
-	 * the extra complexity to optimize that case. If we do find more than
-	 * one, we just punt and return a lossy-page pointer always.
+	 * An exception is that it doesn't matter what we pass for lossy pointers
+	 * in "hidden" entries, because the consistentFn's result can't depend on
+	 * them. We could pass them as MAYBE as well, but if we're using the
+	 * "shim" implementation of a tri-state consistent function (see
+	 * ginlogic.c), it's better to pass as few MAYBEs as possible. So pass
+	 * them as TRUE.
 	 *
 	 * Note that only lossy-page entries pointing to the current item's page
 	 * should trigger this processing; we might have future lossy pages in the
 	 * entry array, but they aren't relevant yet.
 	 */
+	key->curItem = minItem;
 	ItemPointerSetLossyPage(&curPageLossy,
 							GinItemPointerGetBlockNumber(&key->curItem));
-
-	lossyEntry = 0;
 	haveLossyEntry = false;
 	for (i = 0; i < key->nentries; i++)
 	{
@@ -911,17 +1016,14 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 		if (entry->isFinished == FALSE &&
 			ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
 		{
-			if (haveLossyEntry)
-			{
-				/* Multiple lossy entries, punt */
-				key->curItem = curPageLossy;
-				key->curItemMatches = true;
-				key->recheckCurItem = true;
-				return;
-			}
-			lossyEntry = i;
+			if (i < key->nuserentries)
+				key->entryRes[i] = GIN_MAYBE;
+			else
+				key->entryRes[i] = GIN_TRUE;
 			haveLossyEntry = true;
 		}
+		else
+			key->entryRes[i] = GIN_FALSE;
 	}
 
 	/* prepare for calling consistentFn in temp context */
@@ -929,11 +1031,10 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 
 	if (haveLossyEntry)
 	{
-		/* Single lossy-page entry, so see if whole page matches */
-		memset(key->entryRes, FALSE, key->nentries);
-		key->entryRes[lossyEntry] = TRUE;
+		/* Have lossy-page entries, so see if whole page matches */
+		res = key->triConsistentFn(key);
 
-		if (callConsistentFn(ginstate, key))
+		if (res == GIN_TRUE || res == GIN_MAYBE)
 		{
 			/* Yes, so clean up ... */
 			MemoryContextSwitchTo(oldCtx);
@@ -950,41 +1051,69 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 	/*
 	 * At this point we know that we don't need to return a lossy whole-page
 	 * pointer, but we might have matches for individual exact item pointers,
-	 * possibly in combination with a lossy pointer.  Our strategy if there's
-	 * a lossy pointer is to try the consistentFn both ways and return a hit
-	 * if it accepts either one (forcing the hit to be marked lossy so it will
-	 * be rechecked).  An exception is that we don't need to try it both ways
-	 * if the lossy pointer is in a "hidden" entry, because the consistentFn's
-	 * result can't depend on that.
+	 * possibly in combination with a lossy pointer. Pass lossy pointers as
+	 * MAYBE to the ternary consistent function, to let it decide if this
+	 * tuple satisfies the overall key, even though we don't know whether the
+	 * lossy entries match.
 	 *
 	 * Prepare entryRes array to be passed to consistentFn.
 	 */
 	for (i = 0; i < key->nentries; i++)
 	{
 		entry = key->scanEntry[i];
-		if (entry->isFinished == FALSE &&
-			ginCompareItemPointers(&entry->curItem, &key->curItem) == 0)
-			key->entryRes[i] = TRUE;
+		if (entry->isFinished)
+			key->entryRes[i] = GIN_FALSE;
+#if 0
+		/*
+		 * This case can't currently happen, because we't loaded all the
+		 * entries for this item earlier.
+		 */
+		else if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+			key->entryRes[i] = GIN_MAYBE;
+#endif
+		else if (ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
+			key->entryRes[i] = GIN_MAYBE;
+		else if (ginCompareItemPointers(&entry->curItem, &minItem) == 0)
+			key->entryRes[i] = GIN_TRUE;
 		else
-			key->entryRes[i] = FALSE;
+			key->entryRes[i] = GIN_FALSE;
 	}
-	if (haveLossyEntry)
-		key->entryRes[lossyEntry] = TRUE;
 
-	res = callConsistentFn(ginstate, key);
+	res = key->triConsistentFn(key);
 
-	if (!res && haveLossyEntry && lossyEntry < key->nuserentries)
+	switch (res)
 	{
-		/* try the other way for the lossy item */
-		key->entryRes[lossyEntry] = FALSE;
+		case GIN_TRUE:
+			key->curItemMatches = true;
+			/* triConsistentFn set recheckCurItem */
+			break;
+
+		case GIN_FALSE:
+			key->curItemMatches = false;
+			break;
 
-		res = callConsistentFn(ginstate, key);
+		case GIN_MAYBE:
+			key->curItemMatches = true;
+			key->recheckCurItem = true;
+			break;
+
+		default:
+			/*
+			 * the 'default' case shouldn't happen, but if the consistent
+			 * function returns something bogus, this is the safe result
+			 */
+			key->curItemMatches = true;
+			key->recheckCurItem = true;
+			break;
 	}
 
-	key->curItemMatches = res;
-	/* If we matched a lossy entry, force recheckCurItem = true */
-	if (haveLossyEntry)
-		key->recheckCurItem = true;
+	/*
+	 * We have a tuple, and we know if it matches or not. If it's a
+	 * non-match, we could continue to find the next matching tuple, but
+	 * let's break out and give scanGetItem a chance to advance the other
+	 * keys. They might be able to skip past to a much higher TID, allowing
+	 * us to save work.
+	 */
 
 	/* clean up after consistentFn calls */
 	MemoryContextSwitchTo(oldCtx);
@@ -1080,7 +1209,7 @@ scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
 
 			/*
 			 * If this is the first key, remember this location as a
-			 * potential match.
+			 * potential match, and proceed to check the rest of the keys.
 			 *
 			 * Otherwise, check if this is the same item that we checked the
 			 * previous keys for (or a lossy pointer for the same page). If
@@ -1322,7 +1451,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
 	{
 		GinScanKey	key = so->keys + i;
 
-		memset(key->entryRes, FALSE, key->nentries);
+		memset(key->entryRes, GIN_FALSE, key->nentries);
 	}
 	memset(pos->hasMatchKey, FALSE, so->nkeys);
 
@@ -1579,7 +1708,7 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
 		{
 			GinScanKey	key = so->keys + i;
 
-			if (!callConsistentFn(&so->ginstate, key))
+			if (!key->boolConsistentFn(key))
 			{
 				match = false;
 				break;
diff --git a/src/backend/access/gin/ginlogic.c b/src/backend/access/gin/ginlogic.c
new file mode 100644
index 0000000..c46a28c
--- /dev/null
+++ b/src/backend/access/gin/ginlogic.c
@@ -0,0 +1,149 @@
+/*-------------------------------------------------------------------------
+ *
+ * ginlogic.c
+ *	  routines for performing binary- and ternary-logic consistent checks.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *			src/backend/access/gin/ginlogic.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/gin_private.h"
+#include "access/reloptions.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "storage/indexfsm.h"
+#include "storage/lmgr.h"
+
+/*
+ * A dummy consistent function for an EVERYTHING key. Just claim it matches.
+ */
+static bool
+trueConsistentFn(GinScanKey key)
+{
+	key->recheckCurItem = false;
+	return true;
+}
+static GinLogicValue
+trueTriConsistentFn(GinScanKey key)
+{
+	return GIN_MAYBE;
+}
+
+/*
+ * A function for calling a regular, binary logic, consistent function.
+ */
+static bool
+normalBoolConsistentFn(GinScanKey key)
+{
+	/*
+	 * Initialize recheckCurItem in case the consistentFn doesn't know it
+	 * should set it.  The safe assumption in that case is to force recheck.
+	 */
+	key->recheckCurItem = true;
+
+	return DatumGetBool(FunctionCall8Coll(key->consistentFmgrInfo,
+										  key->collation,
+										  PointerGetDatum(key->entryRes),
+										  UInt16GetDatum(key->strategy),
+										  key->query,
+										  UInt32GetDatum(key->nuserentries),
+										  PointerGetDatum(key->extra_data),
+									   PointerGetDatum(&key->recheckCurItem),
+										  PointerGetDatum(key->queryValues),
+									 PointerGetDatum(key->queryCategories)));
+}
+
+/*
+ * This function implements a tri-state consistency check, using a boolean
+ * consistent function provided by the opclass.
+ *
+ * Our strategy is to call consistentFn with MAYBE inputs replaced with every
+ * combination of TRUE/FALSE. If consistentFn returns the same value for every
+ * combination, that's the overall result. Otherwise, return MAYBE. Testing
+ * every combination is O(n^2), so this is only feasible for a small number of
+ * MAYBE inputs.
+ */
+static GinLogicValue
+shimTriConsistentFn(GinScanKey key)
+{
+#define	MAX_MAYBE_ENTRIES	8
+	int			nmaybe;
+	int			maybeEntries[MAX_MAYBE_ENTRIES];
+	int			i;
+	bool		boolResult;
+	bool		recheck;
+	GinLogicValue curResult;
+
+	nmaybe = 0;
+	for (i = 0; i < key->nentries; i++)
+	{
+		if (key->entryRes[i] == GIN_MAYBE)
+		{
+			if (nmaybe >= MAX_MAYBE_ENTRIES)
+				return GIN_MAYBE;
+			maybeEntries[nmaybe++] = i;
+		}
+	}
+
+	/*
+	 * If none of the inputs were MAYBE, so we can just call consistent
+	 * function as is.
+	 */
+	if (nmaybe == 0)
+		return normalBoolConsistentFn(key);
+
+	/* Try the consistent function with the maybe-inputs set both ways */
+	for (i = 0; i < nmaybe; i++)
+		key->entryRes[maybeEntries[i]] = GIN_FALSE;
+	curResult = normalBoolConsistentFn(key);
+
+	for (;;)
+	{
+		/* Twiddle the entries for next combination. */
+		for (i = 0; i < nmaybe; i++)
+		{
+			if (key->entryRes[maybeEntries[i]] == GIN_FALSE)
+			{
+				key->entryRes[maybeEntries[i]] = GIN_TRUE;
+				break;
+			}
+			else
+				key->entryRes[maybeEntries[i]] = GIN_FALSE;
+		}
+		if (i == nmaybe)
+			break;
+
+		boolResult = normalBoolConsistentFn(key);
+		recheck |= key->recheckCurItem;
+
+		if (curResult != boolResult)
+			return GIN_MAYBE;
+	}
+
+	return curResult;
+}
+
+void
+GinInitConsistentMethod(GinState *ginstate, GinScanKey key)
+{
+	if (key->searchMode == GIN_SEARCH_MODE_EVERYTHING)
+	{
+		key->boolConsistentFn = trueConsistentFn;
+		key->triConsistentFn = trueTriConsistentFn;
+	}
+	else
+	{
+		key->consistentFmgrInfo = &ginstate->consistentFn[key->attnum - 1];
+		key->collation = ginstate->supportCollation[key->attnum - 1];
+		key->boolConsistentFn = normalBoolConsistentFn;
+		key->triConsistentFn = shimTriConsistentFn;
+	}
+}
diff --git a/src/backend/access/gin/ginscan.c b/src/backend/access/gin/ginscan.c
index b80b294..ed29d8d 100644
--- a/src/backend/access/gin/ginscan.c
+++ b/src/backend/access/gin/ginscan.c
@@ -164,6 +164,8 @@ ginFillScanKey(GinScanOpaque so, OffsetNumber attnum,
 	key->recheckCurItem = false;
 	key->isFinished = false;
 
+	GinInitConsistentMethod(ginstate, key);
+
 	for (i = 0; i < nQueryValues; i++)
 	{
 		Datum		queryKey;
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index bb0ab31..a8fe448 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -17,6 +17,8 @@
 #include "storage/bufmgr.h"
 #include "utils/rbtree.h"
 
+typedef struct GinScanKeyData *GinScanKey;
+typedef struct GinScanEntryData *GinScanEntry;
 
 /*
  * Page opaque data in an inverted index page.
@@ -588,6 +590,19 @@ extern OffsetNumber gintuple_get_attrnum(GinState *ginstate, IndexTuple tuple);
 extern Datum gintuple_get_key(GinState *ginstate, IndexTuple tuple,
 				 GinNullCategory *category);
 
+/* ginlogic.c */
+
+enum
+{
+	GIN_FALSE = 0,
+	GIN_TRUE = 1,
+	GIN_MAYBE = 2
+} GinLogicValueEnum;
+
+typedef char GinLogicValue;
+
+extern void GinInitConsistentMethod(GinState *ginstate, GinScanKey key);
+
 /* gininsert.c */
 extern Datum ginbuild(PG_FUNCTION_ARGS);
 extern Datum ginbuildempty(PG_FUNCTION_ARGS);
@@ -732,10 +747,6 @@ extern void ginVacuumPostingTreeLeaf(Relation rel, Buffer buf, GinVacuumState *g
  * nuserentries is the number that extractQueryFn returned (which is what
  * we report to consistentFn).	The "user" entries must come first.
  */
-typedef struct GinScanKeyData *GinScanKey;
-
-typedef struct GinScanEntryData *GinScanEntry;
-
 typedef struct GinScanKeyData
 {
 	/* Real number of entries in scanEntry[] (always > 0) */
@@ -746,8 +757,25 @@ typedef struct GinScanKeyData
 	/* array of GinScanEntry pointers, one per extracted search condition */
 	GinScanEntry *scanEntry;
 
+	/*
+	 * requiredEntries contains entries, such that at least one of the entries
+	 * in this set must match, for the overall qual to match.
+	 *
+	 * otherEntries contains additional entries that are needed by the
+	 * consistent function to decide if an item matches, but are not sufficient
+	 * to satisfy the qual without entries from requiredEntries.
+	 */
+	GinScanEntry *requiredEntries;
+	int			nrequired;
+	GinScanEntry *otherEntries;
+	int			nother;
+
 	/* array of check flags, reported to consistentFn */
 	bool	   *entryRes;
+	bool		(*boolConsistentFn) (GinScanKey key);
+	bool		(*triConsistentFn) (GinScanKey key);
+	FmgrInfo   *consistentFmgrInfo;
+	Oid			collation;
 
 	/* other data needed for calling consistentFn */
 	Datum		query;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to