Robert Haas <robertmh...@gmail.com> writes: > On Thu, Aug 30, 2012 at 4:15 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Yeah, the idea of replacing sum_grow with a boolean just occurred to me >> too. As is, I think the code is making some less-than-portable >> assumptions about what will happen if sum_grow overflows; which can >> definitely happen, seeing that gistpenalty and its callees intentionally >> return infinity in some cases. I'd rather it didn't attempt to add >> column penalties together, and I think there's a case to be made that >> not doing so is a back-patchable bug fix.
> Keep in mind that the worst case outcome is the index quality is worse > than it otherwise would have been, so it's not like > OMG-PostgreSQ-eats-your-data. Agreed, but we've seen plenty of complaining about bloated gist indexes, and this might be the cause. >> I'll take a shot at improving this some more. > Okey dokey. Attached is a revised version of gistchoose(); I've not yet transposed the changes into gistRelocateBuildBuffersOnSplit(). It looks a lot more readable to me now. Objections? regards, tom lane
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c index 7c6ce8495caac1e9d7c99afdb513689de93beea5..48d70d98e25363d8425020be7607f20348a0456a 100644 *** a/src/backend/access/gist/gistutil.c --- b/src/backend/access/gist/gistutil.c *************** gistgetadjusted(Relation r, IndexTuple o *** 363,475 **** } /* ! * Search a page for the entry with lowest penalty. * ! * The index may have multiple columns, and there's a penalty value for each column. ! * The penalty associated with a column which appears earlier in the index definition is ! * strictly more important than the penalty of column which appears later in the index ! * definition. */ OffsetNumber gistchoose(Relation r, Page p, IndexTuple it, /* it has compressed entry */ GISTSTATE *giststate) { OffsetNumber maxoff; OffsetNumber i; ! OffsetNumber which; ! float sum_grow, ! which_grow[INDEX_MAX_KEYS]; GISTENTRY entry, identry[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; ! maxoff = PageGetMaxOffsetNumber(p); ! *which_grow = -1.0; ! which = InvalidOffsetNumber; ! sum_grow = 1; gistDeCompressAtt(giststate, r, it, NULL, (OffsetNumber) 0, identry, isnull); ! Assert(maxoff >= FirstOffsetNumber); ! Assert(!GistPageIsLeaf(p)); /* ! * Loop over tuples on page. * ! * We'll exit early if we find an index key that can accommodate the new key with no ! * penalty on any column. sum_grow is used to track this condition. Normally, it is the ! * sum of the penalties we've seen for this column so far, which is not a very useful ! * quantity in general because the penalties for each column are only considered ! * independently, but all we really care about is whether or not it's greater than zero. ! * Since penalties can't be negative, the sum of the penalties will be greater than ! * zero if and only if at least one penalty was greater than zero. To make things just ! * a bit more complicated, we arbitrarily set sum_grow to 1.0 whenever we want to force ! * the at least one more iteration of this outer loop. Any non-zero value would serve ! * just as well. */ ! for (i = FirstOffsetNumber; i <= maxoff && sum_grow; i = OffsetNumberNext(i)) { - int j; IndexTuple itup = (IndexTuple) PageGetItem(p, PageGetItemId(p, i)); ! sum_grow = 0; ! /* Loop over indexed attribtues. */ for (j = 0; j < r->rd_att->natts; j++) { Datum datum; float usize; bool IsNull; datum = index_getattr(itup, j + 1, giststate->tupdesc, &IsNull); gistdentryinit(giststate, j, &entry, datum, r, p, i, FALSE, IsNull); usize = gistpenalty(giststate, j, &entry, IsNull, &identry[j], isnull[j]); ! if (which_grow[j] < 0 || usize < which_grow[j]) { /* ! * We get here in two cases. First, we may have just discovered that the ! * current tuple is the best one we've seen so far; that is, for the first ! * column for which the penalty is not equal to the best tuple seen so far, ! * this one has a lower penalty than the previously-seen one. But, when ! * a new best tuple is found, we must record the best penalty value for ! * all the remaining columns. We'll end up here for each remaining index ! * column in that case, too. */ ! which = i; ! which_grow[j] = usize; if (j < r->rd_att->natts - 1) ! which_grow[j + 1] = -1; ! sum_grow += which_grow[j]; } ! else if (which_grow[j] == usize) { /* ! * The current tuple is exactly as good for this column as the best tuple ! * seen so far. The next iteration of this loop will compare the next ! * column. */ - sum_grow += usize; } else { /* ! * The current tuple is worse for this column than the best tuple seen so ! * far. Skip the remaining columns and move on to the next tuple, if any. */ ! sum_grow = 1; break; } } - } ! if (which == InvalidOffsetNumber) ! which = FirstOffsetNumber; ! return which; } /* --- 363,482 ---- } /* ! * Search an upper index page for the entry with lowest penalty for insertion ! * of the new index key contained in "it". * ! * Returns the index of the page entry to insert into. */ OffsetNumber gistchoose(Relation r, Page p, IndexTuple it, /* it has compressed entry */ GISTSTATE *giststate) { + OffsetNumber result; OffsetNumber maxoff; OffsetNumber i; ! float best_penalty[INDEX_MAX_KEYS]; GISTENTRY entry, identry[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; ! Assert(!GistPageIsLeaf(p)); ! gistDeCompressAtt(giststate, r, it, NULL, (OffsetNumber) 0, identry, isnull); ! /* we'll return FirstOffsetNumber if page is empty (shouldn't happen) */ ! result = FirstOffsetNumber; /* ! * The index may have multiple columns, and there's a penalty value for ! * each column. The penalty associated with a column which appears ! * earlier in the index definition is strictly more important than the ! * penalty of a column which appears later in the index definition. * ! * best_penalty[j] is the best penalty we have seen so far for column j, ! * or -1 when we haven't yet examined column j. Array entries to the ! * right of the first -1 are undefined. */ ! best_penalty[0] = -1; ! ! /* ! * Loop over tuples on page. ! */ ! maxoff = PageGetMaxOffsetNumber(p); ! Assert(maxoff >= FirstOffsetNumber); ! ! for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i)) { IndexTuple itup = (IndexTuple) PageGetItem(p, PageGetItemId(p, i)); + bool zero_penalty; + int j; ! zero_penalty = true; ! /* Loop over indexed attributes within this tuple. */ for (j = 0; j < r->rd_att->natts; j++) { Datum datum; float usize; bool IsNull; + /* Compute penalty for this column. */ datum = index_getattr(itup, j + 1, giststate->tupdesc, &IsNull); gistdentryinit(giststate, j, &entry, datum, r, p, i, FALSE, IsNull); usize = gistpenalty(giststate, j, &entry, IsNull, &identry[j], isnull[j]); + if (usize > 0) + zero_penalty = false; ! if (best_penalty[j] < 0 || usize < best_penalty[j]) { /* ! * New best penalty for column. Tentatively select this tuple ! * as the target, and record the best penalty. Then reset the ! * next column's penalty to "unknown" (and indirectly, the ! * same for all the ones to its right). This will force us ! * to adopt this tuple's penalty values as the best for all ! * the remaining columns during subsequent loop iterations. */ ! result = i; ! best_penalty[j] = usize; ! if (j < r->rd_att->natts - 1) ! best_penalty[j + 1] = -1; } ! else if (best_penalty[j] == usize) { /* ! * The current tuple is exactly as good for this column as the ! * best tuple seen so far. The next iteration of this loop ! * will compare the next column. */ } else { /* ! * The current tuple is worse for this column than the best ! * tuple seen so far. Skip the remaining columns and move on ! * to the next tuple, if any. */ ! zero_penalty = false; /* be sure outer loop won't exit */ break; } } ! /* ! * If we find a tuple with zero penalty for all columns, there's no ! * need to examine remaining tuples; just break out of the loop and ! * return it. ! */ ! if (zero_penalty) ! break; ! } ! return result; } /*
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers