On Tue, Apr 5, 2016 at 1:31 PM, Peter Geoghegan <p...@heroku.com> wrote:
> My new understanding: The extra "included" columns are stored in the
> index, but do not affect its sort order at all. They are no more part
> of the key than, say, the heap TID that the key points to. They are
> just "payload".
Noticed a few issues following another pass:
* tuplesort.c should handle the CLUSTER case in the same way as the
btree case. No?
* Why have a RelationGetNumberOfAttributes(indexRel) call in
tuplesort_begin_index_btree() at all now?
* This critical section is unnecessary, because this happens during
+ if (indnkeyatts != indnatts && P_ISLEAF(opageop))
+ * It's essential to truncate High key here.
+ * The purpose is not just to save more space
on this particular page,
+ * but to keep whole b-tree structure
consistent. Subsequent insertions
+ * assume that hikey is already truncated, and
so they should not
+ * worry about it, when copying the high key
into the parent page
+ * as a downlink.
+ * NOTE It is not crutial for reliability in present,
+ * but maybe it will be that in the future.
+ * NOTE this code will be changed by the
"btree compression" patch,
+ * which is in progress now.
+ keytup = index_reform_tuple(wstate->index, oitup,
+ /* delete "wrong" high key, insert keytup as
+ PageIndexTupleDelete(opage, P_HIKEY);
+ if (!_bt_pgaddtup(opage,
IndexTupleSize(keytup), keytup, P_HIKEY))
+ elog(ERROR, "failed to rewrite
compressed item in index \"%s\"",
Note that START_CRIT_SECTION() promotes any ERROR to PANIC, which
isn't useful here, because we have no buffer lock held, and nothing
must be WAL-logged.
* Think you forgot to update spghandler(). (You did not add a test for
just that one AM, either)
* I wonder why this restriction needs to exist:
+ elog(ERROR, "Expressions are not supported in
What does not supporting it buy us? Was it just that the pg_index
representation is more complicated, and you wanted to put it off?
An error like this should use ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED ..., btw.
* I would like to see index_reform_tuple() assert that the new,
truncated index tuple is definitely <= the original (I worry about the
1/3 page restriction issue). Maybe you should also change the name of
index_reform_tuple(), per David.
* There is some stray whitespace within RelationGetIndexAttrBitmap().
I think you should have updated it with code, though. I don't think
it's necessary for HOT updates to work, but I think it could be
necessary so that we don't need to get a row lock that blocks
non-conflict foreign key locking (see heap_update() callers). I think
you need to be careful for non-key columns within the loop in
RelationGetIndexAttrBitmap(), basically, because it seems to still go
through all columns. UPSERT also must call this code, FWIW.
* I think that a similar omission is also made for the replica
identity stuff in RelationGetIndexAttrBitmap(). Some thought is needed
on how this patch interacts with logical decoding, I guess.
* Valgrind shows an error with an aggregate statement I tried:
2016-04-05 17:01:31.129 PDT 12310 LOG: statement: explain analyze
select count(*) from ab where b > 5 group by a, b;
==12310== Invalid read of size 4
==12310== at 0x656615: match_clause_to_indexcol (indxpath.c:2226)
==12310== by 0x656615: match_clause_to_index (indxpath.c:2144)
==12310== by 0x656DBC: match_clauses_to_index (indxpath.c:2115)
==12310== by 0x658054: match_restriction_clauses_to_index (indxpath.c:2026)
==12310== by 0x658054: create_index_paths (indxpath.c:269)
==12310== by 0x64D1DB: set_plain_rel_pathlist (allpaths.c:649)
==12310== by 0x64D1DB: set_rel_pathlist (allpaths.c:427)
==12310== by 0x64D93B: set_base_rel_pathlists (allpaths.c:299)
==12310== by 0x64D93B: make_one_rel (allpaths.c:170)
==12310== by 0x66876C: query_planner (planmain.c:246)
==12310== by 0x669FBA: grouping_planner (planner.c:1666)
==12310== by 0x66D0C9: subquery_planner (planner.c:751)
==12310== by 0x66D3DA: standard_planner (planner.c:300)
==12310== by 0x66D714: planner (planner.c:170)
==12310== by 0x6FD692: pg_plan_query (postgres.c:798)
==12310== by 0x59082D: ExplainOneQuery (explain.c:350)
==12310== Address 0xbff290c is 2,508 bytes inside a block of size 8,192 alloc'd
==12310== at 0x4C2AB80: malloc (in
==12310== by 0x81B7FA: AllocSetAlloc (aset.c:853)
==12310== by 0x81D257: palloc (mcxt.c:907)
==12310== by 0x4B6F65: RelationGetIndexScan (genam.c:94)
==12310== by 0x4C135D: btbeginscan (nbtree.c:431)
==12310== by 0x4B7A5C: index_beginscan_internal (indexam.c:279)
==12310== by 0x4B7C5A: index_beginscan (indexam.c:222)
==12310== by 0x4B73D1: systable_beginscan (genam.c:379)
==12310== by 0x7E8CF9: ScanPgRelation (relcache.c:341)
==12310== by 0x7EB3C4: RelationBuildDesc (relcache.c:951)
==12310== by 0x7ECD35: RelationIdGetRelation (relcache.c:1800)
==12310== by 0x4A4D37: relation_open (heapam.c:1118)
Separately, I tried "make installcheck-tests TESTS=index_including"
from Postgres + Valgrind, with Valgrind's --track-origins option
enabled (as it was above). I recommend installing Valgrind, and making
sure that the patch shows no errors. I didn't actually find a Valgrind
issue from just using your regression tests (nor did I find an issue
from separately running the regression tests with
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: