On Tue, Apr 5, 2016 at 1:31 PM, Peter Geoghegan <p...@heroku.com> wrote:
> My new understanding: The extra "included" columns are stored in the
> index, but do not affect its sort order at all. They are no more part
> of the key than, say, the heap TID that the key points to. They are
> just "payload".

Noticed a few issues following another pass:

* tuplesort.c should handle the CLUSTER case in the same way as the
btree case. No?

* Why have a RelationGetNumberOfAttributes(indexRel) call in
tuplesort_begin_index_btree() at all now?

* This critical section is unnecessary, because this happens during
index builds:

+               if (indnkeyatts != indnatts && P_ISLEAF(opageop))
+               {
+                       /*
+                        * It's essential to truncate High key here.
+                        * The purpose is not just to save more space
on this particular page,
+                        * but to keep whole b-tree structure
consistent. Subsequent insertions
+                        * assume that hikey is already truncated, and
so they should not
+                        * worry about it, when copying the high key
into the parent page
+                        * as a downlink.
+                        * NOTE It is not crutial for reliability in present,
+                        * but maybe it will be that in the future.
+                        * NOTE this code will be changed by the
"btree compression" patch,
+                        * which is in progress now.
+                        */
+                       keytup = index_reform_tuple(wstate->index, oitup,
                 indnatts, indnkeyatts);
+                       /*  delete "wrong" high key, insert keytup as
+                       START_CRIT_SECTION();
+                       PageIndexTupleDelete(opage, P_HIKEY);
+                       if (!_bt_pgaddtup(opage,
IndexTupleSize(keytup), keytup, P_HIKEY))
+                               elog(ERROR, "failed to rewrite
compressed item in index \"%s\"",
+                                       RelationGetRelationName(wstate->index));
+                       END_CRIT_SECTION();
+               }

Note that START_CRIT_SECTION() promotes any ERROR to PANIC, which
isn't useful here, because we have no buffer lock held, and nothing
must be WAL-logged.

* Think you forgot to update spghandler(). (You did not add a test for
just that one AM, either)

* I wonder why this restriction needs to exist:

+               else
+                       elog(ERROR, "Expressions are not supported in
included columns.");

What does not supporting it buy us? Was it just that the pg_index
representation is more complicated, and you wanted to put it off?

An error like this should use ereport(ERROR,

* I would like to see index_reform_tuple() assert that the new,
truncated index tuple is definitely <= the original (I worry about the
1/3 page restriction issue). Maybe you should also change the name of
index_reform_tuple(), per David.

* There is some stray whitespace within RelationGetIndexAttrBitmap().
I think you should have updated it with code, though. I don't think
it's necessary for HOT updates to work, but I think it could be
necessary so that we don't need to get a row lock that blocks
non-conflict foreign key locking (see heap_update() callers). I think
you need to be careful for non-key columns within the loop in
RelationGetIndexAttrBitmap(), basically, because it seems to still go
through all columns. UPSERT also must call this code, FWIW.

* I think that a similar omission is also made for the replica
identity stuff in RelationGetIndexAttrBitmap(). Some thought is needed
on how this patch interacts with logical decoding, I guess.

* Valgrind shows an error with an aggregate statement I tried:

2016-04-05 17:01:31.129 PDT 12310 LOG:  statement: explain analyze
select count(*) from ab  where b > 5 group by a, b;
==12310== Invalid read of size 4
==12310==    at 0x656615: match_clause_to_indexcol (indxpath.c:2226)
==12310==    by 0x656615: match_clause_to_index (indxpath.c:2144)
==12310==    by 0x656DBC: match_clauses_to_index (indxpath.c:2115)
==12310==    by 0x658054: match_restriction_clauses_to_index (indxpath.c:2026)
==12310==    by 0x658054: create_index_paths (indxpath.c:269)
==12310==    by 0x64D1DB: set_plain_rel_pathlist (allpaths.c:649)
==12310==    by 0x64D1DB: set_rel_pathlist (allpaths.c:427)
==12310==    by 0x64D93B: set_base_rel_pathlists (allpaths.c:299)
==12310==    by 0x64D93B: make_one_rel (allpaths.c:170)
==12310==    by 0x66876C: query_planner (planmain.c:246)
==12310==    by 0x669FBA: grouping_planner (planner.c:1666)
==12310==    by 0x66D0C9: subquery_planner (planner.c:751)
==12310==    by 0x66D3DA: standard_planner (planner.c:300)
==12310==    by 0x66D714: planner (planner.c:170)
==12310==    by 0x6FD692: pg_plan_query (postgres.c:798)
==12310==    by 0x59082D: ExplainOneQuery (explain.c:350)
==12310==  Address 0xbff290c is 2,508 bytes inside a block of size 8,192 alloc'd
==12310==    at 0x4C2AB80: malloc (in
==12310==    by 0x81B7FA: AllocSetAlloc (aset.c:853)
==12310==    by 0x81D257: palloc (mcxt.c:907)
==12310==    by 0x4B6F65: RelationGetIndexScan (genam.c:94)
==12310==    by 0x4C135D: btbeginscan (nbtree.c:431)
==12310==    by 0x4B7A5C: index_beginscan_internal (indexam.c:279)
==12310==    by 0x4B7C5A: index_beginscan (indexam.c:222)
==12310==    by 0x4B73D1: systable_beginscan (genam.c:379)
==12310==    by 0x7E8CF9: ScanPgRelation (relcache.c:341)
==12310==    by 0x7EB3C4: RelationBuildDesc (relcache.c:951)
==12310==    by 0x7ECD35: RelationIdGetRelation (relcache.c:1800)
==12310==    by 0x4A4D37: relation_open (heapam.c:1118)

Separately, I tried "make installcheck-tests TESTS=index_including"
from Postgres + Valgrind, with Valgrind's --track-origins option
enabled (as it was above). I recommend installing Valgrind, and making
sure that the patch shows no errors. I didn't actually find a Valgrind
issue from just using your regression tests (nor did I find an issue
from separately running the regression tests with

Peter Geoghegan

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to