Re: [HACKERS] [NOVICE] WHERE clause not used when index is used

Tom Lane Tue, 01 Mar 2016 12:04:38 -0800

I wrote:
> Petr Jelinek <[email protected]> writes:
>> I can only get the issue when the sort order of the individual keys does 
>> not correlate and the operator sorts according to the first column and 
>> there are duplicate values for the first column.


> Yeah, I think the combination of ASC and DESC columns may be necessary to
> break things.  It needs closer analysis.

Okay, I've now studied the patch more carefully and understood that it
wasn't quite so naive as to not consider multi-column cases at all.
It does work for simple scalar multi-column cases, because the test on
SK_BT_MATCHED is placed so that it only skips checking the first column's
condition, not the conditions on lower-order columns.  Rather, the problem
is failure to consider ROW() comparisons, which do not necessarily have
the same simple behavior as scalar comparisons.  They sort of accidentally
work, *if* the lower-order columns in the ROW() comparison match the
index's lower-order columns as to position and index direction.  Tobias'
example fails because the second column in the ROW() isn't sorted the
same, and you could also make it fail with a 3-or-more-column index in
which the ROW() comparison tested, say, just the first and third columns.

What I think we should do is move the SK_BT_MATCHED flag down to the
first sub-key of the ROW() comparison, which is the one that does behave
the same as in scalar cases.

The fact that it doesn't fail for the case where the lower-order columns
of the ROW() do match the index shows that this still leaves something on
the table: within a ROW() comparison, you could set SK_BT_MATCHED on more
than just the first sub-key, *if* the subsequent columns match the index.
This makes me think that the analysis should be getting done in or near
_bt_mark_scankey_required, not in the rather random place it is now,
because _bt_mark_scankey_required already understands about sub-keys
matching the index or not.


However ... there is an even larger problem, which is that the patch
thinks it can use skey->sk_flags for what's effectively transient state
storage --- not merely transient, but scan-direction-dependent.  This
means that a cursor that reads forwards for awhile and then turns around
and reads backwards will get the wrong answer, even without anything as
complicated as multiple key columns.  It's a bit harder to exhibit such a
bug than you might guess, because of btree's habit of prefetching an index
page at a time; you have to make the scan cross an index page boundary
before you turn around and back it up.  Bearing that in mind, I was soon
able to devise a failure case in the regression database:

regression=# set enable_seqscan TO 0;
SET
regression=# set enable_bitmapscan TO 0;
SET
regression=# begin;
BEGIN
regression=# declare c cursor for select unique1,unique2 from tenk1 where 
unique1 > 9500;
DECLARE CURSOR
regression=# fetch 20 from c;
 unique1 | unique2 
---------+---------
    9501 |    5916
    9502 |    1812
    9503 |    1144
    9504 |    9202
    9505 |    4300
    9506 |    5551
    9507 |     847
    9508 |    4093
    9509 |    9418
    9510 |    1862
    9511 |     848
    9512 |    4823
    9513 |    1125
    9514 |    9276
    9515 |    1160
    9516 |    5177
    9517 |    3600
    9518 |    9677
    9519 |    5518
    9520 |    1429
(20 rows)

regression=# fetch backward 30 from c;
 unique1 | unique2 
---------+---------
    9519 |    5518
    9518 |    9677
    9517 |    3600
    9516 |    5177
    9515 |    1160
    9514 |    9276
    9513 |    1125
    9512 |    4823
    9511 |     848
    9510 |    1862
    9509 |    9418
    9508 |    4093
    9507 |     847
    9506 |    5551
    9505 |    4300
    9504 |    9202
    9503 |    1144
    9502 |    1812
    9501 |    5916
    9500 |    3676
    9499 |     927
    9498 |    2807
    9497 |    6558
    9496 |    1857
    9495 |    1974
    9494 |     560
    9493 |    3531
    9492 |    9875
    9491 |    7237
    9490 |    8928
(30 rows)

Ooops.

I believe the way to fix this would be to stop regarding SK_BT_MATCHED
as state, and instead treat it as a scankey property identified during
_bt_preprocess_keys, analogously to SK_BT_REQFWD/SK_BT_REQBKWD --- and,
like those, you'd need two flags not one since the properties will be
determined independently of knowing which direction you'll be going in.

In any event, I am now of the opinion that this patch needs to be reverted
outright and returned to the authors for redesign.  There are too many
things wrong with it and too little reason to think that we have to have
it in 9.5.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [NOVICE] WHERE clause not used when index is used

Reply via email to