Re: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index

2023-05-03 Thread Jeff Janes
On Wed, May 3, 2023 at 2:00 PM Dirschel, Steve <
steve.dirsc...@thomsonreuters.com> wrote:

> Thanks for the reply Jeff.  Yes-  more of an academic question.  Regarding
> this part:
>
>
>
>Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY
> ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))
>
>Filter: (deleted_millis <= 0)
>
>Buffers: shared hit=24
>
>
>
> For this usage, the =ANY is applied as an "in-index filter".  It only
> descends the index once, to where workflow_id=1070, and then scans forward
> applying the =ANY to each index-tuple until it exhausts the =1070
> condition.  As long as all the =1070 entries fit into just a few buffers,
> the count of buffers accessed by doing this is fewer than doing the
> re-descents.  (Stepping from tuple to tuple in the same index page doesn't
> count as a new access.  While a re-descent releases and reacquires the
> buffer)
>
>
>
> There are 2,981,425 rows where workflow_id = 1070.  Does that change your
> theory of using an “in-index filter” for that plan?
>

Yes.  There is no way that that many index tuples will fit in just 24 index
leaf pages, so I think it must be using the re-descending method for both
plans.  Then I don't know why there is a difference in the number of
buffer accesses.  But the difference seems trivial, so I wouldn't put much
effort into investigating it.



> When you say there was a bit of speculation on the “boundard condition” vs
> “in-index filter” is the speculation on if Postgres has 2 different ways of
> processing a =ANY filter or is the speculation that one is being used by
> one plan and the other is being used by the other plan?
>

The speculation was that this applied to your query.  But going back to
some of my original tests, I see that I remembered some of the details
wrong on the broader topic as well.  When it voluntarily doesn't use the
=ANY as a boundary condition, that shows up in the plan as having the
condition evicted from "Index Cond" line and instead show up in a "Filter"
line, at least in my one test case (which means it is no longer an in-index
filter, as it jumps to the table and verifies visibility before applying
the filter).  So the thing that the plans really don't distinguish is
between when it just chooses not to use the extra index column for cost
reasons, from when it thinks it is unable to use it for correctness/safety
reasons.

Cheers,

Jeff


Re: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index

2023-05-03 Thread Tom Lane
"Dirschel, Steve"  writes:
> There are 2,981,425 rows where workflow_id = 1070.  Does that change your 
> theory of using an “in-index filter” for that plan?  When you say there was a 
> bit of speculation on the “boundard condition” vs “in-index filter” is the 
> speculation on if Postgres has 2 different ways of processing a =ANY filter 
> or is the speculation that one is being used by one plan and the other is 
> being used by the other plan?

I don't believe the intelligence Jeff is postulating actually exists.
I see only one code path for this in nbtree.c, and what it's doing is
what he called the "boundary condition" implementation.  That is, it runs
one index search for "workflow_id = 1070 AND status = 'NOT_STARTED'",
then one for "workflow_id = 1070 AND status = 'PAUSED'", etc,
re-descending the index tree at the start of each of those four scans.

I'm not inclined to ascribe any great significance to the varying numbers
of buffer hits you observed.  I think that's likely explained by chance
layout of the two indexes' contents, so that some of these searches cross
different numbers of index pages even though they visit the same number of
index entries overall.  In particular, it doesn't look like the partial
index is buying anything for this specific test case.  The index's
constraint on "status" matters not at all, because in neither index will
we ever visit any regions of the index where other values of "status"
appear (save the one following entry where we detect that the status value
no longer matches in each scan; but it doesn't really matter what that
entry is).  The constraint on "deleted_millis" could help, but your second
EXPLAIN result shows that it didn't actually eliminate any rows:

>> Index Scan using test_workflow_execution_initial_ui_tabs on 
>> workflow_execution_test  (cost=0.56..15820.19 rows=4335 width=1309) (actual 
>> time=0.049..0.106 rows=56 loops=1)
>>Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY 
>> ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))
>>Filter: (deleted_millis <= 0)

(note the lack of any "Rows Removed by Filter" line).  We can therefore
conclude that the index regions satisfying the workflow_id+status
conditions had no entries with deleted_millis <= 0 either.  So these
two test cases visited exactly the same number of index entries, and
any differences in "buffers hit" had to be chance layout effects, or
possibly the result of different index entry sizes.  How large is
that "result" column in reality?

regards, tom lane




RE: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index

2023-05-03 Thread Dirschel, Steve
Thanks for the reply Jeff.  Yes-  more of an academic question.  Regarding this 
part:

   Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY 
('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))
   Filter: (deleted_millis <= 0)
   Buffers: shared hit=24

For this usage, the =ANY is applied as an "in-index filter".  It only descends 
the index once, to where workflow_id=1070, and then scans forward applying the 
=ANY to each index-tuple until it exhausts the =1070 condition.  As long as all 
the =1070 entries fit into just a few buffers, the count of buffers accessed by 
doing this is fewer than doing the re-descents.  (Stepping from tuple to tuple 
in the same index page doesn't count as a new access.  While a re-descent 
releases and reacquires the buffer)

There are 2,981,425 rows where workflow_id = 1070.  Does that change your 
theory of using an “in-index filter” for that plan?  When you say there was a 
bit of speculation on the “boundard condition” vs “in-index filter” is the 
speculation on if Postgres has 2 different ways of processing a =ANY filter or 
is the speculation that one is being used by one plan and the other is being 
used by the other plan?

Thanks again for your reply.  It is helpful.
Steve