Re: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index
On Wed, May 3, 2023 at 2:00 PM Dirschel, Steve < steve.dirsc...@thomsonreuters.com> wrote: > Thanks for the reply Jeff. Yes- more of an academic question. Regarding > this part: > > > >Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY > ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[]))) > >Filter: (deleted_millis <= 0) > >Buffers: shared hit=24 > > > > For this usage, the =ANY is applied as an "in-index filter". It only > descends the index once, to where workflow_id=1070, and then scans forward > applying the =ANY to each index-tuple until it exhausts the =1070 > condition. As long as all the =1070 entries fit into just a few buffers, > the count of buffers accessed by doing this is fewer than doing the > re-descents. (Stepping from tuple to tuple in the same index page doesn't > count as a new access. While a re-descent releases and reacquires the > buffer) > > > > There are 2,981,425 rows where workflow_id = 1070. Does that change your > theory of using an “in-index filter” for that plan? > Yes. There is no way that that many index tuples will fit in just 24 index leaf pages, so I think it must be using the re-descending method for both plans. Then I don't know why there is a difference in the number of buffer accesses. But the difference seems trivial, so I wouldn't put much effort into investigating it. > When you say there was a bit of speculation on the “boundard condition” vs > “in-index filter” is the speculation on if Postgres has 2 different ways of > processing a =ANY filter or is the speculation that one is being used by > one plan and the other is being used by the other plan? > The speculation was that this applied to your query. But going back to some of my original tests, I see that I remembered some of the details wrong on the broader topic as well. When it voluntarily doesn't use the =ANY as a boundary condition, that shows up in the plan as having the condition evicted from "Index Cond" line and instead show up in a "Filter" line, at least in my one test case (which means it is no longer an in-index filter, as it jumps to the table and verifies visibility before applying the filter). So the thing that the plans really don't distinguish is between when it just chooses not to use the extra index column for cost reasons, from when it thinks it is unable to use it for correctness/safety reasons. Cheers, Jeff
Re: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index
"Dirschel, Steve" writes: > There are 2,981,425 rows where workflow_id = 1070. Does that change your > theory of using an “in-index filter” for that plan? When you say there was a > bit of speculation on the “boundard condition” vs “in-index filter” is the > speculation on if Postgres has 2 different ways of processing a =ANY filter > or is the speculation that one is being used by one plan and the other is > being used by the other plan? I don't believe the intelligence Jeff is postulating actually exists. I see only one code path for this in nbtree.c, and what it's doing is what he called the "boundary condition" implementation. That is, it runs one index search for "workflow_id = 1070 AND status = 'NOT_STARTED'", then one for "workflow_id = 1070 AND status = 'PAUSED'", etc, re-descending the index tree at the start of each of those four scans. I'm not inclined to ascribe any great significance to the varying numbers of buffer hits you observed. I think that's likely explained by chance layout of the two indexes' contents, so that some of these searches cross different numbers of index pages even though they visit the same number of index entries overall. In particular, it doesn't look like the partial index is buying anything for this specific test case. The index's constraint on "status" matters not at all, because in neither index will we ever visit any regions of the index where other values of "status" appear (save the one following entry where we detect that the status value no longer matches in each scan; but it doesn't really matter what that entry is). The constraint on "deleted_millis" could help, but your second EXPLAIN result shows that it didn't actually eliminate any rows: >> Index Scan using test_workflow_execution_initial_ui_tabs on >> workflow_execution_test (cost=0.56..15820.19 rows=4335 width=1309) (actual >> time=0.049..0.106 rows=56 loops=1) >>Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY >> ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[]))) >>Filter: (deleted_millis <= 0) (note the lack of any "Rows Removed by Filter" line). We can therefore conclude that the index regions satisfying the workflow_id+status conditions had no entries with deleted_millis <= 0 either. So these two test cases visited exactly the same number of index entries, and any differences in "buffers hit" had to be chance layout effects, or possibly the result of different index entry sizes. How large is that "result" column in reality? regards, tom lane
RE: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index
Thanks for the reply Jeff. Yes- more of an academic question. Regarding this part: Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[]))) Filter: (deleted_millis <= 0) Buffers: shared hit=24 For this usage, the =ANY is applied as an "in-index filter". It only descends the index once, to where workflow_id=1070, and then scans forward applying the =ANY to each index-tuple until it exhausts the =1070 condition. As long as all the =1070 entries fit into just a few buffers, the count of buffers accessed by doing this is fewer than doing the re-descents. (Stepping from tuple to tuple in the same index page doesn't count as a new access. While a re-descent releases and reacquires the buffer) There are 2,981,425 rows where workflow_id = 1070. Does that change your theory of using an “in-index filter” for that plan? When you say there was a bit of speculation on the “boundard condition” vs “in-index filter” is the speculation on if Postgres has 2 different ways of processing a =ANY filter or is the speculation that one is being used by one plan and the other is being used by the other plan? Thanks again for your reply. It is helpful. Steve