Why using a partial index is doing slightly more logical I/O than a normal index

Dirschel, Steve Wed, 03 May 2023 06:17:15 -0700

Table definition:

workflow_db=> \d workflow_execution_test
                            Table "public.workflow_execution_test"
              Column              |           Type           | Collation | 
Nullable | Default
----------------------------------+--------------------------+-----------+----------+---------
execution_id                     | bigint                   |           |       
   |
state_machine_id                 | bigint                   |           |       
   |
workflow_id                      | bigint                   |           |       
   |
started_datetime                 | timestamp with time zone |           |       
   |
completed_datetime               | timestamp with time zone |           |       
   |
status                           | character varying(50)    |           |       
   |
execution_context_s3_arn         | character varying(200)   |           |       
   |
ol_version                       | integer                  |           |       
   |
created_datetime                 | timestamp with time zone |           |       
   |
updated_datetime                 | timestamp with time zone |           |       
   |
deleted_millis                   | bigint                   |           |       
   |
acquisition_channel_id           | bigint                   |           |       
   |
correlation_id                   | character varying(36)    |           |       
   |
result                           | character varying(50)    |           |       
   |
state_machine_execution_arn      | character varying(200)   |           |       
   |
created_by_id                    | bigint                   |           |       
   |
updated_by_id                    | bigint                   |           |       
   |
acquired_gcs_s3_object           | text                     |           |       
   |
sqs_trigger_id                   | bigint                   |           |       
   |
trigger_message                  | text                     |           |       
   |
acquired_gcs_s3_object_uuid      | character varying(36)    |           |       
   |
api_trigger_id                   | bigint                   |           |       
   |
scheduled_trigger_id             | bigint                   |           |       
   |
notification_trigger_workflow_id | bigint                   |           |       
   |
acquired_object_name             | text                     |           |       
   |
subscription_guid                | uuid                     |           |       
   |
processing_class_code            | character varying(50)    |           |       
   |
root_execution_guid              | uuid                     |           |       
   |
Indexes:
    "test_workflow_execution_active_pending_ordered_by_created_date_" btree 
(workflow_id, status, deleted_millis, created_datetime) WHERE (status::text = 
ANY (ARRAY['ACTION_NEEDED'::character varying, 'NOT_STARTED'::character 
varying, 'PAUSED'::character varying, 'PENDING'::character varying, 
'RUNNING'::character varying]::text[])) AND deleted_millis <= 0
    "test_workflow_execution_initial_ui_tabs" btree (workflow_id, status, 
result, completed_datetime DESC NULLS LAST)


I created/populated this table with 22 million rows.  Afterwards I then created 
the 2 indexes.  So those indexes are packed tightly.

As is the optimizer decides to use the partial index.  Below shows it did 33 
logical reads.  This index should be very good for this query-  the leading 3 
columns of the index are on the 3 criteria in the WHERE clause and the partial 
part is only storing rows that match the status and deleted_millis filters.

explain (analyze, buffers)
select * from workflow_execution_test
where workflow_id = 1070
AND status in ('NOT_STARTED','PAUSED','PENDING','RUNNING')
and deleted_millis <= 0;
                                                                                
             QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using 
test_workflow_execution_active_pending_ordered_by_created_date_ on 
workflow_execution_test  (cost=0.43..15824.82 rows=4335 width=1309) (actual 
time=0.040..0.095 rows=56 loops=1)
   Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY 
('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))
  Buffers: shared hit=33
 Planning:
   Buffers: shared hit=2
Planning Time: 0.321 ms
Execution Time: 0.117 ms

If I hint the query to use the other index it does less work-  it does 24 
logical reads vs 33 using the partial index.

/*+ IndexScan(workflow_execution_test test_workflow_execution_initial_ui_tabs) 
*/
explain (analyze, buffers)
select * from workflow_execution_test
where workflow_id = 1070
AND status in ('NOT_STARTED','PAUSED','PENDING','RUNNING')
and deleted_millis <= 0;
                                                                                
 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using test_workflow_execution_initial_ui_tabs on 
workflow_execution_test  (cost=0.56..15820.19 rows=4335 width=1309) (actual 
time=0.049..0.106 rows=56 loops=1)
   Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY 
('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))
   Filter: (deleted_millis <= 0)
   Buffers: shared hit=24
Planning:
   Buffers: shared hit=2
Planning Time: 0.373 ms
Execution Time: 0.129 ms

It may make sense to me if the query did the same amount of work using either 
of the 2 indexes but it does not make sense to me why the partial index is 
doing more work.  Could anyone explain this behavior?

One other thing to note-  when using the partial index the cost is .43 .. 
15824.82.  When using the other index the cost is .56 .. 15820.19.  So the 
lower end cost (I believe the cost to find the first row) is slightly lower for 
the partial index but the higher end cost (I believe to find the last row) is 
higher for the partial index.  Since there is no LIMIT clause why wouldn't the 
optimizer use the lowest cost to find all rows (which in this case would be to 
use the non-partial index)?

Thanks
This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website: 
https://www.thomsonreuters.com/en/resources/disclosures.html

Why using a partial index is doing slightly more logical I/O than a normal index

Reply via email to