Hello! I tried asking this over on the general listserv before realizing
pgsql-performance is probably better suited.

Hi all,

I recently started at a new firm and have been trying to help to grok
certain planner behavior. A strip-down example of the sort of join we do in
the database looks like this, wherein we join two tables that have about 1
million rows:

-- VACUUM (FULL, VERBOSE, ANALYZE), run the query twice first, then...
EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY)
SELECT
    ci.conversation_uuid,
    ci.item_uuid,
    ci.tenant_id,
    it.item_subject,
    it.item_body_start
FROM
    conversation_item AS ci
INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;

-- The necessary DDL that creates these tables and indexes is attached.
I've commented out some extra stuff that isn't directly related to the
above query.

Depending on config, we get different results in terms of performance
(EXPLAIN output attached):

PLAN A (default config, effective cache size just shy of 15GB): 3.829
seconds. A nested loop is used to probe the hash index
`conversation_item_item_hash_index` for each row of item_text. Although the
cost of probing once is low, a fair amount of time passes because the
operation is repeated ~1.3 million times.

PLAN B (enable_indexscan off, effective cache same as before): 3.254
seconds (~15% speedup, sometimes 30%). Both tables are scanned sequentially
and conversation_item is hashed before results are combined with a hash
join.

PLAN C: (random_page_cost = 8.0, instead of default 4, effective cache same
as before): 2.959 (~23% speedup, sometimes 38%). Same overall plan as PLAN
B, some differences in buffers and I/O. I'll note we had to get to 8.0
before we saw a change to planner behavior; 5.0, 6.0, and 7.0 were too low
to make a difference.

Environment:

Postgres 15.2
Amazon RDS — db.m6g.2xlarge


Questions:

   1. In Plan A, what factors (read: which GUC settings) are causing the
   planner to select a substantially slower plan despite having recent stats
   about number of rows?
   2. Is there a substantial difference between the on-the-fly hash done in
   Plan B and Plan C compared to the hash-index used in Plan A? Can I assume
   they are essentially the same? Perhaps there are there differences in how
   they're applied?
   3. Is it common to see values for random_page_cost set as high as 8.0?
   We would of course need to investigate whether we see a net positive or net
   negative impact on other queries, to adopt this as a general setting, but
   is it a proposal we should actually consider?
   4. Maybe we are barking up the wrong tree with the previous questions.
   Are there other configuration parameters we should consider first to
   improve performance in situations like the one illustrated?
   5. Are there other problems with our schema, query, or plans shown here?
   Other approaches (or tools/analyses) we should consider?

Attached also are the results of

SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');

- Patrick
Hash Join  (cost=53933.69..255802.87 rows=1395542 width=216) (actual 
time=422.466..2868.948 rows=1394633 loops=1)"
  Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, 
it.item_body_start"
  Inner Unique: true"
  Hash Cond: (it.item_uuid = ci.item_uuid)"
  Buffers: shared hit=12283 read=95974 dirtied=77, temp read=44113 
written=44113"
  I/O Timings: shared/local read=462.195, temp read=86.413 write=423.363"
  WAL: records=1 bytes=58"
  ->  Seq Scan on item_text it  (cost=0.00..110641.18 rows=1401518 width=196) 
(actual time=0.012..854.842 rows=1394633 loops=1)"
        Output: it.item_subject, it.item_body_start, it.item_uuid"
        Buffers: shared hit=652 read=95974 dirtied=77"
        I/O Timings: shared/local read=462.195"
        WAL: records=1 bytes=58"
  ->  Hash  (cost=25586.42..25586.42 rows=1395542 width=36) (actual 
time=422.386..422.387 rows=1394633 loops=1)"
        Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id"
        Buckets: 131072  Batches: 16  Memory Usage: 6799kB"
        Buffers: shared hit=11631, temp written=8932"
        I/O Timings: temp write=78.785"
        ->  Seq Scan on conversation_item ci  (cost=0.00..25586.42 rows=1395542 
width=36) (actual time=0.004..133.304 rows=1394633 loops=1)"
              Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id"
              Buffers: shared hit=11631"
Settings: effective_cache_size = '16053152kB', jit = 'off', random_page_cost = 
'8'"
Query Identifier: 5005100605804837348"
Planning:"
  Buffers: shared hit=398 read=7"
  I/O Timings: shared/local read=1.400"
Planning Time: 2.379 ms"
Execution Time: 2956.264 ms"

Generated with:
        EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, 
SUMMARY)
        SELECT
                ci.conversation_uuid,
                ci.item_uuid,
                ci.tenant_id,
                it.item_subject,
                it.item_body_start
        FROM
                conversation_item AS ci
                INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;
Hash Join  (cost=53126.96..251632.49 rows=1374665 width=216) (actual 
time=475.472..3162.922 rows=1374687 loops=1)"
  Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, 
it.item_body_start"
  Inner Unique: true"
  Hash Cond: (it.item_uuid = ci.item_uuid)"
  Buffers: shared hit=1376 read=105087 written=8220, temp read=43290 
written=43290"
  I/O Timings: shared/local read=319.560 write=38.895, temp read=92.529 
write=420.118"
  ->  Seq Scan on item_text it  (cost=0.00..108786.21 rows=1378021 width=196) 
(actual time=0.012..730.051 rows=1374687 loops=1)"
        Output: it.item_subject, it.item_body_start, it.item_uuid"
        Buffers: shared hit=1238 read=93768 written=5951"
        I/O Timings: shared/local read=289.139 write=29.014"
  ->  Hash  (cost=25203.65..25203.65 rows=1374665 width=36) (actual 
time=475.372..475.373 rows=1374687 loops=1)"
        Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id"
        Buckets: 131072  Batches: 16  Memory Usage: 6718kB"
        Buffers: shared hit=138 read=11319 written=2269, temp written=8804"
        I/O Timings: shared/local read=30.421 write=9.881, temp write=80.814"
        ->  Seq Scan on conversation_item ci  (cost=0.00..25203.65 rows=1374665 
width=36) (actual time=0.008..184.562 rows=1374687 loops=1)"
              Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id"
              Buffers: shared hit=138 read=11319 written=2269"
              I/O Timings: shared/local read=30.421 write=9.881"
Settings: effective_cache_size = '16053152kB', enable_indexscan = 'off', jit = 
'off'"
Query Identifier: 5005100605804837348"
Planning:"
  Buffers: shared hit=379 read=12"
  I/O Timings: shared/local read=0.810"
Planning Time: 1.724 ms"
Execution Time: 3254.486 ms"

Generated with:
        EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, 
SUMMARY)
        SELECT
                ci.conversation_uuid,
                ci.item_uuid,
                ci.tenant_id,
                it.item_subject,
                it.item_body_start
        FROM
                conversation_item AS ci
                INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;
Nested Loop  (cost=0.00..194705.88 rows=1340602 width=213) (actual 
time=0.018..3739.905 rows=1339642 loops=1)"
  Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, 
it.item_body_start"
  Inner Unique: true"
  Buffers: shared hit=2681522 read=90343 dirtied=203"
  I/O Timings: shared/local read=233.152"
  WAL: records=203 fpi=203 bytes=428058"
  ->  Seq Scan on item_text it  (cost=0.00..105892.07 rows=1350507 width=193) 
(actual time=0.011..628.038 rows=1339642 loops=1)"
        Output: it.item_subject, it.item_body_start, it.item_uuid"
        Buffers: shared hit=2044 read=90343 dirtied=193"
        I/O Timings: shared/local read=233.152"
        WAL: records=193 fpi=193 bytes=380966"
  ->  Index Scan using conversation_item_item_hash_index on conversation_item 
ci  (cost=0.00..0.07 rows=1 width=36) (actual time=0.002..0.002 rows=1 
loops=1339642)"
        Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id"
        Index Cond: (ci.item_uuid = it.item_uuid)"
        Rows Removed by Index Recheck: 0"
        Buffers: shared hit=2679478 dirtied=10"
        WAL: records=10 fpi=10 bytes=47092"
Settings: effective_cache_size = '16053152kB', jit = 'off'"
Query Identifier: 5005100605804837348"
Planning:"
  Buffers: shared hit=12 read=4"
  I/O Timings: shared/local read=0.015"
Planning Time: 0.236 ms"
Execution Time: 3829.002 ms"

Generated with:
        EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, 
SUMMARY)
        SELECT
                ci.conversation_uuid,
                ci.item_uuid,
                ci.tenant_id,
                it.item_subject,
                it.item_body_start
        FROM
                conversation_item AS ci
                INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;

Attachment: create_statements.postgre.sql
Description: application/sql

name,current_setting,source
archive_library,rds_archive,configuration file
archive_mode,on,configuration file
archive_timeout,5min,configuration file
autovacuum_analyze_scale_factor,0.05,configuration file
autovacuum_max_workers,3,configuration file
autovacuum_naptime,15s,configuration file
autovacuum_vacuum_cost_limit,600,configuration file
autovacuum_vacuum_scale_factor,0.1,configuration file
autovacuum_work_mem,1003322kB,configuration file
checkpoint_completion_target,0.9,configuration file
compute_query_id,auto,configuration file
default_toast_compression,lz4,configuration file
effective_cache_size,16053152kB,configuration file
fsync,on,configuration file
full_page_writes,on,configuration file
hot_standby,off,configuration file
hot_standby_feedback,on,configuration file
huge_pages,on,configuration file
idle_in_transaction_session_timeout,1d,configuration file
ignore_invalid_pages,off,configuration file
jit,off,configuration file
listen_addresses,*,command line
lo_compat_privileges,off,configuration file
log_autovacuum_min_duration,10s,configuration file
log_checkpoints,on,configuration file
log_destination,stderr,configuration file
log_file_mode,644,configuration file
log_line_prefix,%t:%r:%u@%d:[%p]:,configuration file
log_min_duration_statement,4s,configuration file
log_rotation_age,1h,configuration file
log_temp_files,0,configuration file
log_truncate_on_rotation,off,configuration file
logging_collector,on,configuration file
maintenance_work_mem,526332kB,configuration file
max_connections,3449,configuration file
max_locks_per_transaction,64,configuration file
max_parallel_workers,8,configuration file
max_prepared_transactions,0,configuration file
max_replication_slots,20,configuration file
max_stack_depth,6MB,configuration file
max_standby_archive_delay,5min,configuration file
max_standby_streaming_delay,5min,configuration file
max_wal_senders,20,configuration file
max_wal_size,2GB,configuration file
max_worker_processes,16,configuration file
min_wal_size,192MB,configuration file
pg_stat_statements.track,all,configuration file
rds.force_autovacuum_logging_level,warning,configuration file
rds.internal_databases,"rdsadmin,template0",configuration file
rds.logical_replication,off,configuration file
rds.rds_superuser_reserved_connections,2,configuration file
recovery_init_sync_method,syncfs,configuration file
recovery_prefetch,off,configuration file
remove_temp_files_after_crash,off,configuration file
shared_buffers,8026576kB,configuration file
ssl,on,configuration file
ssl_ca_file,/rdsdbdata/rds-metadata/ca-cert.pem,configuration file
ssl_cert_file,/rdsdbdata/rds-metadata/server-cert.pem,configuration file
ssl_key_file,/rdsdbdata/rds-metadata/server-key.pem,configuration file
superuser_reserved_connections,3,configuration file
synchronous_commit,on,configuration file
track_activity_query_size,4kB,configuration file
track_functions,pl,configuration file
track_io_timing,on,configuration file
track_wal_io_timing,on,configuration file
unix_socket_group,rdsdb,configuration file
update_process_title,on,configuration file
vacuum_cost_page_miss,5,configuration file
vacuum_failsafe_age,1200000000,configuration file
vacuum_multixact_failsafe_age,1200000000,configuration file
wal_compression,zstd,configuration file
wal_keep_size,2GB,configuration file
wal_level,replica,configuration file
wal_receiver_create_temp_slot,off,configuration file
wal_receiver_timeout,30s,configuration file
wal_sender_timeout,30s,configuration file

Reply via email to