Re: [PERFORM] Index Scan Costs versus Sort
Following up with some additional information. The machine has 1Gb physical RAM. When I run the query (with sort and seqscan enabled), top reports (numbers are fairly consistent): Mem: 1,032,972k total, 1,019,516k used, 13,412k free, 17,132k buffers Swap: 2,032,140k total, 17,592k used, 2,014,548k free, 742,636k cached The postmaster process is using 34.7% of RAM - 359m virt, 349 res, 319m. No other process is using more than 2% of the memory. From vmstat: r b swpd free buffcache 1 0 1759213568 17056743676 vmstat also shows no swapping going on. Note that I have part of the database, for just Colorado, on my Windows XP laptop (table size for completechain table in this case is 1Gb versus 18Gb for the whole US) for development purposes. I see the same behavior on it, which is a Dell D6100 laptop with 1Gb, running 8.1, and a default postgres.conf file with three changes (shared_buffers set to 7000, and work_mem set to 8192, effective_cache_size 2500). Out of curiosity, how much longer would an index_scan expected to be versus a seq scan? I was under the impression it would be about a facto of 4, or is that not usually the case? Thanks for the help, Charlie ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Index Scan Costs versus Sort
Hi Tom, From pg_stats: schema = "tiger"; tablename = "completechain"; attname = "tlid"; null_frac = 0; avg_width = 4; n_distinct = -1; most_common_vals = ; most_common_freqs = ; correlation = 0.155914; Note that I have default_statistics_target set to 100. Here is the first few values from histogram_bounds: "{102450,2202250,4571797,6365754,8444936,10541593,12485818,14545727,16745594,18421868,20300549,22498643,24114709,26301001,28280632,30370123,32253657,33943046,35898115,37499478,39469054,41868498,43992143,45907830,47826340,49843926,52051798,54409298,56447416, The tlid column is a US Census bureau ID assigned to each chain in the US - where a chain is a road segment, river segment, railroad segment, etc. The data is loaded on state-by-state basis, and then a county-by-county basis. There is no overall ordering to TLIDs, although perhaps there is some local ordering at the county level (but from a quick look at the data I don't see any, and the correlation factor indicates there isn't any if I am interpreting it correctly). Any other info that would be helpful to see? Charlie Tom Lane wrote: Charlie Savage <[EMAIL PROTECTED]> writes: 1. Postgresql estimates the index scan will be 50 times more costly than the seq scan (112870376 vs 2229858) yet in fact it only takes 3 times longer to execute (2312426 s vs. 768403 s). My understanding is that postgresql assumes, via the random_page_cost parameter, that an index scan will take 4 times longer than a sequential scan. So why is the analyzer estimating it is 50 times slower? The other factors that are likely to affect this are index correlation and effective cache size. It's fairly remarkable that a full-table index scan only takes 3 times longer than a seqscan; you must have both a high correlation and a reasonably large cache. You showed us your effective_cache_size setting, but what's the pg_stats entry for completechain.tlid contain? Can you quantify what the physical ordering of tlid values is likely to be? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Index Scan Costs versus Sort
This is related to my post the other day about sort performance. Part of my problem seems to be that postgresql is greatly overestimating the cost of index scans. As a result, it prefers query plans that involve seq scans and sorts versus query plans that use index scans. Here is an example query: SELECT tlid, count(tlid) FROM completechain GROUP BY tlid; Approach #1 - seq scan with sort: "GroupAggregate (cost=10177594.61..11141577.89 rows=48199164 width=4) (actual time=7439085.877..8429628.234 rows=47599910 loops=1)" " -> Sort (cost=10177594.61..10298092.52 rows=48199164 width=4) (actual time=7439085.835..8082452.721 rows=48199165 loops=1)" "Sort Key: tlid" "-> Seq Scan on completechain (cost=0.00..2229858.64 rows=48199164 width=4) (actual time=10.788..768403.874 rows=48199165 loops=1)" "Total runtime: 8596987.505 ms" Approach #2 - index scan (done by setting enable_seqscan to false and enable_sort to false): "GroupAggregate (cost=0.00..113713861.43 rows=48199164 width=4) (actual time=53.211..2652227.201 rows=47599910 loops=1)" " -> Index Scan using idx_completechain_tlid on completechain (cost=0.00..112870376.06 rows=48199164 width=4) (actual time=53.168..2312426.321 rows=48199165 loops=1)" "Total runtime: 2795420.933 ms" Approach #1 is estimated to be 10 times less costly, yet takes 3 times longer to execute. My questions: 1. Postgresql estimates the index scan will be 50 times more costly than the seq scan (112870376 vs 2229858) yet in fact it only takes 3 times longer to execute (2312426 s vs. 768403 s). My understanding is that postgresql assumes, via the random_page_cost parameter, that an index scan will take 4 times longer than a sequential scan. So why is the analyzer estimating it is 50 times slower? 2. In approach #1, the planner thinks the sort will take roughly 4 times longer [(10,298,092 - 2,229,858) / 2,229,858] than the sequential scan. Yet it really takes almost ten times longer. It seems as is the planner is greatly underestimating the sort cost? Due to these two apparent miscalculations, postgresql is choosing the wrong query plan to execute this query. I've attached my postgresql.conf file below just in case this is due to some misconfiguration on my part. Some setup notes: * All tables are vacuumed and analyzed * Running Postgresql 8.1 on Suse 10 * Low end hardware - Dell Dimension 3000, 1GB ram, 1 built-in 80 GB IDE drive, 1 SATA Seagate 400GB drive. The IDE drive has the OS and the WAL files, the SATA drive the database. From hdparm the max IO for the IDE drive is about 50Mb/s and the SATA drive is about 65Mb/s. --- #--- # RESOURCE USAGE (except WAL) #--- shared_buffers = 4 # 4 buffers * 8192 bytes/buffer = 327,680,000 bytes #shared_buffers = 1000# min 16 or max_connections*2, 8KB each temp_buffers = 5000 #temp_buffers = 1000# min 100, 8KB each #max_prepared_transactions = 5# can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 16384# in Kb #work_mem = 1024# min 64, size in KB maintenance_work_mem = 262144# in kb #maintenance_work_mem = 16384# min 1024, size in KB #max_stack_depth = 2048# min 100, size in KB # - Free Space Map - max_fsm_pages = 6 #max_fsm_pages = 2# min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000# min 100, ~70 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000# min 25 #preload_libraries = '' # - Cost-Based Vacuum Delay - #vacuum_cost_delay = 0# 0-1000 milliseconds #vacuum_cost_page_hit = 1# 0-1 credits #vacuum_cost_page_miss = 10# 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits # - Background writer - #bgwriter_delay = 200# 10-1 milliseconds between rounds #bgwriter_lru_percent = 1.0# 0-100% of LRU buffers scanned/round #bgwriter_lru_maxpages = 5# 0-1000 buffers max written/round #bgwriter_all_percent = 0.333# 0-100% of all buffers scanned/round #bgwriter_all_maxpages = 5# 0-1000 buffers max written/round #--- # WRITE AHEAD LOG #--- # - Settings - fsync = on# turns forced synchronization on or off #wal_sync_method = fsync# the default is the first option # supported by the operating system: # open_datasync
Re: [PERFORM] Sort performance on large tables
Hi Simon, Thanks for the response Simon. PostgreSQL can do HashAggregates as well as GroupAggregates, just like Oracle. HashAggs avoid the sort phase, so would improve performance considerably. The difference in performance you are getting is because of the different plan used. Did you specifically do anything to Oracle to help it get that plan, or was it a pure out-of-the-box install (or maybe even a "set this up for Data Warehousing" install)? It was an out-of-the-box plan with the standard database install option (wasn't a Data Warehousing install). Can you let us know how high you have to set work_mem before an EXPLAIN (not EXPLAIN ANALYZE) chooses the HashAgg plan? The planner picked a HashAggregate only when I set work_mem to 2097151 - which I gather is the maximum allowed value according to a message returned from the server. Please be aware that publishing Oracle performance results is against the terms of their licence and we seek to be both fair and legitimate, especially within this public discussion forum. Sorry, I didn't realize - I'll be more vague next time. Charlie ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Sort performance on large tables
Very interesting technique. It doesn't actually do quite what I want since it returns all rows that do not have duplicates and not a complete list of unique tlid values. But I could massage it to do what I want. Anyway, the timing: "Seq Scan on completechain t1 (cost=0.00..218139733.60 rows=24099582 width=4) (actual time=25.890..3404650.452 rows=47000655 loops=1)" " Filter: (NOT (subplan))" " SubPlan" "-> Index Scan using idx_completechain_tlid on completechain t2 (cost=0.00..4.48 rows=1 width=0) (actual time=0.059..0.059 rows=0 loops=48199165)" " Index Cond: ($0 = tlid)" " Filter: ($1 <> ogc_fid)" "Total runtime: 3551423.162 ms" Marc Morin wrote: So a 60% reduction in time. Thanks again for the tip. Charlie I have run into this type of query problem as well. I solved it in my application by the following type of query. Assumes of course that you have an index on tlid. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Charlie Savage Sent: Tuesday, November 08, 2005 2:05 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Sort performance on large tables Hi everyone, I have a question about the performance of sort. Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate 400GB drive. The IDE drive has the OS and the WAL files, the SATA drive the database. From hdparm the max IO for the IDE drive is about 50Mb/s and the SATA drive is about 65Mb/s. Thus a very low-end machine - but it used just for development (i.e., it is not a production machine) and the only thing it does is run a PostgresSQL database. I have a staging table called completechain that holds US tiger data (i.e., streets and addresses for the US). The table is approximately 18GB. Its big because there is a lot of data, but also because the table is not normalized (it comes that way). I want to extract data out of the file, with the most important values being stored in a column called tlid. The tlid field is an integer, and the values are 98% unique. There is a second column called ogc_fid which is unique (it is a serial field). I need to extract out unique TLID's (doesn't matter which duplicate I get rid of). To do this I am running this query: SELECT tlid, min(ogc_fid) FROM completechain GROUP BY tlid; The results from explain analyze are: "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)" " -> Sort (cost=10400373.80..10520553.06 rows=48071704 width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)" "Sort Key: tlid" "-> Seq Scan on completechain (cost=0.00..2228584.04 rows=48071704 width=8) (actual time=27.514..773245.046 rows=48199165 loops=1)" "Total runtime: 8486057.185 ms" Doing a similar query produces the same results: SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain; Note it takes over 10 times longer to do the sort than the full sequential scan. Should I expect results like this? I realize that the computer is quite low-end and is very IO bound for this query, but I'm still surprised that the sort operation takes so long. Out of curiosity, I setup an Oracle database on the same machine with the same data and ran the same query. Oracle was over an order of magnitude faster. Looking at its query plan, it avoided the sort by using "HASH GROUP BY." Does such a construct exist in PostgreSQL (I see only hash joins)? Also as an experiment I forced oracle to do a sort by running this query: SELECT tlid, min(ogc_fid) FROM completechain GROUP BY tlid ORDER BY tlid; Even with this, it was more than a magnitude faster than Postgresql. Which makes me think I have somehow misconfigured postgresql (see the relevant parts of postgresql.conf below). Any idea/help appreciated. Thanks, Charlie --- #- -- # RESOURCE USAGE (except WAL) #- -- shared_buffers = 4 # 4 buffers * 8192 bytes/buffer = 327,680,000 bytes #shared_buffers = 1000 # min 16 or max_connections*2, 8KB each temp_buffers = 5000 #temp_buffers = 1000# min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 16384# in Kb #work_mem = 1024# min 64, size in KB maintenance_work_mem = 26214
Re: [PERFORM] Sort performance on large tables
Its an int4. Charlie Tom Lane wrote: Charlie Savage <[EMAIL PROTECTED]> writes: Thus the time decreased from 8486 seconds to 5279 seconds - which is a nice improvement. However, that still leaves postgresql about 9 times slower. BTW, what data type are you sorting, exactly? If it's a string type, what is your LC_COLLATE setting? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Sort performance on large tables
Thanks everyone for the feedback. I tried increasing work_mem: set work_mem to 30; select tlid, min(ogc_fid) from completechain group by tld; The results are: "GroupAggregate (cost=9041602.80..10003036.88 rows=48071704 width=8) (actual time=4371749.523..5106162.256 rows=47599910 loops=1)" " -> Sort (cost=9041602.80..9161782.06 rows=48071704 width=8) (actual time=4371690.894..4758660.433 rows=48199165 loops=1)" "Sort Key: tlid" "-> Seq Scan on completechain (cost=0.00..2228584.04 rows=48071704 width=8) (actual time=49.518..805234.970 rows=48199165 loops=1)" "Total runtime: 5279988.127 ms" Thus the time decreased from 8486 seconds to 5279 seconds - which is a nice improvement. However, that still leaves postgresql about 9 times slower. I tried increasing work_mem up to 50, but at that point the machine started using its swap partition and performance degraded back to the original values. Charlie Richard Huxton wrote: > Charlie Savage wrote: >> Hi everyone, >> >> I have a question about the performance of sort. > >> Note it takes over 10 times longer to do the sort than the full >> sequential scan. >> >> Should I expect results like this? I realize that the computer is >> quite low-end and is very IO bound for this query, but I'm still >> surprised that the sort operation takes so long. > > The sort will be spilling to disk, which will grind your I/O to a halt. > >> work_mem = 16384# in Kb > > Try upping this. You should be able to issue "set work_mem = 10" > before running your query IIRC. That should let PG do its sorting in > larger chunks. > > Also, if your most common access pattern is ordered via tlid look into > clustering the table on that. Richard Huxton wrote: Charlie Savage wrote: Hi everyone, I have a question about the performance of sort. Note it takes over 10 times longer to do the sort than the full sequential scan. Should I expect results like this? I realize that the computer is quite low-end and is very IO bound for this query, but I'm still surprised that the sort operation takes so long. The sort will be spilling to disk, which will grind your I/O to a halt. work_mem = 16384# in Kb Try upping this. You should be able to issue "set work_mem = 10" before running your query IIRC. That should let PG do its sorting in larger chunks. Also, if your most common access pattern is ordered via tlid look into clustering the table on that. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Sort performance on large tables
Hi everyone, I have a question about the performance of sort. Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate 400GB drive. The IDE drive has the OS and the WAL files, the SATA drive the database. From hdparm the max IO for the IDE drive is about 50Mb/s and the SATA drive is about 65Mb/s. Thus a very low-end machine - but it used just for development (i.e., it is not a production machine) and the only thing it does is run a PostgresSQL database. I have a staging table called completechain that holds US tiger data (i.e., streets and addresses for the US). The table is approximately 18GB. Its big because there is a lot of data, but also because the table is not normalized (it comes that way). I want to extract data out of the file, with the most important values being stored in a column called tlid. The tlid field is an integer, and the values are 98% unique. There is a second column called ogc_fid which is unique (it is a serial field). I need to extract out unique TLID's (doesn't matter which duplicate I get rid of). To do this I am running this query: SELECT tlid, min(ogc_fid) FROM completechain GROUP BY tlid; The results from explain analyze are: "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)" " -> Sort (cost=10400373.80..10520553.06 rows=48071704 width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)" "Sort Key: tlid" "-> Seq Scan on completechain (cost=0.00..2228584.04 rows=48071704 width=8) (actual time=27.514..773245.046 rows=48199165 loops=1)" "Total runtime: 8486057.185 ms" Doing a similar query produces the same results: SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain; Note it takes over 10 times longer to do the sort than the full sequential scan. Should I expect results like this? I realize that the computer is quite low-end and is very IO bound for this query, but I'm still surprised that the sort operation takes so long. Out of curiosity, I setup an Oracle database on the same machine with the same data and ran the same query. Oracle was over an order of magnitude faster. Looking at its query plan, it avoided the sort by using "HASH GROUP BY." Does such a construct exist in PostgreSQL (I see only hash joins)? Also as an experiment I forced oracle to do a sort by running this query: SELECT tlid, min(ogc_fid) FROM completechain GROUP BY tlid ORDER BY tlid; Even with this, it was more than a magnitude faster than Postgresql. Which makes me think I have somehow misconfigured postgresql (see the relevant parts of postgresql.conf below). Any idea/help appreciated. Thanks, Charlie --- #--- # RESOURCE USAGE (except WAL) #--- shared_buffers = 4 # 4 buffers * 8192 bytes/buffer = 327,680,000 bytes #shared_buffers = 1000 # min 16 or max_connections*2, 8KB each temp_buffers = 5000 #temp_buffers = 1000# min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 16384# in Kb #work_mem = 1024# min 64, size in KB maintenance_work_mem = 262144# in kb #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - max_fsm_pages = 6 #max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~70 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 #preload_libraries = '' # - Cost-Based Vacuum Delay - #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits #vacuum_cost_page_miss = 10 # 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits # - Background writer - #bgwriter_delay = 200 # 10-1 milliseconds between rounds #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round #bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round #--- # WRITE AHEAD LOG #--