Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
Hi, After few test, the difference is explained by the effective_cache_size parameter. with effective_cache_size=1000 (default) the planner chooses the following plan postgres=# explain select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN Aggregate (cost=421893.64..421893.65 rows=1 width=0) - Unique (cost=385193.48..395679.24 rows=2097152 width=8) - Sort (cost=385193.48..390436.36 rows=2097152 width=8) Sort Key: test.val - Seq Scan on test (cost=0.00..31252.52 rows=2097152 width=8) (5 rows) with effective_cache_size=15000 the planner chooses the following plan postgres=# explain select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN -- Aggregate (cost=101720.39..101720.40 rows=1 width=0) - Unique (cost=0.00..75505.99 rows=2097152 width=8) - Index Scan using testval on test (cost=0.00..70263.11 rows=2097152 width=8) (3 rows) I test some other values for effective_cache_size. The switch from seq to index scan happens between 9900 and 1 for effective_cache_size. I have my sql server on a OpenBSD 3.8 box with 1 Gb of RAM with nothing else running on it. I setup the cachepercent to 25. I expect to have 25% of 1 Gb of RAM (256 Mb) as file cache. effective_cache_size=15000 means 15000 x 8K of OS cache = 120,000 Kb which is lower than my 256 MB of disk cache. I recall the result of my precedent test. #rows 2097152 IndexScan 1363396,581s SeqScan 98758,445s Ratio 13,805 So the planner when effective_cache_size=15000 chooses a plan that is 13 times slower than the seqscan one. I did not understand where the problem comes from. Any help welcome. Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
What same result? You only ran it up to 2K rows, not 2M. In any Sorry, I do this over and over until xxx.000 rows but I do not write in the mail. I do it again. initdb, create table, insert, vacuum full analyze, explain analyze at each stage. And there was no problem. So I make a copy of the offending data directory, and try again. And I got IndexScan only. I will get an headheak ;-) Too big to be send by mail: http://rilk.com/pg81.html Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
THIS MAY SEEM SILLY but vacuum is mispelled below and presumably there was never any ANALYZE done. postgres=# vaccum full verbose analyze; I do have done the vacUUm full verbose analyze;. But I copy/paste the wrong line. Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
Pailloncy Jean-Gerard [EMAIL PROTECTED] writes: Why the stupid indexscan plan on the whole table ? Pray tell, what are you using for the planner cost parameters? The only way I can come close to duplicating your numbers is by setting random_page_cost to somewhere around 0.01 ... I did not change the costs. grep cost postgresql.conf # note: increasing max_connections costs ~400 bytes of shared memory per # note: increasing max_prepared_transactions costs ~600 bytes of shared memory #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits #vacuum_cost_page_miss = 10 # 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits #random_page_cost = 4 # units are one sequential page fetch # cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for # vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # vacuum_cost_limit Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
I redo the test, with a freshly installed data directory. Same result. Note: This is the full log. I just suppress the mistake I do like sl for ls. Jean-Gérard Pailloncy Last login: Thu Nov 24 12:52:32 2005 from 192.168.0.1 OpenBSD 3.8 (WDT) #2: Tue Nov 8 00:52:38 CET 2005 Welcome to OpenBSD: The proactively secure Unix-like operating system. Please use the sendbug(1) utility to report bugs in the system. Before reporting a bug, please try to reproduce it with the latest version of the code. With bug reports, please try to ensure that enough information to reproduce the problem is enclosed, and if a known fix for it exists, include that as well. Terminal type? [xterm-color] # cd /mnt2/pg/install/bin/ # mkdir /mnt2/pg/data # chown -R _pgsql:_pgsql /mnt2/pg/data # su _pgsql $ ls clusterdbdroplang pg_configpg_resetxlog reindexdb createdb dropuser pg_controldata pg_restore vacuumdb createlang ecpg pg_ctl postgres createuser initdb pg_dump postmaster dropdb ipcclean pg_dumpall psql $ ./initdb -D /mnt2/pg/data The files belonging to this database system will be owned by user _pgsql. This user must also own the server process. The database cluster will be initialized with locale C. fixing permissions on existing directory /mnt2/pg/data ... ok creating directory /mnt2/pg/data/global ... ok creating directory /mnt2/pg/data/pg_xlog ... ok creating directory /mnt2/pg/data/pg_xlog/archive_status ... ok creating directory /mnt2/pg/data/pg_clog ... ok creating directory /mnt2/pg/data/pg_subtrans ... ok creating directory /mnt2/pg/data/pg_twophase ... ok creating directory /mnt2/pg/data/pg_multixact/members ... ok creating directory /mnt2/pg/data/pg_multixact/offsets ... ok creating directory /mnt2/pg/data/base ... ok creating directory /mnt2/pg/data/base/1 ... ok creating directory /mnt2/pg/data/pg_tblspc ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 1000 creating configuration files ... ok creating template1 database in /mnt2/pg/data/base/1 ... ok initializing pg_authid ... ok enabling unlimited row size for system tables ... ok initializing dependencies ... ok creating system views ... ok loading pg_description ... ok creating conversions ... ok setting privileges on built-in objects ... ok creating information schema ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok copying template1 to postgres ... ok WARNING: enabling trust authentication for local connections You can change this by editing pg_hba.conf or using the -A option the next time you run initdb. Success. You can now start the database server using: ./postmaster -D /mnt2/pg/data or ./pg_ctl -D /mnt2/pg/data -l logfile start $ ./pg_ctl -D /mnt2/pg/data -l /mnt2/pg/data/logfile start postmaster starting $ ./psql postgres Welcome to psql 8.1.0, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit postgres=# create table test (id serial, val integer); NOTICE: CREATE TABLE will create implicit sequence test_id_seq for serial column test.id CREATE TABLE postgres=# create unique index testid on test (id); CREATE INDEX postgres=# create index testval on test (val); CREATE INDEX postgres=# insert into test (val) values (round(random() *1024*1024*1024)); INSERT 0 1 postgres=# vacuum full analyze; VACUUM postgres=# select count(1) from test; count --- 1 (1 row) postgres=# explain select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN -- Aggregate (cost=1.04..1.05 rows=1 width=0) - Unique (cost=1.02..1.03 rows=1 width=8) - Sort (cost=1.02..1.02 rows=1 width=8) Sort Key: test.val - Seq Scan on test (cost=0.00..1.01 rows=1 width=8) (5 rows) postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 1 postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 2 postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 4 postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 8 postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 16 postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 32 postgres=# vacuum full analyze; VACUUM postgres=# explain select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN --- Aggregate (cost=4.68..4.69
[PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
Hi, PostgreSQL 8.1 fresh install on a freshly installed OpenBSD 3.8 box. postgres=# CREATE DATABASE test; CREATE DATABASE postgres=# create table test (id serial, val integer); NOTICE: CREATE TABLE will create implicit sequence test_id_seq for serial column test.id CREATE TABLE postgres=# create unique index testid on test (id); CREATE INDEX postgres=# create index testval on test (val); CREATE INDEX postgres=# insert into test (val) values (round(random() *1024*1024*1024)); INSERT 0 1 [...] insert many random values postgres=# vaccum full verbose analyze; postgres=# select count(1) from test; count - 2097152 (1 row) postgres=# explain select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN -- Aggregate (cost=66328.72..66328.73 rows=1 width=0) - Unique (cost=0.00..40114.32 rows=2097152 width=8) - Index Scan using testval on test (cost=0.00..34871.44 rows=2097152 width=8) (3 rows) postgres=# set enable_indexscan=off; postgres=# explain analyze select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN Aggregate (cost=280438.64..280438.65 rows=1 width=0) (actual time=39604.107..39604.108 rows=1 loops=1) - Unique (cost=243738.48..254224.24 rows=2097152 width=8) (actual time=30281.004..37746.488 rows=2095104 loops=1) - Sort (cost=243738.48..248981.36 rows=2097152 width=8) (actual time=30280.999..33744.197 rows=2097152 loops=1) Sort Key: test.val - Seq Scan on test (cost=0.00..23537.52 rows=2097152 width=8) (actual time=11.550..3262.433 rows=2097152 loops=1) Total runtime: 39624.094 ms (6 rows) postgres=# set enable_indexscan=on; postgres=# explain analyze select count(*) from (select distinct on (val) * from test where val1000) as foo; QUERY PLAN --- Aggregate (cost=4739.58..4739.59 rows=1 width=0) (actual time=4686.472..4686.473 rows=1 loops=1) - Unique (cost=4380.56..4483.14 rows=20515 width=8) (actual time=4609.046..4669.289 rows=19237 loops=1) - Sort (cost=4380.56..4431.85 rows=20515 width=8) (actual time=4609.041..4627.976 rows=19255 loops=1) Sort Key: test.val - Bitmap Heap Scan on test (cost=88.80..2911.24 rows=20515 width=8) (actual time=130.954..4559.244 rows=19255 loops=1) Recheck Cond: (val 1000) - Bitmap Index Scan on testval (cost=0.00..88.80 rows=20515 width=0) (actual time=120.041..120.041 rows=19255 loops=1) Index Cond: (val 1000) Total runtime: 4690.513 ms (9 rows) postgres=# explain select count(*) from (select distinct on (val) * from test where val1) as foo; QUERY PLAN - Aggregate (cost=16350.20..16350.21 rows=1 width=0) - Unique (cost=0.00..13748.23 rows=208158 width=8) - Index Scan using testval on test (cost=0.00..13227.83 rows=208158 width=8) Index Cond: (val 1) (4 rows) postgres=# set enable_indexscan=off; postgres=# explain analyze select count(*) from (select distinct on (val) * from test where val1) as foo; QUERY PLAN Aggregate (cost=28081.27..28081.28 rows=1 width=0) (actual time=6444.650..6444.651 rows=1 loops=1) - Unique (cost=24438.50..25479.29 rows=208158 width=8) (actual time=5669.118..6277.206 rows=194142 loops=1) - Sort (cost=24438.50..24958.89 rows=208158 width=8) (actual time=5669.112..5852.351 rows=194342 loops=1) Sort Key: test.val - Bitmap Heap Scan on test (cost=882.55..6050.53 rows=208158 width=8) (actual time=1341.114..4989.840 rows=194342 loops=1) Recheck Cond: (val 1) - Bitmap Index Scan on testval (cost=0.00..882.55 rows=208158 width=0) (actual time=1339.707..1339.707 rows=194342 loops=1) Index Cond: (val 1) Total runtime: 6487.114 ms (9 rows) postgres=# explain analyze select count(*) from (select distinct on (val) * from test where val75000) as foo;
Re: [HACKERS] [PERFORM] A Better External Sort?
Your main example seems to focus on a large table where a key column has constrained values. This case is interesting in proportion to the number of possible values. If I have billions of rows, each having one of only two values, I can think of a trivial and very fast method of returning the table sorted by that key: make two sequential passes, returning the first value on the first pass and the second value on the second pass. This will be faster than the method you propose. 1= No that was not my main example. It was the simplest example used to frame the later more complicated examples. Please don't get hung up on it. 2= You are incorrect. Since IO is the most expensive operation we can do, any method that makes two passes through the data at top scanning speed will take at least 2x as long as any method that only takes one such pass. You do not get the point. As the time you get the sorted references to the tuples, you need to fetch the tuples themself, check their visbility, etc. and returns them to the client. So, if there is only 2 values in the column of big table that is larger than available RAM, two seq scans of the table without any sorting is the fastest solution. Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Error in VACUUM FULL VERBOSE ANALYZE (not enough memory)
The classic output from top (during all other index vacuum): PIDUID PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND 20461503140 13M 75M sleep semwai 5:27 2.05% postgres When backend hits the tsearch2 index, SIZE/RES grows until it reachs 1GB, where I got the error. PIDUID PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND 20461503-50 765M 824M sleep biowai 4:26 33.20% postgres Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])