Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-12-06 Thread Pailloncy Jean-Gerard

Hi,

After few test, the difference is explained by the  
effective_cache_size parameter.


with effective_cache_size=1000 (default)
the planner chooses the following plan
postgres=# explain select count(*) from (select distinct on (val) *  
from test) as foo;

   QUERY PLAN
 


Aggregate  (cost=421893.64..421893.65 rows=1 width=0)
   -  Unique  (cost=385193.48..395679.24 rows=2097152 width=8)
 -  Sort  (cost=385193.48..390436.36 rows=2097152 width=8)
   Sort Key: test.val
   -  Seq Scan on test  (cost=0.00..31252.52  
rows=2097152 width=8)

(5 rows)


with effective_cache_size=15000
the planner chooses the following plan
postgres=# explain select count(*) from (select distinct on (val) *  
from test) as foo;

QUERY PLAN
 
--

Aggregate  (cost=101720.39..101720.40 rows=1 width=0)
   -  Unique  (cost=0.00..75505.99 rows=2097152 width=8)
 -  Index Scan using testval on test  (cost=0.00..70263.11  
rows=2097152 width=8)

(3 rows)

I test some other values for effective_cache_size.
The switch from seq to index scan happens between 9900 and 1 for  
effective_cache_size.


I have my sql server on a OpenBSD 3.8 box with 1 Gb of RAM with  
nothing else running on it.
I setup the cachepercent to 25. I expect to have 25% of 1 Gb of RAM  
(256 Mb) as file cache.
effective_cache_size=15000 means 15000 x 8K of OS cache = 120,000 Kb  
which is lower than my 256 MB of disk cache.


I recall the result of my precedent test.
#rows 2097152
IndexScan 1363396,581s
SeqScan 98758,445s
Ratio  13,805
So the planner when effective_cache_size=15000 chooses a plan that is  
13 times slower than the seqscan one.


I did not understand where the problem comes from.
Any help welcome.

Cordialement,
Jean-Gérard Pailloncy



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-25 Thread Pailloncy Jean-Gerard

What same result?  You only ran it up to 2K rows, not 2M.  In any
Sorry, I do this over and over until xxx.000 rows but I do not write  
in the mail.


I do it again. initdb, create table, insert, vacuum full analyze,  
explain analyze at each stage.

And there was no problem.

So I make a copy of the offending data directory, and try again. And  
I got IndexScan only.

I will get an headheak ;-)

Too big to be send by mail: http://rilk.com/pg81.html

Cordialement,
Jean-Gérard Pailloncy


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-24 Thread Pailloncy Jean-Gerard
THIS MAY SEEM SILLY but vacuum is mispelled below and presumably  
there was never any ANALYZE done.




postgres=# vaccum full verbose analyze;

I do have done the vacUUm full verbose analyze;.
But I copy/paste the wrong line.

Cordialement,
Jean-Gérard Pailloncy


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-24 Thread Pailloncy Jean-Gerard

Pailloncy Jean-Gerard [EMAIL PROTECTED] writes:

Why the stupid indexscan plan on the whole table ?


Pray tell, what are you using for the planner cost parameters?
The only way I can come close to duplicating your numbers is
by setting random_page_cost to somewhere around 0.01 ...



I did not change the costs.

 grep cost postgresql.conf
# note: increasing max_connections costs ~400 bytes of shared memory per
# note: increasing max_prepared_transactions costs ~600 bytes of  
shared memory

#vacuum_cost_delay = 0  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1   # 0-1 credits
#vacuum_cost_page_miss = 10 # 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 0-1 credits
#random_page_cost = 4   # units are one sequential  
page fetch

# cost
#cpu_tuple_cost = 0.01  # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025 # (same)
#autovacuum_vacuum_cost_delay = -1  # default vacuum cost delay for
# vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1  # default vacuum cost limit for
# vacuum_cost_limit


Cordialement,
Jean-Gérard Pailloncy


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-24 Thread Pailloncy Jean-Gerard

I redo the test, with a freshly installed data directory. Same result.

Note: This is the full log. I just suppress the mistake I do like  
sl for ls.


Jean-Gérard Pailloncy


Last login: Thu Nov 24 12:52:32 2005 from 192.168.0.1
OpenBSD 3.8 (WDT) #2: Tue Nov  8 00:52:38 CET 2005

Welcome to OpenBSD: The proactively secure Unix-like operating system.

Please use the sendbug(1) utility to report bugs in the system.
Before reporting a bug, please try to reproduce it with the latest
version of the code.  With bug reports, please try to ensure that
enough information to reproduce the problem is enclosed, and if a
known fix for it exists, include that as well.

Terminal type? [xterm-color]
# cd /mnt2/pg/install/bin/
# mkdir /mnt2/pg/data
# chown -R _pgsql:_pgsql /mnt2/pg/data
# su _pgsql
$ ls
clusterdbdroplang pg_configpg_resetxlog  
reindexdb
createdb dropuser pg_controldata   pg_restore
vacuumdb

createlang   ecpg pg_ctl   postgres
createuser   initdb   pg_dump  postmaster
dropdb   ipcclean pg_dumpall   psql
$ ./initdb -D /mnt2/pg/data
The files belonging to this database system will be owned by user  
_pgsql.

This user must also own the server process.

The database cluster will be initialized with locale C.

fixing permissions on existing directory /mnt2/pg/data ... ok
creating directory /mnt2/pg/data/global ... ok
creating directory /mnt2/pg/data/pg_xlog ... ok
creating directory /mnt2/pg/data/pg_xlog/archive_status ... ok
creating directory /mnt2/pg/data/pg_clog ... ok
creating directory /mnt2/pg/data/pg_subtrans ... ok
creating directory /mnt2/pg/data/pg_twophase ... ok
creating directory /mnt2/pg/data/pg_multixact/members ... ok
creating directory /mnt2/pg/data/pg_multixact/offsets ... ok
creating directory /mnt2/pg/data/base ... ok
creating directory /mnt2/pg/data/base/1 ... ok
creating directory /mnt2/pg/data/pg_tblspc ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 1000
creating configuration files ... ok
creating template1 database in /mnt2/pg/data/base/1 ... ok
initializing pg_authid ... ok
enabling unlimited row size for system tables ... ok
initializing dependencies ... ok
creating system views ... ok
loading pg_description ... ok
creating conversions ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok

WARNING: enabling trust authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.

Success. You can now start the database server using:

./postmaster -D /mnt2/pg/data
or
./pg_ctl -D /mnt2/pg/data -l logfile start

$ ./pg_ctl -D /mnt2/pg/data -l /mnt2/pg/data/logfile start
postmaster starting
$ ./psql postgres
Welcome to psql 8.1.0, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help with psql commands
   \g or terminate with semicolon to execute query
   \q to quit

postgres=#  create table test (id serial, val integer);
NOTICE:  CREATE TABLE will create implicit sequence test_id_seq for  
serial column test.id

CREATE TABLE
postgres=# create unique index testid on test (id);
CREATE INDEX
postgres=# create index testval on test (val);
CREATE INDEX
postgres=# insert into test (val) values (round(random() 
*1024*1024*1024));

INSERT 0 1
postgres=# vacuum full analyze;
VACUUM
postgres=# select count(1) from test;
count
---
 1
(1 row)

postgres=# explain select count(*) from (select distinct on (val) *  
from test) as foo;

  QUERY PLAN
--
Aggregate  (cost=1.04..1.05 rows=1 width=0)
   -  Unique  (cost=1.02..1.03 rows=1 width=8)
 -  Sort  (cost=1.02..1.02 rows=1 width=8)
   Sort Key: test.val
   -  Seq Scan on test  (cost=0.00..1.01 rows=1 width=8)
(5 rows)

postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 1
postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 2
postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 4
postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 8
postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 16
postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 32
postgres=# vacuum full analyze;
VACUUM
postgres=# explain select count(*) from (select distinct on (val) *  
from test) as foo;

  QUERY PLAN
---
Aggregate  (cost=4.68..4.69 

[PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-23 Thread Pailloncy Jean-Gerard

Hi,

PostgreSQL 8.1 fresh install on a freshly installed OpenBSD 3.8 box.

postgres=# CREATE DATABASE test;
CREATE DATABASE
postgres=# create table test (id serial, val integer);
NOTICE:  CREATE TABLE will create implicit sequence test_id_seq for  
serial column test.id

CREATE TABLE
postgres=# create unique index testid on test (id);
CREATE INDEX
postgres=# create index testval on test (val);
CREATE INDEX
postgres=# insert into test (val) values (round(random() 
*1024*1024*1024));

INSERT 0 1

[...] insert many random values

postgres=# vaccum full verbose analyze;
postgres=# select count(1) from test;
  count
-
2097152
(1 row)

postgres=# explain select count(*) from (select distinct on (val) *  
from test) as foo;

QUERY PLAN
 
--

Aggregate  (cost=66328.72..66328.73 rows=1 width=0)
   -  Unique  (cost=0.00..40114.32 rows=2097152 width=8)
 -  Index Scan using testval on test  (cost=0.00..34871.44  
rows=2097152 width=8)

(3 rows)

postgres=# set enable_indexscan=off;
postgres=# explain analyze select count(*) from (select distinct on  
(val) * from test) as foo;

 QUERY PLAN
 

Aggregate  (cost=280438.64..280438.65 rows=1 width=0) (actual  
time=39604.107..39604.108 rows=1 loops=1)
   -  Unique  (cost=243738.48..254224.24 rows=2097152 width=8)  
(actual time=30281.004..37746.488 rows=2095104 loops=1)
 -  Sort  (cost=243738.48..248981.36 rows=2097152 width=8)  
(actual time=30280.999..33744.197 rows=2097152 loops=1)

   Sort Key: test.val
   -  Seq Scan on test  (cost=0.00..23537.52  
rows=2097152 width=8) (actual time=11.550..3262.433 rows=2097152  
loops=1)

Total runtime: 39624.094 ms
(6 rows)

postgres=# set enable_indexscan=on;
postgres=# explain analyze select count(*) from (select distinct on  
(val) * from test where val1000) as foo;
   
QUERY PLAN
 
---
Aggregate  (cost=4739.58..4739.59 rows=1 width=0) (actual  
time=4686.472..4686.473 rows=1 loops=1)
   -  Unique  (cost=4380.56..4483.14 rows=20515 width=8) (actual  
time=4609.046..4669.289 rows=19237 loops=1)
 -  Sort  (cost=4380.56..4431.85 rows=20515 width=8)  
(actual time=4609.041..4627.976 rows=19255 loops=1)

   Sort Key: test.val
   -  Bitmap Heap Scan on test  (cost=88.80..2911.24  
rows=20515 width=8) (actual time=130.954..4559.244 rows=19255 loops=1)

 Recheck Cond: (val  1000)
 -  Bitmap Index Scan on testval   
(cost=0.00..88.80 rows=20515 width=0) (actual time=120.041..120.041  
rows=19255 loops=1)

   Index Cond: (val  1000)
Total runtime: 4690.513 ms
(9 rows)

postgres=# explain select count(*) from (select distinct on (val) *  
from test where val1) as foo;

   QUERY PLAN
 
-

Aggregate  (cost=16350.20..16350.21 rows=1 width=0)
   -  Unique  (cost=0.00..13748.23 rows=208158 width=8)
 -  Index Scan using testval on test  (cost=0.00..13227.83  
rows=208158 width=8)

   Index Cond: (val  1)
(4 rows)

postgres=# set enable_indexscan=off;
postgres=# explain analyze select count(*) from (select distinct on  
(val) * from test where val1) as foo;
  
QUERY PLAN
 
 

Aggregate  (cost=28081.27..28081.28 rows=1 width=0) (actual  
time=6444.650..6444.651 rows=1 loops=1)
   -  Unique  (cost=24438.50..25479.29 rows=208158 width=8) (actual  
time=5669.118..6277.206 rows=194142 loops=1)
 -  Sort  (cost=24438.50..24958.89 rows=208158 width=8)  
(actual time=5669.112..5852.351 rows=194342 loops=1)

   Sort Key: test.val
   -  Bitmap Heap Scan on test  (cost=882.55..6050.53  
rows=208158 width=8) (actual time=1341.114..4989.840 rows=194342  
loops=1)

 Recheck Cond: (val  1)
 -  Bitmap Index Scan on testval   
(cost=0.00..882.55 rows=208158 width=0) (actual  
time=1339.707..1339.707 rows=194342 loops=1)

   Index Cond: (val  1)
Total runtime: 6487.114 ms
(9 rows)

postgres=# explain analyze select count(*) from (select distinct on  
(val) * from test where val75000) as foo;
   

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Pailloncy Jean-Gerard
Your main example seems to focus on a large table where a key  
column has

constrained values.  This case is interesting in proportion to the
number of possible values.  If I have billions of rows, each  
having one

of only two values, I can think of a trivial and very fast method of
returning the table sorted by that key: make two sequential passes,
returning the first value on the first pass and the second value  
on the

second pass.  This will be faster than the method you propose.


1= No that was not my main example.  It was the simplest example  
used to
frame the later more complicated examples.  Please don't get hung  
up on it.


2= You are incorrect.  Since IO is the most expensive operation we  
can do,
any method that makes two passes through the data at top scanning  
speed
will take at least 2x as long as any method that only takes one  
such pass.

You do not get the point.
As the time you get the sorted references to the tuples, you need to  
fetch the tuples themself, check their visbility, etc. and returns  
them to the client.


So,
if there is only 2 values in the column of big table that is larger  
than available RAM,

two seq scans of the table without any sorting
is the fastest solution.

Cordialement,
Jean-Gérard Pailloncy


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Error in VACUUM FULL VERBOSE ANALYZE (not enough memory)

2004-12-17 Thread Pailloncy Jean-Gerard
The classic output from top (during all other index vacuum):
  PIDUID   PRI NICE  SIZE   RES STATE WAIT TIMECPU COMMAND
20461503140   13M   75M sleep semwai   5:27  2.05% postgres
When backend hits the tsearch2 index, SIZE/RES grows until it reachs 
1GB, where I got the error.
  PIDUID   PRI NICE  SIZE   RES STATE WAIT TIMECPU COMMAND
20461503-50  765M  824M sleep biowai   4:26 33.20% postgres

Cordialement,
Jean-Gérard Pailloncy
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])