Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-21 Thread Robert Lor

Tom Lane wrote:


Tatsuo Ishii [EMAIL PROTECTED] writes:
 


18% in s_lock is definitely bad :-(.  Were you able to determine which
LWLock(s) are accounting for the contention?
   



 


Sorry for the delay. Finally I got the oprofile data. It's
huge(34MB). If you are interested, I can put somewhere. Please let me
know.
   



I finally got a chance to look at this, and it seems clear that all the
traffic is on the BufMappingLock.  This is essentially the same problem
we were discussing with respect to Gavin Hamill's report of poor
performance on an 8-way IBM PPC64 box (see hackers archives around
2006-04-21).  If your database is fully cached in shared buffers, then
you can do a whole lot of buffer accesses per unit time, and even though
all the BufMappingLock acquisitions are in shared-LWLock mode, the
LWLock's spinlock ends up being heavily contended on an SMP box.

It's likely that CVS HEAD would show somewhat better performance because
of the btree change to cache local copies of index metapages (which
eliminates a fair fraction of buffer accesses, at least in Gavin's test
case).   Getting much further than that seems to require partitioning
the buffer mapping table.  The last discussion stalled on my concerns
about unpredictable shared memory usage, but I have some ideas on that
which I'll post separately.  In the meantime, thanks for sending along
the oprofile data!

regards, tom lane
 

I ran pgbench and fired up a DTrace script using the lwlock probes we've 
added, and it looks like BufMappingLock is the most contended lock, but  
CheckpointStartLocks are held for longer duration!


Lock IdMode   Count
ControlFileLock   Exclusive   1
SubtransControlLock   Exclusive   1
   BgWriterCommLock   Exclusive   6
  FreeSpaceLock   Exclusive   6
   FirstLockMgrLock   Exclusive  48
BufFreelistLock   Exclusive  74
 BufMappingLock   Exclusive  74
CLogControlLock   Exclusive 184
 XidGenLock   Exclusive 184
CheckpointStartLock  Shared 185
   WALWriteLock   Exclusive 185
  ProcArrayLock   Exclusive 368
CLogControlLock  Shared 552
SubtransControlLock  Shared1273
  WALInsertLock   Exclusive1476
 XidGenLock  Shared1842
  ProcArrayLock  Shared3160
 SInvalLock  Shared3684
 BufMappingLock  Shared   14578

Lock Id   Combined Time (ns)
ControlFileLock 7915
   BgWriterCommLock43438
  FreeSpaceLock   39
BufFreelistLock   448530
   FirstLockMgrLock  2879957
CLogControlLock  4237750
SubtransControlLock  6378042
 XidGenLock  9500422
  WALInsertLock 16372040
 SInvalLock 23284554
  ProcArrayLock 32188638
 BufMappingLock113128512
   WALWriteLock142391501
CheckpointStartLock   4171106665


Regards,
-Robert

 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Forcing using index instead of sequential scan?

2006-07-21 Thread robin.c.smith
Title: Forcing using index instead of sequential scan?






I have been testing the performance of PostgreSQL using the simple tool found at http://benchw.sourceforge.net however I have found that all the queries it run execute with sequential scans. The website where the code runs has examples of the execution plan using indexes.

When I disable the sequential plan query 0 and query 1 run faster ( http://benchw.sourceforge.net/benchw_results_postgres_history.html ) by using the indexes as suggested by the website.

I have tried increasing the effective_cache_size and reducing the random_page_cost to try and force the optimiser to use the index but it always uses the sequential scan.

What is the best way to force the use of indexes in these queries? Currently testing with version 8.1.4.


Regards


Robin Smith


British Telecommunications plc Registered office: 81 Newgate Street London EC1A 7AJ


Registered in England no. 180


This electronic message contains information from British Telecommunications plc which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by telephone or e-mail (to the number or address above) immediately.





Re: [PERFORM] Forcing using index instead of sequential scan?

2006-07-21 Thread robin.c.smith
More information from the query:-

explain analyze
SELECT
d0.dmth,
count(f.fval )
FROM
dim0 AS d0,
fact0 AS f
WHERE   d0.d0key = f.d0key
AND d0.ddate BETWEEN '2010-01-01' AND '2010-12-28'
GROUP BY
d0.dmth
;

 QUERY PLAN


-
 HashAggregate  (cost=336998.83..336998.84 rows=1 width=8) (actual
time=33823.124..33823.134 rows=12 loops=1)
   -  Hash Join  (cost=214.83..335343.83 rows=331000 width=8) (actual
time=61.065..33605.343 rows=336000 loops=1)
 Hash Cond: (outer.d0key = inner.d0key)
 -  Seq Scan on fact0 f  (cost=0.00..281819.00 rows=1000
width=8) (actual time=12.766..28945.036 rows=1000 loops=1)
 -  Hash  (cost=214.00..214.00 rows=331 width=8) (actual
time=31.120..31.120 rows=336 loops=1)
   -  Seq Scan on dim0 d0  (cost=0.00..214.00 rows=331
width=8) (actual time=26.362..30.895 rows=336 loops=1)
 Filter: ((ddate = '2010-01-01'::date) AND (ddate
= '2010-12-28'::date))
 Total runtime: 33823.220 ms
(8 rows)


benchw=# \d fact0
Table public.fact0
 Column |  Type  | Modifiers
++---
 d0key  | integer| not null
 d1key  | integer| not null
 d2key  | integer| not null
 fval   | integer| not null
 ffill  | character varying(100) | not null
Indexes:
fact0_d0key btree (d0key)
fact0_d1key btree (d1key)
fact0_d2key btree (d2key)

benchw=# \d dim0
 Table public.dim0
 Column |  Type   | Modifiers
+-+---
 d0key  | integer | not null
 ddate  | date| not null
 dyr| integer | not null
 dmth   | integer | not null
 dday   | integer | not null
Indexes:
dim0_d0key UNIQUE, btree (d0key)

The example on the web site has the following execution plan:-

QUERY PLAN



 HashAggregate  (cost=286953.94..286953.94 rows=1 width=8)
   -  Nested Loop  (cost=0.00..285268.93 rows=337002 width=8)
 -  Seq Scan on dim0 d0  (cost=0.00..219.00 rows=337 width=8)
   Filter: ((ddate = '2010-01-01'::date) AND (ddate =
'2010-12-28'::date))
 -  Index Scan using fact0_d0key on fact0 f  (cost=0.00..833.07
rows=1022 width=8)
   Index Cond: (outer.d0key = f.d0key)

It uses the index on the join condition.

When I disable the sequential scan with:-

SET enable_seqscan TO off;

The execution plan looks like:-

   QUERY
PLAN


 HashAggregate  (cost=648831.52..648831.53 rows=1 width=8) (actual
time=19155.060..19155.071 rows=12 loops=1)
   -  Nested Loop  (cost=7.51..647176.52 rows=331000 width=8) (actual
time=97.878..18943.155 rows=336000 loops=1)
 -  Index Scan using dim0_d0key on dim0 d0  (cost=0.00..248.00
rows=331 width=8) (actual time=40.467..55.780 rows=336 loops=1)
   Filter: ((ddate = '2010-01-01'::date) AND (ddate =
'2010-12-28'::date))
 -  Bitmap Heap Scan on fact0 f  (cost=7.51..1941.94 rows=1002
width=8) (actual time=0.991..55.391 rows=1000 loops=336)
   Recheck Cond: (outer.d0key = f.d0key)
   -  Bitmap Index Scan on fact0_d0key  (cost=0.00..7.51
rows=1002 width=0) (actual time=0.583..0.583 rows=1000 loops=336)
 Index Cond: (outer.d0key = f.d0key)
 Total runtime: 19155.176 ms
(9 rows)

The query is 19 seconds long now; down from 34 seconds although the
execution plan doesn't match the example from the website.

Regards

Robin
-Original Message-
From: Peter Eisentraut [mailto:[EMAIL PROTECTED] 
Sent: 21 July 2006 12:46
To: pgsql-performance@postgresql.org
Cc: Smith,R,Robin,XJE4JA C
Subject: Re: [PERFORM] Forcing using index instead of sequential scan?


[EMAIL PROTECTED] wrote:
 What is the best way to force the use of indexes in these queries?

Well, the brute-force method is to use SET enable_seqscan TO off, but if

you want to get to the bottom of this, you should look at or post the 
EXPLAIN ANALYZE output of the offending queries.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Forcing using index instead of sequential scan?

2006-07-21 Thread robin.c.smith
The tables have all been analysed.

I set the work_mem to 50 and it still doesn't use the index :-(

Regards

Robin

-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED] 
Sent: 21 July 2006 12:54
To: Smith,R,Robin,XJE4JA C
Subject: Re: [PERFORM] Forcing using index instead of sequential scan?


[EMAIL PROTECTED] wrote:
 I have been testing the performance of PostgreSQL using the simple 
 tool found at http://benchw.sourceforge.net however I have found that 
 all the queries it run execute with sequential scans. The website 
 where the code runs has examples of the execution plan using indexes.
 
 When I disable the sequential plan query 0 and query 1 run faster ( 
 http://benchw.sourceforge.net/benchw_results_postgres_history.html ) 
 by using the indexes as suggested by the website.
 
 I have tried increasing the effective_cache_size and reducing the 
 random_page_cost to try and force the optimiser to use the index but 
 it always uses the sequential scan.
 
 What is the best way to force the use of indexes in these queries? 
 Currently testing with version 8.1.4.

Well, you don't want to be forcing it if possible. Ideally, PG should be

able to figure out what to use itself.

In the case of query0 and query1 as shown on your web-page I'd expect a 
sequential scan of dim0 then access via the index on fact0. Reasons why 
this might not be happening include:
1. Inaccurate stats - ANALYSE your tables
2. Insufficient memory for sorting etc - issue SET work_mem=XXX before 
the query and try increased values.
3. Other parameters are out-of-whack. For example, effective_cache_size 
doesn't change how much cache PG uses, it tells PG how much the O.S. 
will cache. You might find http://www.powerpostgresql.com/PerfList is a 
good quick introduction.


So - ANALYSE your tables
http://www.postgresql.org/docs/8.1/static/sql-analyze.html

Then post EXPLAIN ANALYSE for the queries and we'll see what they're
doing.
-- 
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Forcing using index instead of sequential scan?

2006-07-21 Thread Peter Eisentraut
[EMAIL PROTECTED] wrote:
 What is the best way to force the use of indexes in these queries?

Well, the brute-force method is to use SET enable_seqscan TO off, but if 
you want to get to the bottom of this, you should look at or post the 
EXPLAIN ANALYZE output of the offending queries.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] postgres benchmarks

2006-07-21 Thread Petronenko D.S.

Hello,

does anybody use OSDB benchmarks for postgres?
if not, which kind of bechmarks are used for postgres?

Thanks,
Denis.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-21 Thread Tom Lane
Robert Lor [EMAIL PROTECTED] writes:
 I ran pgbench and fired up a DTrace script using the lwlock probes we've 
 added, and it looks like BufMappingLock is the most contended lock, but  
 CheckpointStartLocks are held for longer duration!

Those numbers look a bit suspicious --- I'd expect to see some of the
LWLocks being taken in both shared and exclusive modes, but you don't
show any such cases.  You sure your script is counting correctly?
Also, it'd be interesting to count time spent holding shared lock
separately from time spent holding exclusive.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-21 Thread Jim C. Nasby
On Fri, Jul 21, 2006 at 12:56:56AM -0700, Robert Lor wrote:
 I ran pgbench and fired up a DTrace script using the lwlock probes we've 
 added, and it looks like BufMappingLock is the most contended lock, but  
 CheckpointStartLocks are held for longer duration!
 
Not terribly surprising given that that lock can generate a substantial
amount of IO (though looking at the numbers, you might want to make
bgwriter more aggressive). Also, that's a shared lock, so it won't have
nearly the impact that BufMappingLock does.

 Lock IdMode   Count
 ControlFileLock   Exclusive   1
 SubtransControlLock   Exclusive   1
BgWriterCommLock   Exclusive   6
   FreeSpaceLock   Exclusive   6
FirstLockMgrLock   Exclusive  48
 BufFreelistLock   Exclusive  74
  BufMappingLock   Exclusive  74
 CLogControlLock   Exclusive 184
  XidGenLock   Exclusive 184
 CheckpointStartLock  Shared 185
WALWriteLock   Exclusive 185
   ProcArrayLock   Exclusive 368
 CLogControlLock  Shared 552
 SubtransControlLock  Shared1273
   WALInsertLock   Exclusive1476
  XidGenLock  Shared1842
   ProcArrayLock  Shared3160
  SInvalLock  Shared3684
  BufMappingLock  Shared   14578
 
 Lock Id   Combined Time (ns)
 ControlFileLock 7915
BgWriterCommLock43438
   FreeSpaceLock   39
 BufFreelistLock   448530
FirstLockMgrLock  2879957
 CLogControlLock  4237750
 SubtransControlLock  6378042
  XidGenLock  9500422
   WALInsertLock 16372040
  SInvalLock 23284554
   ProcArrayLock 32188638
  BufMappingLock113128512
WALWriteLock142391501
 CheckpointStartLock   4171106665
 
 
 Regards,
 -Robert
 
  
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
   http://archives.postgresql.org
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-21 Thread Sven Geisler
Hi,

Tom Lane schrieb:
 Robert Lor [EMAIL PROTECTED] writes:
 I ran pgbench and fired up a DTrace script using the lwlock probes we've 
 added, and it looks like BufMappingLock is the most contended lock, but  
 CheckpointStartLocks are held for longer duration!
 
 Those numbers look a bit suspicious --- I'd expect to see some of the
 LWLocks being taken in both shared and exclusive modes, but you don't
 show any such cases.  You sure your script is counting correctly?
 Also, it'd be interesting to count time spent holding shared lock
 separately from time spent holding exclusive.

Is there a test case which shows the contention for this full cached
tables? It would be nice to have measurable numbers like context
switches and queries per second.

Sven.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-21 Thread Robert Lor

Tom Lane wrote:


Those numbers look a bit suspicious --- I'd expect to see some of the
LWLocks being taken in both shared and exclusive modes, but you don't
show any such cases. You sure your script is counting correctly?
 


I'll double check to make sure no stupid mistakes were made!


Also, it'd be interesting to count time spent holding shared lock
separately from time spent holding exclusive.
 


Will provide that data later today.

Regards,
-Robert


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] BUG #2543: Performance delay acrros the same day

2006-07-21 Thread Bruno Wolff III
On Fri, Jul 21, 2006 at 07:41:02 +,
  Alaa El Gohary [EMAIL PROTECTED] wrote:
 
 The following bug has been logged online:

The report below isn't a bug, its a performance question and should have
been sent to [EMAIL PROTECTED] I am redirecting replies there.

 A query on the postgresql DB takes about 5 seconds and then it starts to
 take more time till it reaches about 60 seconds by the end of the same day.
 I tried vacuum but nothing changed the only thing that works is to dump the
 DB ,drop and create a new one with the dump taken.
 i need to know if there is any way to restore the performance back without
 the need for drop and create
 cause i can't do this accross the day

You most likely aren't vacuuming often enough and/or don't have your FSM
setting high enough.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Bad Planner Statistics for Uneven distribution.

2006-07-21 Thread Kevin McArthur



I discussed this with a few members of #postgresql 
freenode this morning. I'll keep it breif; [note: i have cleaned out columns not 
relevant]

I have two tables, brands and models_brands. The 
first has about 300 records, the later about 350,000 records. The number of 
distinct brands in the models_brands table is  10.



=# \d 
models_brands Table 
"public.models_brands"Column 
| 
Type | 
Modifiers+---+---model | 
integer 
| not nullbrand | 
integer 
| not nullIndexes: "models_brands_brand" btree 
(brand)Foreign-key constraints: 
"models_brands_brand_fkey" FOREIGN KEY (brand) REFERENCES brands(brand_id) ON 
UPDATE CASCADE ON DELETE CASCADE 
"models_brands_model_fkey" FOREIGN KEY (model) REFERENCES models(model_id) ON 
UPDATE CASCADE ON DELETE CASCADE

a=# \d 
brands; 
Table "public.brands" Column 
| 
Type 
| 
Modifiers++---brand_id 
| 
integer 
| not null default nextval('brands_brand_id_seq'::regclass)brand_name 
| character varying(255) | not nullIndexes: 
"brands_pkey" PRIMARY KEY, btree (brand_id)

Now the plans/problems..

=# set enable_seqscan to on;SET=# explain 
analyze select distinct brand from 
models_brands; 
QUERY 
PLAN---Unique 
(cost=46300.70..48148.15 rows=4 width=4) (actual time=3699.691..6215.216 rows=4 
loops=1) - Sort (cost=46300.70..47224.43 
rows=369489 width=4) (actual time=3699.681..5027.069 rows=369489 
loops=1) Sort Key: 
brand - Seq Scan 
on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual 
time=0.040..1352.997 rows=369489 loops=1)Total runtime: 6223.666 
ms(5 rows)

=# set enable_seqscan to off;SET=# explain 
analyze select distinct brand from 
models_brands; 
QUERY 
PLAN---Unique 
(cost=0.00..863160.68 rows=4 width=4) (actual time=0.131..2584.779 rows=4 
loops=1) - Index Scan using models_brands_brand on 
models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual 
time=0.122..1440.809 rows=369489 loops=1)Total runtime: 2584.871 
ms(3 rows)


Picks the wrong plan here. Should pick the index 
with seqscanning enabled.


More (as a different wording/query)... (as 
suggested by others on irc)


=# set enable_seqscan to on;SET=# explain 
analyze select brand_id from brands where exists (select 1 from models_brands 
where brand = 
brands.brand_id); 
QUERY 
PLANSeq 
Scan on brands (cost=0.00..30.09 rows=152 width=4) (actual 
time=7742.460..62567.543 rows=4 loops=1) Filter: 
(subplan) SubPlan - Seq 
Scan on models_brands (cost=0.00..7335.61 rows=92372 width=0) (actual 
time=206.467..206.467 rows=0 
loops=303) 
Filter: (brand = $0)Total runtime: 62567.626 ms

a=# set enable_seqscan to off;SET
=# explain analyze select brand_id from brands 
where exists (select 1 from models_brands where brand = 
brands.brand_id); 
QUERY 
PLAN--Seq 
Scan on brands (cost=1.00..10715.90 rows=152 width=4) (actual 
time=0.615..3.710 rows=4 loops=1) Filter: 
(subplan) SubPlan - Index 
Scan using models_brands_brand on models_brands (cost=0.00..216410.97 
rows=92372 width=0) (actual time=0.008..0.008 rows=0 
loops=303) Index 
Cond: (brand = $0)Total runtime: 3.790 ms


It was also tried to similar results with a LIMIT 1 
in the subquery for exist.

More...

Seqscan still off..


=# explain analyze select distinct brand_id from 
brands inner join models_brands on (brand_id = 
brand); 
QUERY 
PLAN 
-Unique 
(cost=0.00..867782.58 rows=303 width=4) (actual time=0.391..4898.579 rows=4 
loops=1) - Merge Join (cost=0.00..866858.85 
rows=369489 width=4) (actual time=0.383..3749.771 rows=369489 
loops=1) Merge Cond: 
("outer".brand_id = 
"inner".brand) - 
Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) 
(actual time=0.080..0.299 rows=60 
loops=1) - Index 
Scan using models_brands_brand on models_brands (cost=0.00..862236.96 
rows=369489 width=4) (actual time=0.013..1403.175 rows=369489 
loops=1)Total runtime: 4898.697 ms

=# set enable_seqscan to on;SET=# explain 
analyze select distinct brand_id from brands inner join models_brands on 
(brand_id = 
brand); 
QUERY 
PLAN-Unique 

[PERFORM] Partitioned tables in queries

2006-07-21 Thread Kevin Keith
I have a case where I am partitioning tables based on a date range in 
version 8.1.4. For example:


table_with_millions_of_records
interaction_id  char(16) primary key
start_date   timestamp (without timezone) - indexed
.. other columns

child_1   start_date = 2006-07-21 00:00:00
child_2   start_date = 2006-07-20 00:00:00 and start_date  2006-07-21 
00:00:00

...
child_5   start_date = 2006-07-17 00:00:00 and start_date  2006-07-18 
00:00:00


with rules on the parent and child tables that redirect the data to the 
appropriate child table based on the start_date.


Because this table is going to grow very large (very quickly), and will 
need to be purged daily, I created partitions, or child tables to hold 
data for each day. I have done the same thing in Oracle in the past, and 
the PostgreSQL solution works great. The archival process is very simple 
- drop the expired child table. I am having one problem.


If I run a query on the full table (there are 5 child tables with data 
for the last 5 days), and my where clause contains data for the current 
day only:

where start_date  date_trunc('day', now())
all 5 child tables are scanned when I look at the output from explain 
analyze.


My question is - can I force the planner to only scan the relevant child 
table - when the key related to the partitioned data it part of the 
where clause?


Thanks,

Kevin


...

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Partitioned tables in queries

2006-07-21 Thread Steve Atkins


On Jul 21, 2006, at 12:17 PM, Kevin Keith wrote:

I have a case where I am partitioning tables based on a date range  
in version 8.1.4. For example:


table_with_millions_of_records
interaction_id  char(16) primary key
start_date   timestamp (without timezone) - indexed
.. other columns

child_1   start_date = 2006-07-21 00:00:00
child_2   start_date = 2006-07-20 00:00:00 and start_date   
2006-07-21 00:00:00

...
child_5   start_date = 2006-07-17 00:00:00 and start_date   
2006-07-18 00:00:00


with rules on the parent and child tables that redirect the data to  
the appropriate child table based on the start_date.


Because this table is going to grow very large (very quickly), and  
will need to be purged daily, I created partitions, or child tables  
to hold data for each day. I have done the same thing in Oracle in  
the past, and the PostgreSQL solution works great. The archival  
process is very simple - drop the expired child table. I am having  
one problem.


If I run a query on the full table (there are 5 child tables with  
data for the last 5 days), and my where clause contains data for  
the current day only:

where start_date  date_trunc('day', now())
all 5 child tables are scanned when I look at the output from  
explain analyze.


My question is - can I force the planner to only scan the relevant  
child table - when the key related to the partitioned data it part  
of the where clause?


Yes. You'll need non-overlapping check constraints in each child  
table and to set constraint_exclusion to on in postgresql.conf.


See http://www.postgresql.org/docs/8.1/static/ddl-partitioning.html  
for the gory details.


Cheers,
  Steve

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Partitioned tables in queries

2006-07-21 Thread Kevin Keith

My post might have been a little premature - and I apologize for that.

I have figured out what was causing the problem:
1. Constraint exclusion was disabled. I re-enabled.
2. I found that using the now() function - and arbitrary interval will 
produce a different execution plan that using a specific date. For example:

   assuming the current time is 16:00:
   a) where start_date  now() - interval '4 hours' scans all child tables.
   b) where start_date  '2006-07-21 12:00:00' only scans the child 
table with today's data.


So am I to assume that the value in the query must be a constant, and 
cannot be a result of a built-in function in order for 
constraint_exclusion to work correctly?


Thanks,

Kevin


Kevin Keith wrote:
I have a case where I am partitioning tables based on a date range in 
version 8.1.4. For example:


table_with_millions_of_records
interaction_id  char(16) primary key
start_date   timestamp (without timezone) - indexed
.. other columns

child_1   start_date = 2006-07-21 00:00:00
child_2   start_date = 2006-07-20 00:00:00 and start_date  
2006-07-21 00:00:00

...
child_5   start_date = 2006-07-17 00:00:00 and start_date  
2006-07-18 00:00:00


with rules on the parent and child tables that redirect the data to 
the appropriate child table based on the start_date.


Because this table is going to grow very large (very quickly), and 
will need to be purged daily, I created partitions, or child tables to 
hold data for each day. I have done the same thing in Oracle in the 
past, and the PostgreSQL solution works great. The archival process is 
very simple - drop the expired child table. I am having one problem.


If I run a query on the full table (there are 5 child tables with data 
for the last 5 days), and my where clause contains data for the 
current day only:

where start_date  date_trunc('day', now())
all 5 child tables are scanned when I look at the output from explain 
analyze.


My question is - can I force the planner to only scan the relevant 
child table - when the key related to the partitioned data it part of 
the where clause?


Thanks,

Kevin


...




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Bad Planner Statistics for Uneven distribution.

2006-07-21 Thread Tom Lane
Kevin McArthur [EMAIL PROTECTED] writes:
  -  Seq Scan on models_brands  (cost=0.00..6411.89 rows=369489 
 width=4) (actual time=0.040..1352.997 rows=369489 loops=1)
 ...
-  Index Scan using models_brands_brand on models_brands  
 (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.122..1440.809 
 rows=369489 loops=1)

 Picks the wrong plan here. Should pick the index with seqscanning enabled.

It's really not possible for a full-table indexscan to be faster than a
seqscan, and not very credible for it even to be approximately as fast.
I suspect your second query here is the beneficiary of the first query
having fetched all the pages into cache.  In general, if you want to
optimize for a mostly-cached database, you need to reduce
random_page_cost below its default value ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Bad Planner Statistics for Uneven distribution.

2006-07-21 Thread Guillaume Smet

Tom,

On 7/21/06, Tom Lane [EMAIL PROTECTED] wrote:

It's really not possible for a full-table indexscan to be faster than a
seqscan, and not very credible for it even to be approximately as fast.
I suspect your second query here is the beneficiary of the first query
having fetched all the pages into cache.  In general, if you want to
optimize for a mostly-cached database, you need to reduce
random_page_cost below its default value ...


We discussed this case on IRC and the problem was not the first set of
queries but the second one:
select brand_id from brands where exists (select 1 from models_brands
where brand = brands.brand_id);).

Isn't there any way to make PostgreSQL have a better estimation here:
-  Index Scan using models_brands_brand on models_brands
(cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008
rows=0 loops=303)
  Index Cond: (brand = $0)

I suppose it's because the planner estimates that there will be 92372
result rows that it chooses the seqscan instead of the index scan.
ALTER STATISTICS didn't change anything.
IIRC, there were already a few threads about the same sort of
estimation problem and there wasn't any solution to solve this
problem. Do you have any hint/ideas?

--
Guillaume

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-21 Thread Robert Lor

Tom Lane wrote:


Also, it'd be interesting to count time spent holding shared lock
separately from time spent holding exclusive.

 


Tom,

Here is the break down between exclusive  shared LWLocks. Do the 
numbers look reasonable to you?


Regards,
-Robert

bash-3.00# time ./Tom_lwlock_acquire.d `pgrep -n postgres`
** LWLock Count: Exclusive **
Lock IdMode   Count
ControlFileLock   Exclusive   1
  FreeSpaceLock   Exclusive   9
 XidGenLock   Exclusive 202
CLogControlLock   Exclusive 203
   WALWriteLock   Exclusive 203
   BgWriterCommLock   Exclusive 222
BufFreelistLock   Exclusive 305
 BufMappingLock   Exclusive 305
  ProcArrayLock   Exclusive 405
   FirstLockMgrLock   Exclusive 670
  WALInsertLock   Exclusive1616

** LWLock Count: Shared **
Lock IdMode   Count
CheckpointStartLock  Shared 202
CLogControlLock  Shared 450
SubtransControlLock  Shared 776
 XidGenLock  Shared2020
  ProcArrayLock  Shared3778
 SInvalLock  Shared4040
 BufMappingLock  Shared   40838

** LWLock Time: Exclusive **
Lock Id   Combined Time (ns)
ControlFileLock 8301
  FreeSpaceLock80590
CLogControlLock  1603557
   BgWriterCommLock  1607122
BufFreelistLock  1997406
 XidGenLock  2312442
 BufMappingLock  3161683
   FirstLockMgrLock  5392575
  ProcArrayLock  6034396
  WALInsertLock 12277693
   WALWriteLock324869744

** LWLock Time: Shared **
Lock Id   Combined Time (ns)
CLogControlLock  3183788
SubtransControlLock  6956229
 XidGenLock 12012576
 SInvalLock 35567976
  ProcArrayLock 45400779
 BufMappingLock300669441
CheckpointStartLock   4056134243


real0m24.718s
user0m0.382s
sys 0m0.181s



---(end of broadcast)---
TIP 6: explain analyze is your friend