[PERFORM] Yet another slow nested loop

2009-06-16 Thread Alexander Staubo
Here's the query:

select photos.*
from photos
inner join event_participations on
  event_participations.user_id = photos.creator_id and
  event_participations.attend = true
inner join event_instances on
  event_instances.id = event_participations.event_instance_id
where (
  (event_instances.venue_id = 1290) and
  (photos.taken_at  (event_instances.time + interval '-3600 seconds')) and
  (photos.taken_at  (event_instances.time + interval '25200 seconds'))
)
order by taken_at desc
limit 20

It occasionally takes four minutes to run:


 QUERY PLAN
--
 Limit  (cost=0.00..10997.65 rows=20 width=116) (actual
time=262614.474..262614.474 rows=0 loops=1)
   -  Nested Loop  (cost=0.00..5729774.95 rows=10420 width=116)
(actual time=262614.470..262614.470 rows=0 loops=1)
 Join Filter: ((photos.taken_at  (event_instances.time +
'-01:00:00'::interval)) AND (photos.taken_at  (event_instances.time
+ '07:00:00'::interval)))
 -  Nested Loop  (cost=0.00..2055574.35 rows=11869630
width=120) (actual time=21.750..121838.012 rows=14013998 loops=1)
   -  Index Scan Backward using photos_taken_at on photos
 (cost=0.00..40924.34 rows=544171 width=116) (actual
time=14.997..1357.724 rows=544171 loops=1)
   -  Index Scan using event_participations_user_id_index
on event_participations  (cost=0.00..2.95 rows=60 width=8) (actual
time=0.007..0.159 rows=26 loops=544171)
 Index Cond: (event_participations.user_id =
photos.creator_id)
 Filter: event_participations.attend
 -  Index Scan using event_instances_pkey on event_instances
(cost=0.00..0.29 rows=1 width=12) (actual time=0.008..0.008 rows=0
loops=14013998)
   Index Cond: (event_instances.id =
event_participations.event_instance_id)
   Filter: (event_instances.venue_id = 1290)
 Total runtime: 262614.585 ms

With enable_nestloop to false, it takes about 1 second to run.

Database is freshly analyzed and vacuumed. Default statistics target
is 100. I have tried increasing the stats on
event_participations.user_id, event_participations.event_instance_id
and photos.taken_at to 1000, but no improvement.

This is PostgreSQL 8.3.3.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
Hi everybody, I'm creating my database on postgres and after some days
of hard work I'm arrived to obtain good performance and owfull
performace with the same configuration.
I have complex query that perform very well with mergejoin on and
nestloop off.
If I activate nestloop postgres try to use it and the query execution
become inconclusive: after 3 hours still no answare so I kill the query.
Tht's ok but, with this configuration, very simple and little query like
slect colum from table where primarykey=value bacome incredibly slow.
The only solutionI found at the momento is to set mergejoin to off
before doing this query.
That is an awfull solution because with that solution I have to change
all the software (a big, old software) in the (many) points in witch
this kind of query are used (the same problem to set to off mergejoin
for all the system and activate it on che connection that have to make
the hard query).
Do you have any suggestion to accelerate both complex and silply query?
I've tried a lot of configuration in enabling different  Planner Method
Configuration but the only combination that really accelerate hard
query is mergejoin on and nestloop off, other settings seems to be
useless.
Thank's in advance.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another slow nested loop

2009-06-16 Thread Dave Dutcher
 -Original Message-
 From: Alexander Staubo
 
-  Nested Loop  (cost=0.00..5729774.95 rows=10420 width=116)
 (actual time=262614.470..262614.470 rows=0 loops=1)
  Join Filter: ((photos.taken_at  (event_instances.time +
 '-01:00:00'::interval)) AND (photos.taken_at  (event_instances.time
 + '07:00:00'::interval)))
  -  Nested Loop  (cost=0.00..2055574.35 rows=11869630
 width=120) (actual time=21.750..121838.012 rows=14013998 loops=1)


Do you have any of the other enable_* options set to false?  What do you
have random_page_cost set to?  I ask because I'm surprised to see postgres
choose to loop when it knows it will have to loop 11 million times.

Dave



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another slow nested loop

2009-06-16 Thread Alexander Staubo
On Tue, Jun 16, 2009 at 3:56 PM, Dave Dutcherd...@tridecap.com wrote:
 -Original Message-
 From: Alexander Staubo

    -  Nested Loop  (cost=0.00..5729774.95 rows=10420 width=116)
 (actual time=262614.470..262614.470 rows=0 loops=1)
          Join Filter: ((photos.taken_at  (event_instances.time +
 '-01:00:00'::interval)) AND (photos.taken_at  (event_instances.time
 + '07:00:00'::interval)))
          -  Nested Loop  (cost=0.00..2055574.35 rows=11869630
 width=120) (actual time=21.750..121838.012 rows=14013998 loops=1)


 Do you have any of the other enable_* options set to false?

No.

 What do you
 have random_page_cost set to?  I ask because I'm surprised to see postgres
 choose to loop when it knows it will have to loop 11 million times.

The default, ie. 4.0.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Joshua Tolley
On Tue, Jun 16, 2009 at 03:37:42PM +0200, Alberto Dalmaso wrote:
 Hi everybody, I'm creating my database on postgres and after some days
 of hard work I'm arrived to obtain good performance and owfull
 performace with the same configuration.
 I have complex query that perform very well with mergejoin on and
 nestloop off.
 If I activate nestloop postgres try to use it and the query execution
 become inconclusive: after 3 hours still no answare so I kill the query.
 Tht's ok but, with this configuration, very simple and little query like
 slect colum from table where primarykey=value bacome incredibly slow.
 The only solutionI found at the momento is to set mergejoin to off
 before doing this query.
 That is an awfull solution because with that solution I have to change
 all the software (a big, old software) in the (many) points in witch
 this kind of query are used (the same problem to set to off mergejoin
 for all the system and activate it on che connection that have to make
 the hard query).
 Do you have any suggestion to accelerate both complex and silply query?
 I've tried a lot of configuration in enabling different  Planner Method
 Configuration but the only combination that really accelerate hard
 query is mergejoin on and nestloop off, other settings seems to be
 useless.
 Thank's in advance.

It would be helpful if you posted EXPLAIN ANALYZE results for both queries.
This will require you to run each query to completion; if that's not possible
for the 3 hour query, at least run EXPLAIN and post those results.

- Josh / eggyknap


signature.asc
Description: Digital signature


Re: [PERFORM] performance with query

2009-06-16 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 
 I have complex query that perform very well with mergejoin on and
 nestloop off.
 If I activate nestloop postgres try to use it and the query
 execution become inconclusive: after 3 hours still no answare so I
 kill the query.
 Tht's ok but, with this configuration, very simple and little query
 like slect colum from table where primarykey=value bacome
 incredibly slow.
 The only solutionI found at the momento is to set mergejoin to off
 before doing this query.
 
We'll need  a lot more information to be able to provide useful
advice.
 
What version of PostgreSQL?
 
What OS?
 
What does the hardware look like?  (CPUs, drives, memory, etc.)
 
Do you have autovacuum running?  What other regular maintenance to you
do?
 
What does your postgresql.conf file look like?  (If you can strip out
all comments and show the rest, that would be great.)
 
With that as background, if you can show us the schema for the
table(s) involved and the text of a query, along with the EXPLAIN
ANALYZE output (or just EXPLAIN, if the query runs too long to get the
EXPLAIN ANALYZE results) that would allow us to wee where things are
going wrong.  Please show this information without setting any of the
optimizer options off; but then, as a diagnostic step, *also* show
EXPLAIN ANALYZE results when you set options to a configuration that
runs faster.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another slow nested loop

2009-06-16 Thread Tom Lane
Alexander Staubo a...@bengler.no writes:
 Here's the query:
 select photos.*
 from photos
 inner join event_participations on
   event_participations.user_id = photos.creator_id and
   event_participations.attend = true
 inner join event_instances on
   event_instances.id = event_participations.event_instance_id
 where (
   (event_instances.venue_id = 1290) and
   (photos.taken_at  (event_instances.time + interval '-3600 seconds')) and
   (photos.taken_at  (event_instances.time + interval '25200 seconds'))
 )
 order by taken_at desc
 limit 20

 It occasionally takes four minutes to run:

Actually the easiest way to fix that is to get rid of the LIMIT.
(Maybe use a cursor instead, and fetch only twenty rows.)  LIMIT
magnifies the risks from any estimation error, and you've got a lot
of that here ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
 What version of PostgreSQL?
8.3 that comes with opensuse 11.1
  
 What OS?
Linux, opensuse 11.1 64 bit
  
 What does the hardware look like?  (CPUs, drives, memory, etc.)
2 * opteron dual core 8 GB RAM, 70 GB SCSI U320 RAID 1
  
 Do you have autovacuum running?  What other regular maintenance to you
 do?
YES, autovacuum and analyze are running, the only other activity is the
wal backup
  
 What does your postgresql.conf file look like?  (If you can strip out
 all comments and show the rest, that would be great.)

I'll post only the value I've changed

shared_buffers = 1536MB 
temp_buffers = 5MB  
max_prepared_transactions = 30  
  
work_mem = 50MB # I've lot of work in order by
maintenance_work_mem =50MB  
max_stack_depth = 6MB   

max_fsm_pages = 16  
max_fsm_relations = 5000

wal_buffers = 3072kB  

enable_bitmapscan = on
enable_hashagg = on
enable_hashjoin = off
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = off
enable_seqscan = off
enable_sort = off
enable_tidscan = on


effective_cache_size = 3600MB

geqo = off
default_statistics_target = 100

  
 With that as background, if you can show us the schema for the
 table(s) involved and the text of a query, along with the EXPLAIN
 ANALYZE output (or just EXPLAIN, if the query runs too long to get the
 EXPLAIN ANALYZE results) that would allow us to wee where things are
 going wrong.  Please show this information without setting any of the
 optimizer options off; but then, as a diagnostic step, *also* show
 EXPLAIN ANALYZE results when you set options to a configuration that
 runs faster.
  
 -Kevin

The problem is that in the simply query it uses mergejoin instead of
nastedloop (obvious for the parameters I set) but in this situation in
becomes very very slow (15 sec vs 5 ms when I set to off mergejoin).

That is the explain of the complex query that works with more than
acceptable performance

Merge Right Join  (cost=508603077.17..508603195.59 rows=1 width=227)
  Merge Cond: (ve_edil_rendite.id_domanda = domande.id_domanda)
  -  GroupAggregate  (cost=0.00..105.51 rows=1031 width=11)
-  Index Scan using pk_ve_edil_rendite on ve_edil_rendite
(cost=0.00..86.84 rows=1157 width=11)
  -  Materialize  (cost=508603077.17..508603077.18 rows=1 width=195)
-  Nested Loop  (cost=506932259.90..508603077.17 rows=1
width=195)
  -  Merge Join  (cost=406932259.90..408603074.89 rows=1
width=188)
Merge Cond: (domande.id_domanda =
c_elaout_7.id_domanda)
-  Merge Join  (cost=406932259.90..408188339.97
rows=1 width=240)
  Merge Cond: (c_elaout_5.id_domanda =
domande.id_domanda)
  -  Merge Join  (cost=3895.15..1259628.81
rows=138561 width=41)
Merge Cond: (edil_veneto.id_domanda =
c_elaout_5.id_domanda)
-  Merge Join
(cost=1123.18..372710.75 rows=98122 width=29)
  Merge Cond:
(edil_veneto.id_domanda = c_elaout_6.id_domanda)
  -  Index Scan using
IDX_pk_Edil_Veneto on edil_veneto  (cost=0.00..11825.14 rows=232649
width=17)
  -  Index Scan using
IDX_3_c_elaout on c_elaout c_elaout_6  (cost=0.00..359914.34
rows=98122 width=12)
Index Cond:
((c_elaout_6.node)::text = 'contributo_sociale'::text)
-  Index Scan using IDX_3_c_elaout
on c_elaout c_elaout_5  (cost=0.00..887091.20 rows=245306 width=12)
  Index Cond:
((c_elaout_5.node)::text = 'contributo'::text)
  -  Materialize
(cost=406928364.74..406928364.75 rows=1 width=199)
-  Nested Loop
(cost=402583154.89..406928364.74 rows=1 width=199)
  Join Filter:
((r_enti.codice_ente)::text = (r_luoghi.cod_catastale)::text)
  -  Merge Join
(cost=202583154.89..206928031.60 rows=1 width=198)
Merge Cond:
(domande.id_domanda = c_elaout_4.id_domanda)
-  Merge Join
(cost=202583154.89..206425374.54 rows=1 width=186)
  Merge Cond:
(domande.id_domanda = c_elain_3.id_domanda)
  -  Merge Join
(cost=201328203.80..205170407.27 rows=41 width=138)
Merge Cond:
(domande.id_domanda = c_elain_7.id_domanda)
-  Merge Join
(cost=201328203.80..204498966.35 rows=93 width=126)
  Merge
Cond: (domande.id_domanda = 

Re: [PERFORM] performance with query

2009-06-16 Thread Matthew Wakeling

On Tue, 16 Jun 2009, Alberto Dalmaso wrote:

What does your postgresql.conf file look like?



enable_hashjoin = off
enable_nestloop = off
enable_seqscan = off
enable_sort = off


Why are these switched off?


and that is the explain of the too slow simple query

Merge Join  (cost=0.00..1032305.52 rows=4 width=12)
  Merge Cond: (domande.id_dichiarazione = c_elaout.id_domanda)
  -  Index Scan using IDX_8_domande on domande  (cost=0.00..8.39
rows=1 width=4)
Index Cond: (id_domanda = 4165757)
  -  Index Scan using IDX_2_c_elaout on c_elaout
(cost=0.00..1030283.89 rows=805279 width=12)
Filter: ((c_elaout.node)::text = 'Invalido'::text)

this cost  15 sec


with mergejoin to off:

Nested Loop  (cost=1.00..10022.97 rows=4 width=12)
  -  Index Scan using IDX_8_domande on domande  (cost=0.00..8.39
rows=1 width=4)
Index Cond: (id_domanda = 4165757)
  -  Index Scan using IDX_2_c_elaout on c_elaout  (cost=0.00..14.54
rows=4 width=12)
Index Cond: (c_elaout.id_domanda = domande.id_dichiarazione)
Filter: ((c_elaout.node)::text = 'Invalido'::text)

this cost 15 msec!!!


Well duh. What you're effectively doing is telling Postgres to NEVER use a 
nested loop. Then you're getting upset because it isn't using a nested 
loop. When you tell it to NEVER use anything (switching all join 
algorithms off), it ignores you and chooses the right plan anyway.


Matthew

--
You can configure Windows, but don't ask me how.   -- Bill Gates

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 
  What version of PostgreSQL?
 8.3 that comes with opensuse 11.1
 
Could you show us the result of SELECT version(); ?
 
 max_prepared_transactions = 30
 
Unless you're using distributed transactions or need a lot of locks,
that's just going to waste some RAM.  Zero is fine for most people.
 
 maintenance_work_mem =50MB
 
That's a little small -- this only comes into play for maintenance
tasks like index builds.  Not directly part of your reported problem,
but maybe something to bump to the 1GB range.
 
 max_fsm_pages = 16
 max_fsm_relations = 5000
 
Have you done any VACUUM VERBOSE lately and captured the output?  If
so, what do the last few lines say?  (That's a lot of relations for
the number of pages; just curious how it maps to actual.)
 
 enable_hashjoin = off
 enable_nestloop = off
 enable_seqscan = off
 enable_sort = off
 
That's probably a bad idea.  If particular queries aren't performing
well, you can always set these temporarily on a particular connection.
Even then, turning these off is rarely a good idea except for
diagnostic purposes.  I *strongly* recommend you put all of these back
to the defaults of 'on' and start from there, turning off selected
items as needed to get EXPLAIN ANALYZE output to demonstrate the
better plans you've found for particular queries.
 
 effective_cache_size = 3600MB
 
That seems a little on the low side for an 8GB machine, unless you
have other things on there using a lot of RAM.  Do you?
 
If you could set the optimizer options back on and get new plans where
you show specifically which options (if any) where turned off for the
run, that would be good.  Also, please attach the plans to the email
instead of pasting -- the word wrap makes them hard to read.  Finally,
if you could do \d on the tables involved in the query, it would help.
I'll hold off looking at these in hopes that you can do the above.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha scritto:
 On Tue, 16 Jun 2009, Alberto Dalmaso wrote:
  What does your postgresql.conf file look like?
 
  enable_hashjoin = off
  enable_nestloop = off
  enable_seqscan = off
  enable_sort = off
 
 Why are these switched off?
 
because of the need to pump up the performance of the complex query. If
I set then to on then it try to use nasted loop even in the complex
query and that query does never arrive to a response and, of course,
I need a response from it!!!
So my problem is to find a configuration taht save performance for all
the two kind of query, but I'm not abble to find it.
Move to parameters of the RAM can save a 10% of the time in the complex
query, wile I have no changes on the simple one...


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another slow nested loop

2009-06-16 Thread Alexander Staubo
On Tue, Jun 16, 2009 at 4:36 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Actually the easiest way to fix that is to get rid of the LIMIT.
 (Maybe use a cursor instead, and fetch only twenty rows.)  LIMIT
 magnifies the risks from any estimation error, and you've got a lot
 of that here ...

There's no cursor support in ActiveRecord, the ORM library we use, and
I'm not going to write it. Anyway, I would prefer not to gloss over
the underlying problem with something that requires a TODO next to
it. What can be done to fix the underlying problem? Nothing?

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha
 scritto:
 On Tue, 16 Jun 2009, Alberto Dalmaso wrote:
 
  enable_hashjoin = off
  enable_nestloop = off
  enable_seqscan = off
  enable_sort = off
 
 Why are these switched off?
 
 because of the need to pump up the performance of the complex query.
 
These really are meant primarily for diagnostic purposes.  As a last
resort, you could set them off right before running a problem query,
and set them back on again afterward; but you will be much better off
if you can cure the underlying problem.  The best chance of that is to
show us the plan you get with all turned on.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
 Could you show us the result of SELECT version(); ?
of course I can 
PostgreSQL 8.3.7 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE
Linux) 4.3.2 [gcc-4_3-branch revision 141291]
  
 Have you done any VACUUM VERBOSE lately and captured the output?  If
 so, what do the last few lines say?  (That's a lot of relations for
 the number of pages; just curious how it maps to actual.)
It need a lot of time (20 GB database), when I will have the answare
I'll post it
  
  enable_hashjoin = off
  enable_nestloop = off
  enable_seqscan = off
  enable_sort = off
  
 That's probably a bad idea.  If particular queries aren't performing
 well, you can always set these temporarily on a particular connection.
 Even then, turning these off is rarely a good idea except for
 diagnostic purposes.  I *strongly* recommend you put all of these back
 to the defaults of 'on' and start from there, turning off selected
 items as needed to get EXPLAIN ANALYZE output to demonstrate the
 better plans you've found for particular queries.

OK, it will became the viceversa of what I'm doing now (set them to on
and set them to off only on the appropriate connection instead of set
them to off and set them to on only on some appropriate connection).
But the question is: do you thing it is impossible to find a
configuration that works fine for both the kind of query? The
application have to run even versus oracle db... i wont have to write a
different source for the two database...

  
  effective_cache_size = 3600MB
  
 That seems a little on the low side for an 8GB machine, unless you
 have other things on there using a lot of RAM.  Do you?
yes there are two instances of postgress running on the same server (the
database have to stay complitely separated).
  
 If you could set the optimizer options back on and get new plans where
 you show specifically which options (if any) where turned off for the
 run, that would be good.  Also, please attach the plans to the email
 instead of pasting -- the word wrap makes them hard to read.  Finally,
 if you could do \d on the tables involved in the query, it would help.
 I'll hold off looking at these in hopes that you can do the above.
  
 -Kevin
I attach the explanation of the log query after setting all the enable
to on. In this condition the query will never finish...
QUERY PLAN
Nested Loop Left Join  (cost=283253.73..417552.90 rows=1 width=227)
  Join Filter: (ve_edil_rendite.id_domanda = domande.id_domanda)
  -  Nested Loop  (cost=283222.38..417485.46 rows=1 width=195)
-  Nested Loop  (cost=283222.38..417470.91 rows=1 width=247)
  -  Nested Loop  (cost=283222.38..417456.36 rows=1 width=235)
Join Filter: (edil_veneto.id_tp_superficie = 
ve_edil_tp_superfici.id_tp_superficie)
-  Nested Loop  (cost=283222.38..417455.29 rows=1 
width=228)
  -  Nested Loop  (cost=283222.38..417440.75 rows=1 
width=216)
Join Filter: ((r_enti.codice_ente)::text = 
(r_luoghi.cod_catastale)::text)
-  Nested Loop  (cost=283222.38..417107.61 
rows=1 width=215)
  -  Nested Loop  
(cost=283222.38..417093.06 rows=1 width=203)
-  Nested Loop  
(cost=283222.38..417084.75 rows=1 width=186)
  -  Nested Loop  
(cost=283222.38..417070.20 rows=1 width=174)
-  Nested Loop  
(cost=283222.38..417057.25 rows=1 width=162)
  -  Nested Loop  
(cost=283222.38..417044.33 rows=1 width=150)
-  Nested 
Loop  (cost=283222.38..417031.40 rows=1 width=138)
  -  
Nested Loop  (cost=283222.38..417016.85 rows=1 width=126)

-  Nested Loop  (cost=283222.38..416343.12 rows=44 width=114)

  -  Merge Join  (cost=283222.38..384803.94 rows=2431 width=102)

Merge Cond: (componenti.id_dichiarazione = domande.id_dichiarazione)

-  GroupAggregate  (cost=0.00..94032.39 rows=601009 width=12)

  -  Index Scan using IDX_1_componenti on componenti  
(cost=0.00..76403.45 rows=2023265 width=12)

-  Sort  (cost=283222.38..283223.41 rows=412 width=102)
 

Re: [PERFORM] performance with query

2009-06-16 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote:
 
 do you thing it is impossible to find a
 configuration that works fine for both the kind of query?
 
No.  We probably just need a little more information.
 
 The application have to run even versus oracle db... i wont have to
 write a different source for the two database...
 
I understand completely.
 
 I attach the explanation of the log query after setting all the
 enable to on. In this condition the query will never finish...
 
We're getting close.  Can you share the table structure and the actual
query you are running?  It's a lot easier (for me, anyway) to put this
puzzle together with all the pieces in hand.
 
Also, if you can set off some of the optimizer options and get a fast
plan, please show us an EXPLAIN ANALYZE for that, with information on
which settings were turned off.  That will help show where bad
estimates may be causing a problem, or possibly give a hint of table
or index bloat problems.
 
I think we're getting close to critical mass for seeing the
solution
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
Il giorno mar, 16/06/2009 alle 11.31 -0400, Tom Lane ha scritto:
 Alberto Dalmaso dalm...@clesius.it writes:
  Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha scritto:
  enable_hashjoin = off
  enable_nestloop = off
  enable_seqscan = off
  enable_sort = off
  
  Why are these switched off?
  
  because of the need to pump up the performance of the complex query.
 
 That is *not* the way to improve performance of a query.  Turning off
 specific enable_ parameters can be helpful while investigating planner
 behavior, but it is never recommended as a production solution.  You
 have already found out why.
 
   regards, tom lane
Ok, but the problem is that my very long query performes quite well when
it works with merge join but it cannot arrive to an end if it use other
kind of joining.
If i put all the parameter to on, as both of you tell me, in the
explanation I'll see that the db use nasted loop.
If i put to off nasted loop, it will use hash join.
How can I write the query so that the analyzer will use mergejoin (that
is the only option that permit the query to give me the waited answare)
without changing the settings every time on the connection?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 
 I attach the explanation of the log query after setting all the
 enable to on. In this condition the query will never finish...
 
I notice that you many joins in there.  If the query can't be
simplified, you probably need to boost the join_collapse_limit and
from_collapse_limit quite a bit.  If planning time goes through the
roof in that case, you may need to enable geqo -- this is what it's
intended to help.  If you try geqo, you may need to tune it; I'm not
familiar with the knobs for tuning that, so maybe someone else will
jump in if you get to that point.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Tom Lane
Alberto Dalmaso dalm...@clesius.it writes:
 Ok, but the problem is that my very long query performes quite well when
 it works with merge join but it cannot arrive to an end if it use other
 kind of joining.
 If i put all the parameter to on, as both of you tell me, in the
 explanation I'll see that the db use nasted loop.
 If i put to off nasted loop, it will use hash join.
 How can I write the query so that the analyzer will use mergejoin (that
 is the only option that permit the query to give me the waited answare)
 without changing the settings every time on the connection?

You have the wrong mindset completely.  Instead of thinking how can I
force the planner to do it my way, you need to be thinking why is the
planner guessing wrong about which is the best way to do it?  And how
can I improve its guess?

There's not really enough information in what you've posted so far to
let people help you with that question, but one thing that strikes me
from the EXPLAIN is that you have a remarkably large number of joins.
Perhaps increasing from_collapse_limit and/or join_collapse_limit
(to more than the number of tables in the query) would help.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
Unfortunatly the query need that level of complxity as the information I
have to show are spread around different table.
I have tryed the geqo on at the beginning but only with the default
parameters.
Tomorrow (my working day here in Italy is finished some minutes ago, so
I will wait for the end of the explain analyze and the go home ;-P )
I'll try to increase, as you suggest, join_collapse_limit and
from_collapse_limit.
If someone can give me some information on how to configure geqo, I'll
try it again.
In the meantime this night I leave the vacuum verbose to work for me.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
Even if the query end in aproximately 200 sec, the explain analyze is
still working and there are gone more than 1000 sec...
I leave it working this night.
Have a nice evening and thenks for the help.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Speeding up a query.

2009-06-16 Thread Hartman, Matthew
Good afternoon.

I have developed an application to efficiently schedule chemotherapy
patients at our hospital. The application takes into account several
resource constraints (available chairs, available nurses, nurse coverage
assignment to chairs) as well as the chair time and nursing time
required for a regimen.

The algorithm for packing appointments in respects each constraint and
typically schedules a day of treatments (30-60) within 9-10 seconds on
my workstation, down from 27 seconds initially. I would like to get it
below 5 seconds if possible.

I think what's slowing is down is simply the number of rows and joins.
The algorithm creates a scheduling matrix with one row per 5 minute
timeslot, per unit, per nurse assigned to the unit. That translates to
3,280 rows for the days I have designed in development (each day can
change). 

To determine the available slots, the algorithm finds the earliest slot
that has an available chair and a count of the required concurrent
intervals afterwards. So a 60 minute regimen requires 12 concurrent
rows. This is accomplished by joining the table on itself. A second
query is ran for the same range, but with respect to the nurse time and
an available nurse. Finally, those two are joined against each other.
Effectively, it is:

Select *
From   (
Select *
From matrix m1, matrix m2
Where m1.x = m2.x
) chair,
(
Select *
From matrix m1, matrix m2
Where m1.x = m2.x
) nurse
Where chair.id = nurse.id

With matrix having 3,280 rows. Ugh.

I have tried various indexes and clustering approachs with little
success. Any ideas?

Thanks,

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549- x4294 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Speeding up a query.

2009-06-16 Thread Anthony Presley
On the DB side of things, you will want to make sure that your caching
as much as possible - putting a front-end like memcached could help.  I
assume you have indexes on the appropriate tables?  What does the
EXPLAIN ANALYZE on that query look like?

Not necessarily a postgres solution, but I'd think this type of
solution would work really, really well inside of say a a mixed integer
or integer solver ... something like glpk or cbc.  You'd need to
reformulate the problem, but we've built applications using these tools
which can crunch through multiple billions of combinations in under 1 or
2 seconds.

(Of course, you still need to store the results, and feed the input,
using a database of some kind).


--
Anthony Presley

On Tue, 2009-06-16 at 14:35 -0400, Hartman, Matthew wrote:
 Good afternoon.
 
 I have developed an application to efficiently schedule chemotherapy
 patients at our hospital. The application takes into account several
 resource constraints (available chairs, available nurses, nurse coverage
 assignment to chairs) as well as the chair time and nursing time
 required for a regimen.
 
 The algorithm for packing appointments in respects each constraint and
 typically schedules a day of treatments (30-60) within 9-10 seconds on
 my workstation, down from 27 seconds initially. I would like to get it
 below 5 seconds if possible.
 
 I think what's slowing is down is simply the number of rows and joins.
 The algorithm creates a scheduling matrix with one row per 5 minute
 timeslot, per unit, per nurse assigned to the unit. That translates to
 3,280 rows for the days I have designed in development (each day can
 change). 
 
 To determine the available slots, the algorithm finds the earliest slot
 that has an available chair and a count of the required concurrent
 intervals afterwards. So a 60 minute regimen requires 12 concurrent
 rows. This is accomplished by joining the table on itself. A second
 query is ran for the same range, but with respect to the nurse time and
 an available nurse. Finally, those two are joined against each other.
 Effectively, it is:
 
 Select *
 From   (
   Select *
   From matrix m1, matrix m2
   Where m1.x = m2.x
   ) chair,
   (
   Select *
   From matrix m1, matrix m2
   Where m1.x = m2.x
   ) nurse
 Where chair.id = nurse.id
 
 With matrix having 3,280 rows. Ugh.
 
 I have tried various indexes and clustering approachs with little
 success. Any ideas?
 
 Thanks,
 
 Matthew Hartman
 Programmer/Analyst
 Information Management, ICP
 Kingston General Hospital
 (613) 549- x4294 
 
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance discrepancy

2009-06-16 Thread Mark Steben



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Aaron Turner
I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze):

musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);
 QUERY PLAN

 Seq Scan on pcap_store  (cost=4008.22..348521303.54 rows=106532 width=6)
   Filter: (NOT (subplan))
   SubPlan
 -  Materialize  (cost=4008.22..6765.98 rows=205475 width=4)
   -  Seq Scan on pcap_store_log  (cost=0.00..3099.75
rows=205475 width=4)
(5 rows)

musecurity=# \d muapp.pcap_store
   Table muapp.pcap_store
  Column   |  Type  |
  Modifiers
---++-
 pcap_storeid  | integer| not null default
nextval('muapp.pcap_store_pcap_storeid_seq'::regclass)
 filename  | character varying(255) |
 test_run_dutid| integer| default 0
 userid| integer| not null default 0
 analysis_recordid | bigint |
 io_xml| character varying(255) |
Indexes:
pcap_store_pkey PRIMARY KEY, btree (pcap_storeid)
Foreign-key constraints:
pcap_store_analysis_recordid_fkey FOREIGN KEY
(analysis_recordid) REFERENCES muapp.analysis(recordid) ON DELETE
CASCADE
pcap_store_test_run_dutid_fkey FOREIGN KEY (test_run_dutid)
REFERENCES muapp.test_run_dut(test_run_dutid) ON DELETE CASCADE
pcap_store_userid_fkey FOREIGN KEY (userid) REFERENCES
mucore.user(recordid) ON DELETE CASCADE

As you see, the sequence scan on pcap_store is killing me, even though
there appears to be a perfectly good index.  Is there a better way
construct this query?

Thanks,
Aaron

-- 
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Alvaro Herrera
Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):
 
 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

What PG version is this?


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Aaron Turner
On Tue, Jun 16, 2009 at 2:37 PM, Alvaro
Herreraalvhe...@commandprompt.com wrote:
 Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):

 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

 What PG version is this?

Doh, just realized I didn't reply back to list.   It's version 8.3.3.

Also, pcap_storeid is unique in pcap_store_log


-- 
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 2:37 PM, Alvaro
 Herreraalvhe...@commandprompt.com wrote:
 Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):

 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

 What PG version is this?

 Doh, just realized I didn't reply back to list.   It's version 8.3.3.

 Also, pcap_storeid is unique in pcap_store_log

Speaking as one who has dealt with this frustration more than once,
you can typically get better performance with something like:

DELETE FROM muapp.pcap_store AS x
FROM muapp.pcap_store a
LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid
WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL

This is emphatically lame, but there you have it.  It's first of all
lame that we can't do a better job optimizing NOT-IN, at least when
the expression within the subselect is known to be not-null, and it's
secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN
without a self-JOIN.

/rant

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Aaron Turner
On Tue, Jun 16, 2009 at 5:30 PM, Robert Haasrobertmh...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 2:37 PM, Alvaro
 Herreraalvhe...@commandprompt.com wrote:
 Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):

 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

 What PG version is this?

 Doh, just realized I didn't reply back to list.   It's version 8.3.3.

 Also, pcap_storeid is unique in pcap_store_log

 Speaking as one who has dealt with this frustration more than once,
 you can typically get better performance with something like:

 DELETE FROM muapp.pcap_store AS x
 FROM muapp.pcap_store a
 LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid
 WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL

That's a syntax error on 8.3.3... I don't see anywhere in the docs
where the delete command allows for multiple FROM statements.  Perhaps
you meant:

 DELETE FROM muapp.pcap_store AS x
USING muapp.pcap_store AS a
LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
b.pcap_storeid IS NULL;

Is that right?

 This is emphatically lame, but there you have it.  It's first of all
 lame that we can't do a better job optimizing NOT-IN, at least when
 the expression within the subselect is known to be not-null, and it's
 secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN
 without a self-JOIN.

Wow, glad I asked... I never would of figured that out.

-- 
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 5:30 PM, Robert Haasrobertmh...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 2:37 PM, Alvaro
 Herreraalvhe...@commandprompt.com wrote:
 Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):

 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

 What PG version is this?

 Doh, just realized I didn't reply back to list.   It's version 8.3.3.

 Also, pcap_storeid is unique in pcap_store_log

 Speaking as one who has dealt with this frustration more than once,
 you can typically get better performance with something like:

 DELETE FROM muapp.pcap_store AS x
 FROM muapp.pcap_store a
 LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid
 WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL

 That's a syntax error on 8.3.3... I don't see anywhere in the docs
 where the delete command allows for multiple FROM statements.  Perhaps
 you meant:

  DELETE FROM muapp.pcap_store AS x
        USING muapp.pcap_store AS a
        LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
 b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
 b.pcap_storeid IS NULL;

 Is that right?

Woops, yes, I think that's it.

(but I don't guarantee that it won't blow up your entire universe, so
test it carefully first)

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Aaron Turner
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haasrobertmh...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote:

  DELETE FROM muapp.pcap_store AS x
        USING muapp.pcap_store AS a
        LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
 b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
 b.pcap_storeid IS NULL;

 Is that right?

 Woops, yes, I think that's it.

 (but I don't guarantee that it won't blow up your entire universe, so
 test it carefully first)

Yeah, doing that now... taking a bit longer then I expected (took
~5min on rather slow hardware- everything is on a pair of 10K RAID1
drives), but the result seems correct.

   QUERY PLAN

 Hash Join  (cost=19229.08..29478.99 rows=106492 width=6)
   Hash Cond: (x.pcap_storeid = a.pcap_storeid)
   -  Seq Scan on pcap_store x  (cost=0.00..5617.84 rows=212984 width=10)
   -  Hash  (cost=17533.93..17533.93 rows=106492 width=4)
 -  Hash Left Join  (cost=6371.19..17533.93 rows=106492 width=4)
   Hash Cond: (a.pcap_storeid = b.pcap_storeid)
   Filter: (b.pcap_storeid IS NULL)
   -  Seq Scan on pcap_store a  (cost=0.00..5617.84
rows=212984 width=4)
   -  Hash  (cost=3099.75..3099.75 rows=205475 width=4)
 -  Seq Scan on pcap_store_log b
(cost=0.00..3099.75 rows=205475 width=4)

I know the costs are just relative, but I assumed
cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy
hardware.  Honestly, not complaining, 5 minutes is acceptable for this
query (it's a one time thing) just surprised is all.

Thanks for the help!

-- 
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Aaron Turner
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haasrobertmh...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote:

  DELETE FROM muapp.pcap_store AS x
        USING muapp.pcap_store AS a
        LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
 b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
 b.pcap_storeid IS NULL;

 Is that right?

 Woops, yes, I think that's it.

 (but I don't guarantee that it won't blow up your entire universe, so
 test it carefully first)

Yeah, doing that now... taking a bit longer then I expected (took
~5min on rather slow hardware- everything is on a pair of 10K RAID1
drives), but the result seems correct.

   QUERY PLAN

 Hash Join  (cost=19229.08..29478.99 rows=106492 width=6)
   Hash Cond: (x.pcap_storeid = a.pcap_storeid)
   -  Seq Scan on pcap_store x  (cost=0.00..5617.84 rows=212984 width=10)
   -  Hash  (cost=17533.93..17533.93 rows=106492 width=4)
 -  Hash Left Join  (cost=6371.19..17533.93 rows=106492 width=4)
   Hash Cond: (a.pcap_storeid = b.pcap_storeid)
   Filter: (b.pcap_storeid IS NULL)
   -  Seq Scan on pcap_store a  (cost=0.00..5617.84
rows=212984 width=4)
   -  Hash  (cost=3099.75..3099.75 rows=205475 width=4)
 -  Seq Scan on pcap_store_log b
(cost=0.00..3099.75 rows=205475 width=4)

I know the costs are just relative, but I assumed
cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy
hardware.  Honestly, not complaining, 5 minutes is acceptable for this
query (it's a one time thing) just surprised is all.

Thanks for the help!

-- 
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index Scan taking long time

2009-06-16 Thread Bryce Ewing

Hi,

I have been trying to fix a performance issue that we have which I have 
tracked down to index scans being done on a particular table (or set of 
tables):


The following query:
explain analyze select *
FROM inbound.event_20090526 e
LEFT OUTER JOIN inbound.internal_host i ON (e.mta_host_id = i.id)
LEFT OUTER JOIN inbound.internal_host iaa ON (e.aamta_host_id = iaa.id)
LEFT OUTER JOIN inbound.event_status es ON (e.event_status_id = es.id)
LEFT OUTER JOIN inbound.threat t ON (e.threat_id = t.id), inbound.domain 
d, inbound.event_type et

WHERE e.domain_id = d.id
AND e.event_type_id = et.id
AND d.name IN (
   'testdomain.com'
);


Does this:
  
QUERY 
PLAN


--
Nested Loop Left Join  (cost=0.00..10887.03 rows=8 width=2021) (actual 
time=50.352..14378.603 rows=3453 loops=1)
  -  Nested Loop Left Join  (cost=0.00..10882.23 rows=8 width=1985) 
(actual time=50.346..14372.820 rows=3453 loops=1)
-  Nested Loop Left Join  (cost=0.00..10877.43 rows=8 
width=1949) (actual time=50.336..14358.101 rows=3453 loops=1)
  -  Nested Loop Left Join  (cost=0.00..10872.63 rows=8 
width=1801) (actual time=50.321..14344.603 rows=3453 loops=1)
-  Nested Loop  (cost=0.00..10867.83 rows=8 
width=1764) (actual time=50.315..14336.979 rows=3453 loops=1)
  -  Nested Loop  (cost=0.00..10863.03 rows=8 
width=1728) (actual time=50.288..14308.368 rows=3453 loops=1)
-  Index Scan using domain_name_idx on 
domain d  (cost=0.00..6.63 rows=1 width=452) (actual time=0.049..0.080 
rows=1 loops=1)
  Index Cond: ((name)::text = 
'testdomain.com'::text)
-  Index Scan using 
event_20090526_domain_idx on event_20090526 e  (cost=0.00..10694.13 
rows=3606 width=1276) (actual time=50.233..14305.211 rows=3453 loops=1)

  Index Cond: (e.domain_id = d.id)
  -  Index Scan using event_type_pkey on 
event_type et  (cost=0.00..0.56 rows=1 width=36) (actual 
time=0.006..0.006 rows=1 loops=3453)

Index Cond: (et.id = e.event_type_id)
-  Index Scan using threat_pkey on threat t  
(cost=0.00..0.56 rows=1 width=37) (actual time=0.000..0.000 rows=0 
loops=3453)

  Index Cond: (e.threat_id = t.id)
  -  Index Scan using event_status_pkey on event_status 
es  (cost=0.00..0.56 rows=1 width=148) (actual time=0.002..0.002 rows=1 
loops=3453)

Index Cond: (e.event_status_id = es.id)
-  Index Scan using internal_host_pkey on internal_host iaa  
(cost=0.00..0.56 rows=1 width=36) (actual time=0.002..0.003 rows=1 
loops=3453)

  Index Cond: (e.aamta_host_id = iaa.id)
  -  Index Scan using internal_host_pkey on internal_host i  
(cost=0.00..0.56 rows=1 width=36) (actual time=0.000..0.000 rows=0 
loops=3453)

Index Cond: (e.mta_host_id = i.id)
Total runtime: 14380.000 ms

If the same query is done straight away again we get:

QUERY 
PLAN  


-
Nested Loop Left Join  (cost=0.00..10887.03 rows=8 width=2021) (actual 
time=0.165..67.388 rows=3453 loops=1)
  -  Nested Loop Left Join  (cost=0.00..10882.23 rows=8 width=1985) 
(actual time=0.162..61.973 rows=3453 loops=1)
-  Nested Loop Left Join  (cost=0.00..10877.43 rows=8 
width=1949) (actual time=0.156..49.756 rows=3453 loops=1)
  -  Nested Loop Left Join  (cost=0.00..10872.63 rows=8 
width=1801) (actual time=0.148..37.522 rows=3453 loops=1)
-  Nested Loop  (cost=0.00..10867.83 rows=8 
width=1764) (actual time=0.146..31.920 rows=3453 loops=1)
  -  Nested Loop  (cost=0.00..10863.03 rows=8 
width=1728) (actual time=0.129..10.325 rows=3453 loops=1)
-  Index Scan using domain_name_idx on 
domain d  (cost=0.00..6.63 rows=1 width=452) (actual time=0.099..0.139 
rows=1 loops=1)
  Index Cond: ((name)::text = 
'rhe.com.au'::text)
-  Index Scan using 
event_20090526_domain_idx on event_20090526 e  (cost=0.00..10694.13 
rows=3606 width=1276) (actual time=0.027..7.510 rows=3453 loops=1)

 

Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Tom Lane
Aaron Turner synfina...@gmail.com writes:
 I know the costs are just relative, but I assumed
 cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy
 hardware.

Very likely the bulk of the time is spent in the DELETE work proper,
not in the query to find the rows to be deleted.  In particular I wonder
if you have an unindexed foreign key referencing this table ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another slow nested loop

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 11:16 AM, Alexander Stauboa...@bengler.no wrote:
 On Tue, Jun 16, 2009 at 4:36 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Actually the easiest way to fix that is to get rid of the LIMIT.
 (Maybe use a cursor instead, and fetch only twenty rows.)  LIMIT
 magnifies the risks from any estimation error, and you've got a lot
 of that here ...

 There's no cursor support in ActiveRecord, the ORM library we use, and
 I'm not going to write it. Anyway, I would prefer not to gloss over
 the underlying problem with something that requires a TODO next to
 it. What can be done to fix the underlying problem? Nothing?

Basically, we need a system that can accurately estimate multi-column
selectivity, or else some kind of planner hints.

http://archives.postgresql.org/pgsql-performance/2009-06/msg00023.php
http://archives.postgresql.org/pgsql-performance/2009-06/msg00119.php

(with apologies for linking to my own posts, but you can go back and
read the whole thread if you're interested)

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Scan taking long time

2009-06-16 Thread Scott Marlowe
On Tue, Jun 16, 2009 at 9:30 PM, Bryce Ewingbr...@smx.co.nz wrote:
 Hi,

 I have been trying to fix a performance issue that we have which I have
 tracked down to index scans being done on a particular table (or set of
 tables):

 The following query:
 explain analyze select *
 FROM inbound.event_20090526 e
 LEFT OUTER JOIN inbound.internal_host i ON (e.mta_host_id = i.id)
 LEFT OUTER JOIN inbound.internal_host iaa ON (e.aamta_host_id = iaa.id)
 LEFT OUTER JOIN inbound.event_status es ON (e.event_status_id = es.id)
 LEFT OUTER JOIN inbound.threat t ON (e.threat_id = t.id), inbound.domain d,
 inbound.event_type et
 WHERE e.domain_id = d.id
 AND e.event_type_id = et.id
 AND d.name IN (
   'testdomain.com'
 );

Without looking at the explain just yet, it seems to me that you are
constraining the order of joins to insist that the left joins be done
first, then the regular joins second, because of your mix of explicit
and implicit join syntax.  The query planner is constrained to run
explicit joins first, then implicit if I remember correctly.  So,
making it all explicit might help.  Might not.  But it's a thought

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 8.4 COPY performance regression on Solaris

2009-06-16 Thread Alan Li
Hi,

It seems that a COPY of 8M rows to a table to 8.4rc1 takes 30% longer than
it does to 8.3.7 on Solaris.

Here are the steps I've taken to reproduce this problem on two different
solaris boxes (Solaris 10 11/06 s10x_u3wos_10 X86 and Solaris 10 8/07
s10x_u4wos_12b X86).  I've tried this on a Linux box, and I do not see the
problem there.

1. Run the following in psql client to generate a 8M row data file.

copy (select generate_series(1,800), ('1 second'::interval *
generate_series(1,800) + '2007-01-01'::timestamp)) to
'/export/home/alan/work/pgsql/dump.out' with csv;

2. Build 8.3.7 and 8.4rc1 with the following config.

./configure --prefix=`pwd`/../pgsql CC=/opt/SUNWspro/bin/cc CFLAGS=-xO3
-xarch=native \
-xspace -W0,-Lt -W2,-Rcond_elim -Xa  -xildoff -xc99=none -xCC
--without-readline --with-includes=/opt/csw/include
--with-libraries=/opt/csw/lib

3. Run the following on each.

pg_ctl stop -D data -m fast
rm -rf data
initdb -D data
cat postgresql.conf  data/postgresql.conf
pg_ctl start -l cq.log -D data -w
psql -f ddl.sql postgres
time psql -c copy t from '/export/home/alan/work/pgsql/dump.out' with csv
postgres

Here are the numbers from several runs I've done.

8.3.7 - Solaris 10 11/06 s10x_u3wos_10 X86
real0m43.971s
user0m0.002s
sys 0m0.003s
real0m44.042s
user0m0.002s
sys 0m0.003s
real0m44.828s
user0m0.002s
sys 0m0.004s
real0m43.921s
user0m0.002s
sys 0m0.003s

8.4rc1 - Solaris 10 11/06 s10x_u3wos_10 X86
real1m0.041s
user0m0.002s
sys 0m0.003s
real1m0.258s
user0m0.002s
sys 0m0.004s
real1m0.173s
user0m0.002s
sys 0m0.003s
real1m0.402s
user0m0.002s
sys 0m0.003s
real1m0.767s
user0m0.002s
sys 0m0.003s

8.3.7 - Solaris 10 8/07 s10x_u4wos_12b X86
real0m36.242s
user0m0.002s
sys 0m0.004s
real0m37.206s
user0m0.002s
sys 0m0.004s
real0m38.431s
user0m0.002s
sys 0m0.004s
real0m38.885s
user0m0.002s
sys 0m0.004s
real0m38.177s
user0m0.002s
sys 0m0.004s
real0m38.332s
user0m0.002s
sys 0m0.004s
real0m38.401s
user0m0.002s
sys 0m0.004s
real0m36.817s
user0m0.002s
sys 0m0.004s
real0m39.505s
user0m0.002s
sys 0m0.004s
real0m38.871s
user0m0.002s
sys 0m0.004s
real0m38.939s
user0m0.002s
sys 0m0.004s
real0m38.823s
user0m0.002s
sys 0m0.004s
real0m37.955s
user0m0.002s
sys 0m0.004s
real0m39.196s
user0m0.002s
sys 0m0.004s

8.4rc1 - Solaris 10 8/07 s10x_u4wos_12b X86
real0m50.603s
user0m0.002s
sys 0m0.004s
real0m49.945s
user0m0.002s
sys 0m0.004s
real0m50.547s
user0m0.002s
sys 0m0.004s
real0m50.061s
user0m0.002s
sys 0m0.004s
real0m48.151s
user0m0.002s
sys 0m0.004s
real0m50.133s
user0m0.002s
sys 0m0.004s
real0m50.583s
user0m0.002s
sys 0m0.004s

Has anyone else seen this problem?

Thanks, Alan
drop table t;
create table t (i int, d timestamp);



postgresql.conf
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance