[PERFORM] Yet another slow nested loop
Here's the query: select photos.* from photos inner join event_participations on event_participations.user_id = photos.creator_id and event_participations.attend = true inner join event_instances on event_instances.id = event_participations.event_instance_id where ( (event_instances.venue_id = 1290) and (photos.taken_at (event_instances.time + interval '-3600 seconds')) and (photos.taken_at (event_instances.time + interval '25200 seconds')) ) order by taken_at desc limit 20 It occasionally takes four minutes to run: QUERY PLAN -- Limit (cost=0.00..10997.65 rows=20 width=116) (actual time=262614.474..262614.474 rows=0 loops=1) - Nested Loop (cost=0.00..5729774.95 rows=10420 width=116) (actual time=262614.470..262614.470 rows=0 loops=1) Join Filter: ((photos.taken_at (event_instances.time + '-01:00:00'::interval)) AND (photos.taken_at (event_instances.time + '07:00:00'::interval))) - Nested Loop (cost=0.00..2055574.35 rows=11869630 width=120) (actual time=21.750..121838.012 rows=14013998 loops=1) - Index Scan Backward using photos_taken_at on photos (cost=0.00..40924.34 rows=544171 width=116) (actual time=14.997..1357.724 rows=544171 loops=1) - Index Scan using event_participations_user_id_index on event_participations (cost=0.00..2.95 rows=60 width=8) (actual time=0.007..0.159 rows=26 loops=544171) Index Cond: (event_participations.user_id = photos.creator_id) Filter: event_participations.attend - Index Scan using event_instances_pkey on event_instances (cost=0.00..0.29 rows=1 width=12) (actual time=0.008..0.008 rows=0 loops=14013998) Index Cond: (event_instances.id = event_participations.event_instance_id) Filter: (event_instances.venue_id = 1290) Total runtime: 262614.585 ms With enable_nestloop to false, it takes about 1 second to run. Database is freshly analyzed and vacuumed. Default statistics target is 100. I have tried increasing the stats on event_participations.user_id, event_participations.event_instance_id and photos.taken_at to 1000, but no improvement. This is PostgreSQL 8.3.3. A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] performance with query
Hi everybody, I'm creating my database on postgres and after some days of hard work I'm arrived to obtain good performance and owfull performace with the same configuration. I have complex query that perform very well with mergejoin on and nestloop off. If I activate nestloop postgres try to use it and the query execution become inconclusive: after 3 hours still no answare so I kill the query. Tht's ok but, with this configuration, very simple and little query like slect colum from table where primarykey=value bacome incredibly slow. The only solutionI found at the momento is to set mergejoin to off before doing this query. That is an awfull solution because with that solution I have to change all the software (a big, old software) in the (many) points in witch this kind of query are used (the same problem to set to off mergejoin for all the system and activate it on che connection that have to make the hard query). Do you have any suggestion to accelerate both complex and silply query? I've tried a lot of configuration in enabling different Planner Method Configuration but the only combination that really accelerate hard query is mergejoin on and nestloop off, other settings seems to be useless. Thank's in advance. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another slow nested loop
-Original Message- From: Alexander Staubo - Nested Loop (cost=0.00..5729774.95 rows=10420 width=116) (actual time=262614.470..262614.470 rows=0 loops=1) Join Filter: ((photos.taken_at (event_instances.time + '-01:00:00'::interval)) AND (photos.taken_at (event_instances.time + '07:00:00'::interval))) - Nested Loop (cost=0.00..2055574.35 rows=11869630 width=120) (actual time=21.750..121838.012 rows=14013998 loops=1) Do you have any of the other enable_* options set to false? What do you have random_page_cost set to? I ask because I'm surprised to see postgres choose to loop when it knows it will have to loop 11 million times. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another slow nested loop
On Tue, Jun 16, 2009 at 3:56 PM, Dave Dutcherd...@tridecap.com wrote: -Original Message- From: Alexander Staubo - Nested Loop (cost=0.00..5729774.95 rows=10420 width=116) (actual time=262614.470..262614.470 rows=0 loops=1) Join Filter: ((photos.taken_at (event_instances.time + '-01:00:00'::interval)) AND (photos.taken_at (event_instances.time + '07:00:00'::interval))) - Nested Loop (cost=0.00..2055574.35 rows=11869630 width=120) (actual time=21.750..121838.012 rows=14013998 loops=1) Do you have any of the other enable_* options set to false? No. What do you have random_page_cost set to? I ask because I'm surprised to see postgres choose to loop when it knows it will have to loop 11 million times. The default, ie. 4.0. A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
On Tue, Jun 16, 2009 at 03:37:42PM +0200, Alberto Dalmaso wrote: Hi everybody, I'm creating my database on postgres and after some days of hard work I'm arrived to obtain good performance and owfull performace with the same configuration. I have complex query that perform very well with mergejoin on and nestloop off. If I activate nestloop postgres try to use it and the query execution become inconclusive: after 3 hours still no answare so I kill the query. Tht's ok but, with this configuration, very simple and little query like slect colum from table where primarykey=value bacome incredibly slow. The only solutionI found at the momento is to set mergejoin to off before doing this query. That is an awfull solution because with that solution I have to change all the software (a big, old software) in the (many) points in witch this kind of query are used (the same problem to set to off mergejoin for all the system and activate it on che connection that have to make the hard query). Do you have any suggestion to accelerate both complex and silply query? I've tried a lot of configuration in enabling different Planner Method Configuration but the only combination that really accelerate hard query is mergejoin on and nestloop off, other settings seems to be useless. Thank's in advance. It would be helpful if you posted EXPLAIN ANALYZE results for both queries. This will require you to run each query to completion; if that's not possible for the 3 hour query, at least run EXPLAIN and post those results. - Josh / eggyknap signature.asc Description: Digital signature
Re: [PERFORM] performance with query
Alberto Dalmaso dalm...@clesius.it wrote: I have complex query that perform very well with mergejoin on and nestloop off. If I activate nestloop postgres try to use it and the query execution become inconclusive: after 3 hours still no answare so I kill the query. Tht's ok but, with this configuration, very simple and little query like slect colum from table where primarykey=value bacome incredibly slow. The only solutionI found at the momento is to set mergejoin to off before doing this query. We'll need a lot more information to be able to provide useful advice. What version of PostgreSQL? What OS? What does the hardware look like? (CPUs, drives, memory, etc.) Do you have autovacuum running? What other regular maintenance to you do? What does your postgresql.conf file look like? (If you can strip out all comments and show the rest, that would be great.) With that as background, if you can show us the schema for the table(s) involved and the text of a query, along with the EXPLAIN ANALYZE output (or just EXPLAIN, if the query runs too long to get the EXPLAIN ANALYZE results) that would allow us to wee where things are going wrong. Please show this information without setting any of the optimizer options off; but then, as a diagnostic step, *also* show EXPLAIN ANALYZE results when you set options to a configuration that runs faster. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another slow nested loop
Alexander Staubo a...@bengler.no writes: Here's the query: select photos.* from photos inner join event_participations on event_participations.user_id = photos.creator_id and event_participations.attend = true inner join event_instances on event_instances.id = event_participations.event_instance_id where ( (event_instances.venue_id = 1290) and (photos.taken_at (event_instances.time + interval '-3600 seconds')) and (photos.taken_at (event_instances.time + interval '25200 seconds')) ) order by taken_at desc limit 20 It occasionally takes four minutes to run: Actually the easiest way to fix that is to get rid of the LIMIT. (Maybe use a cursor instead, and fetch only twenty rows.) LIMIT magnifies the risks from any estimation error, and you've got a lot of that here ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
What version of PostgreSQL? 8.3 that comes with opensuse 11.1 What OS? Linux, opensuse 11.1 64 bit What does the hardware look like? (CPUs, drives, memory, etc.) 2 * opteron dual core 8 GB RAM, 70 GB SCSI U320 RAID 1 Do you have autovacuum running? What other regular maintenance to you do? YES, autovacuum and analyze are running, the only other activity is the wal backup What does your postgresql.conf file look like? (If you can strip out all comments and show the rest, that would be great.) I'll post only the value I've changed shared_buffers = 1536MB temp_buffers = 5MB max_prepared_transactions = 30 work_mem = 50MB # I've lot of work in order by maintenance_work_mem =50MB max_stack_depth = 6MB max_fsm_pages = 16 max_fsm_relations = 5000 wal_buffers = 3072kB enable_bitmapscan = on enable_hashagg = on enable_hashjoin = off enable_indexscan = on enable_mergejoin = on enable_nestloop = off enable_seqscan = off enable_sort = off enable_tidscan = on effective_cache_size = 3600MB geqo = off default_statistics_target = 100 With that as background, if you can show us the schema for the table(s) involved and the text of a query, along with the EXPLAIN ANALYZE output (or just EXPLAIN, if the query runs too long to get the EXPLAIN ANALYZE results) that would allow us to wee where things are going wrong. Please show this information without setting any of the optimizer options off; but then, as a diagnostic step, *also* show EXPLAIN ANALYZE results when you set options to a configuration that runs faster. -Kevin The problem is that in the simply query it uses mergejoin instead of nastedloop (obvious for the parameters I set) but in this situation in becomes very very slow (15 sec vs 5 ms when I set to off mergejoin). That is the explain of the complex query that works with more than acceptable performance Merge Right Join (cost=508603077.17..508603195.59 rows=1 width=227) Merge Cond: (ve_edil_rendite.id_domanda = domande.id_domanda) - GroupAggregate (cost=0.00..105.51 rows=1031 width=11) - Index Scan using pk_ve_edil_rendite on ve_edil_rendite (cost=0.00..86.84 rows=1157 width=11) - Materialize (cost=508603077.17..508603077.18 rows=1 width=195) - Nested Loop (cost=506932259.90..508603077.17 rows=1 width=195) - Merge Join (cost=406932259.90..408603074.89 rows=1 width=188) Merge Cond: (domande.id_domanda = c_elaout_7.id_domanda) - Merge Join (cost=406932259.90..408188339.97 rows=1 width=240) Merge Cond: (c_elaout_5.id_domanda = domande.id_domanda) - Merge Join (cost=3895.15..1259628.81 rows=138561 width=41) Merge Cond: (edil_veneto.id_domanda = c_elaout_5.id_domanda) - Merge Join (cost=1123.18..372710.75 rows=98122 width=29) Merge Cond: (edil_veneto.id_domanda = c_elaout_6.id_domanda) - Index Scan using IDX_pk_Edil_Veneto on edil_veneto (cost=0.00..11825.14 rows=232649 width=17) - Index Scan using IDX_3_c_elaout on c_elaout c_elaout_6 (cost=0.00..359914.34 rows=98122 width=12) Index Cond: ((c_elaout_6.node)::text = 'contributo_sociale'::text) - Index Scan using IDX_3_c_elaout on c_elaout c_elaout_5 (cost=0.00..887091.20 rows=245306 width=12) Index Cond: ((c_elaout_5.node)::text = 'contributo'::text) - Materialize (cost=406928364.74..406928364.75 rows=1 width=199) - Nested Loop (cost=402583154.89..406928364.74 rows=1 width=199) Join Filter: ((r_enti.codice_ente)::text = (r_luoghi.cod_catastale)::text) - Merge Join (cost=202583154.89..206928031.60 rows=1 width=198) Merge Cond: (domande.id_domanda = c_elaout_4.id_domanda) - Merge Join (cost=202583154.89..206425374.54 rows=1 width=186) Merge Cond: (domande.id_domanda = c_elain_3.id_domanda) - Merge Join (cost=201328203.80..205170407.27 rows=41 width=138) Merge Cond: (domande.id_domanda = c_elain_7.id_domanda) - Merge Join (cost=201328203.80..204498966.35 rows=93 width=126) Merge Cond: (domande.id_domanda =
Re: [PERFORM] performance with query
On Tue, 16 Jun 2009, Alberto Dalmaso wrote: What does your postgresql.conf file look like? enable_hashjoin = off enable_nestloop = off enable_seqscan = off enable_sort = off Why are these switched off? and that is the explain of the too slow simple query Merge Join (cost=0.00..1032305.52 rows=4 width=12) Merge Cond: (domande.id_dichiarazione = c_elaout.id_domanda) - Index Scan using IDX_8_domande on domande (cost=0.00..8.39 rows=1 width=4) Index Cond: (id_domanda = 4165757) - Index Scan using IDX_2_c_elaout on c_elaout (cost=0.00..1030283.89 rows=805279 width=12) Filter: ((c_elaout.node)::text = 'Invalido'::text) this cost 15 sec with mergejoin to off: Nested Loop (cost=1.00..10022.97 rows=4 width=12) - Index Scan using IDX_8_domande on domande (cost=0.00..8.39 rows=1 width=4) Index Cond: (id_domanda = 4165757) - Index Scan using IDX_2_c_elaout on c_elaout (cost=0.00..14.54 rows=4 width=12) Index Cond: (c_elaout.id_domanda = domande.id_dichiarazione) Filter: ((c_elaout.node)::text = 'Invalido'::text) this cost 15 msec!!! Well duh. What you're effectively doing is telling Postgres to NEVER use a nested loop. Then you're getting upset because it isn't using a nested loop. When you tell it to NEVER use anything (switching all join algorithms off), it ignores you and chooses the right plan anyway. Matthew -- You can configure Windows, but don't ask me how. -- Bill Gates -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Alberto Dalmaso dalm...@clesius.it wrote: What version of PostgreSQL? 8.3 that comes with opensuse 11.1 Could you show us the result of SELECT version(); ? max_prepared_transactions = 30 Unless you're using distributed transactions or need a lot of locks, that's just going to waste some RAM. Zero is fine for most people. maintenance_work_mem =50MB That's a little small -- this only comes into play for maintenance tasks like index builds. Not directly part of your reported problem, but maybe something to bump to the 1GB range. max_fsm_pages = 16 max_fsm_relations = 5000 Have you done any VACUUM VERBOSE lately and captured the output? If so, what do the last few lines say? (That's a lot of relations for the number of pages; just curious how it maps to actual.) enable_hashjoin = off enable_nestloop = off enable_seqscan = off enable_sort = off That's probably a bad idea. If particular queries aren't performing well, you can always set these temporarily on a particular connection. Even then, turning these off is rarely a good idea except for diagnostic purposes. I *strongly* recommend you put all of these back to the defaults of 'on' and start from there, turning off selected items as needed to get EXPLAIN ANALYZE output to demonstrate the better plans you've found for particular queries. effective_cache_size = 3600MB That seems a little on the low side for an 8GB machine, unless you have other things on there using a lot of RAM. Do you? If you could set the optimizer options back on and get new plans where you show specifically which options (if any) where turned off for the run, that would be good. Also, please attach the plans to the email instead of pasting -- the word wrap makes them hard to read. Finally, if you could do \d on the tables involved in the query, it would help. I'll hold off looking at these in hopes that you can do the above. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha scritto: On Tue, 16 Jun 2009, Alberto Dalmaso wrote: What does your postgresql.conf file look like? enable_hashjoin = off enable_nestloop = off enable_seqscan = off enable_sort = off Why are these switched off? because of the need to pump up the performance of the complex query. If I set then to on then it try to use nasted loop even in the complex query and that query does never arrive to a response and, of course, I need a response from it!!! So my problem is to find a configuration taht save performance for all the two kind of query, but I'm not abble to find it. Move to parameters of the RAM can save a 10% of the time in the complex query, wile I have no changes on the simple one... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another slow nested loop
On Tue, Jun 16, 2009 at 4:36 PM, Tom Lanet...@sss.pgh.pa.us wrote: Actually the easiest way to fix that is to get rid of the LIMIT. (Maybe use a cursor instead, and fetch only twenty rows.) LIMIT magnifies the risks from any estimation error, and you've got a lot of that here ... There's no cursor support in ActiveRecord, the ORM library we use, and I'm not going to write it. Anyway, I would prefer not to gloss over the underlying problem with something that requires a TODO next to it. What can be done to fix the underlying problem? Nothing? A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Alberto Dalmaso dalm...@clesius.it wrote: Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha scritto: On Tue, 16 Jun 2009, Alberto Dalmaso wrote: enable_hashjoin = off enable_nestloop = off enable_seqscan = off enable_sort = off Why are these switched off? because of the need to pump up the performance of the complex query. These really are meant primarily for diagnostic purposes. As a last resort, you could set them off right before running a problem query, and set them back on again afterward; but you will be much better off if you can cure the underlying problem. The best chance of that is to show us the plan you get with all turned on. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Could you show us the result of SELECT version(); ? of course I can PostgreSQL 8.3.7 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291] Have you done any VACUUM VERBOSE lately and captured the output? If so, what do the last few lines say? (That's a lot of relations for the number of pages; just curious how it maps to actual.) It need a lot of time (20 GB database), when I will have the answare I'll post it enable_hashjoin = off enable_nestloop = off enable_seqscan = off enable_sort = off That's probably a bad idea. If particular queries aren't performing well, you can always set these temporarily on a particular connection. Even then, turning these off is rarely a good idea except for diagnostic purposes. I *strongly* recommend you put all of these back to the defaults of 'on' and start from there, turning off selected items as needed to get EXPLAIN ANALYZE output to demonstrate the better plans you've found for particular queries. OK, it will became the viceversa of what I'm doing now (set them to on and set them to off only on the appropriate connection instead of set them to off and set them to on only on some appropriate connection). But the question is: do you thing it is impossible to find a configuration that works fine for both the kind of query? The application have to run even versus oracle db... i wont have to write a different source for the two database... effective_cache_size = 3600MB That seems a little on the low side for an 8GB machine, unless you have other things on there using a lot of RAM. Do you? yes there are two instances of postgress running on the same server (the database have to stay complitely separated). If you could set the optimizer options back on and get new plans where you show specifically which options (if any) where turned off for the run, that would be good. Also, please attach the plans to the email instead of pasting -- the word wrap makes them hard to read. Finally, if you could do \d on the tables involved in the query, it would help. I'll hold off looking at these in hopes that you can do the above. -Kevin I attach the explanation of the log query after setting all the enable to on. In this condition the query will never finish... QUERY PLAN Nested Loop Left Join (cost=283253.73..417552.90 rows=1 width=227) Join Filter: (ve_edil_rendite.id_domanda = domande.id_domanda) - Nested Loop (cost=283222.38..417485.46 rows=1 width=195) - Nested Loop (cost=283222.38..417470.91 rows=1 width=247) - Nested Loop (cost=283222.38..417456.36 rows=1 width=235) Join Filter: (edil_veneto.id_tp_superficie = ve_edil_tp_superfici.id_tp_superficie) - Nested Loop (cost=283222.38..417455.29 rows=1 width=228) - Nested Loop (cost=283222.38..417440.75 rows=1 width=216) Join Filter: ((r_enti.codice_ente)::text = (r_luoghi.cod_catastale)::text) - Nested Loop (cost=283222.38..417107.61 rows=1 width=215) - Nested Loop (cost=283222.38..417093.06 rows=1 width=203) - Nested Loop (cost=283222.38..417084.75 rows=1 width=186) - Nested Loop (cost=283222.38..417070.20 rows=1 width=174) - Nested Loop (cost=283222.38..417057.25 rows=1 width=162) - Nested Loop (cost=283222.38..417044.33 rows=1 width=150) - Nested Loop (cost=283222.38..417031.40 rows=1 width=138) - Nested Loop (cost=283222.38..417016.85 rows=1 width=126) - Nested Loop (cost=283222.38..416343.12 rows=44 width=114) - Merge Join (cost=283222.38..384803.94 rows=2431 width=102) Merge Cond: (componenti.id_dichiarazione = domande.id_dichiarazione) - GroupAggregate (cost=0.00..94032.39 rows=601009 width=12) - Index Scan using IDX_1_componenti on componenti (cost=0.00..76403.45 rows=2023265 width=12) - Sort (cost=283222.38..283223.41 rows=412 width=102)
Re: [PERFORM] performance with query
Alberto Dalmaso dalm...@clesius.it wrote: do you thing it is impossible to find a configuration that works fine for both the kind of query? No. We probably just need a little more information. The application have to run even versus oracle db... i wont have to write a different source for the two database... I understand completely. I attach the explanation of the log query after setting all the enable to on. In this condition the query will never finish... We're getting close. Can you share the table structure and the actual query you are running? It's a lot easier (for me, anyway) to put this puzzle together with all the pieces in hand. Also, if you can set off some of the optimizer options and get a fast plan, please show us an EXPLAIN ANALYZE for that, with information on which settings were turned off. That will help show where bad estimates may be causing a problem, or possibly give a hint of table or index bloat problems. I think we're getting close to critical mass for seeing the solution -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Il giorno mar, 16/06/2009 alle 11.31 -0400, Tom Lane ha scritto: Alberto Dalmaso dalm...@clesius.it writes: Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha scritto: enable_hashjoin = off enable_nestloop = off enable_seqscan = off enable_sort = off Why are these switched off? because of the need to pump up the performance of the complex query. That is *not* the way to improve performance of a query. Turning off specific enable_ parameters can be helpful while investigating planner behavior, but it is never recommended as a production solution. You have already found out why. regards, tom lane Ok, but the problem is that my very long query performes quite well when it works with merge join but it cannot arrive to an end if it use other kind of joining. If i put all the parameter to on, as both of you tell me, in the explanation I'll see that the db use nasted loop. If i put to off nasted loop, it will use hash join. How can I write the query so that the analyzer will use mergejoin (that is the only option that permit the query to give me the waited answare) without changing the settings every time on the connection? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Alberto Dalmaso dalm...@clesius.it wrote: I attach the explanation of the log query after setting all the enable to on. In this condition the query will never finish... I notice that you many joins in there. If the query can't be simplified, you probably need to boost the join_collapse_limit and from_collapse_limit quite a bit. If planning time goes through the roof in that case, you may need to enable geqo -- this is what it's intended to help. If you try geqo, you may need to tune it; I'm not familiar with the knobs for tuning that, so maybe someone else will jump in if you get to that point. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Alberto Dalmaso dalm...@clesius.it writes: Ok, but the problem is that my very long query performes quite well when it works with merge join but it cannot arrive to an end if it use other kind of joining. If i put all the parameter to on, as both of you tell me, in the explanation I'll see that the db use nasted loop. If i put to off nasted loop, it will use hash join. How can I write the query so that the analyzer will use mergejoin (that is the only option that permit the query to give me the waited answare) without changing the settings every time on the connection? You have the wrong mindset completely. Instead of thinking how can I force the planner to do it my way, you need to be thinking why is the planner guessing wrong about which is the best way to do it? And how can I improve its guess? There's not really enough information in what you've posted so far to let people help you with that question, but one thing that strikes me from the EXPLAIN is that you have a remarkably large number of joins. Perhaps increasing from_collapse_limit and/or join_collapse_limit (to more than the number of tables in the query) would help. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Unfortunatly the query need that level of complxity as the information I have to show are spread around different table. I have tryed the geqo on at the beginning but only with the default parameters. Tomorrow (my working day here in Italy is finished some minutes ago, so I will wait for the end of the explain analyze and the go home ;-P ) I'll try to increase, as you suggest, join_collapse_limit and from_collapse_limit. If someone can give me some information on how to configure geqo, I'll try it again. In the meantime this night I leave the vacuum verbose to work for me. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Even if the query end in aproximately 200 sec, the explain analyze is still working and there are gone more than 1000 sec... I leave it working this night. Have a nice evening and thenks for the help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Speeding up a query.
Good afternoon. I have developed an application to efficiently schedule chemotherapy patients at our hospital. The application takes into account several resource constraints (available chairs, available nurses, nurse coverage assignment to chairs) as well as the chair time and nursing time required for a regimen. The algorithm for packing appointments in respects each constraint and typically schedules a day of treatments (30-60) within 9-10 seconds on my workstation, down from 27 seconds initially. I would like to get it below 5 seconds if possible. I think what's slowing is down is simply the number of rows and joins. The algorithm creates a scheduling matrix with one row per 5 minute timeslot, per unit, per nurse assigned to the unit. That translates to 3,280 rows for the days I have designed in development (each day can change). To determine the available slots, the algorithm finds the earliest slot that has an available chair and a count of the required concurrent intervals afterwards. So a 60 minute regimen requires 12 concurrent rows. This is accomplished by joining the table on itself. A second query is ran for the same range, but with respect to the nurse time and an available nurse. Finally, those two are joined against each other. Effectively, it is: Select * From ( Select * From matrix m1, matrix m2 Where m1.x = m2.x ) chair, ( Select * From matrix m1, matrix m2 Where m1.x = m2.x ) nurse Where chair.id = nurse.id With matrix having 3,280 rows. Ugh. I have tried various indexes and clustering approachs with little success. Any ideas? Thanks, Matthew Hartman Programmer/Analyst Information Management, ICP Kingston General Hospital (613) 549- x4294 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speeding up a query.
On the DB side of things, you will want to make sure that your caching as much as possible - putting a front-end like memcached could help. I assume you have indexes on the appropriate tables? What does the EXPLAIN ANALYZE on that query look like? Not necessarily a postgres solution, but I'd think this type of solution would work really, really well inside of say a a mixed integer or integer solver ... something like glpk or cbc. You'd need to reformulate the problem, but we've built applications using these tools which can crunch through multiple billions of combinations in under 1 or 2 seconds. (Of course, you still need to store the results, and feed the input, using a database of some kind). -- Anthony Presley On Tue, 2009-06-16 at 14:35 -0400, Hartman, Matthew wrote: Good afternoon. I have developed an application to efficiently schedule chemotherapy patients at our hospital. The application takes into account several resource constraints (available chairs, available nurses, nurse coverage assignment to chairs) as well as the chair time and nursing time required for a regimen. The algorithm for packing appointments in respects each constraint and typically schedules a day of treatments (30-60) within 9-10 seconds on my workstation, down from 27 seconds initially. I would like to get it below 5 seconds if possible. I think what's slowing is down is simply the number of rows and joins. The algorithm creates a scheduling matrix with one row per 5 minute timeslot, per unit, per nurse assigned to the unit. That translates to 3,280 rows for the days I have designed in development (each day can change). To determine the available slots, the algorithm finds the earliest slot that has an available chair and a count of the required concurrent intervals afterwards. So a 60 minute regimen requires 12 concurrent rows. This is accomplished by joining the table on itself. A second query is ran for the same range, but with respect to the nurse time and an available nurse. Finally, those two are joined against each other. Effectively, it is: Select * From ( Select * From matrix m1, matrix m2 Where m1.x = m2.x ) chair, ( Select * From matrix m1, matrix m2 Where m1.x = m2.x ) nurse Where chair.id = nurse.id With matrix having 3,280 rows. Ugh. I have tried various indexes and clustering approachs with little success. Any ideas? Thanks, Matthew Hartman Programmer/Analyst Information Management, ICP Kingston General Hospital (613) 549- x4294 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Performance discrepancy
-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] High cost of ... where ... not in (select ...)
I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); QUERY PLAN Seq Scan on pcap_store (cost=4008.22..348521303.54 rows=106532 width=6) Filter: (NOT (subplan)) SubPlan - Materialize (cost=4008.22..6765.98 rows=205475 width=4) - Seq Scan on pcap_store_log (cost=0.00..3099.75 rows=205475 width=4) (5 rows) musecurity=# \d muapp.pcap_store Table muapp.pcap_store Column | Type | Modifiers ---++- pcap_storeid | integer| not null default nextval('muapp.pcap_store_pcap_storeid_seq'::regclass) filename | character varying(255) | test_run_dutid| integer| default 0 userid| integer| not null default 0 analysis_recordid | bigint | io_xml| character varying(255) | Indexes: pcap_store_pkey PRIMARY KEY, btree (pcap_storeid) Foreign-key constraints: pcap_store_analysis_recordid_fkey FOREIGN KEY (analysis_recordid) REFERENCES muapp.analysis(recordid) ON DELETE CASCADE pcap_store_test_run_dutid_fkey FOREIGN KEY (test_run_dutid) REFERENCES muapp.test_run_dut(test_run_dutid) ON DELETE CASCADE pcap_store_userid_fkey FOREIGN KEY (userid) REFERENCES mucore.user(recordid) ON DELETE CASCADE As you see, the sequence scan on pcap_store is killing me, even though there appears to be a perfectly good index. Is there a better way construct this query? Thanks, Aaron -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
Aaron Turner escribió: I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 2:37 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Aaron Turner escribió: I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? Doh, just realized I didn't reply back to list. It's version 8.3.3. Also, pcap_storeid is unique in pcap_store_log -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote: On Tue, Jun 16, 2009 at 2:37 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Aaron Turner escribió: I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? Doh, just realized I didn't reply back to list. It's version 8.3.3. Also, pcap_storeid is unique in pcap_store_log Speaking as one who has dealt with this frustration more than once, you can typically get better performance with something like: DELETE FROM muapp.pcap_store AS x FROM muapp.pcap_store a LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL This is emphatically lame, but there you have it. It's first of all lame that we can't do a better job optimizing NOT-IN, at least when the expression within the subselect is known to be not-null, and it's secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN without a self-JOIN. /rant ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 5:30 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote: On Tue, Jun 16, 2009 at 2:37 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Aaron Turner escribió: I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? Doh, just realized I didn't reply back to list. It's version 8.3.3. Also, pcap_storeid is unique in pcap_store_log Speaking as one who has dealt with this frustration more than once, you can typically get better performance with something like: DELETE FROM muapp.pcap_store AS x FROM muapp.pcap_store a LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL That's a syntax error on 8.3.3... I don't see anywhere in the docs where the delete command allows for multiple FROM statements. Perhaps you meant: DELETE FROM muapp.pcap_store AS x USING muapp.pcap_store AS a LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL; Is that right? This is emphatically lame, but there you have it. It's first of all lame that we can't do a better job optimizing NOT-IN, at least when the expression within the subselect is known to be not-null, and it's secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN without a self-JOIN. Wow, glad I asked... I never would of figured that out. -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote: On Tue, Jun 16, 2009 at 5:30 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote: On Tue, Jun 16, 2009 at 2:37 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Aaron Turner escribió: I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? Doh, just realized I didn't reply back to list. It's version 8.3.3. Also, pcap_storeid is unique in pcap_store_log Speaking as one who has dealt with this frustration more than once, you can typically get better performance with something like: DELETE FROM muapp.pcap_store AS x FROM muapp.pcap_store a LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL That's a syntax error on 8.3.3... I don't see anywhere in the docs where the delete command allows for multiple FROM statements. Perhaps you meant: DELETE FROM muapp.pcap_store AS x USING muapp.pcap_store AS a LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL; Is that right? Woops, yes, I think that's it. (but I don't guarantee that it won't blow up your entire universe, so test it carefully first) ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote: DELETE FROM muapp.pcap_store AS x USING muapp.pcap_store AS a LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL; Is that right? Woops, yes, I think that's it. (but I don't guarantee that it won't blow up your entire universe, so test it carefully first) Yeah, doing that now... taking a bit longer then I expected (took ~5min on rather slow hardware- everything is on a pair of 10K RAID1 drives), but the result seems correct. QUERY PLAN Hash Join (cost=19229.08..29478.99 rows=106492 width=6) Hash Cond: (x.pcap_storeid = a.pcap_storeid) - Seq Scan on pcap_store x (cost=0.00..5617.84 rows=212984 width=10) - Hash (cost=17533.93..17533.93 rows=106492 width=4) - Hash Left Join (cost=6371.19..17533.93 rows=106492 width=4) Hash Cond: (a.pcap_storeid = b.pcap_storeid) Filter: (b.pcap_storeid IS NULL) - Seq Scan on pcap_store a (cost=0.00..5617.84 rows=212984 width=4) - Hash (cost=3099.75..3099.75 rows=205475 width=4) - Seq Scan on pcap_store_log b (cost=0.00..3099.75 rows=205475 width=4) I know the costs are just relative, but I assumed cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy hardware. Honestly, not complaining, 5 minutes is acceptable for this query (it's a one time thing) just surprised is all. Thanks for the help! -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote: DELETE FROM muapp.pcap_store AS x USING muapp.pcap_store AS a LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL; Is that right? Woops, yes, I think that's it. (but I don't guarantee that it won't blow up your entire universe, so test it carefully first) Yeah, doing that now... taking a bit longer then I expected (took ~5min on rather slow hardware- everything is on a pair of 10K RAID1 drives), but the result seems correct. QUERY PLAN Hash Join (cost=19229.08..29478.99 rows=106492 width=6) Hash Cond: (x.pcap_storeid = a.pcap_storeid) - Seq Scan on pcap_store x (cost=0.00..5617.84 rows=212984 width=10) - Hash (cost=17533.93..17533.93 rows=106492 width=4) - Hash Left Join (cost=6371.19..17533.93 rows=106492 width=4) Hash Cond: (a.pcap_storeid = b.pcap_storeid) Filter: (b.pcap_storeid IS NULL) - Seq Scan on pcap_store a (cost=0.00..5617.84 rows=212984 width=4) - Hash (cost=3099.75..3099.75 rows=205475 width=4) - Seq Scan on pcap_store_log b (cost=0.00..3099.75 rows=205475 width=4) I know the costs are just relative, but I assumed cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy hardware. Honestly, not complaining, 5 minutes is acceptable for this query (it's a one time thing) just surprised is all. Thanks for the help! -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Index Scan taking long time
Hi, I have been trying to fix a performance issue that we have which I have tracked down to index scans being done on a particular table (or set of tables): The following query: explain analyze select * FROM inbound.event_20090526 e LEFT OUTER JOIN inbound.internal_host i ON (e.mta_host_id = i.id) LEFT OUTER JOIN inbound.internal_host iaa ON (e.aamta_host_id = iaa.id) LEFT OUTER JOIN inbound.event_status es ON (e.event_status_id = es.id) LEFT OUTER JOIN inbound.threat t ON (e.threat_id = t.id), inbound.domain d, inbound.event_type et WHERE e.domain_id = d.id AND e.event_type_id = et.id AND d.name IN ( 'testdomain.com' ); Does this: QUERY PLAN -- Nested Loop Left Join (cost=0.00..10887.03 rows=8 width=2021) (actual time=50.352..14378.603 rows=3453 loops=1) - Nested Loop Left Join (cost=0.00..10882.23 rows=8 width=1985) (actual time=50.346..14372.820 rows=3453 loops=1) - Nested Loop Left Join (cost=0.00..10877.43 rows=8 width=1949) (actual time=50.336..14358.101 rows=3453 loops=1) - Nested Loop Left Join (cost=0.00..10872.63 rows=8 width=1801) (actual time=50.321..14344.603 rows=3453 loops=1) - Nested Loop (cost=0.00..10867.83 rows=8 width=1764) (actual time=50.315..14336.979 rows=3453 loops=1) - Nested Loop (cost=0.00..10863.03 rows=8 width=1728) (actual time=50.288..14308.368 rows=3453 loops=1) - Index Scan using domain_name_idx on domain d (cost=0.00..6.63 rows=1 width=452) (actual time=0.049..0.080 rows=1 loops=1) Index Cond: ((name)::text = 'testdomain.com'::text) - Index Scan using event_20090526_domain_idx on event_20090526 e (cost=0.00..10694.13 rows=3606 width=1276) (actual time=50.233..14305.211 rows=3453 loops=1) Index Cond: (e.domain_id = d.id) - Index Scan using event_type_pkey on event_type et (cost=0.00..0.56 rows=1 width=36) (actual time=0.006..0.006 rows=1 loops=3453) Index Cond: (et.id = e.event_type_id) - Index Scan using threat_pkey on threat t (cost=0.00..0.56 rows=1 width=37) (actual time=0.000..0.000 rows=0 loops=3453) Index Cond: (e.threat_id = t.id) - Index Scan using event_status_pkey on event_status es (cost=0.00..0.56 rows=1 width=148) (actual time=0.002..0.002 rows=1 loops=3453) Index Cond: (e.event_status_id = es.id) - Index Scan using internal_host_pkey on internal_host iaa (cost=0.00..0.56 rows=1 width=36) (actual time=0.002..0.003 rows=1 loops=3453) Index Cond: (e.aamta_host_id = iaa.id) - Index Scan using internal_host_pkey on internal_host i (cost=0.00..0.56 rows=1 width=36) (actual time=0.000..0.000 rows=0 loops=3453) Index Cond: (e.mta_host_id = i.id) Total runtime: 14380.000 ms If the same query is done straight away again we get: QUERY PLAN - Nested Loop Left Join (cost=0.00..10887.03 rows=8 width=2021) (actual time=0.165..67.388 rows=3453 loops=1) - Nested Loop Left Join (cost=0.00..10882.23 rows=8 width=1985) (actual time=0.162..61.973 rows=3453 loops=1) - Nested Loop Left Join (cost=0.00..10877.43 rows=8 width=1949) (actual time=0.156..49.756 rows=3453 loops=1) - Nested Loop Left Join (cost=0.00..10872.63 rows=8 width=1801) (actual time=0.148..37.522 rows=3453 loops=1) - Nested Loop (cost=0.00..10867.83 rows=8 width=1764) (actual time=0.146..31.920 rows=3453 loops=1) - Nested Loop (cost=0.00..10863.03 rows=8 width=1728) (actual time=0.129..10.325 rows=3453 loops=1) - Index Scan using domain_name_idx on domain d (cost=0.00..6.63 rows=1 width=452) (actual time=0.099..0.139 rows=1 loops=1) Index Cond: ((name)::text = 'rhe.com.au'::text) - Index Scan using event_20090526_domain_idx on event_20090526 e (cost=0.00..10694.13 rows=3606 width=1276) (actual time=0.027..7.510 rows=3453 loops=1)
Re: [PERFORM] High cost of ... where ... not in (select ...)
Aaron Turner synfina...@gmail.com writes: I know the costs are just relative, but I assumed cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy hardware. Very likely the bulk of the time is spent in the DELETE work proper, not in the query to find the rows to be deleted. In particular I wonder if you have an unindexed foreign key referencing this table ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another slow nested loop
On Tue, Jun 16, 2009 at 11:16 AM, Alexander Stauboa...@bengler.no wrote: On Tue, Jun 16, 2009 at 4:36 PM, Tom Lanet...@sss.pgh.pa.us wrote: Actually the easiest way to fix that is to get rid of the LIMIT. (Maybe use a cursor instead, and fetch only twenty rows.) LIMIT magnifies the risks from any estimation error, and you've got a lot of that here ... There's no cursor support in ActiveRecord, the ORM library we use, and I'm not going to write it. Anyway, I would prefer not to gloss over the underlying problem with something that requires a TODO next to it. What can be done to fix the underlying problem? Nothing? Basically, we need a system that can accurately estimate multi-column selectivity, or else some kind of planner hints. http://archives.postgresql.org/pgsql-performance/2009-06/msg00023.php http://archives.postgresql.org/pgsql-performance/2009-06/msg00119.php (with apologies for linking to my own posts, but you can go back and read the whole thread if you're interested) ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index Scan taking long time
On Tue, Jun 16, 2009 at 9:30 PM, Bryce Ewingbr...@smx.co.nz wrote: Hi, I have been trying to fix a performance issue that we have which I have tracked down to index scans being done on a particular table (or set of tables): The following query: explain analyze select * FROM inbound.event_20090526 e LEFT OUTER JOIN inbound.internal_host i ON (e.mta_host_id = i.id) LEFT OUTER JOIN inbound.internal_host iaa ON (e.aamta_host_id = iaa.id) LEFT OUTER JOIN inbound.event_status es ON (e.event_status_id = es.id) LEFT OUTER JOIN inbound.threat t ON (e.threat_id = t.id), inbound.domain d, inbound.event_type et WHERE e.domain_id = d.id AND e.event_type_id = et.id AND d.name IN ( 'testdomain.com' ); Without looking at the explain just yet, it seems to me that you are constraining the order of joins to insist that the left joins be done first, then the regular joins second, because of your mix of explicit and implicit join syntax. The query planner is constrained to run explicit joins first, then implicit if I remember correctly. So, making it all explicit might help. Might not. But it's a thought -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] 8.4 COPY performance regression on Solaris
Hi, It seems that a COPY of 8M rows to a table to 8.4rc1 takes 30% longer than it does to 8.3.7 on Solaris. Here are the steps I've taken to reproduce this problem on two different solaris boxes (Solaris 10 11/06 s10x_u3wos_10 X86 and Solaris 10 8/07 s10x_u4wos_12b X86). I've tried this on a Linux box, and I do not see the problem there. 1. Run the following in psql client to generate a 8M row data file. copy (select generate_series(1,800), ('1 second'::interval * generate_series(1,800) + '2007-01-01'::timestamp)) to '/export/home/alan/work/pgsql/dump.out' with csv; 2. Build 8.3.7 and 8.4rc1 with the following config. ./configure --prefix=`pwd`/../pgsql CC=/opt/SUNWspro/bin/cc CFLAGS=-xO3 -xarch=native \ -xspace -W0,-Lt -W2,-Rcond_elim -Xa -xildoff -xc99=none -xCC --without-readline --with-includes=/opt/csw/include --with-libraries=/opt/csw/lib 3. Run the following on each. pg_ctl stop -D data -m fast rm -rf data initdb -D data cat postgresql.conf data/postgresql.conf pg_ctl start -l cq.log -D data -w psql -f ddl.sql postgres time psql -c copy t from '/export/home/alan/work/pgsql/dump.out' with csv postgres Here are the numbers from several runs I've done. 8.3.7 - Solaris 10 11/06 s10x_u3wos_10 X86 real0m43.971s user0m0.002s sys 0m0.003s real0m44.042s user0m0.002s sys 0m0.003s real0m44.828s user0m0.002s sys 0m0.004s real0m43.921s user0m0.002s sys 0m0.003s 8.4rc1 - Solaris 10 11/06 s10x_u3wos_10 X86 real1m0.041s user0m0.002s sys 0m0.003s real1m0.258s user0m0.002s sys 0m0.004s real1m0.173s user0m0.002s sys 0m0.003s real1m0.402s user0m0.002s sys 0m0.003s real1m0.767s user0m0.002s sys 0m0.003s 8.3.7 - Solaris 10 8/07 s10x_u4wos_12b X86 real0m36.242s user0m0.002s sys 0m0.004s real0m37.206s user0m0.002s sys 0m0.004s real0m38.431s user0m0.002s sys 0m0.004s real0m38.885s user0m0.002s sys 0m0.004s real0m38.177s user0m0.002s sys 0m0.004s real0m38.332s user0m0.002s sys 0m0.004s real0m38.401s user0m0.002s sys 0m0.004s real0m36.817s user0m0.002s sys 0m0.004s real0m39.505s user0m0.002s sys 0m0.004s real0m38.871s user0m0.002s sys 0m0.004s real0m38.939s user0m0.002s sys 0m0.004s real0m38.823s user0m0.002s sys 0m0.004s real0m37.955s user0m0.002s sys 0m0.004s real0m39.196s user0m0.002s sys 0m0.004s 8.4rc1 - Solaris 10 8/07 s10x_u4wos_12b X86 real0m50.603s user0m0.002s sys 0m0.004s real0m49.945s user0m0.002s sys 0m0.004s real0m50.547s user0m0.002s sys 0m0.004s real0m50.061s user0m0.002s sys 0m0.004s real0m48.151s user0m0.002s sys 0m0.004s real0m50.133s user0m0.002s sys 0m0.004s real0m50.583s user0m0.002s sys 0m0.004s Has anyone else seen this problem? Thanks, Alan drop table t; create table t (i int, d timestamp); postgresql.conf Description: Binary data -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance