Re: [PERFORM] NOT LIKE much faster than LIKE?
On Tue, Jan 10, 2006 at 10:46:53AM -0500, Tom Lane wrote: Not with that data, but maybe if you increased the statistics target for the column to 100 or so, you'd catch enough values to get reasonable results. Sorry, I'm not expert with postgresql, could you tell me how to increase the statistic target? In another email you said you applied a patch to CVS, please let me know if you've anything to test for me, and I'll gladly test it immediately (I've a sandbox so it's ok even if it corrupts the db ;). Thanks! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Index isn't used during a join.
On Wed, Jan 11, 2006 at 12:56:55AM -0700, Michael Fuhr wrote: WHERE ... AND doy = EXTRACT(doy FROM now() - '24 hour'::interval) AND doy = EXTRACT(doy FROM now()) To work on 1 Jan this should be more like WHERE ... AND (doy = EXTRACT(doy FROM now() - '24 hour'::interval) OR doy = EXTRACT(doy FROM now())) In any case the point is to add conditions to the WHERE clause that will use an index on the table for which you're currently getting a sequential scan. -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] NOT LIKE much faster than LIKE?
On Tue, 2006-01-10 at 22:40 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I meant use the same sampling approach as I was proposing for ANALYZE, but do this at plan time for the query. That way we can apply the function directly to the sampled rows and estimate selectivity. I think this is so unlikely to be a win as to not even be worth spending any time discussing. The extra planning time across all queries will vastly outweigh the occasional improvement in plan choice for some queries. Extra planning time would be bad, so clearly we wouldn't do this when we already have relevant ANALYZE statistics. I would suggest we do this only when all of these are true - when accessing more than one table, so the selectivity could effect a join result - when we have either no ANALYZE statistics, or ANALYZE statistics are not relevant to estimating selectivity, e.g. LIKE - when access against the single table in question cannot find an index to use from other RestrictInfo predicates I imagined that this would also be controlled by a GUC, dynamic_sampling which would be set to zero by default, and give a measure of sample size to use. (Or just a bool enable_sampling = off (default)). This is mentioned now because the plan under consideration in this thread would be improved by this action. It also isn't a huge amount of code to get it to work. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] NOT LIKE much faster than LIKE?
On Wed, Jan 11, 2006 at 09:07:45AM +, Simon Riggs wrote: I would suggest we do this only when all of these are true - when accessing more than one table, so the selectivity could effect a join result FWIW my problem only happens if I join: on the main table where the kernel_version string is stored (without joins), everything is always blazing fast. So this requirement certainly sounds fine to me. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 500x speed-down: Wrong statistics!
Tom Lane wrote: Alessandro Baretta [EMAIL PROTECTED] writes: I have no clue as to how or why the statistics were wrong yesterday--as I vacuum-analyzed continuously out of lack of any better idea--and I was stupid enough to re-timestamp everything before selecting from pg_stats. Too bad. I would be interested to find out how, if the stats were up-to-date, the thing was still getting the row estimate so wrong. If you manage to get the database back into its prior state please do send along the pg_stats info. I have some more information on this issue, which clears PostgreSQL's planner of all suspects. I am observing severe corruption of the bookkeeping fields managed by the xdbs rule/trigger complex. I am unable to pinpoint the cause, right now, but the effect is that after running a few hours' test on the end-user application (which never interacts directly with xdbs_* fields, and thus cannot possibly mangle them) most tuples (the older ones, apparently) get thei timestamps set to NULL. Before vacuum-analyzing the table, yesterday's statistics were in effect, and the planner used the appropriate indexes. Now, after vacuum-analyzing the table, the pg_stats row for the xdbs_modified field no longer exists (!), and the planner has reverted to the Nested Loop Seq Scan join strategy. Hence, all the vacuum-analyzing I was doing when complaining against the planner was actually collecting completely screwed statistics, and this is why the ALTER TABLE ... SET STATISTICS 1000 did not help at all! Ok. I plead guilty and ask for the clemency of the court. I'll pay my debt with society with a long term of pl/pgsql code debugging... Alex ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [ADMIN] [PERFORM] Assimilation of these versus and hardware threads
People: All of these recent threads about fastest hardware and who's better than who has inspired me to create a new website: http://www.dbtuning.org Well, time to plug my web site, too, I guess: http://www.powerpostgresql.com I've got a configuration primer up there, and the 8.0 Annotated .Conf file will be coming this week. That web site runs on Framewerk, a PostgreSQL-based CMS developed by our own Gavin Roy. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Index isn't used during a join.
When grilled further on (Wed, 11 Jan 2006 00:56:55 -0700), Michael Fuhr [EMAIL PROTECTED] confessed: On Tue, Jan 10, 2006 at 10:10:55PM -0700, Robert Creager wrote: The query is now correct, but still is slow because of lack of index usage. I don't know how to structure the query correctly to use the index. Have you tried adding restrictions on doy in the WHERE clause? Something like this, I think: I cannot. That's what I thought I would get from the join. The query shown will always have two days involved, and only grows from there. The data is graphed at http://www.logicalchaos.org/weather/index.html, and I'm looking at adding historical data to the graphs. Opps, never mind. You hit the nail on the head: weather-# SELECT *, unmunge_time( time_group ) AS time, weather-# EXTRACT( doy FROM unmunge_time( time_group ) ) weather-# FROM minute.windspeed weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy ) weather-# WHERE unmunge_time( time_group ) ( now() - '24 hour'::interval ) weather-# AND doy BETWEEN EXTRACT( doy FROM now() - '24 hour'::interval) weather-# AND EXTRACT( doy FROM now() ) weather-# ORDER BY time_group; QUERY PLAN Sort (cost=21914.09..21914.10 rows=1 width=48) (actual time=76.595..76.662 rows=286 loops=1) Sort Key: windspeed.time_group - Hash Join (cost=21648.19..21914.08 rows=1 width=48) (actual time=64.656..75.562 rows=286 loops=1) Hash Cond: (date_part('doy'::text, unmunge_time(outer.time_group)) = inner.doy) - Bitmap Heap Scan on windspeed (cost=2.27..267.40 rows=74 width=28) (actual time=0.585..1.111 rows=286 loops=1) Recheck Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Bitmap Index Scan on minute_windspeed_unmunge_index (cost=0.00..2.27 rows=74 width=0) (actual time=0.566..0.566 rows=287 loops=1) Index Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Hash (cost=21645.92..21645.92 rows=3 width=20) (actual time=63.849..63.849 rows=2 loops=1) - HashAggregate (cost=21645.84..21645.89 rows=3 width=20) (actual time=63.832..63.834 rows=2 loops=1) - Bitmap Heap Scan on readings (cost=59.21..21596.85 rows=6532 width=20) (actual time=15.174..53.249 rows=7613 loops=1) Recheck Cond: ((date_part('doy'::text, when) = date_part('doy'::text, (now() - '24:00:00'::interval))) AND (date_part('doy'::text, when) = date_part('doy'::text, now( - Bitmap Index Scan on readings_doy_index (cost=0.00..59.21 rows=6532 width=0) (actual time=12.509..12.509 rows=10530 loops=1) Index Cond: ((date_part('doy'::text, when) = date_part('doy'::text, (now() - '24:00:00'::interval))) AND (date_part('doy'::text, when) = date_part('doy'::text, now( Total runtime: 77.177 ms What I had thought is that PG would (could?) be smart enough to realize that one query was restricted, and apply that restriction to the other based on the join. I know it works in other cases (using indexes on both tables using the join)... Something else occurred to me: do you (or will you) have more than one year of data? If so then matching on doy could be problematic unless you also check for the year, or unless you want to match more than one year. Yes and yes. I'm doing both aggregate by day of the year for all data, and aggregate by day of year within each year. The examples are: weather=# select * from doy_agg where doy = extract( doy from now() ); doy | avg_windspeed | max_windspeed -+--+--- 11 | 6.14058239764748 |69 (1 row) weather=# select * from doy_day_agg where extract( doy from day ) = extract( doy from now() ); day | avg_windspeed | max_windspeed -+--+--- 2004-01-11 00:00:00 | 5.03991313397539 |17 2006-01-11 00:00:00 | 18.532050716667 |69 2005-01-11 00:00:00 | 3.6106763448041 |13 Thanks for your help Michael. Cheers, Rob -- 07:07:30 up 3 days, 23:34, 9 users, load average: 2.29, 2.44, 2.43 Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006 pgpbEkxD1FaVM.pgp Description: PGP signature
Re: [PERFORM] Index isn't used during a join.
When grilled further on (Wed, 11 Jan 2006 07:26:59 -0700), Robert Creager [EMAIL PROTECTED] confessed: weather-# SELECT *, unmunge_time( time_group ) AS time, weather-# EXTRACT( doy FROM unmunge_time( time_group ) ) weather-# FROM minute.windspeed weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy ) weather-# WHERE unmunge_time( time_group ) ( now() - '24 hour'::interval ) weather-# AND doy BETWEEN EXTRACT( doy FROM now() - '24 hour'::interval) weather-# AND EXTRACT( doy FROM now() ) weather-# ORDER BY time_group; The more I think about it, the more I believe PG is missing an opportunity. The query is adequately constrained without the BETWEEN clause. Why doesn't PG see that? I realize I'm a hack and by db organization shows that... The query is wrong as stated, as it won't work when the interval crosses a year boundary, but it's a stop gap for now. Cheers, Rob -- 07:58:30 up 4 days, 25 min, 9 users, load average: 2.13, 2.15, 2.22 Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006 pgpvvd8OrY8zM.pgp Description: PGP signature
[PERFORM] Postgres8.0 planner chooses WRONG plan
Hi , I am having problem optimizing this query, Postgres optimizer uses a plan which invloves seq-scan on a table. And when I choose a option to disable seq-scan it uses index-scan and obviously the query is much faster. All tables are daily vacummed and analyzed as per docs. Why cant postgres use index-scan ? Postgres Version:8.0.2 Platform : Fedora Here is the explain analyze output. Let me know if any more information is needed. Can we make postgres use index scan for this query ? Thanks! Pallav. --- explain analyze select * from provisioning.alerts where countystate = 'FL' and countyno = '099' and status = 'ACTIVE' ; QUERY PLAN -- Nested Loop (cost=3.45..15842.17 rows=1 width=125) (actual time=913.491..18992.009 rows=110 loops=1) - Nested Loop (cost=3.45..15838.88 rows=1 width=86) (actual time=913.127..18958.482 rows=110 loops=1) - Hash Join (cost=3.45..15835.05 rows=1 width=82) (actual time=913.093..18954.951 rows=110 loops=1) Hash Cond: (outer.fkserviceinstancestatusid = inner.serviceinstancestatusid) - Hash Join (cost=2.38..15833.96 rows=2 width=74) (actual time=175.139..18952.830 rows=358 loops=1) Hash Cond: (outer.fkserviceofferingid = inner.serviceofferingid) - Seq Scan on serviceinstance si (cost=0.00..15831.52 rows=7 width=78) (actual time=174.430..18948.210 rows=358 loops=1) Filter: (((subplan) = 'FL'::text) AND ((subplan) = '099'::text)) SubPlan - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.090..0.093 rows=1 loops=3923) - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.061 rows=1 loops=265617) - Hash (cost=2.38..2.38 rows=3 width=4) (actual time=0.444..0.444 rows=0 loops=1) - Hash Join (cost=1.08..2.38 rows=3 width=4) (actual time=0.312..0.428 rows=1 loops=1) Hash Cond: (outer.fkserviceid = inner.serviceid) - Seq Scan on serviceoffering so (cost=0.00..1.18 rows=18 width=8) (actual time=0.005..0.068 rows=18 loops=1) - Hash (cost=1.07..1.07 rows=1 width=4) (actual time=0.036..0.036 rows=0 loops=1) - Seq Scan on service s (cost=0.00..1.07 rows=1 width=4) (actual time=0.014..0.019 rows=1 loops=1) Filter: (servicename = 'alert'::text) - Hash (cost=1.06..1.06 rows=1 width=16) (actual time=0.044..0.044 rows=0 loops=1) - Seq Scan on serviceinstancestatus sis (cost=0.00..1.06 rows=1 width=16) (actual time=0.017..0.024 rows=1 loops=1) Filter: (status = 'ACTIVE'::text) - Index Scan using pk_account_accountid on account a (cost=0.00..3.82 rows=1 width=8) (actual time=0.012..0.016 rows=1 loops=110) Index Cond: (outer.fkaccountid = a.accountid) - Index Scan using pk_contact_contactid on contact c (cost=0.00..3.24 rows=1 width=47) (actual time=0.014..0.018 rows=1 loops=110) Index Cond: (outer.fkcontactid = c.contactid) SubPlan - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.072..0.075 rows=1 loops=110) - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.079..0.082 rows=1 loops=110) - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.086..0.089 rows=1 loops=110) Total runtime: 18992.694 ms (30 rows) Time: 18996.203 ms -- As you can see the - Seq Scan on serviceinstance si (cost=0.00..15831.52 rows=7 width=78) (actual time=174.430..18948.210 rows=358 loops=1) was taking too long . same query when i disable the seq-scan it uses index-scan and its much faster now set enable_seqscan=false; SET Time: 0.508 ms explain analyze select * from provisioning.alerts where countystate = 'FL' and countyno = '099' and status = 'ACTIVE' ; QUERY PLAN - Nested Loop (cost=9.10..16676.10 rows=1 width=125) (actual time=24.792..3898.939 rows=110 loops=1) - Nested Loop
Re: [PERFORM] Index isn't used during a join.
Robert Creager [EMAIL PROTECTED] writes: What I had thought is that PG would (could?) be smart enough to realize tha= t one query was restricted, and apply that restriction to the other based o= n the join. I know it works in other cases (using indexes on both tables u= sing the join)... The planner understands about transitivity of equality, ie given a = b and b = c it can infer a = c. It doesn't do any such thing for inequalities though, nor does it deduce f(a) = f(b) for arbitrary functions f. The addition Michael suggested requires much more understanding of the properties of the functions in your query than I think would be reasonable to put into the planner. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Slow query with joins
Bendik Rognlien Johansen [EMAIL PROTECTED] writes: Has anyone got any tips for speeding up this query? It currently takes hours to start. Are the rowcount estimates close to reality? The plan doesn't look unreasonable to me if they are. It might help to increase work_mem to ensure that the hash tables don't spill to disk. Indexes: people_original_is_null btree (original) WHERE original IS NULL This index seems poorly designed: the actual index entries are dead weight since all of them are necessarily NULL. You might as well make the index carry something that you frequently test in conjunction with original IS NULL. For instance, if this particular query is a common case, you could replace this index with CREATE INDEX people_deleted_original_is_null ON people(deleted) WHERE original IS NULL; This index is still perfectly usable for queries that only say original IS NULL, but it can also filter out rows with the wrong value of deleted. Now, if there are hardly any rows with deleted = true, maybe this won't help much for your problem. But in any case you ought to consider whether you can make the index entries do something useful. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Postgres8.0 planner chooses WRONG plan
Pallav Kalva [EMAIL PROTECTED] writes: I am having problem optimizing this query, Get rid of the un-optimizable function inside the view. You've converted something that should be a join into an unreasonably large number of function calls. - Seq Scan on serviceinstance si (cost=0.00..15831.52 rows=7 width=78) (actual time=174.430..18948.210 rows=358 loops=1) Filter: (((subplan) = 'FL'::text) AND ((subplan) = '099'::text)) SubPlan - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.090..0.093 rows=1 loops=3923) - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.061 rows=1 loops=265617) The bulk of the cost here is in the second subplan (0.061 * 265617 = 16202.637 msec total runtime), and there's not a darn thing Postgres can do to improve this because the work is all down inside a black box function. In fact the planner does not even know that the function call is expensive, else it would have preferred a plan that requires fewer evaluations of the function. The alternative plan you show is *not* faster because it's an indexscan; it's faster because get_parametervalue is evaluated fewer times. The useless sub-SELECTs atop the function calls are adding their own little increment of wasted time, too. I'm not sure how bad that is relative to the function calls, but it's certainly not helping. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Postgres8.0 planner chooses WRONG plan
Hi Tom, Thanks! for your input, the view was written first without using the function but its an ugly big with all the joins and its much slower that way. Below is the view without the function and its explain analzye output , as you can see the it takes almost 2 min to run this query with this view . Is there any way to optimize or make changes to this view ? Thanks! Pallav. View Definition --- create or replace view provisioning.alertserviceinstanceold as SELECT services.serviceinstanceid, a.accountid, c.firstname, c.lastname, c.email, services.countyno, services.countystate, services.listingtype AS listingtypename, services.status, services.affiliate, services.affiliatesub, services.domain FROM provisioning.account a JOIN common.contact c ON a.fkcontactid = c.contactid JOIN ( SELECT p1.serviceinstanceid, p1.accountid, p1.countyno, p2.countystate, p3.listingtype, p1.status, p1.affiliate, p1.affiliatesub, p1.domain FROM ( SELECT si.serviceinstanceid, si.affiliate, si.affiliatesub, si.domain, si.fkaccountid AS accountid, p.value AS countyno, sis.status FROM provisioning.service s JOIN provisioning.serviceoffering so ON s.serviceid = so.fkserviceid JOIN provisioning.serviceinstance si ON so.serviceofferingid = si.fkserviceofferingid JOIN provisioning.serviceinstancestatus sis ON si.fkserviceinstancestatusid = sis.serviceinstancestatusid JOIN provisioning.serviceinstanceparameter sip ON si.serviceinstanceid = sip.fkserviceinstanceid JOIN common.parameter p ON sip.fkparameterid = p.parameterid WHERE s.servicename = 'alert'::text AND p.name = 'countyNo'::text) p1 JOIN ( SELECT si.serviceinstanceid, si.affiliate, si.affiliatesub, si.domain, si.fkaccountid AS accountid, p.value AS countystate, sis.status FROM provisioning.service s JOIN provisioning.serviceoffering so ON s.serviceid = so.fkserviceid JOIN provisioning.serviceinstance si ON so.serviceofferingid = si.fkserviceofferingid JOIN provisioning.serviceinstancestatus sis ON si.fkserviceinstancestatusid = sis.serviceinstancestatusid JOIN provisioning.serviceinstanceparameter sip ON si.serviceinstanceid = sip.fkserviceinstanceid JOIN common.parameter p ON sip.fkparameterid = p.parameterid WHERE s.servicename = 'alert'::text AND p.name = 'countyState'::text) p2 ON p1.accountid = p2.accountid AND p1.serviceinstanceid = p2.serviceinstanceid JOIN ( SELECT si.serviceinstanceid, si.affiliate, si.affiliatesub, si.domain, si.fkaccountid AS accountid, p.value AS listingtype, sis.status FROM provisioning.service s JOIN provisioning.serviceoffering so ON s.serviceid = so.fkserviceid JOIN provisioning.serviceinstance si ON so.serviceofferingid = si.fkserviceofferingid JOIN provisioning.serviceinstancestatus sis ON si.fkserviceinstancestatusid = sis.serviceinstancestatusid JOIN provisioning.serviceinstanceparameter sip ON si.serviceinstanceid = sip.fkserviceinstanceid JOIN common.parameter p ON sip.fkparameterid = p.parameterid WHERE s.servicename = 'alert'::text AND p.name = 'listingType'::text) p3 ON p2.accountid = p3.accountid AND p2.serviceinstanceid = p3.serviceinstanceid) services ON a.accountid = services.accountid ORDER BY services.serviceinstanceid; Explain Analyze -- explain analyze select * from provisioning.alertserviceinstanceold where countystate = 'FL' and countyno = '099' and status = 'ACTIVE' ; QUERY PLAN -- Subquery Scan alertserviceinstanceold (cost=31954.24..31954.25 rows=1 width=328) (actual time=113485.801..113487.024 rows=110 loops=1) - Sort (cost=31954.24..31954.24 rows=1 width=152) (actual time=113485.787..113486.123 rows=110 loops=1) Sort Key: si.serviceinstanceid - Hash Join (cost=20636.38..31954.23 rows=1 width=152) (actual time=109721.688..113485.311 rows=110 loops=1) Hash Cond: (outer.accountid = inner.fkaccountid) - Hash Join (cost=6595.89..16770.25 rows=228696 width=47) (actual time=1742.592..4828.396 rows=229855 loops=1) Hash Cond: (outer.contactid = inner.fkcontactid) - Seq Scan on contact c (cost=0.00..4456.96 rows=228696 width=47) (actual time=0.006..1106.459 rows=229868 loops=1) - Hash (cost=6024.11..6024.11 rows=228711 width=8) (actual time=1742.373..1742.373 rows=0 loops=1) - Seq Scan on account a (cost=0.00..6024.11 rows=228711 width=8) (actual time=0.010..990.597 rows=229855 loops=1) - Hash
Re: [PERFORM] Index isn't used during a join.
On Wed, Jan 11, 2006 at 08:02:37AM -0700, Robert Creager wrote: The query is wrong as stated, as it won't work when the interval crosses a year boundary, but it's a stop gap for now. Yeah, I realized that shortly after I posted the original and posted a correction. http://archives.postgresql.org/pgsql-performance/2006-01/msg00104.php -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Improving Inner Join Performance
Did you originally post some problem queries? The settings look OK, though 1G of memory isn't very much now-a-days. On Mon, Jan 09, 2006 at 09:56:52AM +0200, Andy wrote: shared_buffers = 10240 effective_cache_size = 64000 RAM on server: 1Gb. Andy. - Original Message - From: Frank Wiles [EMAIL PROTECTED] To: Andy [EMAIL PROTECTED] Sent: Friday, January 06, 2006 7:12 PM Subject: Re: [PERFORM] Improving Inner Join Performance On Fri, 6 Jan 2006 09:59:30 +0200 Andy [EMAIL PROTECTED] wrote: Yes I have indexes an all join fields. The tables have around 30 columns each and around 100k rows. The database is vacuumed every hour. What are you settings for: shared_buffers effective_cache_size And how much RAM do you have in the server? - Frank Wiles [EMAIL PROTECTED] http://www.wiles.org - -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] NOT LIKE much faster than LIKE?
On Tue, Jan 10, 2006 at 02:44:47AM +0100, Andrea Arcangeli wrote: cooperative runs WHERE kernel_version NOT LIKE '%% PREEMPT %%', while preempt runs WHERE kernel_version LIKE '%% PREEMPT %%'. The only difference One thing you could do is change the like to: WHERE position(' PREEMPT ' in kernel_version) != 0 And then create a functional index on that: CREATE INDEX indexname ON tablename ( position(' PREEMPT ' in kernel_version) ); -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgres8.0 planner chooses WRONG plan
On Wed, Jan 11, 2006 at 11:44:58AM -0500, Pallav Kalva wrote: Some view you've got there... you might want to break that apart into multiple views that are a bit easier to manage. service_instance_with_status is a likely candidate, for example. View Definition --- create or replace view provisioning.alertserviceinstanceold as SELECT services.serviceinstanceid, a.accountid, c.firstname, c.lastname, c.email, services.countyno, services.countystate, services.listingtype AS listingtypename, services.status, services.affiliate, services.affiliatesub, services.domain FROM provisioning.account a JOIN common.contact c ON a.fkcontactid = c.contactid JOIN ( SELECT p1.serviceinstanceid, p1.accountid, p1.countyno, p2.countystate, p3.listingtype, p1.status, p1.affiliate, p1.affiliatesub, p1.domain FROM ( SELECT si.serviceinstanceid, si.affiliate, si.affiliatesub, si.domain, si.fkaccountid AS accountid, p.value AS countyno, sis.status FROM provisioning.service s JOIN provisioning.serviceoffering so ON s.serviceid = so.fkserviceid JOIN provisioning.serviceinstance si ON so.serviceofferingid = si.fkserviceofferingid JOIN provisioning.serviceinstancestatus sis ON si.fkserviceinstancestatusid = sis.serviceinstancestatusid JOIN provisioning.serviceinstanceparameter sip ON si.serviceinstanceid = sip.fkserviceinstanceid JOIN common.parameter p ON sip.fkparameterid = p.parameterid WHERE s.servicename = 'alert'::text AND p.name = 'countyNo'::text) p1 JOIN ( SELECT si.serviceinstanceid, si.affiliate, si.affiliatesub, si.domain, si.fkaccountid AS accountid, p.value AS countystate, sis.status FROM provisioning.service s JOIN provisioning.serviceoffering so ON s.serviceid = so.fkserviceid JOIN provisioning.serviceinstance si ON so.serviceofferingid = si.fkserviceofferingid JOIN provisioning.serviceinstancestatus sis ON si.fkserviceinstancestatusid = sis.serviceinstancestatusid JOIN provisioning.serviceinstanceparameter sip ON si.serviceinstanceid = sip.fkserviceinstanceid JOIN common.parameter p ON sip.fkparameterid = p.parameterid WHERE s.servicename = 'alert'::text AND p.name = 'countyState'::text) p2 ON p1.accountid = p2.accountid AND p1.serviceinstanceid = p2.serviceinstanceid JOIN ( SELECT si.serviceinstanceid, si.affiliate, si.affiliatesub, si.domain, si.fkaccountid AS accountid, p.value AS listingtype, sis.status FROM provisioning.service s JOIN provisioning.serviceoffering so ON s.serviceid = so.fkserviceid JOIN provisioning.serviceinstance si ON so.serviceofferingid = si.fkserviceofferingid JOIN provisioning.serviceinstancestatus sis ON si.fkserviceinstancestatusid = sis.serviceinstancestatusid JOIN provisioning.serviceinstanceparameter sip ON si.serviceinstanceid = sip.fkserviceinstanceid JOIN common.parameter p ON sip.fkparameterid = p.parameterid WHERE s.servicename = 'alert'::text AND p.name = 'listingType'::text) p3 ON p2.accountid = p3.accountid AND p2.serviceinstanceid = p3.serviceinstanceid) services ON a.accountid = services.accountid ORDER BY services.serviceinstanceid; Explain Analyze -- explain analyze select * from provisioning.alertserviceinstanceold where countystate = 'FL' and countyno = '099' and status = 'ACTIVE' ; QUERY PLAN -- Subquery Scan alertserviceinstanceold (cost=31954.24..31954.25 rows=1 width=328) (actual time=113485.801..113487.024 rows=110 loops=1) - Sort (cost=31954.24..31954.24 rows=1 width=152) (actual time=113485.787..113486.123 rows=110 loops=1) Sort Key: si.serviceinstanceid - Hash Join (cost=20636.38..31954.23 rows=1 width=152) (actual time=109721.688..113485.311 rows=110 loops=1) Hash Cond: (outer.accountid = inner.fkaccountid) - Hash Join (cost=6595.89..16770.25 rows=228696 width=47) (actual time=1742.592..4828.396 rows=229855 loops=1) Hash Cond: (outer.contactid = inner.fkcontactid) - Seq Scan on contact c (cost=0.00..4456.96 rows=228696 width=47) (actual time=0.006..1106.459 rows=229868 loops=1) - Hash (cost=6024.11..6024.11 rows=228711 width=8) (actual time=1742.373..1742.373 rows=0 loops=1) - Seq Scan on account a (cost=0.00..6024.11 rows=228711 width=8) (actual time=0.010..990.597 rows=229855 loops=1) - Hash (cost=14040.49..14040.49 rows=1 width=117) (actual time=107911.397..107911.397 rows=0
Re: [PERFORM] Slow query with joins
Yes, the rowcount estimates are real, however, it has been a long time since the last VACUUM FULL (there is never a good time). I have clustered the tables, reindexed, analyzed, vacuumed and the plan now looks like this: no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' || r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat, ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra AS ad_extra, ad.deleted AS ad_deleted, co.id AS co_id, co.type AS co_type, co.value AS co_value, co.description AS co_description, co.priority AS co_priority, co.visible AS co_visible, co.searchable AS co_searchable, co.deleted AS co_deleted FROM people r LEFT OUTER JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON (r.id = co.record) WHERE NOT r.deleted AND r.original IS NULL ORDER BY r.id; QUERY PLAN -- Sort (cost=182866.49..182943.12 rows=30655 width=587) Sort Key: r.id - Nested Loop Left Join (cost=0.00..170552.10 rows=30655 width=587) - Nested Loop Left Join (cost=0.00..75054.96 rows=26325 width=160) - Index Scan using people_deleted_original_is_null on people r (cost=0.00..1045.47 rows=23861 width=27) Filter: ((NOT deleted) AND (original IS NULL)) - Index Scan using addresses_record_idx on addresses ad (cost=0.00..3.05 rows=4 width=137) Index Cond: (outer.id = ad.record) - Index Scan using contacts_record_idx on contacts co (cost=0.00..3.32 rows=24 width=431) Index Cond: (outer.id = co.record) (10 rows) Looks faster, but still very slow. I added limit 1000 and it has been running for about 25 minutes now with no output. top shows: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29994 postgres 18 0 95768 78m 68m R 17.0 7.7 0:53.27 postmaster which is unusual, I usually get 99.9 %cpu for just about any query, which leads me to believe this is disk related. postgresql.conf: shared_buffers = 8192 work_mem = 8192 maintenance_work_mem = 524288 Hardware 2x2.8GHz cpu 1GB ram Could this be an issue related to lack of VACUUM FULL? The tables get a lot of updates. Thank you very much so far! On Jan 11, 2006, at 4:45 PM, Tom Lane wrote: Bendik Rognlien Johansen [EMAIL PROTECTED] writes: Has anyone got any tips for speeding up this query? It currently takes hours to start. Are the rowcount estimates close to reality? The plan doesn't look unreasonable to me if they are. It might help to increase work_mem to ensure that the hash tables don't spill to disk. Indexes: people_original_is_null btree (original) WHERE original IS NULL This index seems poorly designed: the actual index entries are dead weight since all of them are necessarily NULL. You might as well make the index carry something that you frequently test in conjunction with original IS NULL. For instance, if this particular query is a common case, you could replace this index with CREATE INDEX people_deleted_original_is_null ON people(deleted) WHERE original IS NULL; This index is still perfectly usable for queries that only say original IS NULL, but it can also filter out rows with the wrong value of deleted. Now, if there are hardly any rows with deleted = true, maybe this won't help much for your problem. But in any case you ought to consider whether you can make the index entries do something useful. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Slow query with joins
I'd try figuring out if the join is the culprit or the sort is (by dropping the ORDER BY). work_mem is probably forcing the sort to spill to disk, and if your drives are rather busy... You might also get a win if you re-order the joins to people, contacts, addresses, if you know it will have the same result. In this case LIMIT won't have any real effect, because you have to go all the way through with the ORDER BY anyway. On Wed, Jan 11, 2006 at 08:55:32PM +0100, Bendik Rognlien Johansen wrote: Yes, the rowcount estimates are real, however, it has been a long time since the last VACUUM FULL (there is never a good time). I have clustered the tables, reindexed, analyzed, vacuumed and the plan now looks like this: no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' || r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat, ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra AS ad_extra, ad.deleted AS ad_deleted, co.id AS co_id, co.type AS co_type, co.value AS co_value, co.description AS co_description, co.priority AS co_priority, co.visible AS co_visible, co.searchable AS co_searchable, co.deleted AS co_deleted FROM people r LEFT OUTER JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON (r.id = co.record) WHERE NOT r.deleted AND r.original IS NULL ORDER BY r.id; QUERY PLAN -- Sort (cost=182866.49..182943.12 rows=30655 width=587) Sort Key: r.id - Nested Loop Left Join (cost=0.00..170552.10 rows=30655 width=587) - Nested Loop Left Join (cost=0.00..75054.96 rows=26325 width=160) - Index Scan using people_deleted_original_is_null on people r (cost=0.00..1045.47 rows=23861 width=27) Filter: ((NOT deleted) AND (original IS NULL)) - Index Scan using addresses_record_idx on addresses ad (cost=0.00..3.05 rows=4 width=137) Index Cond: (outer.id = ad.record) - Index Scan using contacts_record_idx on contacts co (cost=0.00..3.32 rows=24 width=431) Index Cond: (outer.id = co.record) (10 rows) Looks faster, but still very slow. I added limit 1000 and it has been running for about 25 minutes now with no output. top shows: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29994 postgres 18 0 95768 78m 68m R 17.0 7.7 0:53.27 postmaster which is unusual, I usually get 99.9 %cpu for just about any query, which leads me to believe this is disk related. postgresql.conf: shared_buffers = 8192 work_mem = 8192 maintenance_work_mem = 524288 Hardware 2x2.8GHz cpu 1GB ram Could this be an issue related to lack of VACUUM FULL? The tables get a lot of updates. Thank you very much so far! On Jan 11, 2006, at 4:45 PM, Tom Lane wrote: Bendik Rognlien Johansen [EMAIL PROTECTED] writes: Has anyone got any tips for speeding up this query? It currently takes hours to start. Are the rowcount estimates close to reality? The plan doesn't look unreasonable to me if they are. It might help to increase work_mem to ensure that the hash tables don't spill to disk. Indexes: people_original_is_null btree (original) WHERE original IS NULL This index seems poorly designed: the actual index entries are dead weight since all of them are necessarily NULL. You might as well make the index carry something that you frequently test in conjunction with original IS NULL. For instance, if this particular query is a common case, you could replace this index with CREATE INDEX people_deleted_original_is_null ON people(deleted) WHERE original IS NULL; This index is still perfectly usable for queries that only say original IS NULL, but it can also filter out rows with the wrong value of deleted. Now, if there are hardly any rows with deleted = true, maybe this won't help much for your problem. But in any case you ought to consider whether you can make the index entries do something useful. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] NOT LIKE much faster than LIKE?
On Wed, Jan 11, 2006 at 12:40:32PM -0600, Jim C. Nasby wrote: On Tue, Jan 10, 2006 at 02:44:47AM +0100, Andrea Arcangeli wrote: cooperative runs WHERE kernel_version NOT LIKE '%% PREEMPT %%', while preempt runs WHERE kernel_version LIKE '%% PREEMPT %%'. The only difference One thing you could do is change the like to: WHERE position(' PREEMPT ' in kernel_version) != 0 That alone fixed it, with this I don't even need the index (yet). Thanks a lot. And then create a functional index on that: CREATE INDEX indexname ON tablename ( position(' PREEMPT ' in kernel_version) ); The index only helps the above query with = 0 and not the one with != 0, but it seems not needed in practice. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] NOT LIKE much faster than LIKE?
On Wed, Jan 11, 2006 at 09:39:47PM +0100, Andrea Arcangeli wrote: On Wed, Jan 11, 2006 at 12:40:32PM -0600, Jim C. Nasby wrote: On Tue, Jan 10, 2006 at 02:44:47AM +0100, Andrea Arcangeli wrote: cooperative runs WHERE kernel_version NOT LIKE '%% PREEMPT %%', while preempt runs WHERE kernel_version LIKE '%% PREEMPT %%'. The only difference One thing you could do is change the like to: WHERE position(' PREEMPT ' in kernel_version) != 0 That alone fixed it, with this I don't even need the index (yet). Thanks a lot. The fix is online already w/o index: http://klive.cpushare.com/?branch=allscheduler=preemptive Of course I'm still fully available to test any fix for the previous LIKE query if there's interest. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] NOT LIKE much faster than LIKE?
On Wed, Jan 11, 2006 at 09:39:47PM +0100, Andrea Arcangeli wrote: CREATE INDEX indexname ON tablename ( position(' PREEMPT ' in kernel_version) ); The index only helps the above query with = 0 and not the one with != 0, but it seems not needed in practice. Hrm. If you need indexing then, you'll probably have to do 2 indexes with a WHERE clause... CREATE INDEX ... WHERE position(...) = 0; CREATE INDEX ... WHERE position(...) != 0; I suspect this is because of a lack of stats for functional indexes. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] NOT LIKE much faster than LIKE?
Jim C. Nasby [EMAIL PROTECTED] writes: On Wed, Jan 11, 2006 at 09:39:47PM +0100, Andrea Arcangeli wrote: The index only helps the above query with = 0 and not the one with != 0, but it seems not needed in practice. I suspect this is because of a lack of stats for functional indexes. No, it's because != isn't an indexable operator. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Slow query with joins
The sort is definitively the culprit. When I removed it the query was instant. I tried setting work_mem = 131072 but it did not seem to help. I really don't understand this :-( Any other ideas? Thanks! On Jan 11, 2006, at 9:23 PM, Jim C. Nasby wrote: I'd try figuring out if the join is the culprit or the sort is (by dropping the ORDER BY). work_mem is probably forcing the sort to spill to disk, and if your drives are rather busy... You might also get a win if you re-order the joins to people, contacts, addresses, if you know it will have the same result. In this case LIMIT won't have any real effect, because you have to go all the way through with the ORDER BY anyway. On Wed, Jan 11, 2006 at 08:55:32PM +0100, Bendik Rognlien Johansen wrote: Yes, the rowcount estimates are real, however, it has been a long time since the last VACUUM FULL (there is never a good time). I have clustered the tables, reindexed, analyzed, vacuumed and the plan now looks like this: no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' || r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat, ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra AS ad_extra, ad.deleted AS ad_deleted, co.id AS co_id, co.type AS co_type, co.value AS co_value, co.description AS co_description, co.priority AS co_priority, co.visible AS co_visible, co.searchable AS co_searchable, co.deleted AS co_deleted FROM people r LEFT OUTER JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON (r.id = co.record) WHERE NOT r.deleted AND r.original IS NULL ORDER BY r.id; QUERY PLAN - --- -- Sort (cost=182866.49..182943.12 rows=30655 width=587) Sort Key: r.id - Nested Loop Left Join (cost=0.00..170552.10 rows=30655 width=587) - Nested Loop Left Join (cost=0.00..75054.96 rows=26325 width=160) - Index Scan using people_deleted_original_is_null on people r (cost=0.00..1045.47 rows=23861 width=27) Filter: ((NOT deleted) AND (original IS NULL)) - Index Scan using addresses_record_idx on addresses ad (cost=0.00..3.05 rows=4 width=137) Index Cond: (outer.id = ad.record) - Index Scan using contacts_record_idx on contacts co (cost=0.00..3.32 rows=24 width=431) Index Cond: (outer.id = co.record) (10 rows) Looks faster, but still very slow. I added limit 1000 and it has been running for about 25 minutes now with no output. top shows: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29994 postgres 18 0 95768 78m 68m R 17.0 7.7 0:53.27 postmaster which is unusual, I usually get 99.9 %cpu for just about any query, which leads me to believe this is disk related. postgresql.conf: shared_buffers = 8192 work_mem = 8192 maintenance_work_mem = 524288 Hardware 2x2.8GHz cpu 1GB ram Could this be an issue related to lack of VACUUM FULL? The tables get a lot of updates. Thank you very much so far! On Jan 11, 2006, at 4:45 PM, Tom Lane wrote: Bendik Rognlien Johansen [EMAIL PROTECTED] writes: Has anyone got any tips for speeding up this query? It currently takes hours to start. Are the rowcount estimates close to reality? The plan doesn't look unreasonable to me if they are. It might help to increase work_mem to ensure that the hash tables don't spill to disk. Indexes: people_original_is_null btree (original) WHERE original IS NULL This index seems poorly designed: the actual index entries are dead weight since all of them are necessarily NULL. You might as well make the index carry something that you frequently test in conjunction with original IS NULL. For instance, if this particular query is a common case, you could replace this index with CREATE INDEX people_deleted_original_is_null ON people(deleted) WHERE original IS NULL; This index is still perfectly usable for queries that only say original IS NULL, but it can also filter out rows with the wrong value of deleted. Now, if there are hardly any rows with deleted = true, maybe this won't help much for your problem. But in any case you ought to consider whether you can make the index entries do something useful. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of
[PERFORM] Showing Column Statistics Number
Hi, Ive looked around through the docs, but cant seem to find an answer to this. If I change a columns statistics with Alter table alter column set statistics n, is there a way I can later go back and see what the number is for that column? I want to be able to tell which columns Ive changed the statistics on, and which ones I havent. Thanks, Dave
Re: [PERFORM] Showing Column Statistics Number
On Wed, Jan 11, 2006 at 04:05:18PM -0600, Dave Dutcher wrote: I've looked around through the docs, but can't seem to find an answer to this. If I change a column's statistics with Alter table alter column set statistics n, is there a way I can later go back and see what the number is for that column? I want to be able to tell which columns I've changed the statistics on, and which ones I haven't. pg_attribute.attstattarget http://www.postgresql.org/docs/8.1/interactive/catalog-pg-attribute.html -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Extremely irregular query performance
Hi, I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's) with 4Gb of RAM. I have recently noticed that the performance of some more complex queries is extremely variable and irregular. For example, I currently have a query that returns a small number of rows (5) by joining a dozen of tables. Below are the running times obtained by repeatedly lauching this query in psql (nothing else was running on the server at that time): Time: 424.848 ms Time: 1615.143 ms Time: 15036.475 ms Time: 83471.683 ms Time: 163.224 ms Time: 2454.939 ms Time: 188.093 ms Time: 158.071 ms Time: 192.431 ms Time: 195.076 ms Time: 635.739 ms Time: 164549.902 ms As you can see, the performance is most of the time pretty good (less than 1 second), but every fourth of fifth time I launch the query the server seems to go into orbit. For the longer running times, I can see from 'top' that the server process uses almost 100% of a CPU. This is rather worrisome, as I cannot be confident of the overall performance of my application with so much variance in query response times. I suspect a configuration problem related to the cache mechanism (shared_buffers? effective_cache_size?), but to be honest I do not know where to start to diagnose it. I also noticed that the query plan can vary when the same query is launched two times in a row (with no other changes to the DB in between). Is there a random aspect to the query optimizer that could explain some of the observed variance in performance ? Any help would be greatly appreciated. Thanks in advance, J-P ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] indexes on primary and foreign keys
I do a load of sql joins using primary and foreign keys. What i would like to know if PostgreSQL creates indexes on these columns automatically (in addition to using them to maintain referential integrity) or do I have to create an index manually on these columns as indicated below? CREATE TABLE cities ( city_id integer primary key, city_name varchar(50));CREATE INDEX city_id_index ON cities(city_id);Thanks for any insight. Burak
Re: [PERFORM] Extremely irregular query performance
=?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= [EMAIL PROTECTED] writes: I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's) with 4Gb of RAM. I have recently noticed that the performance of some more complex queries is extremely variable and irregular. For example, I currently have a query that returns a small number of rows (5) by joining a dozen of tables. A dozen tables? You're exceeding the geqo_threshold and getting a plan that has some randomness in it. You could either increase geqo_threshold if you can stand the extra planning time, or try increasing geqo_effort to get it to search a little harder and hopefully find a passable plan more often. See http://www.postgresql.org/docs/8.1/static/geqo.html http://www.postgresql.org/docs/8.1/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-GEQO I'm kinda surprised that you don't get better results with the default settings. We could tinker some more with the defaults, if you can provide evidence about better values ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] indexes on primary and foreign keys
Burak Seydioglu [EMAIL PROTECTED] writes: I do a load of sql joins using primary and foreign keys. What i would like to know if PostgreSQL creates indexes on these columns automatically (in addition to using them to maintain referential integrity) or do I have to create an index manually on these columns as indicated below? Indexes are only automatically created where needed to enforce a UNIQUE constraint. That includes primary keys, but not foreign keys. Note that you only really need an index on the referencing (non-unique) side of a foreign key if you are worried about performance of DELETEs or key changes on the referenced table. If you seldom or never do that, you might want to dispense with the index. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Extremely irregular query performance
On Wed, 2006-01-11 at 16:37, Jean-Philippe Côté wrote: Hi, I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's) with 4Gb of RAM. I have recently noticed that the performance of some more complex queries is extremely variable and irregular. For example, I currently have a query that returns a small number of rows (5) by joining a dozen of tables. Below are the running times obtained by repeatedly lauching this query in psql (nothing else was running on the server at that time): Time: 424.848 ms Time: 1615.143 ms Time: 15036.475 ms Time: 83471.683 ms Time: 163.224 ms Time: 2454.939 ms Time: 188.093 ms Time: 158.071 ms Time: 192.431 ms Time: 195.076 ms Time: 635.739 ms Time: 164549.902 ms As you can see, the performance is most of the time pretty good (less than 1 second), but every fourth of fifth time I launch the query the server seems to go into orbit. For the longer running times, I can see from 'top' that the server process uses almost 100% of a CPU. As mentioned earlier, it could be you're exceeding the GEQO threshold. It could also be that you are doing just enough else at the time, and have your shared buffers or sort mem high enough that you're initiating a swap storm. Mind posting all the parts of your postgresql.conf file you've changed from the default? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Extremely irregular query performance
Thanks a lot for this info, I was indeed exceeding the genetic optimizer's threshold. Now that it is turned off, I get a very stable response time of 435ms (more or less 5ms) for the same query. It is about three times slower than the best I got with the genetic optimizer on, but the overall average is much lower. I'll also try to play with the geqo parameters and see if things improve. Thanks again, J-P -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane Sent: January 11, 2006 6:03 PM To: Jean-Philippe Côté Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Extremely irregular query performance =?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= [EMAIL PROTECTED] writes: I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's) with 4Gb of RAM. I have recently noticed that the performance of some more complex queries is extremely variable and irregular. For example, I currently have a query that returns a small number of rows (5) by joining a dozen of tables. A dozen tables? You're exceeding the geqo_threshold and getting a plan that has some randomness in it. You could either increase geqo_threshold if you can stand the extra planning time, or try increasing geqo_effort to get it to search a little harder and hopefully find a passable plan more often. See http://www.postgresql.org/docs/8.1/static/geqo.html http://www.postgresql.org/docs/8.1/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-GEQO I'm kinda surprised that you don't get better results with the default settings. We could tinker some more with the defaults, if you can provide evidence about better values ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Extremely irregular query performance
If this is a query that will be executed more than once, you can also avoid incurring the planning overhead multiple times by using PREPARE. -- Mark Lewis On Wed, 2006-01-11 at 18:50 -0500, Jean-Philippe Côté wrote: Thanks a lot for this info, I was indeed exceeding the genetic optimizer's threshold. Now that it is turned off, I get a very stable response time of 435ms (more or less 5ms) for the same query. It is about three times slower than the best I got with the genetic optimizer on, but the overall average is much lower. I'll also try to play with the geqo parameters and see if things improve. Thanks again, J-P ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] indexes on primary and foreign keys
How about the performance effect on SELECT statements joining multiple tables (LEFT JOINS)? I have been reading all day and here is an excerpt from one article that is located at http://pgsql.designmagick.com/tutorial.php?id=19pid=28 [quote] The best reason to use an index is for joining multiple tables together in a single query. When two tables are joined, a recordthat exists in both tables needs to be used to link them together. If possible, the column in both tables should be indexed. [/quote] Regarding similar posts, I tried to search the archives but for some reason the search utility is not functioning. http://search.postgresql.org/archives.search?cs=utf-8fm=onst=20dt=backq=index Thank you very much for your help. Burak On 1/11/06, Michael Fuhr [EMAIL PROTECTED] wrote: On Wed, Jan 11, 2006 at 02:38:42PM -0800, Burak Seydioglu wrote: I do a load of sql joins using primary and foreign keys. What i would like to know if PostgreSQL creates indexes on these columns automatically (in addition to using them to maintain referential integrity) or do I have to create an index manually on these columns as indicated below? CREATE TABLE cities ( city_id integer primary key, city_name varchar(50) ); CREATE INDEX city_id_index ON cities(city_id);PostgreSQL automatically creates indexes on primary keys.If you runthe above CREATE TABLE statement in psql you should see a message to that effect:NOTICE:CREATE TABLE / PRIMARY KEY will create implicit index cities_pkey for table citiesIf you look at the table definition you should see the primarykey's index: test= \d cities Table public.citiesColumn | Type| Modifiers---+---+--- city_id | integer | not null city_name | character varying(50) |Indexes:cities_pkey PRIMARY KEY, btree (city_id) So you don't need to create another index on cities.city_id.However,PostgreSQL doesn't automatically create an index on the referringcolumn of a foreign key constraint, so if you have another table like CREATE TABLE districts (district_idinteger PRIMARY KEY,district_namevarchar(50),city_idinteger REFERENCES cities);then you won't automatically get an index on districts.city_id .It's generally a good idea to create one; failure to do so can causedeletes and updates on the referred-to table (cities) to be slowbecause referential integrity checks would have to do sequentialscans on the referring table (districts).Indeed, performance problems for exactly this reason occasionally come up in the mailinglists.--Michael Fuhr
[PERFORM] Please Help: PostgreSQL performance Optimization
Hi, I'm working on a project, whose implementation deals with PostgreSQL. A brief description of our application is given below. I'm running version 8.0 on a dedicated server 1Gb of RAM. my database isn't complex, it contains just 2 simple tables. CREATE TABLE cookies ( domain varchar(50) NOT NULL, path varchar(50) NOT NULL, name varchar(50) NOT NULL, principalid varchar(50) NOT NULL, host text NOT NULL, value text NOT NULL, secure bool NOT NULL, timestamp timestamp with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP+TIME '04:00:00', PRIMARY KEY (domain,path,name,principalid) ) CREATE TABLE liberty ( principalid varchar(50) NOT NULL, requestid varchar(50) NOT NULL, spassertionurl text NOT NULL, libertyversion varchar(50) NOT NULL, relaystate varchar(50) NOT NULL, PRIMARY KEY (principalid) ) I'm developping an application that uses the libpqxx to execute psql queries on the database and have to execute 500 requests at the same time. UPDATE cookies SET host='ping.icap-elios.com', value= '54E5B5491F27C0177083795F2E09162D', secure=FALSE, timestamp=CURRENT_TIMESTAMP+INTERVAL '14400 SECOND' WHERE domain='ping.icap-elios.com' AND path='/tfs' AND principalid='192.168.8.219' AND name='jsessionid' SELECT path, upper(name) AS name, value FROM cookies WHERE timestampCURRENT_TIMESTAMP AND principalid='192.168.8.219' AND secure=FALSE AND (domain='ping.icap-elios.com' OR domain='.icap-elios.com') I have to notify that the performance of is extremely variable and irregular. I can also see that the server process uses almost 100% of a CPU. I'm using the default configuration file, and i m asking if i have to change some paramters to have a good performance. Any help would be greatly appreciated. Thanks,
[PERFORM] Stable function being evaluated more than once in a single query
Hi, I've got a set-returning function, defined as STABLE, that I reference twice within a single query, yet appears to be evaluated via two seperate function scans. I created a simple query that calls the function below and joins the results to itself (Note: in case you wonder why I'd do such a query, it's not my actual query, which is much more complex. I just created this simple query to try to test out the 'stable' behavior). select proname,provolatile from pg_proc where proname = 'get_tran_filesize'; proname| provolatile +- get_tran_filesize | s (1 row) explain analyze select * from get_tran_filesize('2005-12-11 00:00:00-08','2006-01-11 15:58:33-08','{228226,228222,228210}'); QUERY PLAN -- Function Scan on get_tran_filesize (cost=0.00..12.50 rows=1000 width=40) (actual time=49.522..49.524 rows=3 loops=1) Total runtime: 49.550 ms (2 rows) explain analyze select * from get_tran_filesize('2005-12-11 00:00:00-08','2006-01-11 15:58:33-08','{228226,228222,228210}') gt, get_tran_filesize('2005-12-11 00:00:00-08','2006-01-11 15:58:33-08','{228226,228222,228210}') gt2 where gt.tran_id = gt2.tran_id; QUERY PLAN -- Merge Join (cost=124.66..204.66 rows=5000 width=80) (actual time=83.027..83.040 rows=3 loops=1) Merge Cond: (outer.tran_id = inner.tran_id) - Sort (cost=62.33..64.83 rows=1000 width=40) (actual time=40.250..40.251 rows=3 loops=1) Sort Key: gt.tran_id - Function Scan on get_tran_filesize gt (cost=0.00..12.50 rows=1000 width=40) (actual time=40.237..40.237 rows=3 loops=1) - Sort (cost=62.33..64.83 rows=1000 width=40) (actual time=42.765..42.767 rows=3 loops=1) Sort Key: gt2.tran_id - Function Scan on get_tran_filesize gt2 (cost=0.00..12.50 rows=1000 width=40) (actual time=42.748..42.751 rows=3 loops=1) Total runtime: 83.112 ms (9 rows) If I do get this working, then my question is, if I reference this function within a single query, but within seperate subqueries within the query, will it be re-evaluated each time, or just once. Basically, I'm not clear on the definition of surrounding query in the following exerpt from the Postgreql documentation: A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all calls within a single surrounding query. Thanks, Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] NOT LIKE much faster than LIKE?
On Tue, 2006-01-10 at 17:21 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I think its OK to use the MCV, but I have a problem with the current heuristics: they only work for randomly generated strings, since the selectivity goes down geometrically with length. We could certainly use a less aggressive curve for that. You got a specific proposal? non-left anchored LIKE is most likely going to be used with unstructured, variable length data - else we might use SUBSTRING instead. My proposal would be to assume that LIKE is acting on human language text data. I considered this a while back, but wrote it off in favour of dynamic sampling - but it's worth discussing this to see whether we can improve on things without that. Here's one of the links I reviewed previously: http://www.ling.lu.se/persons/Joost/Texts/studling.pdf Sigurd et al [2004] This shows word frequency distribution peaks at 3 letter/2 phoneme words, then tails off exponentially after that. Clearly when search string 3 then the selectivity must tail off exponentially also, since we couldn't find words shorter than the search string itself. The search string might be a phrase, but it seems reasonable to assume that phrases also drop off in frequency according to length. It is difficult to decide what to do at len=2 or len=3, and I would be open to various thoughts, but would default to keeping like_selectivity as it is now. Sigurd et al show that word length tails off at 0.7^Len beyond Len=3, so selectivity FIXED_CHAR_SEL should not be more than 0.7, but I see no evidence for it being as low as 0.2 (from the published results). For simplicity, where Len 3, I would make the tail off occur with factor 0.5, rather than 0.2. We could see a few more changes from those results, but curbing the aggressive tail off would be a simple and easy act. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] indexes on primary and foreign keys
At 07:21 06/01/12, Michael Fuhr wrote: On Wed, Jan 11, 2006 at 02:38:42PM -0800, Burak Seydioglu wrote: I do a load of sql joins using primary and foreign keys. What i would like to know if PostgreSQL creates indexes on these columns automatically (in addition to using them to maintain referential integrity) or do I have to create an index manually on these columns as indicated below? CREATE TABLE cities ( city_id integer primary key, city_name varchar(50) ); CREATE INDEX city_id_index ON cities(city_id); PostgreSQL automatically creates indexes on primary keys. If you run the above CREATE TABLE statement in psql you should see a message to that effect: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index cities_pkey for table cities Is there a way to suppress this notice when I create tables in a script? Best regards, KC. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] indexes on primary and foreign keys
On Jan 12, 2006, at 9:36 , K C Lau wrote: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index cities_pkey for table cities Is there a way to suppress this notice when I create tables in a script? Set[1] your log_min_messages to WARNING or higher[2]. [1](http://www.postgresql.org/docs/current/interactive/sql-set.html) [2](http://www.postgresql.org/docs/current/interactive/runtime-config- logging.html#RUNTIME-CONFIG-LOGGING-WHEN) Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] indexes on primary and foreign keys
On Thu, Jan 12, 2006 at 10:26:58AM +0900, Michael Glaesemann wrote: On Jan 12, 2006, at 9:36 , K C Lau wrote: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index cities_pkey for table cities Is there a way to suppress this notice when I create tables in a script? Set[1] your log_min_messages to WARNING or higher[2]. Or client_min_messages, depending on where you don't want to see the notice. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Extremely irregular query performance
=?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= [EMAIL PROTECTED] writes: Thanks a lot for this info, I was indeed exceeding the genetic optimizer's threshold. Now that it is turned off, I get a very stable response time of 435ms (more or less 5ms) for the same query. It is about three times slower than the best I got with the genetic optimizer on, but the overall average is much lower. Hmm. It would be interesting to use EXPLAIN ANALYZE to confirm that the plan found this way is the same as the best plan found by GEQO, and the extra couple hundred msec is the price you pay for the exhaustive plan search. If GEQO is managing to find a plan better than the regular planner then we need to look into why ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] indexes on primary and foreign keys
At 09:26 06/01/12, you wrote: On Jan 12, 2006, at 9:36 , K C Lau wrote: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index cities_pkey for table cities Is there a way to suppress this notice when I create tables in a script? Set[1] your log_min_messages to WARNING or higher[2]. [1](http://www.postgresql.org/docs/current/interactive/sql-set.html) [2](http://www.postgresql.org/docs/current/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN) Michael Glaesemann grzm myrealbox com Thanks. The side effect is that it would suppress other notices which might be useful. I was looking for a way to suppress the notice within the CREATE TABLE statement but could not. I noticed that when I specify a constraint name for the primary key, it would create an implicit index with the constraint name. So may be if the optional constraint name is specified by the user, then the notice can be suppressed. Indeed the manual already says that the index will be automatically created. BTW, there's an extra space in link[2] above which I have removed. Best regards, KC. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Index isn't used during a join.
When grilled further on (Wed, 11 Jan 2006 10:33:03 -0500), Tom Lane [EMAIL PROTECTED] confessed: The planner understands about transitivity of equality, ie given a = b and b = c it can infer a = c. It doesn't do any such thing for inequalities though, nor does it deduce f(a) = f(b) for arbitrary functions f. The addition Michael suggested requires much more understanding of the properties of the functions in your query than I think would be reasonable to put into the planner. OK. I think reached a point that I need to re-organize how the data is stored, maybe ridding myself of the schema and switching entirely to views. At that point, I likely could rid myself of the function (unmunge_time) I'm using, and work with times and doy fields. Thanks, Rob -- 21:17:00 up 4 days, 13:43, 9 users, load average: 2.02, 2.18, 2.23 Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006 pgpFfPrTCdHVy.pgp Description: PGP signature
Re: [PERFORM] Stable function being evaluated more than once in a single query
Mark Liberman [EMAIL PROTECTED] writes: I've got a set-returning function, defined as STABLE, that I reference twice within a single query, yet appears to be evaluated via two seperate function scans. There is no guarantee, express or implied, that this won't be the case. (Seems like we just discussed this a couple days ago...) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] indexes on primary and foreign keys
K C Lau [EMAIL PROTECTED] writes: Thanks. The side effect is that it would suppress other notices which might be useful. There's been some discussion of subdividing the present notice category into two subclasses, roughly defined as only novices wouldn't know this and maybe this is interesting. What's missing at this point is a concrete proposal as to which existing NOTICE messages should go into each category. If you feel like tackling the project, go for it... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq