Re: [PERFORM] Linux mis-reporting memory
"Tom Lane" <[EMAIL PROTECTED]> writes: > Decibel! <[EMAIL PROTECTED]> writes: >> I'm finding this rather interesting report from top on a Debian box... > >> Mem: 32945280k total, 32871832k used,73448k free, 247432k buffers >> Swap: 1951888k total,42308k used, 1909580k free, 30294300k cached > >> So how is it that linux thinks that 30G is cached? > > Why would you think that a number reported by the operating system has > something to do with Postgres' shared memory? I think his question is how can the kernel be using 30G for kernel buffers if it only has 32G total and 8G of that is taken up by Postgres's shared buffers. It seems to imply Linux is paging out sysV shared memory. In fact some of Heikki's tests here showed that Linux would do precisely that. If your working set really is smaller than shared buffers then that's not so bad. Those buffers really would be completely idle anyways. But if your working set is larger than shared buffers and you're just not thrashing it hard enough to keep it in RAM then it's really bad. The buffer Linux will choose to page out are precisely those that Postgres will likely choose shortly as victim buffers, forcing Linux to page them back in just so Postgres can overwrite them. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Linux mis-reporting memory
On Fri, 2007-09-21 at 09:03 +0100, Gregory Stark wrote: > >> Mem: 32945280k total, 32871832k used,73448k free, 247432k buffers > >> Swap: 1951888k total,42308k used, 1909580k free, 30294300k cached > > > It seems to imply Linux is paging out sysV shared memory. In fact some of > Heikki's tests here showed that Linux would do precisely that. But then why is it not reporting that in the "Swap: used" section ? It only reports 42308k used swap. I have a box where I just executed 3x a select count(*) from a table which has ~5.5 GB size on disk, and the count executed in <4 seconds, which I take as it is all cached (shared memory is set to 12GB - I use the box for testing for now, otherwise I would set it far lower because I have bad experience with setting it more than 1/3 of the available memory). Top reported at the end of the process: Mem: 16510724k total, 16425252k used,85472k free,10144k buffers Swap: 7815580k total, 157804k used, 7657776k free, 15980664k cached I also watched it during the selects, but it was not significantly different. So my only conclusion is that the reported "cached" value is either including the shared memory or is simply wrong... or I just don't get how linux handles memory. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Linux mis-reporting memory
Hi, Le Friday 21 September 2007 01:04:01 Decibel!, vous avez écrit : > I'm finding this rather interesting report from top on a Debian box... I've read from people in other free software development groups that top/ps memory usage outputs are not useful not trustable after all. A more usable (or precise or trustworthy) tool seems to be exmap: http://www.berthels.co.uk/exmap/ Hope this helps, -- dim ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Linux mis-reporting memory
"Csaba Nagy" <[EMAIL PROTECTED]> writes: > On Fri, 2007-09-21 at 09:03 +0100, Gregory Stark wrote: >> >> Mem: 32945280k total, 32871832k used,73448k free, 247432k buffers >> >> Swap: 1951888k total,42308k used, 1909580k free, 30294300k cached >> > >> It seems to imply Linux is paging out sysV shared memory. In fact some of >> Heikki's tests here showed that Linux would do precisely that. > > But then why is it not reporting that in the "Swap: used" section ? It > only reports 42308k used swap. Hm, good point. The other possibility is that Postgres just hasn't even touched a large part of its shared buffers. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Linux mis-reporting memory
On Fri, 2007-09-21 at 10:43 +0100, Gregory Stark wrote: > The other possibility is that Postgres just hasn't even touched a large part > of its shared buffers. > But then how do you explain the example I gave, with a 5.5GB table seq-scanned 3 times, shared buffers set to 12 GB, and top still showing almost 100% memory as cached and no SWAP "used" ? In this case you can't say postgres didn't touch it's shared buffers - or a sequential scan won't use the shared buffers ? Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Searching for the cause of a bad plan
Hi all, Postgres version: 8.2.4 Tables: table_a(a bigint, b bigint, primary key(a, b) ); table_b1(b bigint primary key, more columns...); table_b2(b bigint primary key references table_b1(b), more columns...); table_b1: ~ 27M rows; ~25 more columns; width=309 (as reported by explain select *); table_a: ~400M rows; - column "b" should reference table_b1, but it does not for performance reasons (it is an insert only table); - column "a" distinct values: 1148 - has (a, b) as primary key; - has no additional columns; table_b1: ~40K rows; ~70 more columns; width=1788 (as reported by explain select *); Statistics for the involved columns for each table are attached in files (to preserve the spacing). They were taken after analyzing the relevant table (except for table_b2 where I added the "fiddled" statistics first and then remembered to analyze fresh, resulting in the "non_fiddled" version, which gives the same result as the fiddled one). The problem query is: prepare test_001(bigint) as SELECT tb.* FROM table_a ta JOIN table_b2 tb ON ta.b=tb.b WHERE ta.a = $1 ORDER BY ta.a, ta.b limit 10; Explain gives Plan 1 (see attached plans.txt) If I set enable_hashjoin=off and enable_mergejoin=off, I get Plan 2 (again, see plans.txt). The difference is a 30x improvement in the second case... (I actually forgot to account for cache effects, but later rerun the queries multiple times and the timings are proportional). Additionally, if I replace table_b2 with table_b1 in the query, I get Plan 3 (with reasonable execution time) with both enable_hashjoin and enable_mergejoin on. So there is something which makes table_b2 different from table_b1 for planning purposes, but I could not identify what that is... they have differences in statistics, but fiddling with the stats gave me no difference in the plan. Looking at Plan 2, it looks like the "limit" step is estimating wrongly it's cost. I guessed that it does that because it thinks the "b" values selected from table_a for a given "a" span a larger range than the "b" values in table_b2, because the "b" values in table_b2 are a (relatively small) subset of the "b" values in table_a. But this is not the case, the query only gets "a" values for which all the "b" values in table_a will be found in table_b2. Of course the planner has no way to know this, but then I think it is not the case, as I tried to copy the histogram statistics in pg_statistic for the column "b" from the entry for table_b1 (which contains the whole span of "b" values) to the entry for table_b2, with no change in the plan. Just for the record, this query is just a part of a more complex one, which joins in bigger tables, resulting in even worse performance, but I tracked it down to refusing the nested loop to be the problem. Is there anything I could do to convince the planner to use here the nested loop plan ? Thanks, Csaba. attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation -+---+---++-+-+--+- a | 0 | 8 | 1148 | {31826743,31855101,31855343,31854918,31856328,31861573,31855122,31855130,31855189,31856426} | {0.005,0.0047,0.0043,0.004,0.004,0.004,0.0037,0.0037,0.0037,0.0037} | {31734956,31854364,31854732,31855162,31855660,31857144,31858109,31858965,31859762,31860576,31861566} |0.999608 b | 0 | 8 | -1 | | | {63977,36878147,42247866,42548692,42812320,46992026,51444368,55977972,56607708,59496742,68530614} |0.602959 attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation -+---+---++--+---+-
Re: [PERFORM] Linux mis-reporting memory
Csaba Nagy wrote: > On Fri, 2007-09-21 at 10:43 +0100, Gregory Stark wrote: >> The other possibility is that Postgres just hasn't even touched a large part >> of its shared buffers. > > But then how do you explain the example I gave, with a 5.5GB table > seq-scanned 3 times, shared buffers set to 12 GB, and top still showing > almost 100% memory as cached and no SWAP "used" ? In this case you can't > say postgres didn't touch it's shared buffers - or a sequential scan > won't use the shared buffers ? Which version of Postgres is this? In 8.3, a scan like that really won't suck it all into the shared buffer cache. For seq scans on tables larger than shared_buffers/4, it switches to the bulk read strategy, using only a few buffers, and choosing the starting point with the scan synchronization facility. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 11:59 +0100, Simon Riggs wrote: > Please re-run everything on clean tables without frigging the stats. We > need to be able to trust what is happening is normal. I did, the plan fiddling happened after getting the plans after a fresh analyze, and I did run the plan again with fresh analyze just before sending the mail and the plan was the same. In fact I spent almost 2 days playing with the query which is triggering this behavior, until I tracked it down to this join. Thing is that we have many queries which rely on this join, so it is fairly important that we understand what happens there. > Plan2 sees that b1 is wider, which will require more heap blocks to be > retrieved. It also sees b1 is less correlated than b2, so again will > require more database blocks to retrieve. Try increasing > effective_cache_size. effective_cach_size is set to ~2.7G, the box has 4G memory. I increased it now to 3,5G but it makes no difference. I increased it further to 4G, no difference again. > Can you plans with/without LIMIT and with/without cursor, for both b1 > and b2? The limit is unfortunately absolutely needed part of the query, it makes no sense to try without. If it would be acceptable to do it without the limit, then it is entirely possible that the plan I get now would be indeed better... but it is not acceptable. Thanks, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 12:03 +0200, Csaba Nagy wrote: > prepare test_001(bigint) as > SELECT tb.* > FROM table_a ta > JOIN table_b2 tb ON ta.b=tb.b > WHERE ta.a = $1 > ORDER BY ta.a, ta.b > limit 10; Please re-run everything on clean tables without frigging the stats. We need to be able to trust what is happening is normal. Plan2 sees that b1 is wider, which will require more heap blocks to be retrieved. It also sees b1 is less correlated than b2, so again will require more database blocks to retrieve. Try increasing effective_cache_size. Can you plans with/without LIMIT and with/without cursor, for both b1 and b2? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Linux mis-reporting memory
On Fri, 2007-09-21 at 12:08 +0200, Csaba Nagy wrote: > On Fri, 2007-09-21 at 10:43 +0100, Gregory Stark wrote: > > The other possibility is that Postgres just hasn't even touched a large part > > of its shared buffers. > > > > But then how do you explain the example I gave, with a 5.5GB table > seq-scanned 3 times, shared buffers set to 12 GB, and top still showing > almost 100% memory as cached and no SWAP "used" ? In this case you can't > say postgres didn't touch it's shared buffers - or a sequential scan > won't use the shared buffers ? Well, 6.5GB of shared_buffers could be swapped out and need not be swapped back in to perform those 3 queries. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 13:29 +0200, Csaba Nagy wrote: > > Can you plans with/without LIMIT and with/without cursor, for both b1 > > and b2? > > The limit is unfortunately absolutely needed part of the query Understood, but not why I asked... -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Linux mis-reporting memory
On Fri, 2007-09-21 at 11:34 +0100, Heikki Linnakangas wrote: > Which version of Postgres is this? In 8.3, a scan like that really won't > suck it all into the shared buffer cache. For seq scans on tables larger > than shared_buffers/4, it switches to the bulk read strategy, using only > a few buffers, and choosing the starting point with the scan > synchronization facility. > This was on 8.1.9 installed via apt-get on Debian 4.1.1-21. In any case I'm pretty sure linux swaps shared buffers, as I always got worse performance for shared buffers more than about 1/3 of the memory. But in that case the output of top is misleading. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 12:34 +0100, Simon Riggs wrote: > On Fri, 2007-09-21 at 13:29 +0200, Csaba Nagy wrote: > > > > Can you plans with/without LIMIT and with/without cursor, for both b1 > > > and b2? > > > > The limit is unfortunately absolutely needed part of the query > > Understood, but not why I asked... > Well, the same query without limit goes: dbdop=# explain execute test_001(31855344); QUERY PLAN Sort (cost=322831.85..322831.94 rows=36 width=1804) Sort Key: ta.a, ta.b -> Hash Join (cost=3365.60..322830.92 rows=36 width=1804) Hash Cond: (ta.b = tb.b) -> Index Scan using pk_table_a on table_a ta (cost=0.00..314541.78 rows=389648 width=16) Index Cond: (a = $1) -> Hash (cost=524.71..524.71 rows=41671 width=1788) -> Seq Scan on table_b2 tb (cost=0.00..524.71 rows=41671 width=1788) I'm not sure what you mean without cursor, maybe not using prepare ? Well we set up the JDBC driver to always prepare the queries, as this gives us much better worst case plans than when letting postgres see the parameter values, especially in queries with limit. So I simulate that when explaining the behavior we see. All our limit queries are for interactive display, so the worst case is of much higher importance for us than the mean execution time... unfortunately postgres has a tendency to take the best mean performance path than avoid worst case, and it is not easy to convince it otherwise. Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!
Dennis, Thanks for your reply. No, the OLD server are no longer available (decommissioned) - the new servers are definitely better h\w. I do not have any queries to EXPLAIN ANALYZE as they are built by the application and I am not allowed to enable logging on for that server - so where do I go from here??? I am pretty much trying to make changes in the postgresql.conf file but don't have a CLUE as to what starting numbers I should be looking at to change(???) Here is the EXPLAIN ANALYZE for the ONE (1) query I do have...it takes 4 - 5 hours to run a SELECT with the 'EXPLAIN ANALYZE': QUERY PLAN - --- Limit (cost=100013612.76..299939413.70 rows=1 width=8) (actual time=10084289.859..10084289.861 rows=1 loops=1) -> Subquery Scan people_consent (cost=100013612.76..624068438343.99 rows=3121 width=8) (actual time=10084289.853..10084289.853 rows=1 loops=1) -> Append (cost=100013612.76..624068438312.78 rows=3121 width=815) (actual time=10084289.849..10084289.849 rows=1 loops=1) -> Nested Loop (cost=100013612.76..100013621.50 rows=2 width=815) (actual time=10084289.846..10084289.846 rows=1 loops=1) -> Unique (cost=100013612.76..100013612.77 rows=2 width=8) (actual time=10084289.817..10084289.817 rows=1 loops=1) -> Sort (cost=100013612.76..100013612.77 rows=2 width=8) (actual time=10084289.814..10084289.814 rows=1 loops=1) Sort Key: temp_consent.id -> Unique (cost=100013612.71..100013612.73 rows=2 width=36) (actual time=10084245.195..10084277.468 rows=7292 loops=1) -> Sort (cost=100013612.71..100013612.72 rows=2 width=36) (actual time=10084245.191..10084254.425 rows=7292 loops=1) Sort Key: id, daterecorded, answer -> Append (cost=100013515.80..100013612.70 rows=2 width=36) (actual time=10083991.226..10084228.613 rows=7292 loops=1) -> HashAggregate (cost=100013515.80..100013515.82 rows=1 width=36) (actual time=10083991.223..10083998.046 rows=3666 loops=1) -> Nested Loop (cost=10060.61..100013515.80 rows=1 width=36) (actual time=388.263..10083961.330 rows=3702 loops=1) -> Nested Loop (cost=10060.61..100013511.43 rows=1 width=36) (actual time=388.237..10083897.268 rows=3702 loops=1) -> Nested Loop (cost=10060.61..100013507.59 rows=1 width=24) (actual time=388.209..10083833.870 rows=3702 loops=1) -> Nested Loop (cost=10060.61..100013504.56 rows=1 width=24) (actual time=388.173..10083731.122 rows=3702 loops=1) Join Filter: ("inner".question_answer_id = "outer".id) -> Nested Loop (cost=60.61..86.33 rows=1 width=28) (actual time=13.978..114.768 rows=7430 loops=1) -> Index Scan using answers_answer_un on answers a (cost=0.00..5.01 rows=1 width=28) (actual time=0.084..0.088 rows=1 loops=1) Index Cond: ((answer)::text = 'Yes'::text) -> Bitmap Heap Scan on questions_answers qa (cost=60.61..81.23 rows=7 width=16) (actual time=13.881..87.112 rows=7430 loops=1) Recheck Cond: ((qa.answer_id = "outer".id) AND (((qa.question_tag)::text = 'consentTransfer'::text) OR ((qa.question_tag)::text = 'share WithEval'::text))) -> BitmapAnd (cost=60.61..60.61 rows=7 width=0) (actual time=13.198..13.198 rows=0 loops=1) -> Bitmap Index Scan on qs_as_answer_id (cost=0.00..5.27 rows=649 width=0) (actual time=9.689..9.689 rows=57804 loops=1)
Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!
In response to smiley2211 <[EMAIL PROTECTED]>: > > Dennis, > > Thanks for your reply. > > No, the OLD server are no longer available (decommissioned) - the new > servers are definitely better h\w. Says who? I've heard that one before, and I've seen it be false. Some wonk replaced a 1Ghz system with 1G of RAM and a high-end SCSI RAID 10 with a new 3ghz server with 4G of ram and a cheapo SATA-based RAID 5, but doesn't know he was better off with the older system? That may not apply to you, or it might. We don't know because you didn't give us details. > I do not have any queries to EXPLAIN ANALYZE as they are built by the > application and I am not allowed to enable logging on for that server - so > where do I go from here??? Update your resume. If you're expected to performance tune this system, but you're not allowed to enable logging and you can't get a look at the queries, you're going to looking for new employment soon, because you've been asked to do the impossible. > I am pretty much trying to make changes in the postgresql.conf file but > don't have a CLUE as to what starting numbers I should be looking at to > change(???) > > Here is the EXPLAIN ANALYZE for the ONE (1) query I do have...it takes 4 - 5 > hours to run a SELECT with the 'EXPLAIN ANALYZE': It's very difficult (if not impossible) to make sense of this output without the query itself. It would also be nice if your mail program didn't mangle the output, as it would save folks having to reconstruct it. > > > > QUERY PLAN > > > - > --- > Limit (cost=100013612.76..299939413.70 rows=1 width=8) (actual > time=10084289.859..10084289.861 rows=1 loops=1) >-> Subquery Scan people_consent (cost=100013612.76..624068438343.99 > rows=3121 width=8) (actual time=10084289.853..10084289.853 rows=1 loops=1) > -> Append (cost=100013612.76..624068438312.78 rows=3121 > width=815) (actual time=10084289.849..10084289.849 rows=1 loops=1) >-> Nested Loop (cost=100013612.76..100013621.50 rows=2 > width=815) (actual time=10084289.846..10084289.846 rows=1 loops=1) > -> Unique (cost=100013612.76..100013612.77 rows=2 > width=8) (actual time=10084289.817..10084289.817 rows=1 loops=1) >-> Sort (cost=100013612.76..100013612.77 rows=2 > width=8) (actual time=10084289.814..10084289.814 rows=1 loops=1) > Sort Key: temp_consent.id > -> Unique > (cost=100013612.71..100013612.73 rows=2 width=36) (actual > time=10084245.195..10084277.468 rows=7292 loops=1) >-> Sort > (cost=100013612.71..100013612.72 rows=2 width=36) (actual > time=10084245.191..10084254.425 rows=7292 loops=1) > Sort Key: id, daterecorded, > answer > -> Append > (cost=100013515.80..100013612.70 rows=2 width=36) (actual > time=10083991.226..10084228.613 rows=7292 loops=1) >-> HashAggregate > (cost=100013515.80..100013515.82 rows=1 width=36) (actual > time=10083991.223..10083998.046 rows=3666 loops=1) > -> Nested Loop > (cost=10060.61..100013515.80 rows=1 width=36) (actual > time=388.263..10083961.330 rows=3702 loops=1) >-> Nested > Loop (cost=10060.61..100013511.43 rows=1 width=36) (actual > time=388.237..10083897.268 rows=3702 loops=1) > -> > Nested Loop (cost=10060.61..100013507.59 rows=1 width=24) (actual > time=388.209..10083833.870 rows=3702 loops=1) > > -> Nested Loop (cost=10060.61..100013504.56 rows=1 width=24) (actual > time=388.173..10083731.122 rows=3702 loops=1) > > > Join Filter: ("inner".question_answer_id = "outer".id) > > > -> Nested Loop (cost=60.61..86.33 rows=1 width=28) (actual > time=13.978..114.768 rows=7430 loops=1) > > > -> Index Scan using answers_answer_un on answers a (cost=0.00..5.01 rows=1 > width=28) (actual time=0.084..0.088 rows=1 loops=1) >
Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!
I suffered the same fate when upgrading some time back. The single biggest issue for me was that the default 8.X setup changed what had been fast query plans using indexes to slow plans using sequential scans. Changing the random_page_cost in postgresql.conf from 4.0 to 2.0 (which indicates to Postgres that reading index pages isn't such a big deal, encouraging index use) solved most of these issues for me. Jeff -Original Message- From: smiley2211 [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 8:14 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!! Dennis, Thanks for your reply. No, the OLD server are no longer available (decommissioned) - the new servers are definitely better h\w. I do not have any queries to EXPLAIN ANALYZE as they are built by the application and I am not allowed to enable logging on for that server - so where do I go from here??? I am pretty much trying to make changes in the postgresql.conf file but don't have a CLUE as to what starting numbers I should be looking at to change(???) Here is the EXPLAIN ANALYZE for the ONE (1) query I do have...it takes 4 - 5 hours to run a SELECT with the 'EXPLAIN ANALYZE': QUERY PLAN - --- Limit (cost=100013612.76..299939413.70 rows=1 width=8) (actual time=10084289.859..10084289.861 rows=1 loops=1) -> Subquery Scan people_consent (cost=100013612.76..624068438343.99 rows=3121 width=8) (actual time=10084289.853..10084289.853 rows=1 loops=1) -> Append (cost=100013612.76..624068438312.78 rows=3121 width=815) (actual time=10084289.849..10084289.849 rows=1 loops=1) -> Nested Loop (cost=100013612.76..100013621.50 rows=2 width=815) (actual time=10084289.846..10084289.846 rows=1 loops=1) -> Unique (cost=100013612.76..100013612.77 rows=2 width=8) (actual time=10084289.817..10084289.817 rows=1 loops=1) -> Sort (cost=100013612.76..100013612.77 rows=2 width=8) (actual time=10084289.814..10084289.814 rows=1 loops=1) Sort Key: temp_consent.id -> Unique (cost=100013612.71..100013612.73 rows=2 width=36) (actual time=10084245.195..10084277.468 rows=7292 loops=1) -> Sort (cost=100013612.71..100013612.72 rows=2 width=36) (actual time=10084245.191..10084254.425 rows=7292 loops=1) Sort Key: id, daterecorded, answer -> Append (cost=100013515.80..100013612.70 rows=2 width=36) (actual time=10083991.226..10084228.613 rows=7292 loops=1) -> HashAggregate (cost=100013515.80..100013515.82 rows=1 width=36) (actual time=10083991.223..10083998.046 rows=3666 loops=1) -> Nested Loop (cost=10060.61..100013515.80 rows=1 width=36) (actual time=388.263..10083961.330 rows=3702 loops=1) -> Nested Loop (cost=10060.61..100013511.43 rows=1 width=36) (actual time=388.237..10083897.268 rows=3702 loops=1) -> Nested Loop (cost=10060.61..100013507.59 rows=1 width=24) (actual time=388.209..10083833.870 rows=3702 loops=1) -> Nested Loop (cost=10060.61..100013504.56 rows=1 width=24) -> (actual time=388.173..10083731.122 rows=3702 loops=1) Join Filter: ("inner".question_answer_id = "outer".id) -> Nested Loop (cost=60.61..86.33 rows=1 width=28) (actual time=13.978..114.768 rows=7430 loops=1) -> Index Scan using answers_answer_un on answers a (cost=0.00..5.01 -> rows=1 width=28) (actual time=0.084..0.088 rows=1 loops=1) Index Cond: ((answer)::text = 'Yes'::text) -> Bitmap Heap Scan on questions_answers qa (cost=60.61..81.23 rows=7 width=16) (actual time=13.881..87.112 rows=7430 loops=1) Recheck Cond: ((qa.answer_id = "outer".id) AND (((qa.question_tag)::text = 'consentTransfer'::text) OR ((qa.question_tag)::text = 'share WithEval'::text))) -> BitmapAnd (cost=60.61..60.61 rows=7 width=0) (actual time=13.198..13.198 rows=0 loops=1) -> Bitmap Index Scan on qs_as_answer_id (cost=0.00..5.27 rows=649 -> width=0) (actual time=9.689..9.689 rows=57804 loops=1) Index Cond: (qa.answer_id = "outer".id) -> BitmapOr (cost=55.08..55.08 rows=6596 width=0) (actual time=2.563..2.563 rows=0 loops=1) -> Bitmap Index Scan on qs_as_qtag (cost=0.00..27.54 rows=3298 -> width=0) (actual time=1.923..1.923 r
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 14:12 +0200, Csaba Nagy wrote: > On Fri, 2007-09-21 at 12:34 +0100, Simon Riggs wrote: > > On Fri, 2007-09-21 at 13:29 +0200, Csaba Nagy wrote: > > > > > > Can you plans with/without LIMIT and with/without cursor, for both b1 > > > > and b2? > > > > > > The limit is unfortunately absolutely needed part of the query > > > > Understood, but not why I asked... > > > Well, the same query without limit goes: OK, thanks. > I'm not sure what you mean without cursor, maybe not using prepare ? Sorry, misread that. === I think I understand now: The cost of the LIMIT is being applied, but in slightly the wrong way. The cost of the Nested Loop node is reduced by the fraction of LIMIT/(number of expected rows), which is only an approximation of what we're doing. In Plan 2 this leads to the wildly wrong estimate that each row costs 49,851 cost units to retrieve, which is about x50 wrong. In Plan 3 that approximation leads to a more reasonable cost, so this works in Plan 3, but doesn't in Plan 2. What we should do is push down the effect of the LIMIT so that the cost of the Index Scan on ta reflects the fact that it returns only 10 rows. It correctly expects 388638 rows that match the value requested, but it is not retrieving all of them. The executor handles the query efficiently but the cost model doesn't reflect what the executor actually does and so we pick the wrong plan. Pushing down the LIMIT would only be possible when LIMIT has a constant value at plan time, but that seems like most of the time to my eyes. The plan estimates should look like this for Plan 2 (marked **) Limit (cost=0.00.. rows=10 width=1804) -> Nested Loop (cost=0.00..X rows=10 width=1804) -> Index Scan using pk_table_a on table_a ta (cost=0.00..**11.96** rows=**10** width=16) Index Cond: (a = $1) -> Index Scan using pk_table_b2 on table_b2 tb (cost=0.00..3.77 rows=1 width=1788) Index Cond: (ta.b = tb.b) Incidentally, the way out of this is to improve the stats by setting stats target = 1000 on column a of ta. That will allow the optimizer to have a better estimate of the tail of the distribution of a, which should then be more sensibly reflected in the cost of the Index Scan. That doesn't solve the actual problem, but should help in your case. Plans copied below for better clarity: Plan 2: db> explain analyze execute test_001(31855344); QUERY PLAN -- Limit (cost=0.00..498511.80 rows=10 width=1804) (actual time=17.729..21.672 rows=2 loops=1) -> Nested Loop (cost=0.00..1794642.48 rows=36 width=1804) (actual time=17.729..21.671 rows=2 loops=1) -> Index Scan using pk_table_a on table_a ta (cost=0.00..324880.88 rows=388638 width=16) (actual time=0.146..0.198 rows=2 loops=1) Index Cond: (a = $1) -> Index Scan using pk_table_b2 on table_b2 tb (cost=0.00..3.77 rows=1 width=1788) (actual time=10.729..10.731 rows=1 loops=2) Index Cond: (ta.b = tb.b) Total runtime: 21.876 ms Plan 3: db> explain analyze execute test_001(31855344); QUERY PLAN -- Limit (cost=0.00..853.14 rows=10 width=325) (actual time=20.117..28.104 rows=2 loops=1) -> Nested Loop (cost=0.00..2024323.48 rows=23728 width=325) (actual time=20.116..28.101 rows=2 loops=1) -> Index Scan using pk_table_a on table_a ta (cost=0.00..327561.01 rows=388684 width=16) (actual time=0.023..0.027 rows=2 loops=1) Index Cond: (a = $1) -> Index Scan using pk_table_b1 on table_b1 tb (cost=0.00..4.35 rows=1 width=309) (actual time=14.032..14.034 rows=1 loops=2) Index Cond: (ta.b = tb.b) Total runtime: 28.200 ms -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Searching for the cause of a bad plan
[snip] Ok, I was not able to follow your explanation, it's too deep for me into what the planner does... > Incidentally, the way out of this is to improve the stats by setting > stats target = 1000 on column a of ta. That will allow the optimizer to > have a better estimate of the tail of the distribution of a, which > should then be more sensibly reflected in the cost of the Index Scan. > That doesn't solve the actual problem, but should help in your case. OK, I can confirm that. I set the statistics target for column "a" on table_a to 1000, analyzed, and got the plan below. The only downside is that analyze became quite expensive on table_a, it took 15 minutes and touched half of the pages... I will experiment with lower settings, maybe it will work with less than 1000 too. db> explain analyze execute test_001(31855344); QUERY PLAN -- Limit (cost=0.00..4499.10 rows=10 width=1804) (actual time=103.566..120.363 rows=2 loops=1) -> Nested Loop (cost=0.00..344630.97 rows=766 width=1804) (actual time=103.563..120.359 rows=2 loops=1) -> Index Scan using pk_table_a on table_a ta (cost=0.00..67097.97 rows=78772 width=16) (actual time=71.965..77.284 rows=2 loops=1) Index Cond: (a = $1) -> Index Scan using pk_table_b2 on table_b2 tb (cost=0.00..3.51 rows=1 width=1788) (actual time=21.526..21.528 rows=1 loops=2) Index Cond: (ta.b = tb.b) Total runtime: 120.584 ms Thanks, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!
>From: smiley2211 >Subject: Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!! > >-> Seq Scan on encounters_questions_answers eqa >(cost=1.00..17608.66 rows=464766 width=8) (actual >time=0.003..735.934 rows=464766 loop >s=7430) It looks like enable_seqscan is set to false. For some reason that might have worked on 7.4, but I would try turning that back on for 8.1. Sequential scans aren't always bad, sometimes they are faster than index scans. I would first try running the system with all the enable_* settings on. If you can't turn on logging its going to be very hard to track down the problem. The easiest way to track down a problem normally is to set log_min_duration to something like 2000ms. Then Postgres will log all slow queries. Then you can run EXPLAIN ANALYZE on the slow queries to find the problem. I think Carlos had a good idea when he asked about the encoding on the new server vs the old. Does your application use the like keyword to compare text fields? If so, you might need to create indexes which use the text_pattern_ops operator classes. With unicode postgres cannot use an index scan for a query like SELECT * FROM foo WHERE name LIKE 'Bob%' unless there is an index like CREATE INDEX name_index ON foo (name text_pattern_ops). However if you are not using like queries, then this is not your problem. More on operator classes: http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html Dave ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Query planner unaware of possibly best plan
Hi, I think the query planner is unaware of the possibly best plan in some situations. See the test case below: -- --- -- CREATE TABLE tparent ( id integer NOT NULL, ord integer NOT NULL, CONSTRAINT par_pkey_id PRIMARY KEY (id), CONSTRAINT par_uniq_ord UNIQUE (ord) ); CREATE TABLE tchild ( par_id integer NOT NULL, ord integer NOT NULL, CONSTRAINT chi_pkey_parid_ord PRIMARY KEY (par_id, ord), CONSTRAINT chi_fkey FOREIGN KEY (par_id) REFERENCES tparent(id) ); INSERT INTO tparent VALUES (1, 3); INSERT INTO tparent VALUES (2, 1); INSERT INTO tparent VALUES (3, 4); INSERT INTO tparent VALUES (4, 5); INSERT INTO tparent VALUES (5, 2); INSERT INTO tchild VALUES (1, 2); INSERT INTO tchild VALUES (1, 1); INSERT INTO tchild VALUES (2, 1); INSERT INTO tchild VALUES (2, 3); INSERT INTO tchild VALUES (2, 2); INSERT INTO tchild VALUES (3, 1); INSERT INTO tchild VALUES (3, 2); INSERT INTO tchild VALUES (4, 1); INSERT INTO tchild VALUES (5, 2); INSERT INTO tchild VALUES (5, 1); ANALYZE tparent; ANALYZE tchild; SET enable_seqscan TO false; SET enable_bitmapscan TO false; SET enable_hashjoin TO false; SET enable_mergejoin TO false; SET enable_sort TO false; EXPLAIN ANALYZE SELECT * FROM tparent JOIN tchild ON tchild.par_id = tparent.id WHERE tparent.ord BETWEEN 1 AND 4 ORDER BY tparent.ord, tchild.ord; -- --- -- Sort (cost=10132.10..10140.10 rows=8 width=16) (actual time=0.440..0.456 rows=9 loops=1) Sort Key: tparent.ord, tchild.ord -> Nested Loop (cost=0.00..84.10 rows=8 width=16) (actual time=0.179..0.270 rows=9 loops=1) -> Index Scan using par_uniq_ord on tparent (cost=0.00..20.40 rows=4 width=8) (actual time=0.089..0.098 rows=4 loops=1) Index Cond: ((ord >= 1) AND (ord <= 4)) -> Index Scan using chi_pkey_parid_ord on tchild (cost=0.00..9.93 rows=2 width=8) (actual time=0.023..0.028 rows=2 loops=4) Index Cond: (tchild.par_id = "outer".id) -- --- -- Even though I forced the nested loop plan using both indexes (that returns the rows in the correct order), there is a needless sort step on the top, consuming half of the time even on such small tables. Now it's clear why the planner did not choose this plan, why I had to force it: because it isn't the best if the sort is still there. The first time I posted this ( http://archives.postgresql.org/pgsql-general/2007-05/msg01306.php ) and read Tom's answer I was convinced that this is rarely a problem, but now I don't think so, since I ran into it for the third time. Can that sort step somehow be eliminated if the NestLoop's outer table is being scanned via a unique index? If not, how can I rewrite my indexes/query in such a way that it's still safe (the rows always come in the order I want), but I don't have to wait for that needless sort? I'm using PostgreSQL 8.1.8. Thanks in advance, Denes Daniel Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en ___ www.t-mobile.hu/mobizin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Searching for the cause of a bad plan
> OK, I can confirm that. I set the statistics target for column "a" on > table_a to 1000, analyzed, and got the plan below. The only downside is > that analyze became quite expensive on table_a, it took 15 minutes and > touched half of the pages... I will experiment with lower settings, > maybe it will work with less than 1000 too. So, just to finish this up: setting statistics to 100 worked too, and it has an acceptable impact on analyze. My original (more complicated) query is working fine now, with visible effects on server load... Thanks Simon for your help ! Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Searching for the cause of a bad plan
Csaba Nagy <[EMAIL PROTECTED]> writes: > Looking at Plan 2, it looks like the "limit" step is estimating wrongly > it's cost. The reason you get a bad plan is that this rowcount estimate is so far off: >-> Index Scan using pk_table_a on table_a ta > (cost=0.00..324786.18 rows=388532 width=16) (actual time=454.389..460.138 > rows=2 loops=1) > Index Cond: (a = $1) Raising the stats target helped no doubt because it didn't overestimate the number of rows so much... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] query io stats and finding a slow query
>>> On Thu, Sep 20, 2007 at 4:36 PM, in message <[EMAIL PROTECTED]>, "Kamen Stanev" <[EMAIL PROTECTED]> wrote: > > Is there a way to find which query is doing large io operations and/or which > is using cached data and which is reading from disk. A big part of your cache is normally in the OS, which makes that tough. > please share your experience on how do you decide which > queries to optimize and how to reorganize your database? We base this on two things -- query metrics from our application framework and user complaints about performance. > Is there any tools that you use to profile your database. Many people set log_min_duration_statement to get a look at long-running queries. When you identify a problem query, running it with EXPLAIN ANALYZE in front will show you the plan with estimated versus actual counts, costs, and time. This does actually execute the query (unlike EXPLAIN without ANALYZE). -Kevin ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 16:26 +0200, Csaba Nagy wrote: > [snip] > > Ok, I was not able to follow your explanation, it's too deep for me into > what the planner does... I'm thinking that this case is too narrow to do too much with, when I think about how we might do what I proposed. OTOH this isn't the first bad plan we've had because we used the index for ordering. There might be some common link that we can improve upon. > > Incidentally, the way out of this is to improve the stats by setting > > stats target = 1000 on column a of ta. That will allow the optimizer to > > have a better estimate of the tail of the distribution of a, which > > should then be more sensibly reflected in the cost of the Index Scan. > > That doesn't solve the actual problem, but should help in your case. > > OK, I can confirm that. I set the statistics target for column "a" on > table_a to 1000, analyzed, and got the plan below. The only downside is > that analyze became quite expensive on table_a, it took 15 minutes and > touched half of the pages... I will experiment with lower settings, > maybe it will work with less than 1000 too. Well, we know there are ways of optimizing ANALYZE. ISTM we should be able to auto-select stats target based upon the shape of the frequency distribution of the column values. We'd need to make some calculations about the index cost model, but its probably worth it for the future. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Low CPU Usage
I'm doing several tests. Right now I did a VACUUM FULL ANALYZE in both servers. In the old one vacuum runs for about 354 seconds and in the new one 59 seconds. Then I have ran EXPLAIN ANALYZE SELECT * FROM fact_ven_renta fvr, dim_producto_std_producto dpp WHERE fvr.producto_std_producto_sk = dpp.producto_sk I have found that the plans aren't exactly the same. This is the plan for the old server: Hash Join (cost=449.55..8879.24 rows=136316 width=904) (actual time=50.734..1632.491 rows=136316 loops=1) Hash Cond: (fvr.producto_std_producto_sk = dpp.producto_sk) -> Seq Scan on fact_ven_renta fvr (cost=0.00..6044.16 rows=136316 width=228) (actual time=0.029..452.716 rows=136316 loops=1) -> Hash (cost=403.69..403.69 rows=3669 width=676) (actual time=50.582..50.582 rows=3669 loops=1) -> Seq Scan on dim_producto_std_producto dpp (cost=0.00..403.69 rows=3669 width=676) (actual time=0.023..19.776 rows=3669 loops=1) Total runtime: 2022.293 ms And this is the plan for the new server: Hash Join (cost=412.86..9524.13 rows=136316 width=905) (actual time=9.421..506.376 rows=136316 loops=1) Hash Cond: ("outer".producto_std_producto_sk = "inner".producto_sk) -> Seq Scan on fact_ven_renta fvr (cost=0.00..6044.16 rows=136316 width=228) (actual time=0.006..107.318 rows=136316 loops=1) -> Hash (cost=403.69..403.69 rows=3669 width=677) (actual time=9.385..9.385 rows=3669 loops=1) -> Seq Scan on dim_producto_std_producto dpp (cost=0.00..403.69 rows=3669 width=677) (actual time=0.003..3.157 rows=3669 loops=1) Total runtime: 553.619 ms I see an "outer" join in the plan for the new server. This is weird!!! There are the same databases in both servers. The old one runs this query for about 37 seconds and for the new one for about 301 seconds. Why are plans different? May the backup recovery process have had an error in the new server when restoring? I appreciate some help. Regards Agustin - Mensaje original De: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Para: pgsql-performance@postgresql.org Enviado: miércoles 19 de septiembre de 2007, 14:38:13 Asunto: [PERFORM] Low CPU Usage Hi all. Recently I have installed a brand new server with a Pentium IV 3.2 GHz, SATA Disk, 2GB of Ram in Debian 4.0r1 with PostgreSQL 8.2.4 (previously a 8.1.9). I have other similar server with an IDE disk, Red Hat EL 4 and PostgreSQL 8.2.3 I have almost the same postgresql.conf in both servers, but in the new one (I have more work_mem than the other one) things go really slow. I began to monitor i/o disk and it's really ok, I have test disk with hdparm and it's 5 times faster than the IDE one. Running the same queries in both servers in the new one it envolves almost 4 minutes instead of 18 seconds in the old one. Both databases are the same, I have vacuum them and I don't know how to manage this issue. The only weird thing is than in the older server running the query it uses 30% of CPU instead of 3 o 5 % of the new one!!! What's is happening with this server? I upgrade from 8.1.9 to 8.2.4 trying to solve this issue but I can't find a solution. Any ideas? Regards Agustin El Mundial de Rugby 2007 Las últimas noticias en Yahoo! Deportes: http://ar.sports.yahoo.com/mundialderugby Los referentes más importantes en compra/ venta de autos se juntaron: Demotores y Yahoo! Ahora comprar o vender tu auto es más fácil. Vistá ar.autos.yahoo.com/
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 12:08 -0400, Tom Lane wrote: > Csaba Nagy <[EMAIL PROTECTED]> writes: > > Looking at Plan 2, it looks like the "limit" step is estimating wrongly > > it's cost. > > The reason you get a bad plan is that this rowcount estimate is so far > off: That's true, but its not relevant, since the query would still be fast even if that estimate was exactly right. With LIMIT 10, it wouldn't matter how many rows were there as long as there were more than 10. The true execution cost is limited, the cost model is not. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Searching for the cause of a bad plan
Simon Riggs <[EMAIL PROTECTED]> writes: > On Fri, 2007-09-21 at 12:08 -0400, Tom Lane wrote: >> The reason you get a bad plan is that this rowcount estimate is so far >> off: > That's true, but its not relevant, Yes it is --- the reason it wants to use a hashjoin instead of a nestloop is exactly that it thinks the loop would iterate too many times. (Ten is already too many in this case --- if it had estimated five rows out of the join, it'd have gone with the nestloop, since the cost estimate difference at the top level is less than 2x.) regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query planner unaware of possibly best plan
On Fri, 2007-09-21 at 17:36 +0200, Denes Daniel wrote: > Even though I forced the nested loop plan using both indexes (that > returns the rows in the correct order), there is a needless sort step on > the top, consuming half of the time even on such small tables. > Now it's clear why the planner did not choose this plan, why I had to > force it: because it isn't the best if the sort is still there. Ordering by parent, child is fairly common but the variation you've got here isn't that common. You'd need to make a case considering all the alternatives; nobody will agree without a balanced case that includes what is best for everyone. Your EXPLAIN looks edited. Have you also edited the sort costs? They look slightly higher than we might expect. Please provide the full normal EXPLAIN output. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Low CPU Usage
>>> On Fri, Sep 21, 2007 at 12:30 PM, in message <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]> wrote: > This is the plan for the old server: > Hash Join (cost=449.55..8879.24 rows=136316 width=904) (actual > time=50.734..1632.491 rows=136316 loops=1) . . . > Total runtime: 2022.293 ms > And this is the plan for the new server: > Hash Join (cost=412.86..9524.13 rows=136316 width=905) (actual > time=9.421..506.376 rows=136316 loops=1) . . . > Total runtime: 553.619 ms > I see an "outer" join in the plan for the new server. This is weird!!! There > are the same databases in both servers. That's just a matter of labeling the tables with role rather than alias. The plans look the same to me. > The old one runs this query for about 37 seconds and for the new one for > about 301 seconds. That's not what it looks like based on the EXPLAIN ANALYZE output. It looks like run time dropped from two seconds to half a second. It seems as though you either have a network delay delivering the results, or your application is slow to read them. Exactly how are you arriving at those timings you're reporting to us? -Kevin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Low CPU Usage
I forgot to tell this plan was with Postgres 8.1.9 in the new server with postgres 8.2.4 in the new server the plan is the same as with te old one (the little difference in rows retrieved is that the database is yesterday snapshot). This is the plan for the new server with postgres 8.2.4: Hash Join (cost=449.55..8846.67 rows=135786 width=904) (actual time=10.823..467.746 rows=135786 loops=1) Hash Cond: (fvr.producto_std_producto_sk = dpp.producto_sk) -> Seq Scan on fact_ven_renta fvr (cost=0.00..6020.86 rows=135786 width=228) (actual time=0.007..81.268 rows=135786 loops=1) -> Hash (cost=403.69..403.69 rows=3669 width=676) (actual time=10.733..10.733 rows=3669 loops=1) -> Seq Scan on dim_producto_std_producto dpp (cost=0.00..403.69 rows=3669 width=676) (actual time=0.004..2.995 rows=3669 loops=1) Total runtime: 513.747 ms This query is running for about 200 seconds, doing dstat I don't see anything weird (regards to low cpu usage 2% or 3%) and normal i/o. In the old server I have 30% of cpu usage an high i/o and run faster!!! This is really weird. - Mensaje original De: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Para: pgsql-performance@postgresql.org Enviado: viernes 21 de septiembre de 2007, 14:30:45 Asunto: Re: [PERFORM] Low CPU Usage I'm doing several tests. Right now I did a VACUUM FULL ANALYZE in both servers. In the old one vacuum runs for about 354 seconds and in the new one 59 seconds. Then I have ran EXPLAIN ANALYZE SELECT * FROM fact_ven_renta fvr, dim_producto_std_producto dpp WHERE fvr.producto_std_producto_sk = dpp.producto_sk I have found that the plans aren't exactly the same. This is the plan for the old server: Hash Join (cost=449.55..8879.24 rows=136316 width=904) (actual time=50.734..1632.491 rows=136316 loops=1) Hash Cond: (fvr.producto_std_producto_sk = dpp.producto_sk) -> Seq Scan on fact_ven_renta fvr (cost=0.00..6044.16 rows=136316 width=228) (actual time=0.029..452.716 rows=136316 loops=1) -> Hash (cost=403.69..403.69 rows=3669 width=676) (actual time=50.582..50.582 rows=3669 loops=1) -> Seq Scan on dim_producto_std_producto dpp (cost=0.00..403.69 rows=3669 width=676) (actual time=0.023..19.776 rows=3669 loops=1) Total runtime: 2022.293 ms And this is the plan for the new server: Hash Join (cost=412.86..9524.13 rows=136316 width=905) (actual time=9.421..506.376 rows=136316 loops=1) Hash Cond: ("outer".producto_std_producto_sk = "inner".producto_sk) -> Seq Scan on fact_ven_renta fvr (cost=0.00..6044.16 rows=136316 width=228) (actual time=0.006..107.318 rows=136316 loops=1) -> Hash (cost=403.69..403.69 rows=3669 width=677) (actual time=9.385..9.385 rows=3669 loops=1) -> Seq Scan on dim_producto_std_producto dpp (cost=0.00..403.69 rows=3669 width=677) (actual time=0.003..3.157 rows=3669 loops=1) Total runtime: 553.619 ms I see an "outer" join in the plan for the new server. This is weird!!! There are the same databases in both servers. The old one runs this query for about 37 seconds and for the new one for about 301 seconds. Why are plans different? May the backup recovery process have had an error in the new server when restoring? I appreciate some help. Regards Agustin - Mensaje original De: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Para: pgsql-performance@postgresql.org Enviado: miércoles 19 de septiembre de 2007, 14:38:13 Asunto: [PERFORM] Low CPU Usage Hi all. Recently I have installed a brand new server with a Pentium IV 3.2 GHz, SATA Disk, 2GB of Ram in Debian 4.0r1 with PostgreSQL 8.2.4 (previously a 8.1.9). I have other similar server with an IDE disk, Red Hat EL 4 and PostgreSQL 8.2.3 I have almost the same postgresql.conf in both servers, but in the new one (I have more work_mem than the other one) things go really slow. I began to monitor i/o disk and it's really ok, I have test disk with hdparm and it's 5 times faster than the IDE one. Running the same queries in both servers in the new one it envolves almost 4 minutes instead of 18 seconds in the old one. Both databases are the same, I have vacuum them and I don't know how to manage this issue. The only weird thing is than in the older server running the query it uses 30% of CPU instead of 3 o 5 % of the new one!!! What's is happening with this server? I upgrade from 8.1.9 to 8.2.4 trying to solve this issue but I can't find a solution. Any ideas? Regards Agustin El Mundial de Rugby 2007 Las últimas noticias en Yahoo! Deportes: http://ar.sports.yahoo.com/mundialderugby Los referentes más importantes en compra/venta de autos se juntaron: Demotores y Yahoo!. Ahora comprar o vender tu auto es más fácil. Visitá http://ar.autos.yahoo.com/ Los referentes más importantes en compra/ venta de autos se juntaron: Demotores y Yahoo! Ahora comprar o vender tu auto es más fácil. Vistá ar.autos.yahoo.com/
Re: [PERFORM] Low CPU Usage
> That's not what it looks like based on the EXPLAIN ANALYZE output. > It looks like run time dropped from two seconds to half a second. > It seems as though you either have a network delay delivering the results, > or your application is slow to read them. > Exactly how are you arriving at those timings you're reporting to us? I have noticed this in a daly process I run which involves normally 45 minutes and with the new server takes 1:40. Some days ago I began to do some tests with no success, then I opened PgAdmin with this simply query to read 2 big tables and then compare disk access. SELECT * FROM fact_ven_renta fvr, dim_producto_std_producto dpp WHERE fvr.producto_std_producto_sk = dpp.producto_sk fact_ven_renta has 136316 rows dim_producto_std_producto has 3669 rows I have made all possible combinations pgadmin (running in the same server each query, in the old one, in the new one), without difference and I only retrieve the first 100 records (I didn't count the network time in any case). But the weird thing is running the query in the new server the are many disk access and cpu usage. And with other applications in the same server are a lot of disks access. Seguí de cerca a la Selección Argentina de Rugby en el Mundial de Francia 2007. http://ar.sports.yahoo.com/mundialderugby
Re: [PERFORM] Low CPU Usage
In response to [EMAIL PROTECTED]: > > That's not what it looks like based on the EXPLAIN ANALYZE output. > > It looks like run time dropped from two seconds to half a second. > > > It seems as though you either have a network delay delivering the results, > > or your application is slow to read them. > > > Exactly how are you arriving at those timings you're reporting to us? > > I have noticed this in a daly process I run which involves normally 45 > minutes and with the new server takes 1:40. > > Some days ago I began to do some tests with no success, then I opened PgAdmin > with this simply query to read 2 big tables and then compare disk access. > SELECT * > FROM fact_ven_renta fvr, dim_producto_std_producto dpp > WHERE > fvr.producto_std_producto_sk = dpp.producto_sk > > fact_ven_renta has 136316 rows > dim_producto_std_producto has 3669 rows Run the tests from psql on the same server. As Kevin pointed out, the _server_ is faster, but it appears as if the connection between PGadmin and this new server is slower somehow. Are you sure of your speed/duplex settings on the network side? That's the most common cause of this kind of thing in my experience. Try doing a raw FTP transfer between the client and server and see if you get the speed you should. > > > > I have made all possible combinations pgadmin (running in the same server > each query, in the old one, in the new one), without difference and I only > retrieve the first 100 records (I didn't count the network time in any case). > But the weird thing is running the query in the new server the are many disk > access and cpu usage. And with other applications in the same server are a > lot of disks access. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 13:53 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Fri, 2007-09-21 at 12:08 -0400, Tom Lane wrote: > >> The reason you get a bad plan is that this rowcount estimate is so far > >> off: > > > That's true, but its not relevant, > > Yes it is --- the reason it wants to use a hashjoin instead of a > nestloop is exactly that it thinks the loop would iterate too many > times. (Ten is already too many in this case --- if it had estimated > five rows out of the join, it'd have gone with the nestloop, since > the cost estimate difference at the top level is less than 2x.) That's not my perspective. If the LIMIT had been applied accurately to the cost then the hashjoin would never even have been close to the nested join in the first place. It's just chance that the frequency distribution is favourable to us and thus amenable to using the hint of improving stats_target. The improved knowledge of the distribution just hides the fact that the cost model is still wrong: a cost of 45000 per row shows this. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Low CPU Usage
>> > That's not what it looks like based on the EXPLAIN ANALYZE output. >> > It looks like run time dropped from two seconds to half a second. >> >> > It seems as though you either have a network delay delivering the results, >> > or your application is slow to read them. >> >> > Exactly how are you arriving at those timings you're reporting to us? >> >> I have noticed this in a daly process I run which involves normally 45 >> minutes and with the new server takes 1:40. >> >> Some days ago I began to do some tests with no success, then I opened >> PgAdmin with this simply query to read 2 big tables and then compare disk >> access. >> SELECT * >> FROM fact_ven_renta fvr, dim_producto_std_producto dpp >> WHERE >> fvr.producto_std_producto_sk = dpp.producto_sk >> >> fact_ven_renta has 136316 rows >> dim_producto_std_producto has 3669 rows >Run the tests from psql on the same server. As Kevin pointed out, the >_server_ is faster, but it appears as if the connection between PGadmin and >this new server is slower somehow. It runs quickly!!! But I don't know how to compare because looks like it retrieve fields by demand, when I put ctrl+end (go to the last record) it use a lot of CPU and disk, run quickly anyway. Correct me if am I wrong but, executing PgAdmin in the same server there aren't networks delays! And when the server is processing the query there isn't network traffic because is processing the result. > Are you sure of your speed/duplex settings on the network side? That's > the most common cause of this kind of thing in my experience. Try doing > a raw FTP transfer between the client and server and see if you get the > speed you should. This isn't a dedicated database server, client application and server are running in the same machine!!! I have stop the client application too with same results. Anyway I will do some network test to find a solution. Seguí de cerca a la Selección Argentina de Rugby en el Mundial de Francia 2007. http://ar.sports.yahoo.com/mundialderugby
Re: [PERFORM] Low CPU Usage
In response to [EMAIL PROTECTED]: > >> > That's not what it looks like based on the EXPLAIN ANALYZE output. > >> > It looks like run time dropped from two seconds to half a second. > >> > >> > It seems as though you either have a network delay delivering the > >> > results, > >> > or your application is slow to read them. > >> > >> > Exactly how are you arriving at those timings you're reporting to us? > >> > >> I have noticed this in a daly process I run which involves normally 45 > >> minutes and with the new server takes 1:40. > >> > >> Some days ago I began to do some tests with no success, then I opened > >> PgAdmin with this simply query to read 2 big tables and then compare disk > >> access. > >> SELECT * > >> FROM fact_ven_renta fvr, dim_producto_std_producto dpp > >> WHERE > >> fvr.producto_std_producto_sk = dpp.producto_sk > >> > >> fact_ven_renta has 136316 rows > >> dim_producto_std_producto has 3669 rows > > >Run the tests from psql on the same server. As Kevin pointed out, the > >_server_ is faster, but it appears as if the connection between PGadmin and > >this new server is slower somehow. > > It runs quickly!!! But I don't know how to compare because looks like it > retrieve fields by demand, when I put ctrl+end (go to the last record) it use > a lot of CPU and disk, run quickly anyway. That's pretty odd. If you use \timing in psql, you can get execution time for each query, if it helps you track things down. > Correct me if am I wrong but, executing PgAdmin in the same server there > aren't networks delays! Not network, no. But the results of your explains seem to show that the query is executing much faster on the new system than the old, so the problem still becomes, "what is happening after the query completes that is so slow?" It's just that networking is ruled out. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Linux mis-reporting memory
On Thu, 20 Sep 2007, Decibel! wrote: I'm finding this rather interesting report from top on a Debian box... how is it that linux thinks that 30G is cached? top on Linux gives weird results when faced with situations where there's shared memory involved. I look at /proc/meminfo and run ipcs when I want a better idea what's going on. As good of an article on this topic as I've found is http://gentoo-wiki.com/FAQ_Linux_Memory_Management which recommends using free to clarify how big the disk cache really is. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query planner unaware of possibly best plan
Simon Riggs <[EMAIL PROTECTED]> írta: > Ordering by parent, child is fairly common but the variation you've got > here isn't that common. You'd need to make a case considering all the > alternatives; nobody will agree without a balanced case that includes > what is best for everyone. > > Your EXPLAIN looks edited. Have you also edited the sort costs? They > look slightly higher than we might expect. Please provide the full > normal EXPLAIN output. > > -- > Simon Riggs > 2ndQuadrant http://www.2ndQuadrant.com I've just inserted some newlines, so it's better to read than when my email-client wraps the lines automatically. Did not touch the information itself. But here is the normal output of EXPLAIN ANALYZE: EXPLAIN ANALYZE SELECT * FROM tparent JOIN tchild ON tchild.par_id = tparent.id WHERE tparent.ord BETWEEN 1 AND 4 ORDER BY tparent.ord, tchild.ord; QUERY PLAN --- Sort (cost=10132.10..10140.10 rows=8 width=16) (actual time=0.302..0.319 rows=9 loops=1) Sort Key: tparent.ord, tchild.ord -> Nested Loop (cost=0.00..84.10 rows=8 width=16) (actual time=0.181..0.267 rows=9 loops=1) -> Index Scan using par_uniq_ord on tparent (cost=0.00..20.40 rows=4 width=8) (actual time=0.100..0.109 rows=4 loops=1) Index Cond: ((ord >= 1) AND (ord <= 4)) -> Index Scan using chi_pkey_parid_ord on tchild (cost=0.00..9.93 rows=2 width=8) (actual time=0.020..0.026 rows=2 loops=4) Index Cond: (tchild.par_id = "outer".id) Total runtime: 0.412 ms (8 rows) The costs may be different because I've tuned the query planner's parameters. > Ordering by parent, child is fairly common but the variation you've got > here isn't that common. How else can you order by parent, child other than first ordering by a unique key of parent, then something in child? (Except for child.parent_id, child.something because this has all the information in child and can rely on a single multicolumn index.) Denes Daniel Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en ___ www.t-mobile.hu/mobizin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Low CPU Usage
> > >> > That's not what it looks like based on the EXPLAIN ANALYZE output. > > >> > It looks like run time dropped from two seconds to half a second. > > >> > > >> > It seems as though you either have a network delay delivering the > > >> > results, > > >> > or your application is slow to read them. > > >> > > >> > Exactly how are you arriving at those timings you're reporting to us? > > >> > > >> I have noticed this in a daly process I run which involves normally 45 > > >> minutes and with the new server takes 1:40. > > >> > > >> Some days ago I began to do some tests with no success, then I opened > > >> PgAdmin with this simply query to read 2 big tables and then compare > > >> disk access. > > >> SELECT * > > >> FROM fact_ven_renta fvr, dim_producto_std_producto dpp > > >> WHERE > > >> fvr.producto_std_producto_sk = dpp.producto_sk > > >> > > >> fact_ven_renta has 136316 rows > > >> dim_producto_std_producto has 3669 rows >> > > >Run the tests from psql on the same server. As Kevin pointed out, the > > >_server_ is faster, but it appears as if the connection between PGadmin > > >and this new server is slower somehow. > > > > It runs quickly!!! But I don't know how to compare because looks like it > > retrieve fields by demand, when I put ctrl+end (go to the last record) it > > use a lot of CPU and disk, run quickly anyway. > That's pretty odd. If you use \timing in psql, you can get execution > time for each query, if it helps you track things down. Yes, in the new server running with \timing it consumes 5.6 seconds and in the old server it consumes 25 seconds. > > Correct me if am I wrong but, executing PgAdmin in the same server there > > aren't networks delays! > Not network, no. But the results of your explains seem to show that the > query is executing much faster on the new system than the old, so the > problem still becomes, "what is happening after the query completes that > is so slow?" It's just that networking is ruled out. Is connected to full 100Mb, it transfers many things quick to clients. Is running Apache adn JBoss, transfer rate is good, I did scp to copy many archives and is as quick as the old server. I have no idea how to continue researching this problem. Now I'm going to do some networks tests. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 Las últimas noticias sobre el Mundial de Rugby 2007 están en Yahoo! Deportes. ¡Conocelas! http://ar.sports.yahoo.com/mundialderugby
Re: [PERFORM] Query planner unaware of possibly best plan
Denes Daniel wrote: > I've just inserted some newlines, so it's better to read than when my > email-client wraps the lines automatically. Did not touch the information > itself. But here is the normal output of EXPLAIN ANALYZE: The best thing to do is paste them in a text file and send it as an attachment. > QUERY PLAN > --- > > Sort (cost=10132.10..10140.10 rows=8 width=16) (actual > time=0.302..0.319 rows=9 loops=1) >Sort Key: tparent.ord, tchild.ord >-> Nested Loop (cost=0.00..84.10 rows=8 width=16) (actual > time=0.181..0.267 rows=9 loops=1) > -> Index Scan using par_uniq_ord on tparent (cost=0.00..20.40 > rows=4 width=8) (actual time=0.100..0.109 rows=4 loops=1) >Index Cond: ((ord >= 1) AND (ord <= 4)) > -> Index Scan using chi_pkey_parid_ord on tchild > (cost=0.00..9.93 rows=2 width=8) (actual time=0.020..0.026 rows=2 > loops=4) >Index Cond: (tchild.par_id = "outer".id) > Total runtime: 0.412 ms > (8 rows) > > The costs may be different because I've tuned the query planner's > parameters. Why did you set enable_sort=off? It's not like sorting 9 rows is going to take any noticeable amount of time anyway. -- Alvaro Herrerahttp://www.advogato.org/person/alvherre "No hay hombre que no aspire a la plenitud, es decir, la suma de experiencias de que un hombre es capaz" ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query planner unaware of possibly best plan
In reply to Alvaro Herrera: > The best thing to do is paste them in a text file and send it as an > attachment. Okay, it's attached. > Why did you set enable_sort=off? It's not like sorting 9 rows is going > to take any noticeable amount of time anyway. Of course it's no problem for 9 rows, but this is only a test case. In production there will be much more. I just wanted to show that the planner doesn't even consider a plan without a sort step, using purely index scans. Denes Daniel Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en ___ www.t-mobile.hu/mobizin EXPLAIN ANALYZE SELECT * FROM tparent JOIN tchild ON tchild.par_id = tparent.id WHERE tparent.ord BETWEEN 1 AND 4 ORDER BY tparent.ord, tchild.ord; QUERY PLAN --- Sort (cost=10132.10..10140.10 rows=8 width=16) (actual time=0.302..0.319 rows=9 loops=1) Sort Key: tparent.ord, tchild.ord -> Nested Loop (cost=0.00..84.10 rows=8 width=16) (actual time=0.181..0.267 rows=9 loops=1) -> Index Scan using par_uniq_ord on tparent (cost=0.00..20.40 rows=4 width=8) (actual time=0.100..0.109 rows=4 loops=1) Index Cond: ((ord >= 1) AND (ord <= 4)) -> Index Scan using chi_pkey_parid_ord on tchild (cost=0.00..9.93 rows=2 width=8) (actual time=0.020..0.026 rows=2 loops=4) Index Cond: (tchild.par_id = "outer".id) Total runtime: 0.412 ms (8 rows) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Low CPU Usage
>From: [EMAIL PROTECTED] >Subject: Re: [PERFORM] Low CPU Usage > >I have no idea how to continue researching this problem. Now I'm going to do some networks tests. I would go back to the slow program and try to capture the slow queries in the log file. Once you have some queries which are running slow then you can run EXPLAIN ANALYZE to see what the bottle neck is. It seems like you've found pgAdmin is slow sending across the network, but we don't know if that has anything to do with your original problems. Just my 2 cents. Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Low CPU Usage
[EMAIL PROTECTED] wrote: > That's pretty odd. If you use \timing in psql, you can get execution > time for each query, if it helps you track things down. Yes, in the new server running with \timing it consumes 5.6 seconds and in the old server it consumes 25 seconds. > > Correct me if am I wrong but, executing PgAdmin in the same server there aren't networks delays! > Not network, no. But the results of your explains seem to show that the > query is executing much faster on the new system than the old, so the > problem still becomes, "what is happening after the query completes that > is so slow?" It's just that networking is ruled out. Is connected to full 100Mb, it transfers many things quick to clients. Is running Apache adn JBoss, transfer rate is good, I did scp to copy many archives and is as quick as the old server. I have no idea how to continue researching this problem. Now I'm going to do some networks tests. See if this can give some help to you: I was experienced some problems with networks with win98 and winXP stations, the application was running with good performance almost of the time, but in suddenly the performance slow down. We noticed that the problem was with the time to connect with the server, that was very slow. The problem occurs also when the internet link down. Well, I don't know why but when we exclude win98 stations from network, the problem disappears. I think that was some DNS problem (but not sure), because one time we cleared nameserver clauses in the /etc/resolv.conf the performance return to the normal. But we reinstalled win98 machines with winXP too, so I don't know what happened exactly. The server OS was a Mandriva Linux running postgres ( 8.0, I guess) and samba. Workstations connect via odbc (informing the IP of server or the name to connect the problem persists). -- Luiz K. Matsumura Plan IT Tecnologia Informática Ltda. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Low CPU Usage
Luiz K. Matsumura wrote: Is connected to full 100Mb, it transfers many things quick to clients. Is running Apache adn JBoss, transfer rate is good, I did scp to copy many archives and is as quick as the old server. I have no idea how to continue researching this problem. Now I'm going to do some networks tests. Any chance this is your desktop machine, and you're also using it for audio? Microsoft built in a feature (!) that reduces network speed by 90% when music is playing: http://it.slashdot.org/article.pl?sid=07/08/26/1628200&from=rss Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Low CPU Usage
>>> On Fri, Sep 21, 2007 at 3:40 PM, in message <[EMAIL PROTECTED]>, "Luiz K. Matsumura" <[EMAIL PROTECTED]> wrote: > but in suddenly the performance slow down. We noticed that the problem > was with the time to connect with the server, that was very slow. > I think that was some DNS problem (but not sure), because one time we > cleared nameserver clauses in the /etc/resolv.conf the performance > return to the normal. You may have it there. In some versions of Java, on Windows, connection times are horribly slow unless the machine's IP address has a reverse DNS entry. Perhaps the new machine lacks such an entry, or there's a different version of Java in use? -Kevin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Low CPU Usage
It's a Debian 4.0r1 server without sound (alsa is disabled), I'm running the querys locally. It's happening running the query locally with PgAdmin and by jdbc locally too. Yes I have win98, XP machines on my network, I will unplugged from the net and test again. On monday I'll give you my answer. Last thing I did was disabling ipv6 and with the same results. Thank you very much for your help. - Mensaje original De: Craig James <[EMAIL PROTECTED]> Para: Luiz K. Matsumura <[EMAIL PROTECTED]> CC: [EMAIL PROTECTED]; pgsql-performance@postgresql.org Enviado: viernes 21 de septiembre de 2007, 18:15:40 Asunto: Re: [PERFORM] Low CPU Usage Luiz K. Matsumura wrote: >> Is connected to full 100Mb, it transfers many things quick to clients. >> Is running Apache adn JBoss, transfer rate is good, I did scp to copy >> many archives and is as quick as the old server. >> >> I have no idea how to continue researching this problem. Now I'm going >> to do some networks tests. Any chance this is your desktop machine, and you're also using it for audio? Microsoft built in a feature (!) that reduces network speed by 90% when music is playing: http://it.slashdot.org/article.pl?sid=07/08/26/1628200&from=rss Craig Las últimas noticias sobre el Mundial de Rugby 2007 están en Yahoo! Deportes. ¡Conocelas! http://ar.sports.yahoo.com/mundialderugby
Re: [PERFORM] Searching for the cause of a bad plan
Simon Riggs <[EMAIL PROTECTED]> writes: > That's not my perspective. If the LIMIT had been applied accurately to > the cost then the hashjoin would never even have been close to the > nested join in the first place. [ shrug... ] Your perspective is mistaken. There is nothing wrong with the way the LIMIT estimation is being done. The plan in question was Limit (cost=0.00..498511.80 rows=10 width=1804) (actual time=17.729..21.672 rows=2 loops=1) -> Nested Loop (cost=0.00..1794642.48 rows=36 width=1804) (actual time=17.729..21.671 rows=2 loops=1) -> Index Scan using pk_table_a on table_a ta (cost=0.00..324880.88 rows=388638 width=16) (actual time=0.146..0.198 rows=2 loops=1) Index Cond: (a = $1) -> Index Scan using pk_table_b2 on table_b2 tb (cost=0.00..3.77 rows=1 width=1788) (actual time=10.729..10.731 rows=1 loops=2) Index Cond: (ta.b = tb.b) Total runtime: 21.876 ms and there are two fairly serious estimation errors here, neither related at all to the LIMIT: * five-orders-of-magnitude overestimate of the number of table_a rows that will match the condition on a; * enormous underestimate of the number of join rows --- it's apparently thinking only 0.0001 of the table_a rows will have a join partner, whereas at least for this case they all do. Had the latter estimate been right, the cost of pulling results this way would indeed have been something like 50K units per joined row, because of the large number of inner index probes per successful join. It might be interesting to look into why those estimates are so far off; the stats Csaba displayed don't seem to have any obvious oddity that would justify such bizarre results. But the LIMIT has got nothing to do with this. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Query planner unaware of possibly best plan
Simon Riggs <[EMAIL PROTECTED]> wrote: > On Fri, 2007-09-21 at 21:20 +0200, Dániel Dénes wrote: > > > The costs may be different because I've tuned the query planner's > > parameters. > > OK, understood. > > > > Ordering by parent, child is fairly common but the variation you've > > > got here isn't that common. > > How else can you order by parent, child other than first ordering by > > a unique key of parent, then something in child? (Except for > > child.parent_id, child.something because this has all the > > information in child and can rely on a single multicolumn index.) > > Why "except"? Whats wrong with ordering that way? > > Make the case. **I** want it is not sufficient... > > -- > Simon Riggs > 2ndQuadrant http://www.2ndQuadrant.com In reply to Simon Riggs <[EMAIL PROTECTED]>: > > How else can you order by parent, child other than first ordering by > > a unique key of parent, then something in child? (Except for > > child.parent_id, child.something because this has all the > > information in child and can rely on a single multicolumn index.) > > Why "except"? Whats wrong with ordering that way? Well, nothing, but what if I have to order by some other unique key? Of course I could do that by redundantly storing the parent's data in child and then creating a multicolumn index, but... Just to see clear: when I found this, I was trying to make a slightly different query. It was like: SELECT * FROM tparent JOIN tchild ON tchild.par_id = tparent.id WHERE tparent.uniqcol1 = 123 ORDER BY tparent.uniqcol2, tchild.ord; where there was a unique index on (tparent.uniqcol1, tparent.uniqcol2) and the columns are marked NOT NULL. I expected a plan like doing an index scan on parent.uniqcol2 where uniqcol1 = 123, and (using a nestloop and child's pkey) joining in the children in the correct order (without a sort). But I got something else, so I tried everything to get what I wanted -- just to see the costs why the planner chose something else. After some time I found out that there is no such plan, so no matter what I do it will sort... So that's how I got here. But since the original problem isn't that clean & simple, I thought I'd make a test case, that's easy to follow, and illustrates the problem: that the planner doesn't even consider my plan. If it did, I think that'd be the one that gets executed. But tell me if I'm wrong somewhere. > Make the case. **I** want it is not sufficient... Sorry, I can't understand that... I'm far from perfect in english. Please clarify so I can do what you ask me to. Denes Daniel - Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en ___ www.t-mobile.hu/mobizin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query planner unaware of possibly best plan
Denes Daniel <[EMAIL PROTECTED]> writes: > Simon Riggs <[EMAIL PROTECTED]> wrote: >> Make the case. **I** want it is not sufficient... > Sorry, I can't understand that... I'm far from perfect in english. The point here is that you've repeated the same example N times without actually making a case that it's interesting to support. We have to think about the intellectual complexity that would be added to the planner to support this case, and the cycles that would be expended on every query (and wasted, for most queries) on trying to detect whether the case applies. If it were simple and cheap to do, these arguments wouldn't hold much weight, but it doesn't look to me like either is the case. Another problem is that it's not clear there's much to be gained. Avoiding the sort step is only interesting if the query produces so many rows that a sort would be expensive ... but if that's the case, it seems unlikely that a nestloop indexscan plan would be the best choice anyway. So basically this looks like a lot of work for a narrow and questionable gain. If you want it to happen you need to convince people that it's easier and more useful than it looks. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Low CPU Usage
On Thu, 20 Sep 2007, [EMAIL PROTECTED] wrote: Which other test can I do to find if this is a hardware, kernel o postgres issue? The little test hdparm does is not exactly a robust hard drive benchmark. If you want to rule out hard drive transfer speed issues, take at look at the tests suggested at http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm and see how your results compare to the single SATA disk example I give there. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate