Re: [PERFORM] Limit + group + join
Tobias wrote: Splendid :-) Unfortunately we will not be upgrading for some monthes still, but anyway I'm happy. This provides yet another good argument for upgrading sooner. I'm also happy to see such a perfect match: - A problem that can be reduced from beeing complex and production-specific, to simple and easily reproducible. - Enthusiastic people testing it and pinpointing even more precisely what conditions will cause the condition - Programmers actually fixing the issue - Testers verifying that it was fixed Long live postgresql! :-) In the last three or so years since I've been really active with postgresql, I've found two or three issues/bugs which I was able to reproduce and reduce to a test case. In all instances the fix was in cvs literally within minutes. Merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Limit + group + join
Mark Kirkwood [EMAIL PROTECTED] writes: joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id DESC LIMIT 5; [ fails to pick an available index-scan-backward plan ] I looked into this and found that indeed the desirable join plan was getting generated, but it wasn't picked because query_planner didn't have an accurate idea of how much of the join needed to be scanned to satisfy the GROUP BY step. I've committed some changes that hopefully will let 8.1 be smarter about GROUP BY ... LIMIT queries. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Limit + group + join
Tom Lane wrote: I looked into this and found that indeed the desirable join plan was getting generated, but it wasn't picked because query_planner didn't have an accurate idea of how much of the join needed to be scanned to satisfy the GROUP BY step. I've committed some changes that hopefully will let 8.1 be smarter about GROUP BY ... LIMIT queries. Very nice :-) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id DESC LIMIT 5; QUERY PLAN -- Limit (cost=0.00..15.23 rows=5 width=4) - Group (cost=0.00..243730.00 rows=8 width=4) - Nested Loop (cost=0.00..243530.00 rows=8 width=4) - Index Scan Backward using c_pkey on c (cost=0.00..1450.00 rows=8 width=4) - Index Scan using b_on_c on b (cost=0.00..3.01 rows=1 width=4) Index Cond: (b.c_id = outer.id) (6 rows) This is 8.1devel from today. regards Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Limit + group + join
[Tom Lane] I looked into this and (...) I've committed some changes that hopefully will let 8.1 be smarter about GROUP BY ... LIMIT queries. [Mark Kirkwood] Very nice :-) (...) This is 8.1devel from today. Splendid :-) Unfortunately we will not be upgrading for some monthes still, but anyway I'm happy. This provides yet another good argument for upgrading sooner. I'm also happy to see such a perfect match: - A problem that can be reduced from beeing complex and production-specific, to simple and easily reproducible. - Enthusiastic people testing it and pinpointing even more precisely what conditions will cause the condition - Programmers actually fixing the issue - Testers verifying that it was fixed Long live postgresql! :-) -- Notice of Confidentiality: This email is sent unencrypted over the network, and may be stored on several email servers; it can be read by third parties as easy as a postcard. Do not rely on email for confidential information. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Limit + group + join
Mark Kirkwood [EMAIL PROTECTED] writes: What is interesting is why this plan is being rejected... Which PG version are you using exactly? That mistake looks like an artifact of the 8.0 fuzzy plan cost patch, which we fixed recently: http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy cost issue explains his results. As far as the desc point goes, the problem is that mergejoins aren't capable of dealing with backward sort order, so a merge plan isn't considered for that case (or at least, it would have to have a sort after it, which pretty much defeats the point for a fast-start plan). I have some ideas about fixing this but it won't happen before 8.2. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Limit + group + join
Tom Lane wrote: Mark Kirkwood [EMAIL PROTECTED] writes: What is interesting is why this plan is being rejected... Which PG version are you using exactly? That mistake looks like an artifact of the 8.0 fuzzy plan cost patch, which we fixed recently: http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php Right on - 8.0.3 (I might look at how CVS tip handles this, could be interesting). But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy cost issue explains his results. As far as the desc point goes, the problem is that mergejoins aren't capable of dealing with backward sort order, so a merge plan isn't considered for that case (or at least, it would have to have a sort after it, which pretty much defeats the point for a fast-start plan). I have some ideas about fixing this but it won't happen before 8.2. That doesn't explain why the nested loop is being kicked tho', or have I missed something obvious? - it's been known to happen :-)... Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Limit + group + join
Mark Kirkwood [EMAIL PROTECTED] writes: That doesn't explain why the nested loop is being kicked tho', No, but I think the fuzzy-cost bug does. There are two different issues here. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Limit + group + join
Interestingly enough, 7.4.8 and 8.1devel-2005-08-23 all behave the same as 8.0.3 for me (tables freshly ANALYZEd): joinlimit=# SELECT version(); version - PostgreSQL 7.4.8 on i386-unknown-freebsd5.4, compiled by GCC gcc (GCC) 3.4.2 [FreeBSD] 20040728 (1 row) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id DESC LIMIT 5; QUERY PLAN --- Limit (cost=10591.36..10591.39 rows=5 width=4) - Group (cost=10591.36..10992.02 rows=80131 width=4) - Sort (cost=10591.36..10791.69 rows=80131 width=4) Sort Key: c.id - Merge Join (cost=0.00..4064.66 rows=80131 width=4) Merge Cond: (outer.id = inner.c_id) - Index Scan using c_pkey on c (cost=0.00..1411.31 rows=80131 width=4) - Index Scan using b_on_c on b (cost=0.00..1451.72 rows=80172 width=4) (8 rows) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id LIMIT 5; QUERY PLAN - Limit (cost=0.00..0.27 rows=5 width=4) - Group (cost=0.00..4264.99 rows=80131 width=4) - Merge Join (cost=0.00..4064.66 rows=80131 width=4) Merge Cond: (outer.id = inner.c_id) - Index Scan using c_pkey on c (cost=0.00..1411.31 rows=80131 width=4) - Index Scan using b_on_c on b (cost=0.00..1451.72 rows=80172 width=4) (6 rows) joinlimit=# SELECT version(); version PostgreSQL 8.1devel on i386-unknown-freebsd5.4, compiled by GCC gcc (GCC) 3.4.2 [FreeBSD] 20040728 (1 row) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id DESC LIMIT 5; QUERY PLAN --- Limit (cost=10654.53..10654.55 rows=5 width=4) - Group (cost=10654.53..11054.53 rows=8 width=4) - Sort (cost=10654.53..10854.53 rows=8 width=4) Sort Key: c.id - Merge Join (cost=0.00..4139.44 rows=8 width=4) Merge Cond: (outer.id = inner.c_id) - Index Scan using c_pkey on c (cost=0.00..1450.00 rows=8 width=4) - Index Scan using b_on_c on b (cost=0.00..1490.00 rows=8 width=4) (8 rows) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id LIMIT 5; QUERY PLAN - Limit (cost=0.00..0.27 rows=5 width=4) - Group (cost=0.00..4339.44 rows=8 width=4) - Merge Join (cost=0.00..4139.44 rows=8 width=4) Merge Cond: (outer.id = inner.c_id) - Index Scan using c_pkey on c (cost=0.00..1450.00 rows=8 width=4) - Index Scan using b_on_c on b (cost=0.00..1490.00 rows=8 width=4) (6 rows) The non default server params of relevance are: shared_buffers = 12000 effective_cache_size = 10 work_mem/sort_mem = 20480 I did wonder if the highish sort_mem might be a factor, but no, with it set to 1024 I get the same behaviour (just higher sort cost estimates). Cheers Mark Tom Lane wrote: Which PG version are you using exactly? That mistake looks like an artifact of the 8.0 fuzzy plan cost patch, which we fixed recently: http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy cost issue explains his results. As far as the desc point goes, the problem is that mergejoins aren't capable of dealing with backward sort order, so a merge plan isn't considered for that case (or at least, it would have to have a sort after it, which pretty much defeats the point for a fast-start plan). I have some ideas about fixing this but it won't happen before 8.2. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Limit + group + join
Tom Lane [EMAIL PROTECTED] writes: As far as the desc point goes, the problem is that mergejoins aren't capable of dealing with backward sort order, so a merge plan isn't considered for that case (or at least, it would have to have a sort after it, which pretty much defeats the point for a fast-start plan). I have some ideas about fixing this but it won't happen before 8.2. Of course in this case assuming id is an integer column you can just sort by -id instead. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Limit + group + join
[Jeffrey W. Baker - Thu at 06:56:59PM -0700] explain select c.id from c join b on c_id=c.id group by c.id order by c.id desc limit 5; Where's b in this join clause? join b on c_id=c.id It just a funny way of writing: select c.id from c,b where c_id=c.id group by c.id order by c.id desc limit 5; It looks like a cartesian product to me. No. The query will return exactly the same as the simplest query: select c.id from c order by c.id desc limit 5; As said, this is a gross oversimplification of the production envorinment. In the production environment, I really need to use both join, group and limit. I tested a bit with subqueries, it was not a good solution (selecting really a lot of rows and aggregates from many of the tables). The next idea is to hack it up by manually finding out where the limit will cut, and place a restriction in the where-part of the query. -- Notice of Confidentiality: This information may be confidential, and blah-blah-blah - so please keep your eyes closed. Please delete and destroy this email. Failure to comply will cause my lawyer to yawn. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Limit + group + join
On Thu, 2005-08-25 at 18:56 -0700, Jeffrey W. Baker wrote: On Fri, 2005-08-26 at 02:27 +0200, Tobias Brox wrote: Consider this setup - which is a gross simplification of parts of our production system ;-) create table c (id integer primary key); create table b (id integer primary key, c_id integer); create index b_on_c on b(c_id) insert into c (select ... lots of IDs ...); insert into b (select id, id from c); /* keep it simple :-) */ Now, I'm just interessted in some few rows. All those gives good plans: explain select c.id from c order by c.id limit 1; explain select c.id from c group by c.id order by c.id limit 1; explain select c.id from c join b on c_id=c.id order by c.id limit 1; ... BUT ... combining join, group and limit makes havoc: explain select c.id from c join b on c_id=c.id group by c.id order by c.id desc limit 5; Where's b in this join clause? It looks like a cartesian product to me. Nevermind. I read c_id as c.id. -jwb ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly