Re: [PERFORM] Limit + group + join

2005-08-29 Thread Merlin Moncure
Tobias wrote:
 Splendid :-) Unfortunately we will not be upgrading for some monthes
 still,
 but anyway I'm happy.  This provides yet another good argument for
 upgrading
 sooner.  I'm also happy to see such a perfect match:
 
  - A problem that can be reduced from beeing complex and
production-specific, to simple and easily reproducible.
 
  - Enthusiastic people testing it and pinpointing even more precisely
what
conditions will cause the condition
 
  - Programmers actually fixing the issue
 
  - Testers verifying that it was fixed
 
 Long live postgresql! :-)

In the last three or so years since I've been really active with
postgresql, I've found two or three issues/bugs which I was able to
reproduce and reduce to a test case.  In all instances the fix was in
cvs literally within minutes.

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Limit + group + join

2005-08-27 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
 c.id ORDER BY c.id DESC LIMIT 5;
 [ fails to pick an available index-scan-backward plan ]

I looked into this and found that indeed the desirable join plan was
getting generated, but it wasn't picked because query_planner didn't
have an accurate idea of how much of the join needed to be scanned to
satisfy the GROUP BY step.  I've committed some changes that hopefully
will let 8.1 be smarter about GROUP BY ... LIMIT queries.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Limit + group + join

2005-08-27 Thread Mark Kirkwood

Tom Lane wrote:


I looked into this and found that indeed the desirable join plan was
getting generated, but it wasn't picked because query_planner didn't
have an accurate idea of how much of the join needed to be scanned to
satisfy the GROUP BY step.  I've committed some changes that hopefully
will let 8.1 be smarter about GROUP BY ... LIMIT queries.



Very nice :-)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id DESC LIMIT 5;
QUERY PLAN 


--
 Limit  (cost=0.00..15.23 rows=5 width=4)
   -  Group  (cost=0.00..243730.00 rows=8 width=4)
 -  Nested Loop  (cost=0.00..243530.00 rows=8 width=4)
   -  Index Scan Backward using c_pkey on c 
(cost=0.00..1450.00 rows=8 width=4)
   -  Index Scan using b_on_c on b  (cost=0.00..3.01 
rows=1 width=4)

 Index Cond: (b.c_id = outer.id)
(6 rows)

This is 8.1devel from today.

regards

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Limit + group + join

2005-08-27 Thread Tobias Brox
[Tom Lane]
 I looked into this and (...) I've committed some changes that hopefully will
 let 8.1 be smarter about GROUP BY ... LIMIT queries.

[Mark Kirkwood]
 Very nice :-)
(...)
 This is 8.1devel from today.

Splendid :-) Unfortunately we will not be upgrading for some monthes still,
but anyway I'm happy.  This provides yet another good argument for upgrading
sooner.  I'm also happy to see such a perfect match:

 - A problem that can be reduced from beeing complex and
   production-specific, to simple and easily reproducible.
   
 - Enthusiastic people testing it and pinpointing even more precisely what
   conditions will cause the condition
   
 - Programmers actually fixing the issue
 
 - Testers verifying that it was fixed
 
Long live postgresql! :-) 

-- 
Notice of Confidentiality: This email is sent unencrypted over the network,
and may be stored on several email servers; it can be read by third parties
as easy as a postcard.  Do not rely on email for confidential information.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Limit + group + join

2005-08-26 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 What is interesting is why this plan is being rejected...

Which PG version are you using exactly?  That mistake looks like an
artifact of the 8.0 fuzzy plan cost patch, which we fixed recently:
http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php

But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy
cost issue explains his results.

As far as the desc point goes, the problem is that mergejoins aren't
capable of dealing with backward sort order, so a merge plan isn't
considered for that case (or at least, it would have to have a sort
after it, which pretty much defeats the point for a fast-start plan).
I have some ideas about fixing this but it won't happen before 8.2.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Limit + group + join

2005-08-26 Thread Mark Kirkwood

Tom Lane wrote:

Mark Kirkwood [EMAIL PROTECTED] writes:


What is interesting is why this plan is being rejected...



Which PG version are you using exactly?  That mistake looks like an
artifact of the 8.0 fuzzy plan cost patch, which we fixed recently:
http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php



Right on - 8.0.3 (I might look at how CVS tip handles this, could be 
interesting).



But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy
cost issue explains his results.

As far as the desc point goes, the problem is that mergejoins aren't
capable of dealing with backward sort order, so a merge plan isn't
considered for that case (or at least, it would have to have a sort
after it, which pretty much defeats the point for a fast-start plan).
I have some ideas about fixing this but it won't happen before 8.2.


That doesn't explain why the nested loop is being kicked tho', or have I 
missed something obvious? - it's been known to happen :-)...


Cheers

Mark


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Limit + group + join

2005-08-26 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 That doesn't explain why the nested loop is being kicked tho',

No, but I think the fuzzy-cost bug does.  There are two different issues
here.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Limit + group + join

2005-08-26 Thread Mark Kirkwood
Interestingly enough, 7.4.8 and 8.1devel-2005-08-23 all behave the same 
as 8.0.3 for me (tables freshly ANALYZEd):


joinlimit=# SELECT version();
 version 


-
 PostgreSQL 7.4.8 on i386-unknown-freebsd5.4, compiled by GCC gcc (GCC) 
3.4.2 [FreeBSD] 20040728

(1 row)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id DESC LIMIT 5;
  QUERY PLAN 


---
 Limit  (cost=10591.36..10591.39 rows=5 width=4)
   -  Group  (cost=10591.36..10992.02 rows=80131 width=4)
 -  Sort  (cost=10591.36..10791.69 rows=80131 width=4)
   Sort Key: c.id
   -  Merge Join  (cost=0.00..4064.66 rows=80131 width=4)
 Merge Cond: (outer.id = inner.c_id)
 -  Index Scan using c_pkey on c 
(cost=0.00..1411.31 rows=80131 width=4)
 -  Index Scan using b_on_c on b 
(cost=0.00..1451.72 rows=80172 width=4)

(8 rows)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id  LIMIT 5;
   QUERY PLAN 


-
 Limit  (cost=0.00..0.27 rows=5 width=4)
   -  Group  (cost=0.00..4264.99 rows=80131 width=4)
 -  Merge Join  (cost=0.00..4064.66 rows=80131 width=4)
   Merge Cond: (outer.id = inner.c_id)
   -  Index Scan using c_pkey on c  (cost=0.00..1411.31 
rows=80131 width=4)
   -  Index Scan using b_on_c on b  (cost=0.00..1451.72 
rows=80172 width=4)

(6 rows)


joinlimit=# SELECT version();
  version 



 PostgreSQL 8.1devel on i386-unknown-freebsd5.4, compiled by GCC gcc 
(GCC) 3.4.2 [FreeBSD] 20040728

(1 row)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id DESC LIMIT 5;
  QUERY PLAN 


---
 Limit  (cost=10654.53..10654.55 rows=5 width=4)
   -  Group  (cost=10654.53..11054.53 rows=8 width=4)
 -  Sort  (cost=10654.53..10854.53 rows=8 width=4)
   Sort Key: c.id
   -  Merge Join  (cost=0.00..4139.44 rows=8 width=4)
 Merge Cond: (outer.id = inner.c_id)
 -  Index Scan using c_pkey on c 
(cost=0.00..1450.00 rows=8 width=4)
 -  Index Scan using b_on_c on b 
(cost=0.00..1490.00 rows=8 width=4)

(8 rows)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id LIMIT 5;
   QUERY PLAN 


-
 Limit  (cost=0.00..0.27 rows=5 width=4)
   -  Group  (cost=0.00..4339.44 rows=8 width=4)
 -  Merge Join  (cost=0.00..4139.44 rows=8 width=4)
   Merge Cond: (outer.id = inner.c_id)
   -  Index Scan using c_pkey on c  (cost=0.00..1450.00 
rows=8 width=4)
   -  Index Scan using b_on_c on b  (cost=0.00..1490.00 
rows=8 width=4)

(6 rows)

The non default server params of relevance are:

shared_buffers = 12000
effective_cache_size = 10
work_mem/sort_mem = 20480

I did wonder if the highish sort_mem might be a factor, but no, with it 
 set to 1024 I get the same behaviour (just higher sort cost estimates).


Cheers

Mark

Tom Lane wrote:



Which PG version are you using exactly?  That mistake looks like an
artifact of the 8.0 fuzzy plan cost patch, which we fixed recently:
http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php

But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy
cost issue explains his results.

As far as the desc point goes, the problem is that mergejoins aren't
capable of dealing with backward sort order, so a merge plan isn't
considered for that case (or at least, it would have to have a sort
after it, which pretty much defeats the point for a fast-start plan).
I have some ideas about fixing this but it won't happen before 8.2.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Limit + group + join

2005-08-26 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 As far as the desc point goes, the problem is that mergejoins aren't
 capable of dealing with backward sort order, so a merge plan isn't
 considered for that case (or at least, it would have to have a sort
 after it, which pretty much defeats the point for a fast-start plan).
 I have some ideas about fixing this but it won't happen before 8.2.

Of course in this case assuming id is an integer column you can just sort by
-id instead.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Limit + group + join

2005-08-25 Thread Tobias Brox
[Jeffrey W. Baker - Thu at 06:56:59PM -0700]
  explain select c.id from c join b on c_id=c.id  group by c.id order by c.id
  desc limit 5;
 
 Where's b in this join clause?

join b on c_id=c.id

It just a funny way of writing:

select c.id from c,b where c_id=c.id  group by c.id order by c.id desc limit 5;

 It looks like a cartesian product to me.

No.  The query will return exactly the same as the simplest query:

  select c.id from c order by c.id  desc limit 5;   

As said, this is a gross oversimplification of the production envorinment.
In the production environment, I really need to use both join, group and
limit.  I tested a bit with subqueries, it was not a good solution
(selecting really a lot of rows and aggregates from many of the tables).

The next idea is to hack it up by manually finding out where the limit
will cut, and place a restriction in the where-part of the query.

-- 
Notice of Confidentiality: This information may be confidential, and
blah-blah-blah - so please keep your eyes closed.  Please delete and destroy
this email.  Failure to comply will cause my lawyer to yawn.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Limit + group + join

2005-08-25 Thread Jeffrey W. Baker
On Thu, 2005-08-25 at 18:56 -0700, Jeffrey W. Baker wrote:
 On Fri, 2005-08-26 at 02:27 +0200, Tobias Brox wrote:
  Consider this setup - which is a gross simplification of parts of our
  production system ;-)
  
create table c (id integer primary key);
create table b (id integer primary key, c_id integer);
create index b_on_c on b(c_id)
  
insert into c (select ... lots of IDs ...);
insert into b (select id, id from c); /* keep it simple :-) */

  Now, I'm just interessted in some few rows.  
  
  All those gives good plans:
  
  explain select c.id from c order by c.id limit 1;
  explain select c.id from c group by c.id order by c.id limit 1;
  explain select c.id from c join b on c_id=c.id order by c.id limit 1;
  
  ... BUT ... combining join, group and limit makes havoc:
  
  explain select c.id from c join b on c_id=c.id  group by c.id order by c.id
  desc limit 5;
 
 Where's b in this join clause?  It looks like a cartesian product to me.

Nevermind.  I read c_id as c.id.

-jwb


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly