Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan for relatively simple query seems to be very inefficient)

2005-04-07 Thread Jim C. Nasby
On Wed, Apr 06, 2005 at 06:35:10PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  On Wed, Apr 06, 2005 at 06:09:37PM -0400, Tom Lane wrote:
  Can anyone suggest a more general rule?  Do we need for example to
  consider whether the relation membership is the same in two clauses
  that might be opposite sides of a range restriction?  It seems like
  
  a.x  b.y AND a.x  b.z
 
  In a case like this, you could actually look at the  data in b and see
  what the average range size is.
 
 Not with the current statistics --- you'd need some kind of cross-column
 statistics involving both y and z.  (That is, I doubt it would be
 helpful to estimate the average range width by taking the difference of
 independently-calculated mean values of y and z ...)  But yeah, in
 principle it would be possible to make a non-default estimate.

Actually, it might be possible to take a SWAG at it using the histogram
and correlation stats.

You know... since getting universally useful cross-platform stats seems
to be pretty pie-in-the-sky, would it be possible to generate more
complex stats on the fly from a sampling of a table? If you're looking
at a fairly sizeable table ISTM it would be worth sampling the rows on
10 or 20 random pages to see what you get. In this case, you'd want to
know the average difference between two fields. Other queries might want
something different.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan for relatively simple query seems to be very inefficient)

2005-04-06 Thread Jim C. Nasby
On Wed, Apr 06, 2005 at 06:09:37PM -0400, Tom Lane wrote:
 Can anyone suggest a more general rule?  Do we need for example to
 consider whether the relation membership is the same in two clauses
 that might be opposite sides of a range restriction?  It seems like
 
   a.x  b.y AND a.x  b.z

In a case like this, you could actually look at the  data in b and see
what the average range size is. If you wanted to get really fancy, the
optimizer could decide how best to access a based on each row of b.

 probably can be treated as a range restriction on a.x for this purpose,
 but I'm much less sure that the same is true of
 
   a.x  b.y AND a.x  c.z

Well, this could end up being much trickier, since who knows how b and c
are related. Though thinking about it, although I threw out the
row-by-row analysis idea to be glib, that would actually work in this
case; you could take a look at what b and c look like each time 'through
the loop'.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan for relatively simple query seems to be very inefficient)

2005-04-06 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Wed, Apr 06, 2005 at 06:09:37PM -0400, Tom Lane wrote:
 Can anyone suggest a more general rule?  Do we need for example to
 consider whether the relation membership is the same in two clauses
 that might be opposite sides of a range restriction?  It seems like
 
 a.x  b.y AND a.x  b.z

 In a case like this, you could actually look at the  data in b and see
 what the average range size is.

Not with the current statistics --- you'd need some kind of cross-column
statistics involving both y and z.  (That is, I doubt it would be
helpful to estimate the average range width by taking the difference of
independently-calculated mean values of y and z ...)  But yeah, in
principle it would be possible to make a non-default estimate.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan for relatively simple query seems to be very inefficient)

2005-04-06 Thread Tom Lane
John A Meinel [EMAIL PROTECTED] writes:
 Actually, I think he was saying do a nested loop, and for each item in
 the nested loop, re-evaluate if an index or a sequential scan is more
 efficient.

 I don't think postgres re-plans once it has started, though you could
 test this in a plpgsql function.

It doesn't, and in any case that's a microscopic view of the issue.
The entire shape of the plan might change depending on what we think
the selectivity is --- much more than could be handled by switching
scan types at the bottom level.

Also, I anticipate that bitmap-driven index scans will change things
considerably here.  The range of usefulness of pure seqscans will
drop drastically...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq