Re: [PERFORM] plan problem

2004-04-09 Thread Tom Lane
Ken Geis <[EMAIL PROTECTED]> writes:
> Does anyone think that the planner issue has merit to address?  Can
> someone help me figure out what code I would look at?

The planner doesn't currently attempt to "drill down" into a sub-select-
in-FROM to find statistics about the variables emitted by the sub-select.
So it's just falling back to a default estimate of the number of
distinct values coming out of the sub-select.

The "drilling down" part is not hard; the difficulty comes from trying
to figure out whether and how the stats from the underlying column would
need to be adjusted for the behavior of the sub-select itself.  As an
example, the result of (SELECT DISTINCT foo FROM bar) would usually have
much different stats from the raw bar.foo column.  In your example, the
LIMIT clause potentially affects the stats by reducing the number of
distinct values.

Now in most situations where the sub-select wouldn't change the stats,
there's no issue anyway because the planner will flatten the sub-select
into the main query.  So we really have to figure out the adjustment
part before we can think about doing much here.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] plan problem

2004-04-07 Thread Richard Huxton
On Wednesday 07 April 2004 10:03, Ken Geis wrote:
> Richard Huxton wrote:
> > On Tuesday 06 April 2004 21:25, Ken Geis wrote:
> >>I am trying to find an efficient way to draw a random sample from a
> >>complex query.  I also want it to be easy to use within my application.
> >>
> >>So I've defined a view that encapsulates the query.  The id in the
> >>"driving" table is exposed, and I run a query like:
> >>
> >>select * from stats_record_view
> >>  where id in (select id from driver_stats
> >>order by random()
> >>limit 3);
> >
> > How about a join?
> >
> > SELECT s.*
> > FROM
> > stats_record_view s
> > JOIN
> > (SELECT id FROM driver_stats ORDER BY random() LIMIT 3) AS r
> > ON s.id = r.id;
>
> Yes, I tried this too after I sent the first mail, and this was somewhat
> better.  I ended up adding a random column to the driving table, putting
> an index on it, and exposing that column in the view.  Now I can say
>
> SELECT * FROM stats_record_view WHERE random < 0.093;
>
> For my application, it's OK if the same sample is picked time after time
> and it may change if data is added.

Fair enough - that'll certainly do it.

> > Also worth checking the various list archives - this has come up in the
> > past, but some time ago.
>
> There are some messages in the archives about how to get a random
> sample.  I know how to do that, and that's not why I posted my message.
>   Are you saying that the planner behavior I spoke of is in the
> archives?  I wouldn't know what to search on to find that thread.  Does
> anyone think that the planner issue has merit to address?  Can someone
> help me figure out what code I would look at?

I was assuming after getting a random subset they'd see the same problem you 
are. If not, probably worth looking at. In which case, an EXPLAIN ANALYZE of 
your original query would be good.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] plan problem

2004-04-07 Thread Ken Geis
Richard Huxton wrote:
On Tuesday 06 April 2004 21:25, Ken Geis wrote:

I am trying to find an efficient way to draw a random sample from a
complex query.  I also want it to be easy to use within my application.
So I've defined a view that encapsulates the query.  The id in the
"driving" table is exposed, and I run a query like:
select * from stats_record_view
 where id in (select id from driver_stats
   order by random()
   limit 3);


How about a join?

SELECT s.*
FROM
stats_record_view s
JOIN
(SELECT id FROM driver_stats ORDER BY random() LIMIT 3) AS r
ON s.id = r.id;
Yes, I tried this too after I sent the first mail, and this was somewhat 
better.  I ended up adding a random column to the driving table, putting 
an index on it, and exposing that column in the view.  Now I can say

SELECT * FROM stats_record_view WHERE random < 0.093;

For my application, it's OK if the same sample is picked time after time 
and it may change if data is added.

...
Also worth checking the various list archives - this has come up in the past, 
but some time ago.
There are some messages in the archives about how to get a random 
sample.  I know how to do that, and that's not why I posted my message. 
 Are you saying that the planner behavior I spoke of is in the 
archives?  I wouldn't know what to search on to find that thread.  Does 
anyone think that the planner issue has merit to address?  Can someone 
help me figure out what code I would look at?

Ken Geis



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] plan problem

2004-04-07 Thread Richard Huxton
On Tuesday 06 April 2004 21:25, Ken Geis wrote:
> I am trying to find an efficient way to draw a random sample from a
> complex query.  I also want it to be easy to use within my application.
>
> So I've defined a view that encapsulates the query.  The id in the
> "driving" table is exposed, and I run a query like:
>
> select * from stats_record_view
>   where id in (select id from driver_stats
> order by random()
> limit 3);

How about a join?

SELECT s.*
FROM
stats_record_view s
JOIN
(SELECT id FROM driver_stats ORDER BY random() LIMIT 3) AS r
ON s.id = r.id;

Or, what about a cursor and fetch forward (or back?) a random number of rows 
before each fetch. That's probably not going to be so random though.

Also worth checking the various list archives - this has come up in the past, 
but some time ago.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings