Re: [PERFORM] plan problem
Ken Geis <[EMAIL PROTECTED]> writes: > Does anyone think that the planner issue has merit to address? Can > someone help me figure out what code I would look at? The planner doesn't currently attempt to "drill down" into a sub-select- in-FROM to find statistics about the variables emitted by the sub-select. So it's just falling back to a default estimate of the number of distinct values coming out of the sub-select. The "drilling down" part is not hard; the difficulty comes from trying to figure out whether and how the stats from the underlying column would need to be adjusted for the behavior of the sub-select itself. As an example, the result of (SELECT DISTINCT foo FROM bar) would usually have much different stats from the raw bar.foo column. In your example, the LIMIT clause potentially affects the stats by reducing the number of distinct values. Now in most situations where the sub-select wouldn't change the stats, there's no issue anyway because the planner will flatten the sub-select into the main query. So we really have to figure out the adjustment part before we can think about doing much here. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] plan problem
On Wednesday 07 April 2004 10:03, Ken Geis wrote: > Richard Huxton wrote: > > On Tuesday 06 April 2004 21:25, Ken Geis wrote: > >>I am trying to find an efficient way to draw a random sample from a > >>complex query. I also want it to be easy to use within my application. > >> > >>So I've defined a view that encapsulates the query. The id in the > >>"driving" table is exposed, and I run a query like: > >> > >>select * from stats_record_view > >> where id in (select id from driver_stats > >>order by random() > >>limit 3); > > > > How about a join? > > > > SELECT s.* > > FROM > > stats_record_view s > > JOIN > > (SELECT id FROM driver_stats ORDER BY random() LIMIT 3) AS r > > ON s.id = r.id; > > Yes, I tried this too after I sent the first mail, and this was somewhat > better. I ended up adding a random column to the driving table, putting > an index on it, and exposing that column in the view. Now I can say > > SELECT * FROM stats_record_view WHERE random < 0.093; > > For my application, it's OK if the same sample is picked time after time > and it may change if data is added. Fair enough - that'll certainly do it. > > Also worth checking the various list archives - this has come up in the > > past, but some time ago. > > There are some messages in the archives about how to get a random > sample. I know how to do that, and that's not why I posted my message. > Are you saying that the planner behavior I spoke of is in the > archives? I wouldn't know what to search on to find that thread. Does > anyone think that the planner issue has merit to address? Can someone > help me figure out what code I would look at? I was assuming after getting a random subset they'd see the same problem you are. If not, probably worth looking at. In which case, an EXPLAIN ANALYZE of your original query would be good. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] plan problem
Richard Huxton wrote: On Tuesday 06 April 2004 21:25, Ken Geis wrote: I am trying to find an efficient way to draw a random sample from a complex query. I also want it to be easy to use within my application. So I've defined a view that encapsulates the query. The id in the "driving" table is exposed, and I run a query like: select * from stats_record_view where id in (select id from driver_stats order by random() limit 3); How about a join? SELECT s.* FROM stats_record_view s JOIN (SELECT id FROM driver_stats ORDER BY random() LIMIT 3) AS r ON s.id = r.id; Yes, I tried this too after I sent the first mail, and this was somewhat better. I ended up adding a random column to the driving table, putting an index on it, and exposing that column in the view. Now I can say SELECT * FROM stats_record_view WHERE random < 0.093; For my application, it's OK if the same sample is picked time after time and it may change if data is added. ... Also worth checking the various list archives - this has come up in the past, but some time ago. There are some messages in the archives about how to get a random sample. I know how to do that, and that's not why I posted my message. Are you saying that the planner behavior I spoke of is in the archives? I wouldn't know what to search on to find that thread. Does anyone think that the planner issue has merit to address? Can someone help me figure out what code I would look at? Ken Geis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] plan problem
On Tuesday 06 April 2004 21:25, Ken Geis wrote: > I am trying to find an efficient way to draw a random sample from a > complex query. I also want it to be easy to use within my application. > > So I've defined a view that encapsulates the query. The id in the > "driving" table is exposed, and I run a query like: > > select * from stats_record_view > where id in (select id from driver_stats > order by random() > limit 3); How about a join? SELECT s.* FROM stats_record_view s JOIN (SELECT id FROM driver_stats ORDER BY random() LIMIT 3) AS r ON s.id = r.id; Or, what about a cursor and fetch forward (or back?) a random number of rows before each fetch. That's probably not going to be so random though. Also worth checking the various list archives - this has come up in the past, but some time ago. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings