On Tuesday 06 April 2004 21:25, Ken Geis wrote:
I am trying to find an efficient way to draw a random sample from a complex query. I also want it to be easy to use within my application.
So I've defined a view that encapsulates the query. The id in the "driving" table is exposed, and I run a query like:
select * from stats_record_view where id in (select id from driver_stats order by random() limit 30000);
How about a join?
SELECT s.* FROM stats_record_view s JOIN (SELECT id FROM driver_stats ORDER BY random() LIMIT 30000) AS r ON s.id = r.id;
Yes, I tried this too after I sent the first mail, and this was somewhat better. I ended up adding a random column to the driving table, putting an index on it, and exposing that column in the view. Now I can say
SELECT * FROM stats_record_view WHERE random < 0.093;
For my application, it's OK if the same sample is picked time after time and it may change if data is added.
...
Also worth checking the various list archives - this has come up in the past, but some time ago.
There are some messages in the archives about how to get a random sample. I know how to do that, and that's not why I posted my message. Are you saying that the planner behavior I spoke of is in the archives? I wouldn't know what to search on to find that thread. Does anyone think that the planner issue has merit to address? Can someone help me figure out what code I would look at?
Ken Geis
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings