Richard Huxton wrote:
On Tuesday 06 April 2004 21:25, Ken Geis wrote:

I am trying to find an efficient way to draw a random sample from a
complex query.  I also want it to be easy to use within my application.

So I've defined a view that encapsulates the query.  The id in the
"driving" table is exposed, and I run a query like:

select * from stats_record_view
 where id in (select id from driver_stats
               order by random()
               limit 30000);


How about a join?

SELECT s.*
FROM
stats_record_view s
JOIN
(SELECT id FROM driver_stats ORDER BY random() LIMIT 30000) AS r
ON s.id = r.id;

Yes, I tried this too after I sent the first mail, and this was somewhat better. I ended up adding a random column to the driving table, putting an index on it, and exposing that column in the view. Now I can say


SELECT * FROM stats_record_view WHERE random < 0.093;

For my application, it's OK if the same sample is picked time after time and it may change if data is added.

...
Also worth checking the various list archives - this has come up in the past, but some time ago.

There are some messages in the archives about how to get a random sample. I know how to do that, and that's not why I posted my message. Are you saying that the planner behavior I spoke of is in the archives? I wouldn't know what to search on to find that thread. Does anyone think that the planner issue has merit to address? Can someone help me figure out what code I would look at?



Ken Geis




---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to