Re: [PERFORM] Selecting random rows efficiently

2003-09-03 Thread scott.marlowe
Can you just create an extra serial column and make sure that one is 
always in order and no holes in it?  (i.e. a nightly process, etc...)???

If so, then something like this truly flies:

select * from accounts where aid = (select cast(floor(random()*10)+1 as int));

My times on it on a 100,000 row table are  1 millisecond.

Note that you have to have a hole free sequence AND know how many rows 
there are, but if you can meet those needs, this is screamingly fast.

On Sat, 30 Aug 2003, Russell Garrett wrote:

 Considering that we'd have to index the random field too, it'd be neater in
 the long term to re-number the primary key. Although, being a primary key,
 that's foreign-keyed from absolutely everywhere, so that'd probably take an
 amusingly long time.
 
 ...and no we're not from Micronesia, we're from ever so slightly less exotic
 London. Though Micronesia might be nice...
 
 Russ (also from last.fm but without the fancy address)
 
 [EMAIL PROTECTED] wrote:
  On Sat, 2003-08-30 at 09:01, Rod Taylor wrote:
  i was hoping there was some trickery with sequences that would
  allow me to easily pick a random valid sequence number..?
 
  I would suggest renumbering the data.
 
  ALTER SEQUENCE ... RESTART WITH 1;
  UPDATE table SET pkey = DEFAULT;
 
  Of course, PostgreSQL may have trouble with that update due to
  evaluation of the unique constraint immediately -- so drop the
  primary key first, and add it back after.
 
  And if there are child tables, they'd all have to be updated, too.
 
 
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Selecting random rows efficiently

2003-08-30 Thread Jeff
On Sat, 30 Aug 2003, Richard Jones wrote:

 Hi,
 i have a table of around 3 million rows from which i regularly (twice a second
 at the moment) need to select a random row from

 currently i'm doing order by rand() limit 1 - but i suspect this is
 responsible for the large load on my db server - i guess that PG is doing far
 too much work just to pick one row.


If you have an int id (aka serial) column then it is simple - just pick a
random number between 1 and currval('id_seq')...

or offset rand() limit 1 perhaps?

since you want random ther eis no need to bother with an order and that'll
save a sort.

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Selecting random rows efficiently

2003-08-30 Thread Richard Jones
On Saturday 30 August 2003 1:08 pm, you wrote:
 On Sat, 30 Aug 2003, Richard Jones wrote:
  Hi,
  i have a table of around 3 million rows from which i regularly (twice a
  second at the moment) need to select a random row from
 
  currently i'm doing order by rand() limit 1 - but i suspect this is
  responsible for the large load on my db server - i guess that PG is doing
  far too much work just to pick one row.

 If you have an int id (aka serial) column then it is simple - just pick a
 random number between 1 and currval('id_seq')...

 or offset rand() limit 1 perhaps?

 since you want random ther eis no need to bother with an order and that'll
 save a sort.


Yes, the pkey is a SERIAL but the problem is that the sequence is rather 
sparse

for example, it goes something like 1 - 5000  then 10-10 and then 
200-upwards

this is due to chunks being deleted etc.. 

if i pick a random number for the key it will not be a random enough 
distribution, because the sequence is sparse.. sometimes it will pick a key 
that doesnt exist.

i'm currently reading all the keys into an array and selecting randoms from 
there - but this is no good long-term as i need to refresh the array of keys 
to take into account newly added rows to the table (daily)

i was hoping there was some trickery with sequences that would allow me to 
easily pick a random valid sequence number..?

Thanks,
Rich.









 --
 Jeff Trout [EMAIL PROTECTED]
 http://www.jefftrout.com/
 http://www.stuarthamm.net/



 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Selecting random rows efficiently

2003-08-30 Thread Rod Taylor
 i was hoping there was some trickery with sequences that would allow me to 
 easily pick a random valid sequence number..?

I would suggest renumbering the data.

ALTER SEQUENCE ... RESTART WITH 1;
UPDATE table SET pkey = DEFAULT;

Of course, PostgreSQL may have trouble with that update due to
evaluation of the unique constraint immediately -- so drop the primary
key first, and add it back after.


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Selecting random rows efficiently

2003-08-30 Thread Ron Johnson
On Sat, 2003-08-30 at 08:09, Richard Jones wrote:
 Hi,
 i have a table of around 3 million rows from which i regularly (twice a second 
 at the moment) need to select a random row from
 
 currently i'm doing order by rand() limit 1 - but i suspect this is 
 responsible for the large load on my db server - i guess that PG is doing far 
 too much work just to pick one row.

What datatype is the selected by key?

Also, where is rand() defined?  Is that a UDF?

Could it be that there is a type mismatch?

 one way i can think of is to read in all the primary keys from my table, and 
 select one of the keys at random then directly fetch that row. 
 
 are there any other ways to do this? i need to keep the load down :)
 
 Thanks,
 Richard 

Are you really in Micronesia?

-- 
-
Ron Johnson, Jr. [EMAIL PROTECTED]
Jefferson, LA USA

The greatest dangers to liberty lurk in insidious encroachment 
by men of zeal, well-meaning, but without understanding.
Justice Louis Brandeis, dissenting, Olmstead v US (1928)


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Selecting random rows efficiently

2003-08-30 Thread Tom Lane
I said:
 3. Your query now looks like
   SELECT * FROM table WHERE random_id = random()
   ORDER BY random_id LIMIT 1;

Correction: the above won't give quite the right query because random()
is marked as a volatile function.  You can hide the random() call inside
a user-defined function that you (misleadingly) mark stable, or you can
just stick it into a sub-select:

regression=# explain select * from foo WHERE random_id = (select random())
regression-# ORDER BY random_id LIMIT 1;
   QUERY PLAN
-
 Limit  (cost=0.01..0.15 rows=1 width=8)
   InitPlan
 -  Result  (cost=0.00..0.01 rows=1 width=0)
   -  Index Scan using fooi on foo  (cost=0.00..45.50 rows=334 width=8)
 Index Cond: (random_id = $0)
(5 rows)

This technique is probably safer against future planner changes,
however:

regression=# create function oneshot_random() returns float8 as
regression-# 'select random()' language sql stable;
CREATE FUNCTION
regression=# explain select * from foo WHERE random_id = oneshot_random()
regression-# ORDER BY random_id LIMIT 1;
   QUERY PLAN
-
 Limit  (cost=0.00..0.14 rows=1 width=8)
   -  Index Scan using fooi on foo  (cost=0.00..46.33 rows=334 width=8)
 Index Cond: (random_id = oneshot_random())
(3 rows)

The point here is that an indexscan boundary condition has to use stable
or immutable functions.  By marking oneshot_random() stable, you
essentially say that it's okay to evaluate it only once per query,
rather than once at each row.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly