I have 16 - 25 simultaneous processes running selecting records (to process) 
from a DB that has about 3.3 million lines, quite standard-
SELECT a,b WHERE x, y and z ORDER BY rand() LIMIT 1

Works fine up to about 8-10 processes, and then goes exponentially slower 
until queries take over a minute to answer at around 20 processes. Note that 
not all processes are hitting the DB at the same time, but they all tend to 
come to a halt together by the time I am running 20 of them.

So this is what each process does-

1. SELECT a,b where x,y,z and DONE=no.

2. UPDATE a field to apply a 'lock' of sorts on that row, so another process 
doesn't also do it.

do yada yada with the data (Can take a few seconds to a few minutes)

3. UPDATE again, this time with a 'done' statement

Run loop again.

Now with every command I am running order by rand() - this is to greatly 
minimise the chance of another process grabbing the same row in the period 
between 1 and 2.

However I also think this is the part that may be the most taxing on the db 
server.

Is there a better way?

Can I grab the data and somehow 'lock' that row in one command, so another 
process can't read it (it gets the next row) negating the need for order by 
rand() ?
Then at the end of the process unlock it and update it with 'done' all in one 
go?

I'm not joking when I say that this is process is capable of bringing a Core 2 
Duo with 4Gb of RAM to a crawl.... I am running MySQL 5.1.30.

Michael

--~--~---------~--~----~------------~-------~--~----~
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to [email protected]
To unsubscribe, send email to
[email protected]
-~----------~----~----~----~------~----~------~--~---

Reply via email to