Leam,

I spent a good part of last night thinking about this problem and I
came to the exact same conclusion you did: it's a useless charge of
resources. I think I'm going to take your advice and simply archive
the user input into the database but process it immediately when the
user submits it. A lot quicker and a lot leaner solution from a
resource standpoint.

Thanks for the input.

Anthony

On Mon, Feb 15, 2010 at 5:23 AM, Leam Hall <l...@reuel.net> wrote:
>
> Well, from an admin viewpoint, I'd recommend a lot more thinking before doing 
> this. The "script every minute" idea causes all sorts of issues on the server 
> as well as the database. One hokey query, or one large dataset, and you can 
> cause a lot of problems for the entire machine.
>
> What I don't understand is why you need to have a cron job to deal with the 
> user data. Why not have your processing script called when the user submits? 
> This keeps your script from having to go through the entire database to find 
> uncommitted changes. If you're going to have a gazillion row database aren't 
> you going to be spending a lot of time on queries just to find those not 
> committed?
>
> One other possibility would be to have a second, separate database that 
> stores the user input. Have a script that grabs a small number of rows, does 
> it's thing, deletes the rows it just worked on, sleeps for a couple seconds, 
> and then calls itself. That way your primary database isn't getting hit so 
> hard, your secondary database is the one that has to do disaster recovery, 
> you can split the machines up if load gets too much, and your SA team won't 
> stuff you in a trashcan when your query trashes the system.
>
> Leam
>
> Anthony Papillion wrote:
>>
>> Hello Everyone,
>>
>> I'm designing a system that will work on a schedule. Users will submit data 
>> for processing into the database and then, every minute, a PHP script will 
>> pass through the db looking for unprocessed rows (marked pending) and 
>> process them.
>>
>> The problem is, I may eventually have a few million records to process at a 
>> time. Each record could take anywhere from a few seconds to a few minutes to 
>> perform the required operations on. My concern is making sure that the 
>> script, on the next scheduled pass, doesn't grab the records currently being 
>> processed and start processing them again.
>>
>> Right now, I'm thinking of accomplishing this by updating a 'status' field 
>> in the database. So unprocessed records would have a status of 'pending', 
>> records being processed would have a status of 'processing' and completly 
>> processed record will have a status of 'complete'.
>>
>> For some reason, I see this as ugly but that's the only way I can think of 
>> making sure that records aren't duplicatly processed. So when I select 
>> records to process, I'm ONLY selecting one's with the status of 'pending' 
>> which means they are new, unprocessed.
>>
>> Is there a better, more eleqent way of doing this or is this pretty much it?
>>
>> Thanks!
>> Anthony Papillion
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> New York PHP Users Group Community Talk Mailing List
>> http://lists.nyphp.org/mailman/listinfo/talk
>>
>> http://www.nyphp.org/Show-Participation
>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
_______________________________________________
New York PHP Users Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

http://www.nyphp.org/Show-Participation

Reply via email to