Leam, I spent a good part of last night thinking about this problem and I came to the exact same conclusion you did: it's a useless charge of resources. I think I'm going to take your advice and simply archive the user input into the database but process it immediately when the user submits it. A lot quicker and a lot leaner solution from a resource standpoint.
Thanks for the input. Anthony On Mon, Feb 15, 2010 at 5:23 AM, Leam Hall <l...@reuel.net> wrote: > > Well, from an admin viewpoint, I'd recommend a lot more thinking before doing > this. The "script every minute" idea causes all sorts of issues on the server > as well as the database. One hokey query, or one large dataset, and you can > cause a lot of problems for the entire machine. > > What I don't understand is why you need to have a cron job to deal with the > user data. Why not have your processing script called when the user submits? > This keeps your script from having to go through the entire database to find > uncommitted changes. If you're going to have a gazillion row database aren't > you going to be spending a lot of time on queries just to find those not > committed? > > One other possibility would be to have a second, separate database that > stores the user input. Have a script that grabs a small number of rows, does > it's thing, deletes the rows it just worked on, sleeps for a couple seconds, > and then calls itself. That way your primary database isn't getting hit so > hard, your secondary database is the one that has to do disaster recovery, > you can split the machines up if load gets too much, and your SA team won't > stuff you in a trashcan when your query trashes the system. > > Leam > > Anthony Papillion wrote: >> >> Hello Everyone, >> >> I'm designing a system that will work on a schedule. Users will submit data >> for processing into the database and then, every minute, a PHP script will >> pass through the db looking for unprocessed rows (marked pending) and >> process them. >> >> The problem is, I may eventually have a few million records to process at a >> time. Each record could take anywhere from a few seconds to a few minutes to >> perform the required operations on. My concern is making sure that the >> script, on the next scheduled pass, doesn't grab the records currently being >> processed and start processing them again. >> >> Right now, I'm thinking of accomplishing this by updating a 'status' field >> in the database. So unprocessed records would have a status of 'pending', >> records being processed would have a status of 'processing' and completly >> processed record will have a status of 'complete'. >> >> For some reason, I see this as ugly but that's the only way I can think of >> making sure that records aren't duplicatly processed. So when I select >> records to process, I'm ONLY selecting one's with the status of 'pending' >> which means they are new, unprocessed. >> >> Is there a better, more eleqent way of doing this or is this pretty much it? >> >> Thanks! >> Anthony Papillion >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> New York PHP Users Group Community Talk Mailing List >> http://lists.nyphp.org/mailman/listinfo/talk >> >> http://www.nyphp.org/Show-Participation > > _______________________________________________ > New York PHP Users Group Community Talk Mailing List > http://lists.nyphp.org/mailman/listinfo/talk > > http://www.nyphp.org/Show-Participation _______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation