I'd recommend making use of existing message queuing systems such as ActiveMQ which already provide infrastructure for distributing messages to multiple clients, redelivering after a failed attempt, etc. We've had pretty good luck with this for StatusNet, where we run a lot of processing on individual messages through background queues to keep the frontend responsive.
One difficulty is that we don't have a good system for handling data for multiple sites in one process, so it may need an intermediate process to spawn out children for actual processing. I think Tim did a little experimental work with GearMan; did that pan out? -- brion On Nov 15, 2010 4:27 AM, "Daniel Friesen" <[email protected]> wrote: > There was a thought about the job queue that popped into my mind today. > > From what I understand, for a Wiki Farm, in order to use runJobs.php > instead of using the in-request queue (which on high traffic sites is > less desireable) the Wiki Farm has to run runJobs.php periodically for > each and every wiki on the farm. > So, for example. If a Wiki Farm has 10,000 wiki it's hosting, say the > Wiki Host really wants to ensure that the queue is run at least hourly > to keep the data on the wiki reasonably up to date, the wiki farm > essentially needs to call runJobs.php 10,000 times an hour (ie: one time > for each individual wiki), irrelevantly of whether a wiki has jobs or > not. Either that or poll each database before hand, which in itself is > 10,000 database calls an hour plus the runJobs execution which still > isn't that desireable. > > > What do people think of having another source class for the job queue > like we have for file storage, text storage, etc... > > The idea being that Wiki Farms would have the ability to implement a new > Job Queue source which instead derives jobs from a single shared > database with the same structure as the normal job queue, but with a > farm specific wiki id inside the table as well. > Using this method a Wiki Farm would be able to set up a cron job (or > perhaps a daemon to be even more effective at dispatching the job queue > runs) which instead of making 10,000 calls to runJobs outright, it would > fetch a random job row from the shared job queue table, look at the wiki > id inside the row and execute a runJobs (perhaps with a limit=1000) > script for that wiki to dispatch the queue and run some jobs for that > wiki. It would of course continue looking at random jobs from the shared > table and dispatching more runJobs executions serving the role of trying > to keep the job queues running for all wiki on the farm, but without > making wasteful runJobs calls for a pile of wikis which have no jobs to run. > > Any comments? > > -- > ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] > > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
