I'd recommend making use of existing message queuing systems such as
ActiveMQ which already provide infrastructure for distributing messages to
multiple clients, redelivering after a failed attempt, etc. We've had pretty
good luck with this for StatusNet, where we run a lot of processing on
individual messages through background queues to keep the frontend
responsive.

One difficulty is that we don't have a good system for handling data for
multiple sites in one process, so it may need an intermediate process to
spawn out children for actual processing.

I think Tim did a little experimental work with GearMan; did that pan out?

-- brion
On Nov 15, 2010 4:27 AM, "Daniel Friesen" <[email protected]> wrote:
> There was a thought about the job queue that popped into my mind today.
>
> From what I understand, for a Wiki Farm, in order to use runJobs.php
> instead of using the in-request queue (which on high traffic sites is
> less desireable) the Wiki Farm has to run runJobs.php periodically for
> each and every wiki on the farm.
> So, for example. If a Wiki Farm has 10,000 wiki it's hosting, say the
> Wiki Host really wants to ensure that the queue is run at least hourly
> to keep the data on the wiki reasonably up to date, the wiki farm
> essentially needs to call runJobs.php 10,000 times an hour (ie: one time
> for each individual wiki), irrelevantly of whether a wiki has jobs or
> not. Either that or poll each database before hand, which in itself is
> 10,000 database calls an hour plus the runJobs execution which still
> isn't that desireable.
>
>
> What do people think of having another source class for the job queue
> like we have for file storage, text storage, etc...
>
> The idea being that Wiki Farms would have the ability to implement a new
> Job Queue source which instead derives jobs from a single shared
> database with the same structure as the normal job queue, but with a
> farm specific wiki id inside the table as well.
> Using this method a Wiki Farm would be able to set up a cron job (or
> perhaps a daemon to be even more effective at dispatching the job queue
> runs) which instead of making 10,000 calls to runJobs outright, it would
> fetch a random job row from the shared job queue table, look at the wiki
> id inside the row and execute a runJobs (perhaps with a limit=1000)
> script for that wiki to dispatch the queue and run some jobs for that
> wiki. It would of course continue looking at random jobs from the shared
> table and dispatching more runJobs executions serving the role of trying
> to keep the job queues running for all wiki on the farm, but without
> making wasteful runJobs calls for a pile of wikis which have no jobs to
run.
>
> Any comments?
>
> --
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
>
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to