Re: [Wikitech-l] Best practices for read/write vs read-only requests, and our multi-DC future

Brion Vibber Thu, 21 Apr 2016 08:27:56 -0700

On Thu, Apr 21, 2016 at 4:59 PM, Erik Bernhardson <
[email protected]> wrote:

> On Apr 20, 2016 10:45 PM, "Brion Vibber" <[email protected]> wrote:
> > Note that we could fire off a job queue background task to do the actual
> > removal... But is it also safe to do that on a read-only request?
> >
>
> https://www.mediawiki.org/wiki/Requests_for_comment/Master_%26_slave_datacenter_strategy_for_MediaWiki
> > seems to indicate job queueing will be safe, but would like to confirm
> > that. :)
> >
>
> I think this is the preferred method. My understanding is that the jobs
> will get shipped to the primary DC job queue.
>

*nod* looks like per spec that should work with few surprises.

>
> > Similarly in https://gerrit.wikimedia.org/r/#/c/284269/ we may wish to
> > trigger missing transcodes to run on demand, similarly. The actual re
> > encoding happens in a background job, but we have to fire it off, and we
> > have to record that we fired it off so we don't duplicate it...
> [snip]
> >
> The job queue can do deduplication, although you would have to check if
> that is active while the job is running and not only while queued. Might
> help?
>

Part of the trick is we want to let the user know that the job has been
queued; and if the job errors out, we want the user to know that the job
errored out.

Currently this means we have to update a row in the 'transcode' table
(TimedMediaHandler-specific info about the transcoded derivative files)
when we fire off the job, then update its state again when the job actually
runs.

If that's split into two queues, one lightweight and one heavyweight, then
this might make sense:

* N web requests hit something using File:Foobar.webm, which has a missing
transcode
* they each try to queue up a job to the lightweight queue that says "start
queueing this to actually transcode!"
* when the job queue runner on the lightweight queue sees the first such
job, it records the status update to the database and queues up a
heavyweight job to run the actual transcoding. The N-1 remaining jobs duped
on the same title/params either get removed, or never got stored in the
first place; I forget how it works. :)
* ... time passes, during which further web requests don't yet see the
updated database table state, and keep queueing in the lightweight queue.
* lightweight queue runners see some of those jobs, but they have the
updated master database state and know they don't need to act.
* database replication of the updated state hits the remote DC
* ..time passes, during which further web requests see the updated database
table state and don't bother queueing the lightweight job
* eventually, the heavyweight job runs, completes, updates the states at
start and end.
* eventually, the database replicates the transcode state completion to the
remote DC.
* web requests start seeing the completed state, and their output includes
the updated transcode information.

It all feels a bit complex, and I wonder if we could build some common
classes to help with this transaction model. I'm pretty sure we can be
making more use of background jobs outside of TimedMediaHandler's slow
video format conversions. :D

-- brion
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Best practices for read/write vs read-only requests, and our multi-DC future

Reply via email to