https://bugzilla.wikimedia.org/show_bug.cgi?id=45877

Krinkle <krinklem...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|ResourceLoader caches       |ResourceLoader: Bad cache
                   |missing module for many     |stuck due to race condition
                   |minutes after it's          |with scap between load.php
                   |available                   |and index.php server

--- Comment #5 from Krinkle <krinklem...@gmail.com> ---
(In reply to Krinkle from comment #3)
> This bug (bug 45877) is about the race condition where one server is already
> embedding requests urls in the page (or mw.loader.load calls) while the
> module in question is not yet available on the apache server that the
> request will be made to.
> 
> e.g.
> en.wikipedia.org --> srv123 @r12 --> outputs html mw.loader.load('foo')
> -> bits.wikimedia.org --> srv214 @r11 --> mw.loader.state('foo', 'missing');
> 
> Where that exact url (with that timestamp) will get cached for 30 days.
> 
> Now it won't be broken for 30 days because the startup manifest
> en.wikipedia.org requests context work with (from bits) will be rebuilt
> every 5 minutes.
> 
> I'm not sure what the proper solution is for this problem. Perhaps syncing
> to bits firts, though that might bring the opposite problem, which is likely
> less visible, but might be equally problematic.

Rephrasing bug summary to more accurately reflect this.

The different relevant scenarios:


1) index.php server first, load.php server second

So whenever a change of any kind is deployed (new module, or a change to an
existing module), it is possible that the code might arrive on one server first
(e.g. the server serving the html, with the module load queue), and then the
client makes a subsequent request to load.php for those modules (handled by a
server that doesn't yet have the code).

In that case, the response will be module.state=missing; Which is fine and
degrades gracefully. This is not a broken state, simply the old state prior to
this particular deployment. Once the deployment is finished, the next time a
client requests the startup module from load.php, the module will be in there
and with a higher timestamp, so the module=missing response won't be served
again.

2) index.php server and load.php/startup first, load.php/module second

First html server gets it, ensuring the module is in the load queue (if not
already) and sometime before or after this, the server handling the startup
module got it too. Client will make a request for the module with the newer
version number in the url, gets handled by a server that we haven't synced to
yet. Responds with an old version of the module (or module=missing if its a new
module).

This situation doesn't resolve itself within 5 minutes because the bad response
is stuck in frontend varnish cache at the correct url.

Touching startup.js won't help either.

This can only be resolved by changing the module again (touch any of the
relevant module's included files), syncing that and hoping you don't get the
same race condition (it's rare but I'd say it happens 1/20 times).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to