hashar added a comment.

T111441 is most probably a beast too large to get rid of in a short time, and I don't even know who could be diverted to work on it.

I would at least clear out the HHVM bytecode caches for to be safe. A server that failed had a 202MBytes file:

Got a Failed to run getConfiguration.php. on mw1215.eqiad.wmnet:

$ ls -hl /var/cache/hhvm/
total 2.5G
-rw-r--r-- 1 www-data www-data 202M Sep 16 12:28 cli.hhbc.sq3
-rw-r--r-- 1 www-data www-data 2.3G Sep 16 04:10 fcgi.hhbc.sq3

That is from an api.php call. If the file limit stands true, cli.hhbc.sq3 is only 202Mbytes so I am not sure why that fails.

The table above is for today and show a lot of servers are around that size if not bigger, thus I highly suspect that will trigger the ulimit again.

Ideally, it would nice to figure out why the bytecode cache has to be written to. My assumption is that we should get it compiled once on each deploy, and mwscript would not have to mess with it.

When proceeding with the deployment, I highly recommend to do the jobrunners one at a time. Watch logstash for it. Note that /var/log/mediawiki/jobrunner.log is only readable by root for now (due to T146040)



To: hashar
Cc: thcipriani, Anomie, aaron, MZMcBride, Tobi_WMDE_SW, FastLizard4, JJMC89, zeljkofilipin, Lydia_Pintscher, daniel, aude, Addshore, Aklapper, greg, Legoktm, demon, gerritbot, Stashbot, hashar, Lewizho99, Maathavan, D3r1ck01, Liudvikas, Izno, Luke081515, Wikidata-bugs, ArielGlenn, JanZerebecki, Mbch331, Jay8g, Joe, jeremyb, mmodell
Wikidata-bugs mailing list

Reply via email to