https://bugzilla.wikimedia.org/show_bug.cgi?id=72366

--- Comment #19 from Antoine "hashar" Musso (WMF) <[email protected]> ---
When the change "make scap restart HHVM" (
https://gerrit.wikimedia.org/r/#/c/163078/ ) has been deployed on the beta
cluster, Bryan Davis immediately noticed the apache timeout issue. Quoting him:


> In beta I'm seeing errors that line up with hhvm restarts:
> 
> [proxy_fcgi:error] [pid 10116] [client 10.68.16.12:13867] AH01079:
> failed to make connection to backend: 127.0.0.1
> 
> In beta we don't use pybal so there is no way at all to depool/repool around
> a restart. In production I don't think that we have any automatable means for
> depooling/repooling at this time. How worried should I be about a 503 storm
> if we put this into production? Without depooling it is going to happen.

And I followed up with:

> A few random ideas:
>
> More proxy: Assuming HHVM restart is reasonably fast, could we set a timeout
> in Apache ?   That would mean adding mod_proxy in between Apache and hhvm
> fcgi which will come with a bunch of potential failures on its own :/
>
> Add in pybal some RPC to remotely depool/repool a node
>
> Does hhvm supports a graceful stop?  ie stop accepting new connections (that
> would make pybal to unpool the node) but keep processing ongoing requests ?


Seems we need to make HHVM smarter ala "apache graceful-stop".

On beta cluster, the load balancing is handled by the backend Varnish which
have the web servers as backend. There might be a way to ask the Varnishes to
stop sending new connections to a web server that is about to be restarted.


Finally, the patch was made to clear HHVM caches for JIT PCRE, both might need
a feature request to be loaded with HHVM upstream so it can be reclaimed
automatically or forced from time to time.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to