On 23/11/2017 21:57, Raphael Mazelier wrote:

A short following of the situation and what seems to mitigate the problem/make this platform works. After lot of testing and A/B testing, the solution for us was to make more smaller instances. We basically double all the servers(vms) , but in the other hand divide by two (or more) the ram, and the memory allocated to varnish. We also revert using malloc with little ram (4G) , 12g on the vm(s). We also make scheduled task to flush the cache (restarting varnish). This is a completely counter intuitive, because nuking some entries seems better to handle a big cache with no nuke. In my understating it means that our hot content remains in the cache, and nuking object is OK. This may also means that our ttl in objects are completly wrong.

Anyway it seems working. Thanks a lot for the people who help us. (and I'm sure we can find a way to re-tribute this).


Another follow up for posterity :)

I think we have finally succeed restoring a nominal service on your application. The main problem in the varnish side was the use of a two stage caching pattern for non cachable requests. We completely misunderstood the hit for pass concept ; resulting in many request being kept in the waiting list at the two stage, specifically in peak. Since theses requests can not be cached it seems that piping them in level 1 is more than enough. To be fair we also fix some little things in our application code too :)

Happy Holidays.

--
Raphael Mazelier

_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

Reply via email to