On 23/11/2017 21:57, Raphael Mazelier wrote:
A short following of the situation and what seems to mitigate the
problem/make this platform works.
After lot of testing and A/B testing, the solution for us was to make
more smaller instances.
We basically double all the servers(vms) , but in the other hand divide
by two (or more) the ram, and the memory allocated to varnish.
We also revert using malloc with little ram (4G) , 12g on the vm(s). We
also make scheduled task to flush the cache (restarting varnish).
This is a completely counter intuitive, because nuking some entries
seems better to handle a big cache with no nuke.
In my understating it means that our hot content remains in the cache,
and nuking object is OK. This may also means that our ttl in objects are
completly wrong.
Anyway it seems working. Thanks a lot for the people who help us. (and
I'm sure we can find a way to re-tribute this).
Another follow up for posterity :)
I think we have finally succeed restoring a nominal service on your
application. The main problem in the varnish side was the use of a two
stage caching pattern for non cachable requests. We completely
misunderstood the hit for pass concept ; resulting in many request being
kept in the waiting list at the two stage, specifically in peak. Since
theses requests can not be cached it seems that piping them in level 1
is more than enough. To be fair we also fix some little things in our
application code too :)
Happy Holidays.
--
Raphael Mazelier
_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc