Hi, Let's look at the usual suspects first, can we get the output of "ps aux |grep varnish" and a pastebin of "varnishncsa -1"?
Are you using any vmod? man varnishncsa will help craft a format line with the response time (on mobile now, I don't have access to it) Cheers, -- Guillaume Quintard On Nov 14, 2017 23:25, "Raphael Mazelier" <[email protected]> wrote: > Hello list, > > First of all despite my mail subject I really appreciate varnish. > We use it a lot at work (hundred of instances) with success and > unfortunately some pain these time. > > TLDR; upgrading from varnish 2 to varnish 4 and 5 on one of our > infrastructure brought us some serious trouble and instability on this > platform. > And we are a bit desperate/frustrated > > > Long story. > > A bit of context : > > This a very complex platform serving an IPTV service with some traffic. > (8k req/s in peak, even more when it work well). > It is compose of a two stage reverse proxy cache (3 x 2 varnish for stage > 1), 2 varnish for stage 2, (so 8 in total) and a lot of different backends > (php applications, nodejs apps, remote backends *sigh*, and even pipe one). > This a big historical spaghetti app. We plan to rebuild it from scratch in > 2018. > The first stage varnish are separate in two pool handling different > topology of clients. > > A lot of the logic is in varnish/vcl itself, lot of url rewrite, lot of > manipulation of headers, choice of a backend, and even ESI processing... > The VCL of the stage 1 varnish are almost 3000 lines long. > > But for now we have to leave/deal with it. > > History of the problem : > > At the beginning all varnish are in 2.x version. Things works almost well. > This summer we need to upgrade the varnish version to handle very long > header (a product requirement). > So after a short battle porting our vcl to vcl4.0 we start using varnish 4. > Shortly after thing begun to goes very bad. > > The first issue we hit, is a memory exhaustion on both stage, and > oom-killer... > We test a lot of things, and in the battle we upgrade to varnish5. > We fix it, resizing the pool, and using now file backend (from memory > before). > Memory is now stable (we have large pool, 32G, and strange thing, we never > have object being nuke, which it good or bad it depend). > We have also fix a lot of things in our vcl. > > The problem we fight against now is only on the stage1 varnish, and > specifically on one pool (the busiest one). > When everything goes well the average cpu usage is 30%, memory stabilize > around 12G, hit cache is around 0.85. > Problem happen randomly (not everyday) but during our peaks. The cpu > increase fasly to reach 350% (4 core) and load > 3/ > When the problem is here varnish still deliver requests (we didn't see > dropped or reject connections) but our application begin to lost user, > including a big lot of business. I suspect this is because timeout are very > aggressive on the client side and varnish should answer slowly > > -first question : how see response time of request of the varnish server > ?. (varnishnsca something ?) > > I also suspect some kind of request queuing, also stracing varnish when it > happen show a lot of futex wait ?!. > The frustrating part is restarting varnish fix the problem immediately, > and the cpu remains normal after, even if the trafic peak is not finish. > So there is clearly something stacked in varnish which cause our problem. > > -second question : how to see number of stacked connections, long > connections and so on ? > > At this stage we accept all kind of help / hints for debuging (and > regarding the business impact we can evaluate the help of a professional > support) > > PS : I always have the option to scale out, popping a lot of new varnish > instance, but this seems very frustrating... > > Best, > > -- > Raphael Mazelier > > > _______________________________________________ > varnish-misc mailing list > [email protected] > https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc >
_______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
