I cannot cut the LB out due to network design. There is no log on the LB also. Problematic requests are not large (js/css files), I suspect that this is a miscommunication issue between Varnish and Apache but I am not able to detect the reason yet.
On Mon, Apr 3, 2017 at 10:32 AM, Guillaume Quintard < [email protected]> wrote: > Large requests/responses are dropped by the LB, maybe? > > -- > Guillaume Quintard > > On Mon, Apr 3, 2017 at 9:12 AM, Andrei <[email protected]> wrote: > >> So the Varnish backend requests go through a load balancer before >> reaching Apache? What about those logs? What if you cut that LB out, and >> just use directors to LB in Varnish directly? >> >> On Sun, Apr 2, 2017 at 12:39 PM, Hazar Güney <[email protected]> >> wrote: >> >>> Btw, I need to also note that traffic is routed to Varnish from load >>> balancer: >>> >>> LB -> Varnish -> LB -> Backend pool >>> >>> Time does not matter.. It occurs during both peak and regular hours. Even >>> during peak hours we do not reach the "local ports" limit. Unfortunately >>> there is no any clue in the logs. There is no evidence that Apache >>> restarts on the backend pool during occurence of the issue. >>> >>> On Sat, Apr 1, 2017 at 9:44 PM, Andrei <[email protected]> wrote: >>> >>>> If it's during peak hours are you sure there aren't any rate limits >>>> being reached? Perhaps net.ipv4.ip_local_port_range might need a bump? >>>> Are Apache or syslog logging anything around those times? No silly periodic >>>> (Apache) graceful restarts? Just a few thoughts :) >>>> >>>> On Sat, Apr 1, 2017 at 2:14 PM, Hazar Güney <[email protected]> >>>> wrote: >>>> >>>>> We see this error a few times in a day on a highly busy production >>>>> environment. Unfortunately there is too much traffic on the server to keep >>>>> tcpdump/ngrep running and we cannot re-produce it on test environment :( >>>>> >>>>> I have started tcpdump on a test environment of another implementation >>>>> and will let you as soon as the issue gets triggerred again. >>>>> >>>>> On Fri, Mar 31, 2017 at 4:17 PM, Andrei <[email protected]> wrote: >>>>> >>>>>> Can you provide a tcpdump/ngrep of the requests between >>>>>> Client/Varnish/Apache along with the varnishlog entry to see if that >>>>>> uncovers anything? >>>>>> >>>>>> On Fri, Mar 31, 2017 at 7:25 AM, Hazar Güney <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Any idea? >>>>>>> >>>>>>> On Thu, Mar 30, 2017 at 3:41 PM, Hazar Güney <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> It did not work either: >>>>>>>> >>>>>>>> * << BeReq >> 127418176 >>>>>>>> - Begin bereq 127418175 fetch >>>>>>>> - Timestamp Start: 1490877149.450124 0.000000 0.000000 >>>>>>>> - BereqMethod GET >>>>>>>> - BereqURL XXXX >>>>>>>> - BereqProtocol HTTP/1.1 >>>>>>>> - BereqHeader Accept: text/css,*/*;q=0.1 >>>>>>>> - BereqHeader User-Agent: Mozilla/5.0 (iPhone; CPU iPhone >>>>>>>> OS 10_2 like Mac OS X) AppleWebKit/602.3.12 (KHTML, like Gecko) >>>>>>>> Version/10.0 Mobile/14C92 Safari/602.1 >>>>>>>> - BereqHeader Accept-Language: tr-tr >>>>>>>> - BereqHeader Referer: XXXX >>>>>>>> - BereqHeader Host: XXXX >>>>>>>> - BereqHeader RIP: XXXX >>>>>>>> - BereqHeader X-Forwarded-For: XXXX >>>>>>>> - BereqHeader Accept-Encoding: gzip >>>>>>>> - BereqHeader X-Varnish: 127418176 >>>>>>>> - VCL_call BACKEND_FETCH >>>>>>>> - BereqHeader connection: Close >>>>>>>> - VCL_return fetch >>>>>>>> - BackendOpen 25 reload_2017-03-30T14:53:46.st2 >>>>>>>> 10.35.78.11 80 172.17.0.2 59152 >>>>>>>> - BackendStart 10.35.78.11 80 >>>>>>>> - Timestamp Bereq: 1490877149.450594 0.000470 0.000470 >>>>>>>> - FetchError http first read error: EOF >>>>>>>> - BackendClose 25 reload_2017-03-30T14:53:46.st2 >>>>>>>> - Timestamp Beresp: 1490877149.451184 0.001060 0.000590 >>>>>>>> - Timestamp Error: 1490877149.451189 0.001065 0.000005 >>>>>>>> - BerespProtocol HTTP/1.1 >>>>>>>> - BerespStatus 503 >>>>>>>> - BerespReason Service Unavailable >>>>>>>> - BerespReason Backend fetch failed >>>>>>>> - BerespHeader Date: Thu, 30 Mar 2017 12:32:29 GMT >>>>>>>> - BerespHeader Server: Varnish >>>>>>>> - VCL_call BACKEND_ERROR >>>>>>>> - BereqHeader X-Varnish-Backend-5xx: 1 >>>>>>>> - VCL_return retry >>>>>>>> - Timestamp Retry: 1490877149.451205 0.001081 0.000016 >>>>>>>> - Link bereq 127298071 retry >>>>>>>> - End >>>>>>>> >>>>>>>> On Thu, Mar 30, 2017 at 2:34 PM, Guillaume Quintard < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> It does, I'm suspecting that the connection reuse is creating some >>>>>>>>> issues, probably because Apache is doing some non-standard stuff >>>>>>>>> (protip: >>>>>>>>> always blame Apache). >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Guillaume Quintard >>>>>>>>> >>>>>>>>> On Thu, Mar 30, 2017 at 1:17 PM, Hazar Güney <[email protected] >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> "Connection: close" supersedes keep-alive behavior, is that >>>>>>>>>> correct? >>>>>>>>>> >>>>>>>>>> On Thu, Mar 30, 2017 at 2:08 PM, Guillaume Quintard < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Can you try something: add 'set bereq.http.connection = "Close"; >>>>>>>>>>> ' at the beginning of vcl_backend_fetch and see if that helps? >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Guillaume Quintard >>>>>>>>>>> >>>>>>>>>>> On Thu, Mar 30, 2017 at 1:04 PM, Hazar Güney < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> MaxKeepAliveRequests 20 >>>>>>>>>>>> KeepAliveTimeout 2 >>>>>>>>>>>> >>>>>>>>>>>> Version is "4.1.3 revision 5e3b6d2". We have also seen "straight >>>>>>>>>>>> insufficient bytes" error with POST requests to a specific php >>>>>>>>>>>> script >>>>>>>>>>>> hosted by another backend and fixed it by using "pipe" instead of >>>>>>>>>>>> "pass" >>>>>>>>>>>> but this specific backend gives "http first read error: EOF" >>>>>>>>>>>> error. Another >>>>>>>>>>>> example from today: >>>>>>>>>>>> >>>>>>>>>>>> * << BeReq >> 126635444 >>>>>>>>>>>> - Begin bereq 126635443 fetch >>>>>>>>>>>> - Timestamp Start: 1490870598.921499 0.000000 0.000000 >>>>>>>>>>>> - BereqMethod GET >>>>>>>>>>>> - BereqURL XXXX >>>>>>>>>>>> - BereqProtocol HTTP/1.1 >>>>>>>>>>>> - BereqHeader Host: XXXX >>>>>>>>>>>> - BereqHeader User-Agent: Mozilla/5.0 (Windows NT 10.0; >>>>>>>>>>>> Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) >>>>>>>>>>>> Chrome/56.0.2924.87 >>>>>>>>>>>> Safari/537.36 >>>>>>>>>>>> - BereqHeader Accept: image/webp,image/*,*/*;q=0.8 >>>>>>>>>>>> - BereqHeader Referer: XXXX >>>>>>>>>>>> - BereqHeader Accept-Language: >>>>>>>>>>>> tr-TR,tr;q=0.8,en-US;q=0.6,en;q=0.4 >>>>>>>>>>>> - BereqHeader RIP: XXXX >>>>>>>>>>>> - BereqHeader X-Forwarded-For: XXXX >>>>>>>>>>>> - BereqHeader Accept-Encoding: gzip >>>>>>>>>>>> - BereqHeader X-Varnish: 126635444 >>>>>>>>>>>> - VCL_call BACKEND_FETCH >>>>>>>>>>>> - VCL_return fetch >>>>>>>>>>>> - BackendOpen 35 reload_2017-03-20T11:32:44.st2 >>>>>>>>>>>> 10.35.78.11 80 172.17.0.2 48896 >>>>>>>>>>>> - BackendStart 10.35.78.11 80 >>>>>>>>>>>> - Timestamp Bereq: 1490870598.922050 0.000552 0.000552 >>>>>>>>>>>> *- FetchError http first read error: EOF* >>>>>>>>>>>> - BackendClose 35 reload_2017-03-20T11:32:44.st2 >>>>>>>>>>>> - Timestamp Beresp: 1490870598.922622 0.001124 0.000572 >>>>>>>>>>>> - Timestamp Error: 1490870598.922627 0.001129 0.000005 >>>>>>>>>>>> - BerespProtocol HTTP/1.1 >>>>>>>>>>>> - BerespStatus 503 >>>>>>>>>>>> - BerespReason Service Unavailable >>>>>>>>>>>> - BerespReason Backend fetch failed >>>>>>>>>>>> - BerespHeader Date: Thu, 30 Mar 2017 10:43:18 GMT >>>>>>>>>>>> - BerespHeader Server: Varnish >>>>>>>>>>>> - VCL_call BACKEND_ERROR >>>>>>>>>>>> - BereqHeader X-Varnish-Backend-5xx: 1 >>>>>>>>>>>> - VCL_return retry >>>>>>>>>>>> - Timestamp Retry: 1490870598.922657 0.001159 0.000030 >>>>>>>>>>>> - Link bereq 126832283 retry >>>>>>>>>>>> - End >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 29, 2017 at 12:03 PM, Mattias Geniar < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> > Backend is Apache. >>>>>>>>>>>>> >>>>>>>>>>>>> In older Varnish versions, you could sometimes see a similar >>>>>>>>>>>>> error; >>>>>>>>>>>>> >>>>>>>>>>>>> > 11 FetchError c straight insufficient bytes >>>>>>>>>>>>> >>>>>>>>>>>>> The error message you’re seeing might be related, as it >>>>>>>>>>>>> mentions the EOF. >>>>>>>>>>>>> >>>>>>>>>>>>> This happens when the backend sends a Content-Length header >>>>>>>>>>>>> that doesn’t match the _actual_ content length it’s sending. In >>>>>>>>>>>>> Apache, >>>>>>>>>>>>> this was commonly caused by a mod_deflate misconfiguration. >>>>>>>>>>>>> >>>>>>>>>>>>> For testing, could you try disabling Gzip either in your >>>>>>>>>>>>> backend or strip the Accept-Encoding header in Varnish to force a >>>>>>>>>>>>> plain >>>>>>>>>>>>> text response? >>>>>>>>>>>>> >>>>>>>>>>>>> Mattias >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> varnish-misc mailing list >>>>>>> [email protected] >>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
