Bump. Any idea what might be happening here? Otherwise can someone suggest to me how I can enforce a *real* timeout on a per-request basis?
I'd like to keep the return(restart) part of my vcl, but not if it means that some request will take 2x longer than what is specific in .first_byte_timeout. On Fri, Jan 10, 2014 at 4:39 PM, Stephen Wood <[email protected]> wrote: > I am experiencing some strange behavior. Periodically requests will take > 2x the time specified in first_byte_timeout before they actually are killed > and the server returns a 503. This is quite annoying and it means that our > users receive a 504 gateway timeout instead of our error message. > > My gut tells me that it has something to do with the way return(restart) > works. The response code returned from the backend is a 503. > > Let me describe my vcl config briefly. The backend definitions look like > this: > > backend test1 { > .host = "10.0.0.1"; > .port = "80"; > .connect_timeout = 1s; > .first_byte_timeout = 65s; > .between_bytes_timeout = 1s; > .saintmode_threshold = 5; > .probe = { > .request = > "GET /status HTTP/1.1" > "Host: test.com" > "Connection: close"; > .interval = 60s; > .timeout = 10s; > .window = 5; > .threshold = 3; > } > } > > (similar for backend test2) > > I'm using the round-robin directory. Under vcl_fetch I have this: > > sub vcl_fetch { > if (beresp.status == 503) { > set beresp.saintmode = 60s; > if (req.request != "POST") { > return(restart); > } else { > error 500 "Failed"; > } > } > > set beresp.grace = 10s; > } > > Lastly, here's me catching the varnish server in the act of essentially > making two requests and not timing out. > > 169 BackendXID b 1547011740 > 169 TxRequest b GET > 169 TxURL b test? > 169 TxProtocol b HTTP/1.1 > 169 TxHeader b User-Agent: curl/7.30.0 > 169 TxHeader b Host: xxx.xxx.xxx > 169 TxHeader b Accept: */* > 169 TxHeader b X-Varnish: 1547011740 > 169 TxHeader b Accept-Encoding: gzip > 169 BackendClose b test1 > > 107 BackendXID b 1547011740 > 107 TxRequest b GET > 107 TxURL b /test? > 107 TxProtocol b HTTP/1.1 > 107 TxHeader b User-Agent: curl/7.30.0 > 107 TxHeader b Host: xxx.xxx.xxx > 107 TxHeader b Accept: */* > 107 TxHeader b X-Varnish: 1547011740 > 107 TxHeader b Accept-Encoding: gzip > 107 BackendClose b test2 > > > 47 SessionOpen c 162.208.42.130 59288 :80 > 47 ReqStart c 162.208.42.130 59288 1547011740 > 47 RxRequest c GET > 47 RxURL c /test? > 47 RxProtocol c HTTP/1.1 > 47 RxHeader c User-Agent: curl/7.30.0 > 47 RxHeader c Accept: */* > 47 RxHeader c Connection: keep-alive > 47 VCL_call c recv lookup > 47 VCL_call c hash > 47 Hash c > 47 Hash c /test? > 47 Hash c > 47 VCL_return c hash > 47 VCL_call c miss fetch > 47 Backend c 169 main_cluster test1 > 47 FetchError c http first read error: -1 11 (Resource temporarily > unavailable) > 47 Backend c 107 main_cluster test2 > 47 FetchError c http first read error: -1 11 (Resource temporarily > unavailable) > 47 VCL_call c error deliver > 47 VCL_call c deliver deliver > 47 TxProtocol c HTTP/1.1 > 47 TxStatus c 503 > 47 TxResponse c Service Unavailable > 47 TxHeader c Server: Varnish > 47 TxHeader c Content-Type: application/json; charset=utf-8 > 47 TxHeader c Content-Length: 65 > 47 TxHeader c Accept-Ranges: bytes > 47 TxHeader c Date: Fri, 10 Jan 2014 23:24:52 GMT > 47 TxHeader c X-Varnish: 1547011740 > 47 TxHeader c Age: 130 > 47 TxHeader c Via: 1.1 varnish > 47 TxHeader c Connection: close > 47 Length c 65 > 47 ReqEnd c 1547011740 1389396162.194909096 1389396292.196509361 > 0.000040293 130.000771999 0.000828266 > > What it looks to me like what's happening is the first server is returning > a bad status code after 60 seconds (the first_byte_timeout is 65), which > satisfies the requirement of first_byte_timeout. The request triggers a > return(restart) and it's tried again on another host, which takes another > 60 or so seconds. Finally the request just dies by only after it's been > hanging around for twice the time allotted in first_byte_timeout > > Does anyone know what I'm doing wrong? I do want to trigger a > return(restart) for spurious status codes, but I want first_byte_timeout to > serve as the time I'm willing to let a request go through. Is there a > better parameter I can use than this one? > > It looks very similar to this > bug<https://www.varnish-cache.org/trac/ticket/1156> but > it looks like that's been fixed. I'm using varnish 3.0.5. > > Any help is greatly appreciated. > -- > Stephen Wood > www.heystephenwood.com > -- Stephen Wood www.heystephenwood.com
_______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
