This tcpdump output illustrates an issue we seem to have with default Linux tcp timeouts and the default timeout_req of 2 seconds:
16:47:44.542049 IP client.49550 > varnish.80: Flags [S], seq 29295818, win 4380, options [mss 1460,sackOK,eol], length 0 16:47:44.542080 IP varnish.80 > client.49550: Flags [S.], seq 3652568857, ack 29295819, win 29200, options [mss 1460,nop,nop,sackOK], length 0 16:47:44.542250 IP client.49550 > varnish.80: Flags [.], ack 1, win 4380, length 0 16:47:46.080501 IP client.49550 > varnish.80: Flags [P.], seq 1:1453, ack 1, win 4380, length 1452 16:47:46.080528 IP varnish.80 > client.49550: Flags [.], ack 1453, win 31944, length 0 16:47:48.082783 IP varnish.80 > client.49550: Flags [F.], seq 1, ack 1453, win 31944, length 0 16:47:48.083070 IP client.49550 > varnish.80: Flags [.], ack 2, win 4380, length 0 16:47:48.350763 IP client.49550 > varnish.80: Flags [P.], seq 1453:2905, ack 2, win 4380, length 1452 16:47:48.350792 IP varnish.80 > client.49550: Flags [R], seq 3652568859, win 0, length 0 The packet at 16:47:46.080501 contains the first part of a request up to the start of a very long cookie line. At 16:47:48 varnish closes after reaching timeout_req of 2s. Then, the client immediately acks. My understanding is that the varnish->client ack 1453 got lost and the client did not get around to retransmit seq 1:1453 before we timed out. The most helpful online reference regarding recommended initial tcp retransmittion timeouts I have found so far is http://tools.ietf.org/html/rfc6298#ref-PA00 In summary, an initial timeout (RTO) of 1s is now recommended, but the former 3s RTO remains valid. So, for any client following the former 3s recommendation, current we don't even tolerate a single packet retransmission after 3way is complete. For those clients following the new 1s recommended RTO, timing is also really tight it seems unlikely that we tolerate retransmission of two packets. Based on this, I'd suggest to raise the default timeout_req to 7 seconds to allow for two retransmissions at RTO=3. This seems to be particularly relevant with the growing popularity of mobile clients. The risk is increased resource usage for malicious requests. To address it, I'd suggest to document that lowering timeout_req can be an option to mitigate certain DoS (slowloris) attacks. Nils _______________________________________________ varnish-dev mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
