Hi everyone,

I am trying to track down a bug which is troubling our production systems and am so far stumped.

This is on Debian Linux. Tried kernels 2.4.27, and 2.6.7, squid 2.5STABLE[157]. All have this problem.
Squid is configured as a reverse-accelerator, compiled with --enable-x-accelerator-vary and our webservers add X-Accelerator-Vary: Accept-Encoding to responses.


A small percentage of incoming requests (about 0.02%) to our reverse-accelerator farm take a very long time to complete. From the few clues I've been able to glean I suspect there is a problem with squid refreshing objects while another client is in the process of retrieving the same object.

The clues:
A wget in a loop retrieving the main page of our site will occasionally take just under 15 minutes to complete the retrieval. Normally it takes 0.02 seconds.


When I look at the access.log for that retrieval and work back to the time the request was placed I often find that some client out on the internet had issued a request with a no-cache header resulting in TCP_CLIENT_REFRESH_MISS for the main page.

With wget --server-response I see that the Age header of the slow to retrieve page always has a low number of seconds, so it was just refreshed prior to the request.

The Age + the time to retrieve the object = the read_timeout in squid.conf. I changed it to 9 minutes on one server and started seeing wget fail with 8+ instead of 14+ minutes.

The object is transferred quickly, but the connection stays open until some timer in squid elapses (read_timeout) and only then squid closes the connection.

This problem did not exist on the same hardware with Solaris x86 as the OS.

Any ideas as to where I should be looking? There are a few places in the code that are ifdef'd _SQUID_LINUX_, but nothing looks applicable to the problem.

I am having no luck reproducing this on a test system.

--
Robert Borkowski

Reply via email to