I'm trying to debug an intermittent slowness problem with Squid 3.4.4. Unfortunately I haven't been able to figure out how to reproduce the problem, it just occurs every so often on a production server. I've not yet tried Squid 3.4.5, but there's nothing in the change log that would lead me to believe that this problem has been addressed.

I've got an example web fetch from this morning:

The network traffic shows:
09:01:54.489515 client -> server TCP SYN
09:01:54.489541 client <- server TCP SYN/ACK
09:01:54.489555 client -> server TCP ACK
09:01:54.490059 client -> server HTTP GET request
09:01:54.490074 client <- server TCP ACK
09:02:09.492576 client -> server TCP FIN (client times out, tears down)
09:02:09.532222 client <- server TCP ACK
09:02:35.371911 client <- server TCP FIN (server tears down connection)
The client is port 58469, the server is 3128.

As you can see, Squid never replies to the GET request (and actually, in this case the GET request didn't require Squid to contact another server - the authentication credentials were invalid, so it should have produced a 407).

Examining the Squid logs (http://persephone.nexusuk.org/~steve/cache.log.trimmed), I can see that Squid didn't accept the connection until 09:02:35.370

What seems to be happening is that helperStatefulHandleRead is being called, and taking several seconds to complete - if this happens frequently enough then the incoming connections get queued up and significantly delayed. See the log below:

2014/05/08 09:01:53.489 kid1| comm.cc(167) comm_read: comm_read, queueing read for local=[::] remote=[::] FD 66 flags=1; asynCall 0x7f9c174c4c00*1 2014/05/08 09:01:53.489 kid1| ModEpoll.cc(139) SetSelect: FD 66, type=1, handler=1, client_data=0x7f9bf4425328, timeout=0 2014/05/08 09:01:53.489 kid1| AsyncCallQueue.cc(53) fireNext: leaving helperHandleRead(local=[::] remote=[::] FD 66 flags=1, data=0x7f9c0a1dad78, size=120, buf=0x7f9c09f4d8a0) 2014/05/08 09:01:53.489 kid1| AsyncCallQueue.cc(51) fireNext: entering helperStatefulHandleRead(local=[::] remote=[::] FD 53 flags=1, data=0x7f9c08ae5928, size=281, buf=0x7f9bfa3e11d0) 2014/05/08 09:01:53.489 kid1| AsyncCall.cc(30) make: make call helperStatefulHandleRead [call44614375] 2014/05/08 09:01:58.329 kid1| AsyncCall.cc(18) AsyncCall: The AsyncCall helperDispatchWriteDone constructed, this=0x7f9c1a39bd40 [call44614625] 2014/05/08 09:01:58.329 kid1| Write.cc(29) Write: local=[::] remote=[::] FD 68 flags=1: sz 188: asynCall 0x7f9c1a39bd40*1

FDs 66 and 68 are connections to external ACL helpers, but the network traffic shows that the external ACLs are answering immediately, so as far as I can tell this delay isn't caused by the helpers themselves. (FD 66 received a query at 09:01:53.480992 and replied at 09:01:53.484808; FD 68 received a query at 09:01:58.334608 and replied at 09:01:58.334661).

I *think* FD 53 might be a connection to the ICAP service.

I have noticed that squid often seems to use a lot of CPU time when this problem is occurring.

Unfortunately I don't know where to go with the debugging now - The current amount of debug logging produces a lot of data, but isn't really detailed enough for me to work out what's going on. But as mentioned, since I can't reproduce this problem in a test environment, I have no choice but to just leave debug logging turned on on a production server.

Any suggestions / help from people more familiar with the Squid internals would certainly be helpful.

--
 - Steve Hill
   Technical Director
   Opendium Limited     http://www.opendium.com

Direct contacts:
   Instant messager: xmpp:st...@opendium.com
   Email:            st...@opendium.com
   Phone:            sip:st...@opendium.com

Sales / enquiries contacts:
   Email:            sa...@opendium.com
   Phone:            +44-844-9791439 / sip:sa...@opendium.com

Support contacts:
   Email:            supp...@opendium.com
   Phone:            +44-844-4844916 / sip:supp...@opendium.com

Reply via email to