Hi all (again ;) if I talk too much, tell me I will stop), I continue to investigate this problem. It seems that varnish is really keeping ESTABLISHED connexions to the backend for a verryve verry verrry long time :
cache1b# netstat -apnt |grep ESTABLISHED|awk '{print $5}' | cut -f 1 -d ':'| sort | uniq -c | sort -g 6 client1 8 client2 10 client3 > total 24 open connexions 43 backend1 50 backend2 74 backend3 > total 167 open connexions !!! The strange thing in that situation is that, on the BACKEND side, the number of ESTABLISHED connexions is quite low : for i in be1b be2b be3b ; do ssh $i netstat -apnt |grep :30000 |grep ESTABLISHED ; done | wc -l 20 maybe the problem is on the BACKEND REUSE code ? maybe it is on the PROBE code ? Maybe there is not really any problem on varnish side : I have another idea regarding this, that may come from the fact that the Backends are behind an ipvs load-balancer (yes, our config is quite complex...) this ipvs load-balancer is in NAT mode, so, there is a NAT (and therefore a connexion tracking list) somewhere between varnish and the backends. Maybe the connexion between varnish and its backend is using http keepalive, so the TCP channel is not closed at the end, and maybe it is closed some time AFTER the NAT connexion keeping timeout. In that case, varnish never receive the TCP connexion closing packet, and thus keeps the connexion open until ... it fills up its connexion stack. There is so many scenario that I don't think I will be able to test all of them before my client (the user of this big cluster) kicks varnish off ;) but I will try them in order to find a solution, to be continued... Regards, B. _______________________________________________ varnish-dev mailing list varnish-dev@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-dev