lighttpd failing to accept new connections ( connection reset )

2008-08-28 Thread Steven Hartland

We're using lighttpd here for a new project and we're having issues
where by it simply stops processing after a 1-2 days.

Having looked at it in some detail this morning it seems that
the kernel is resetting the connection without notifying the
lighttpd process there is a new connection attempt. I assume
that the listen queue is full but why kevent is not notifying
lighttpd that there are outstanding events is beyond me.


The following is a truss of the process which is currently in
this state:-
kevent(6,0x0,0,{},11096,{1.0})   = 0 (0x0)
gettimeofday({1219920575.149428},0x0)= 0 (0x0)
kevent(6,0x0,0,{},11096,{1.0})   = 0 (0x0)
gettimeofday({1219920576.150443},0x0)= 0 (0x0)

ktrace of the operation as well:-
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
28363 lighttpd GIO   fd 6 wrote 0 bytes
  
28363 lighttpd GIO   fd 6 read 0 bytes
  
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
28363 lighttpd GIO   fd 6 wrote 0 bytes
  
28363 lighttpd GIO   fd 6 read 0 bytes
  
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
28363 lighttpd GIO   fd 6 wrote 0 bytes
  
28363 lighttpd GIO   fd 6 read 0 bytes
  
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
28363 lighttpd GIO   fd 6 wrote 0 bytes
  
28363 lighttpd GIO   fd 6 read 0 bytes
  
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
28363 lighttpd GIO   fd 6 wrote 0 bytes
  
28363 lighttpd GIO   fd 6 read 0 bytes
  
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)


tcpdump shows:-
12:10:29.475255 IP (tos 0x10, ttl  64, id 9536, offset 0, flags [DF], proto: TCP (6), length: 64) client.61224  server.80: S, 
cksum 0x6d22 (incorrect (- 0xedfa), 291994449:291994449(0) win 65535 mss 1460,nop,wscale 1,nop,nop,timestamp 3661727139 
0,sackOK,eol
12:10:29.481396 IP (tos 0x0, ttl  61, id 25503, offset 0, flags [DF], proto: TCP (6), length: 60) server.80  client.61224: S, 
cksum 0xbf22 (correct), 3444532576:3444532576(0) ack 291994450 win 65535 mss 1460,nop,wscale 9,sackOK,timestamp 3136311843 
3661727139
12:10:29.481419 IP (tos 0x10, ttl  64, id 9538, offset 0, flags [DF], proto: TCP (6), length: 52) client.61224  server.80: ., 
cksum 0x6d16 (incorrect (- 0x6bd2), 1:1(0) ack 1 win 33304 nop,nop,timestamp 3661727145 3136311843
12:10:29.487519 IP (tos 0x10, ttl  61, id 25504, offset 0, flags [DF], proto: TCP (6), length: 40) server.80  client.61224: R, 
cksum 0x20c7 (correct), 3444532577:3444532577(0) win 0


This may have been raised before back 2003 as bug kern/57380
but it was closed after no response from the reporter.

Another possible issues related to this is:-
http://trac.lighttpd.net/trac/ticket/1734


I've currently got one of the production machines offline
with this error ( hence the important flag ) in the hope
that someone can suggest a test which will shed more light
on the issue before I restart it.

   Regards
   Steve 





This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to [EMAIL PROTECTED]

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: lighttpd failing to accept new connections ( connection reset )

2008-08-28 Thread Jeremy Chadwick
On Thu, Aug 28, 2008 at 01:13:57PM +0100, Steven Hartland wrote:
 We're using lighttpd here for a new project and we're having issues
 where by it simply stops processing after a 1-2 days.

 Having looked at it in some detail this morning it seems that
 the kernel is resetting the connection without notifying the
 lighttpd process there is a new connection attempt. I assume
 that the listen queue is full but why kevent is not notifying
 lighttpd that there are outstanding events is beyond me.


 The following is a truss of the process which is currently in
 this state:-
 kevent(6,0x0,0,{},11096,{1.0})   = 0 (0x0)
 gettimeofday({1219920575.149428},0x0)= 0 (0x0)
 kevent(6,0x0,0,{},11096,{1.0})   = 0 (0x0)
 gettimeofday({1219920576.150443},0x0)= 0 (0x0)

 ktrace of the operation as well:-
 28363 lighttpd RET   kevent 0
 28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
 28363 lighttpd RET   gettimeofday 0
 28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
 28363 lighttpd GIO   fd 6 wrote 0 bytes
   
 28363 lighttpd GIO   fd 6 read 0 bytes
   
 28363 lighttpd RET   kevent 0
 28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
 28363 lighttpd RET   gettimeofday 0
 28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
 28363 lighttpd GIO   fd 6 wrote 0 bytes
   
 28363 lighttpd GIO   fd 6 read 0 bytes
   
 28363 lighttpd RET   kevent 0
 28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
 28363 lighttpd RET   gettimeofday 0
 28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
 28363 lighttpd GIO   fd 6 wrote 0 bytes
   
 28363 lighttpd GIO   fd 6 read 0 bytes
   
 28363 lighttpd RET   kevent 0
 28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
 28363 lighttpd RET   gettimeofday 0
 28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
 28363 lighttpd GIO   fd 6 wrote 0 bytes
   
 28363 lighttpd GIO   fd 6 read 0 bytes
   
 28363 lighttpd RET   kevent 0
 28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
 28363 lighttpd RET   gettimeofday 0
 28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
 28363 lighttpd GIO   fd 6 wrote 0 bytes
   
 28363 lighttpd GIO   fd 6 read 0 bytes
   
 28363 lighttpd RET   kevent 0
 28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
 28363 lighttpd RET   gettimeofday 0
 28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)


 tcpdump shows:-
 12:10:29.475255 IP (tos 0x10, ttl  64, id 9536, offset 0, flags [DF], 
 proto: TCP (6), length: 64) client.61224  server.80: S, cksum 0x6d22 
 (incorrect (- 0xedfa), 291994449:291994449(0) win 65535 mss 
 1460,nop,wscale 1,nop,nop,timestamp 3661727139 0,sackOK,eol
 12:10:29.481396 IP (tos 0x0, ttl  61, id 25503, offset 0, flags [DF], 
 proto: TCP (6), length: 60) server.80  client.61224: S, cksum 0xbf22 
 (correct), 3444532576:3444532576(0) ack 291994450 win 65535 mss 
 1460,nop,wscale 9,sackOK,timestamp 3136311843 3661727139
 12:10:29.481419 IP (tos 0x10, ttl  64, id 9538, offset 0, flags [DF], 
 proto: TCP (6), length: 52) client.61224  server.80: ., cksum 0x6d16 
 (incorrect (- 0x6bd2), 1:1(0) ack 1 win 33304 nop,nop,timestamp 
 3661727145 3136311843
 12:10:29.487519 IP (tos 0x10, ttl  61, id 25504, offset 0, flags [DF], 
 proto: TCP (6), length: 40) server.80  client.61224: R, cksum 0x20c7 
 (correct), 3444532577:3444532577(0) win 0

 This may have been raised before back 2003 as bug kern/57380
 but it was closed after no response from the reporter.

 Another possible issues related to this is:-
 http://trac.lighttpd.net/trac/ticket/1734


 I've currently got one of the production machines offline
 with this error ( hence the important flag ) in the hope
 that someone can suggest a test which will shed more light
 on the issue before I restart it.

Can you change the polling method in lighttpd to use poll or select
instead of kqueue?  This would help in determining if the problem is
with the daemon itself or the kevent system.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: lighttpd failing to accept new connections ( connection reset ) / possible kqueue bug

2008-08-28 Thread Steven Hartland


- Original Message - 
From: Jeremy Chadwick [EMAIL PROTECTED]


Can you change the polling method in lighttpd to use poll or select
instead of kqueue?  This would help in determining if the problem is
with the daemon itself or the kevent system.


Yep already scheduled that change for our London node tomorrow
morning. ATM we are seeing this issue every 1 - 2 days so it may
take a little while before I can answer that question.

I've had a look through the source and I can't see any reason why
kevent would suddenly stop notifying the app that new connections
are present. Event registration appears to only be done once on
app startup and similarly unregisters are only done on shutdown,
so my current thinking is there may be an problem with kqueue
itself.

I don't suppose your aware of any way to query the status of this
in the kernel or app given I have a node in this hung state?

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to [EMAIL PROTECTED]

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: lighttpd failing to accept new connections ( connection reset )

2008-08-28 Thread Robert Watson


On Thu, 28 Aug 2008, Steven Hartland wrote:

We're using lighttpd here for a new project and we're having issues where by 
it simply stops processing after a 1-2 days.


Having looked at it in some detail this morning it seems that the kernel is 
resetting the connection without notifying the lighttpd process there is a 
new connection attempt. I assume that the listen queue is full but why 
kevent is not notifying lighttpd that there are outstanding events is beyond 
me.


The connections getting reset without application notification is a classic 
symptom of a full listen queue.  A couple of questions:


(1) What FreeBSD version?

(2) Are you using accept filters?

(3) If possibly, are you able to instrument lighthttpd so that you can trigger
it to query SO_LISTENQLIMIT, SO_LISTENQLEN, and SO_LISTENINCQLEN on the
listen socket once things have gone wrong?  The respectively (and perhaps
obviously) querye the current administrative limit on queue depth, the
number queue depth on completed connections, and the current queue depth
on incomplete connections.  The last of these will only be used with
accept filters on recent FreeBSD network stacks (since the syncache was
added).

Hopefully doing (3) will allow us to try to determine whether it's indeed the 
case that somehow the listen queue or event handling has gotten wedged in 
some way.


In terms of analyzing the state of the machine -- if you have a kernel.debug 
around and are willing to do a bit of digging, the best thing to do would be 
to track down the listen socket and directly inspect it using kgdb to dump its 
field contents.  This can be done on a live box by attaching kgdb to kernel 
memory using /dev/mem as the target device.  You can find the kernel memory 
address of the listen socket by tracking it down in fstat -- a typical entry 
might look like this:


  root inetd   11589* internet stream tcp c535

So you can do a print *(socket *)0xc535 to print out the socket 
structure once attached to /dev/mem.  If you need more pointers on how to do 
this, send me a private e-mail and I can walk you through it in detail.


Robert N M Watson
Computer Laboratory
University of Cambridge




The following is a truss of the process which is currently in
this state:-
kevent(6,0x0,0,{},11096,{1.0})   = 0 (0x0)
gettimeofday({1219920575.149428},0x0)= 0 (0x0)
kevent(6,0x0,0,{},11096,{1.0})   = 0 (0x0)
gettimeofday({1219920576.150443},0x0)= 0 (0x0)

ktrace of the operation as well:-
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
28363 lighttpd GIO   fd 6 wrote 0 bytes
 
28363 lighttpd GIO   fd 6 read 0 bytes
 
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
28363 lighttpd GIO   fd 6 wrote 0 bytes
 
28363 lighttpd GIO   fd 6 read 0 bytes
 
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
28363 lighttpd GIO   fd 6 wrote 0 bytes
 
28363 lighttpd GIO   fd 6 read 0 bytes
 
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
28363 lighttpd GIO   fd 6 wrote 0 bytes
 
28363 lighttpd GIO   fd 6 read 0 bytes
 
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)
28363 lighttpd GIO   fd 6 wrote 0 bytes
 
28363 lighttpd GIO   fd 6 read 0 bytes
 
28363 lighttpd RET   kevent 0
28363 lighttpd CALL  gettimeofday(0x7fffeb20,0)
28363 lighttpd RET   gettimeofday 0
28363 lighttpd CALL  kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffeb20)


tcpdump shows:-
12:10:29.475255 IP (tos 0x10, ttl  64, id 9536, offset 0, flags [DF], proto: 
TCP (6), length: 64) client.61224  server.80: S, cksum 0x6d22 (incorrect (- 
0xedfa), 291994449:291994449(0) win 65535 mss 1460,nop,wscale 
1,nop,nop,timestamp 3661727139 0,sackOK,eol
12:10:29.481396 IP (tos 0x0, ttl  61, id 25503, offset 0, flags [DF], proto: 
TCP (6), length: 60) server.80  client.61224: S, cksum 0xbf22 (correct), 
3444532576:3444532576(0) ack 291994450 win 65535 mss 1460,nop,wscale 
9,sackOK,timestamp 3136311843 3661727139
12:10:29.481419 IP (tos 0x10, ttl  64, id 9538, offset 0, flags [DF], proto: 
TCP (6), length: 52) client.61224  server.80: ., cksum 0x6d16 (incorrect (- 
0x6bd2), 1:1(0) ack 1 win 33304 nop,nop,timestamp 3661727145 3136311843
12:10:29.487519 IP (tos 0x10, ttl  61, id 25504, offset 0, 

Re: lighttpd failing to accept new connections ( connection reset )

2008-08-28 Thread Steven Hartland


- Original Message - 
From: Robert Watson [EMAIL PROTECTED]
The connections getting reset without application notification is a classic 
symptom of a full listen queue.  A couple of questions:


Yep thats what I thought.


(1) What FreeBSD version?


7.0-RELEASE-p2 (amd64)


(2) Are you using accept filters?


The modules there but not loaded, so no.


(3) If possibly, are you able to instrument lighthttpd so that you can trigger
it to query SO_LISTENQLIMIT, SO_LISTENQLEN, and SO_LISTENINCQLEN on the
listen socket once things have gone wrong?  The respectively (and perhaps
obviously) querye the current administrative limit on queue depth, the
number queue depth on completed connections, and the current queue depth
on incomplete connections.  The last of these will only be used with
accept filters on recent FreeBSD network stacks (since the syncache was
added).

Hopefully doing (3) will allow us to try to determine whether it's indeed the 
case that somehow the listen queue or event handling has gotten wedged in 
some way.


This should be possible, I'll have a look, assuming the kgdb stuff doesn't
turn up the required results.

   Regards
   Steve



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to [EMAIL PROTECTED]

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]