Sockets stuck in CLOSED state...

2008-06-18 Thread Ali Niknam

Dear All,

Recently i've been upgrading some of my machines from FreeBSD 6.x amd64 
to FreeBSD 7.0 amd64.


After upgrading I noticed a weird error/bug. It seems that after several 
thousand TCP connections some seem to hang in 'CLOSED' state.


netstat -n gives:
...
tcp4  0   0  1.2.3.4.*  4.5.6.7.42149   CLOSED
tcp4  39  0  1.2.3.4.*  4.5.6.7.54103   CLOSED
tcp4  35  0  1.2.3.4.*  4.5.6.7.41718   CLOSED
tcp4  38  0  1.2.3.4.*  4.5.6.7.55618   CLOSED
tcp4  41  0  1.2.3.4.*  4.5.6.7.44230   CLOSED
tcp4  39  0  1.2.3.4.*  4.5.6.7.49439   CLOSED
...

These never go away; they gradually increase and increase until the 
application starts giving errors (probably because some socket or 
filedescriptor limit is reached). When the application is killed these 
entries disappear.


The application in question is a self written DNS server, multithreaded, 
and running fine for years without any troubles on both BSD 5.x as well 
as 6.x. Also 32bits as well as 64bits on 6.x.


Ofcourse that doesn't mean that the application is error free, however, 
after doing extensive testing I really can not find anything wrong with 
the application itself, so I'm thinking maybe there's a change somewhere 
that causes this? I know that tcp/network has been completely redone...


What basically happens in the application is this:
 - one main tcp thread runs an infinite while loop waiting for new 
connections to arrive
 - as soon as one arrives a new thread is spawned that handles the 
newly created stream

 - it reads some bytes, writes some bytes, then closes it
 - thread exits

What appears to happen is this: after the new thread is spawned it tries 
to read 2 bytes (DNS tcp length information). It gets back 0 bytes (EOF) 
and therefore closes the sockets and calls pthread_exit. However in 
netstat that same stream oftenly appears to have bytes 'stuck' in the in 
queue...


I really can't see how this can cause hanging sockets in 'CLOSED' state. 
Even if the incoming queue isnt read entirely a call to close should 
close it. Also I really can't find any documentation in netstat, or 
elsewhere, about the 'CLOSED' state...



Any help would greatly be appreciated!


Kind Regards,


Ali Niknam
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Sockets stuck in CLOSED state...

2008-06-18 Thread Wojciech Puchar

...
tcp4  0   0  1.2.3.4.*  4.5.6.7.42149   CLOSED
tcp4  39  0  1.2.3.4.*  4.5.6.7.54103   CLOSED
tcp4  35  0  1.2.3.4.*  4.5.6.7.41718   CLOSED
tcp4  38  0  1.2.3.4.*  4.5.6.7.55618   CLOSED
tcp4  41  0  1.2.3.4.*  4.5.6.7.44230   CLOSED
tcp4  39  0  1.2.3.4.*  4.5.6.7.49439   CLOSED
...

These never go away; they gradually increase and increase until the 
application starts giving errors (probably because some socket or 
filedescriptor limit is reached). When the application is killed these 
entries disappear.


The application in question is a self written DNS server, multithreaded, and 
running fine for years without any troubles on both BSD 5.x as well as 6.x. 
Also 32bits as well as 64bits on 6.x.


do stupid thing - in your source add

#define socket TEST_SOCKET
#define connect TEST_CONNECT
#define bind TEST_BIND
#define listen TEST_LISTEN
all other network functions you use same way here!


and write one .c program where all these TEST_* functions are defined, 
doing the same as  original PLUS logging to file.


after a while (when you see this closed/unclosed connections) stop it and 
look at logs.



i'm almost sure you will notice where is a problem.

possibly threads implementation changed...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Sockets stuck in CLOSED state...

2008-06-18 Thread Ali Niknam

Wojciech Puchar wrote:

 #define socket TEST_SOCKET
...
 and write one .c program where all these TEST_* functions are defined,
 doing the same as  original PLUS logging to file.

 after a while (when you see this closed/unclosed connections) stop it
 and look at logs.


Thank you for the suggestions. I had considered that myself, however the 
server is doing about 300 DNS queries per second, so that's not easy to 
log. And even if it is logged you have sooo much information that it's 
nearly impossible to comprehend it.


The thing is that the problem does not occur always; the same ip can 
connect and do queries for thousands of times before 1 connection gets 
stuck.


To give you an idea: after about 24 hours (so that's about 26 million 
queries) I get about 10 stuck connections.


 i'm almost sure you will notice where is a problem.
 possibly threads implementation changed...


I can imagine; still, as far as I know, it should not be possible to be 
stuck in CLOSED...

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]