If it's ESTABLISHED, can't you get lsof to tell you what other process has a port 46502 TCP socket open?
e.g. # lsof -i :80 # but use 46502 :) COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME curl 2619 robstar 3u IPv4 3188168535 0t0 TCP foo.bar:51400->baz.biff:http (ESTABLISHED) lighttpd 3217 www-data 4u IPv4 9500 0t0 TCP *:http (LISTEN) lighttpd 3217 www-data 5u IPv6 9501 0t0 TCP *:http (LISTEN) --Rob* On Thu, Nov 28, 2013 at 07:00:25PM +0100, Pieter Hintjens wrote: > Lacking more information, it looked like normal TCP behavior. Since > you're on Linux and the socket is ESTABLISHED, it's something else. > > Can you get a small test case that reproduces this reliably? > > On Thu, Nov 28, 2013 at 6:05 PM, Andrew Hume <[email protected]> wrote: > > fedora 18 > > but netstat didn’t say WAIT, it said ESTABLISHED > > > > tcp 0 0 135..249:46502 135..249:46502 ESTABLISHED > > > > (eliding identical interior octets for privacy). > > > > i’m trying to understand your argument; are you saying this is a TCP hiccup? > > it doesn’t smell like that; especially as it never used to happen and now > > it happens quite often. > > > > On Nov 28, 2013, at 7:11 AM, Pieter Hintjens <[email protected]> wrote: > > > > What OS are you using? > > > > I've seen this symptom before, where a server cannot re-bind to a TCP > > socket when there is an old client connection still connected to the > > defunct socket. If you run netstat -a you'll see the socket in a wait > > state, forever. When the client disconnects and restarts, it all works > > again. > > > > The problem is not solvable afaik at the lower levels. The new server > > cannot force the socket out of a wait state (SO_REUSADDR does > > nothing), and the client does not (afaik) get an error on the socket. > > > > One solution is to detect the error using heartbeats, and then > > explicitly close the socket at the client side, which frees the > > server-side port for new connections. > > > > I do not recall seeing the problem on Linux, only on AIX and Windows, > > which is why I wonder what OS you're using. > > > > It would be nice to add the heartbeating into ZMTP and libzmq if we > > had budget to do that (and if this is in fact the problem). > > > > -Pieter > > > > On Thu, Nov 28, 2013 at 3:42 PM, Andrew Hume <[email protected]> > > wrote: > > > > a few months ago, i moved to czmq 3.2.3 and i’ve been quite happy except for > > one issue. > > i notice this rarely, so i’ve let it sit but now its become a nuisance. > > > > i have a stats_server which binds a PULL on port 46502. > > i have a hist_server which connects a PUSH to port 46502. > > ordinarily, the stats_server stays up for ever, while every now and then, > > we restart the hist_server process. so far, so good. and like always, > > we can start these servers in either order and it all works. > > > > what happens when we forget to restart the stats_server? > > the hist_server runs happily, sending stats over the channel on 46502. > > after (hours, days), we finally observe the stats_server is not up and we > > start it. it now fails because port 46502 “is in use”. > > > > this seems to be a bug to me. _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
