Hi All, I've got an issue with ZMQ_ROUTER sockets which I'm having a hard time working around, and I'd love some advice, but I suspect the answer is that what I want to do isn't possible.
Say I have a router socket listening on a port, and I have peers connecting and disconnecting randomly over TCP. These peers have random identities for all intents and purposes. Most of the time, a peer will disconnect "cleanly", meaning the TCP connection is terminated via FIN or RST packets, ZMQ cleans up the file descriptor. However, some of the time, my peer will die silently, effectively due to network outage or power outage or something. In these cases, the router socket keeps the file descriptor around forever. I know that the peer is dead because all my peers heartbeat to each other, and the heartbeats have gone away. I thought that trying to send some data to a dead peer would tear down that connection, since the underlying TCP socket would eventually start erroring, but it doesn't, zmq must be dropping my packet before sending it to the underlying socket. The socket monitor tells me that someone has connected to the router socket on on its bound port with a specific file descriptor, but I've got so many of these coming in that I can't associate a specific file descriptor with a specific peer. TCP keep-alives don't work all that well in raising errors in a dead connection. What I know on the app side due to my heartbeats is that peer XYZ is dead. I'd like to tell the router socket to close the underlying file descriptor. What I know via the monitor is that I have a bunch of file descriptors open, but I can't map them to peers. If I could, I'd just call os.close() on that file descriptor and hopefully ZMQ would handle this gracefully. Eventually, in a few hours of uptime, my process hits the os file descriptor limit, and stops receiving new connections on the zeromq level. I can have the process quit when it detects this, but that forces all the functioning peers to reconnect and re-do some work, so I'd like to avoid it. I scanned the previous discussions about it, and there has been mention of exposing this somehow, but I don't see anything along these lines in the latest API. (looking at 4.1.2 release). Any suggestions on how I could work around this? I'm thinking of extending the socket monitor to have a new event type, like ZMQ_PEER_CONNECT/DISCONNECT which passes back the peer ID and file descriptor, but I've not gone through the zmq code enough yet to know how much work this would be. Thanks in advance, -- Marcin
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
