lsof is a list of open files. Having open files would certainly add to the utilization of processor and/or memory. I would look to see if the processor or memory is constrained. If 100% of the processor or memory are being utilized, then new network connections may not be able to be established. The result would be consistent with the "hanging" behavior you observe. The resources cannot establish additional network connections in this scenario.
In theory they could accumulate to a critical amount after 15-20 minutes. It seems like you have almost solved the problem. On 3/8/15, Jithendra Reddy <[email protected]> wrote: > Hi, > > We have implemented a REQ-REP socket communication. In brief the > application does the following: > 1. Client asks for a free tcp port through REQ socket > 2. Server listens at REP socket, looks for a free tcp port. Forks a child > process and runs ZeroMQ REP socket listening at the free port. Parent > process sends back this free port detail to client > 3. Client then starts communicating to child process at the recieved port > using REQ-REP ZeroMQ socket > > The above application has issues, if we do stress test. We are running > nearly 30 clients in one minute. Stress test works fine for a while (15-20 > minutes) and then hangs. > > We see that message is sent from REQ socket, but not recieved at REP > socket. > How to trouble shoot this issue? > > We see that lsof is increasing as the stress test progresses. We do close > sockets in the application and also set linger to 0. Could the increased > lsof cause hang? > > Your inputs to resolved the hang and to troubleshoot the issue will be > helpful. > > Regards > _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
