Hi, I've found a culprit which caused the data loss.
When ZMQ send a large message, the stream_engine sends data through multiple out_event calls. The ZMQ linger option only guarantees messages are delivered to the peer pipe. By the speculative write out_event is called at least once but large message requires multiple hops. Before finishing enough out_event calls, stream_engine can be terminated. So a longer linger option will not resolve this issue. A workaround seems to be adding some sleeps before close. I'm going to submit a pull request to resolving the issue. Thanks Min On Jan 27, 2013, at 12:42 AM, Yu Dongmin <[email protected]> wrote: > Hi, > > My guess was it might have an issue on libzmq (zeromq c library) when large > messages were heavily sent. > > Thanks > Min > > On Jan 26, 2013, at 4:01 PM, Ritesh Adval <[email protected]> wrote: > >> Hi Min, >> >> Thanks for the update.Just to confirm, >> Are you saying that this issue is on zeromq c library or jzmq c wrapper? >> >> Just an update that when I replaced >> DEALER socket which connects to ROUTER socket of broker with REQ socket and >> replaced DEALER socket which connects to DEALER socket of broker with REP >> socket, then I do not see message loss when doing the same test. (REQ socket >> does "send" and then "recv" and REP does opposite "recv" and "send") >> >> -Ritesh >> Sent from my iPhone. >> >> >> On Jan 25, 2013, at 8:42 PM, Min <[email protected]> wrote: >> >>> I was able to reproduce the issue on jzmq even on zeromq 3.2.2. >>> >>> What I discovered is about last 30K bytes of 45K message was not sometimes >>> delivered to in-router on raw close. >>> I didn't build equivalent C code, as jzmq is a thin wrapper of native C >>> library it could have the same problem. >>> >>> But I didn't find a clear solution yet. >>> >>> Thanks >>> Min >>> >>> >>> On Thu, Jan 24, 2013 at 6:39 AM, Ritesh Adval <[email protected]> >>> wrote: >>> Hello, >>> >>> I have created a bug for this issue with instructions and java test case. >>> Its at https://zeromq.jira.com/browse/LIBZMQ-497 >>> >>> Thanks >>> Ritesh >>> >>> >>> >>> >>> On Tue, Jan 22, 2013 at 6:30 PM, Ritesh Adval <[email protected]> >>> wrote: >>> Thanks Min, >>> >>> I will create a bug with instruction and unit test. I was also >>> experimenting with Java only version of zeromq >>> (https://github.com/zeromq/jeromq). When running same test it does not drop >>> message but has some other issue. >>> >>> -Ritesh >>> >>> >>> >>> On Mon, Jan 21, 2013 at 11:53 PM, Min <[email protected]> wrote: >>> Ritesh, >>> >>> If you can reproduce the problem, Java code should be fine. >>> >>> Community could look into it. >>> >>> Thanks >>> Min >>> >>> 2013년 1월 17일 목요일에 Ritesh Adval님이 작성: >>> >>> Hi Charles, >>> >>> I have test program in JAVA, I am not a C programmer so i will probably >>> take me time to reproduce this in C. Can someone first take a look at my >>> JAVA program to see if I am not doing anything stupid. Should I create bug >>> and attach Java maven project? >>> Its very easy to run it, all you need is zeromq 2.2.0 installed and jzmq >>> built and installed by building jzmq (https://github.com/zeromq/jzmq). >>> I can add instructions to the bug report. Once confirmed that program >>> looks right I can try to create a C version of the test but will take me >>> some time. >>> >>> let me know. >>> >>> Thanks >>> Ritesh >>> >>> >>> >>> >>> On Wed, Jan 16, 2013 at 10:55 PM, Charles Remes <[email protected]> >>> wrote: >>> On Jan 16, 2013, at 4:08 PM, Ritesh Adval <[email protected]> wrote: >>> >>> > Hi Charles, >>> > >>> > Yes I close the socket in my thread after sending 100 messages, and I >>> > expect that LINGER will make sure messages are sent to the other end, I >>> > expected that context termination will block and make sure any pending >>> > messages are sent, but thats not happening. context termination returns >>> > quickly. >>> > >>> > Just now tried again in my unit test by setting LINGER to >>> > Integer.MAX_VALUE explicitly in all my sockets and ran the test again and >>> > it did fail with messages getting dropped. >>> > >>> > The interesting thing is only the 100th message (The last one) from some >>> > of my concurrent threads are getting dropped. >>> >>> Time to show someone the code. That's the easiest way to figure it out. If >>> you can reproduce this in C, that will get a lot more attention. >>> >>> Here's how to open an issue: >>> >>> http://www.zeromq.org/docs:issue-tracking >>> >>> cr >>> >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >>> >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >>> >>> >>> >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >>> >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
