Hi,

I've found a culprit which caused the data loss. 

When ZMQ send a large message, the stream_engine sends data through multiple 
out_event calls.

The ZMQ linger option only guarantees messages are delivered to the peer pipe. 
By the speculative write out_event is called at least once but large message 
requires multiple hops.

Before finishing enough out_event calls, stream_engine can be terminated.


So a longer linger option will not resolve this issue. A workaround seems to be 
adding some sleeps before close.  


I'm going to submit a pull request to resolving the issue.

Thanks
Min

On Jan 27, 2013, at 12:42 AM, Yu Dongmin <[email protected]> wrote:

> Hi,
> 
> My guess was it might have an issue on libzmq (zeromq c library) when large 
> messages were heavily sent.
> 
> Thanks
> Min
> 
> On Jan 26, 2013, at 4:01 PM, Ritesh Adval <[email protected]> wrote:
> 
>> Hi Min,
>> 
>> Thanks for the update.Just to confirm, 
>> Are you saying that this issue is on zeromq c library or jzmq c wrapper?
>> 
>> Just an update that when I replaced 
>> DEALER socket which connects to ROUTER socket of broker with REQ socket and 
>> replaced DEALER socket which connects to DEALER socket of broker with REP 
>> socket, then I do not see message loss when doing the same test. (REQ socket 
>> does "send" and then "recv" and REP does opposite  "recv" and "send")
>> 
>> -Ritesh
>> Sent from my iPhone.
>> 
>> 
>> On Jan 25, 2013, at 8:42 PM, Min <[email protected]> wrote:
>> 
>>> I was able to reproduce the issue on jzmq even on zeromq 3.2.2.
>>> 
>>> What I discovered is about last 30K bytes of 45K message was not sometimes 
>>> delivered to in-router on raw close. 
>>> I didn't build equivalent C code, as jzmq is a thin wrapper of native C 
>>> library it could have the same problem.
>>> 
>>> But I didn't find a clear solution yet.
>>> 
>>> Thanks
>>> Min
>>> 
>>> 
>>> On Thu, Jan 24, 2013 at 6:39 AM, Ritesh Adval <[email protected]> 
>>> wrote:
>>> Hello,
>>> 
>>> I have created a bug for this issue with instructions and java test case. 
>>> Its at https://zeromq.jira.com/browse/LIBZMQ-497
>>> 
>>> Thanks
>>> Ritesh
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Jan 22, 2013 at 6:30 PM, Ritesh Adval <[email protected]> 
>>> wrote:
>>> Thanks Min,
>>> 
>>> I will create a bug with instruction and unit test. I was also 
>>> experimenting with Java only version of zeromq 
>>> (https://github.com/zeromq/jeromq). When running same test it does not drop 
>>> message but has some other issue.
>>> 
>>> -Ritesh
>>> 
>>> 
>>> 
>>> On Mon, Jan 21, 2013 at 11:53 PM, Min <[email protected]> wrote:
>>> Ritesh,
>>> 
>>> If you can reproduce the problem, Java code should be fine.
>>> 
>>> Community could look into it.
>>> 
>>> Thanks
>>> Min
>>> 
>>> 2013년 1월 17일 목요일에 Ritesh Adval님이 작성:
>>> 
>>> Hi Charles,
>>> 
>>> I have test program in JAVA, I am not a C programmer so i will probably 
>>> take me time to reproduce this in C. Can someone first take a look at my 
>>> JAVA program to see if I am not doing anything stupid.  Should I create bug 
>>> and attach Java maven project?
>>> Its very easy to run it, all you need is zeromq 2.2.0 installed and jzmq 
>>> built and installed by building jzmq (https://github.com/zeromq/jzmq).
>>> I can add instructions to the bug report. Once confirmed that program  
>>> looks right I can try to create a C version of the test but will take me 
>>> some time.
>>> 
>>> let me know.
>>> 
>>> Thanks
>>> Ritesh
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Jan 16, 2013 at 10:55 PM, Charles Remes <[email protected]> 
>>> wrote:
>>> On Jan 16, 2013, at 4:08 PM, Ritesh Adval <[email protected]> wrote:
>>> 
>>> > Hi Charles,
>>> >
>>> > Yes I close the socket in my thread after sending 100 messages, and I 
>>> > expect that LINGER will make sure messages are sent to the other end, I 
>>> > expected that context termination will block and make sure any pending 
>>> > messages are sent, but thats not happening. context termination returns 
>>> > quickly.
>>> >
>>> > Just now tried again in my unit test by setting LINGER to 
>>> > Integer.MAX_VALUE explicitly in all my sockets and ran the test again and 
>>> > it did fail with messages getting dropped.
>>> >
>>> > The interesting thing is only the 100th message  (The last one) from some 
>>> > of my concurrent threads are getting dropped.
>>> 
>>> Time to show someone the code. That's the easiest way to figure it out. If 
>>> you can reproduce this in C, that will get a lot more attention.
>>> 
>>> Here's how to open an issue:
>>> 
>>> http://www.zeromq.org/docs:issue-tracking
>>> 
>>> cr
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> [email protected]
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> 
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> [email protected]
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> [email protected]
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> 
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> [email protected]
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> [email protected]
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to