Hi Alexander,

What I'd do is write a simple C listener so you can trace what's
happening and learn whether the issue is in libzmq, clrzmq, or your
app.

-Pieter

On Thu, Jul 25, 2013 at 6:18 PM, Alexander Zhitlenok
<[email protected]> wrote:
> Hi Pieter, thank you for reply.
>
> 1. I don't think that we have problems with data corruption on sender side. 
> Even if for some reason a sender sends corrupted data, it should not stop 
> data receiving on listener side. However that's what we have! After receiving 
> 3-5 corrupted messages (coming at the same moment), ReceiveFrame() method 
> (clrzmq.dll) never returns back with non-empty message. So, it looks like 
> something is broken on listener side.
>
> 2. Since our app is C# app and we call libzmq through clrzmq, it's almost 
> impossible for us from our layer to destroy c++ data buffer. Our layer calls 
> clrzmq layer with sending managed data byte array. As I get from clrzmq code, 
> they copy our managed data into reusable unmanaged buffer and send the 
> unmanaged buffer to zmq_send() method in a synchronous manner. If zmq_send() 
> internally works asynchronously (as you explained to me), clrzmq c++ data 
> buffer could be corrupted. It would seem that if clrzmq had such an obvious 
> bug that a lot of users must have experienced it? And how can I fix it 
> without changing clrzmq code?
>
> Just to reiterate, it does not seem possible that sending corrupted data from 
> a single sender would cause the listener side to stop receiving messages from 
> all senders.
>
> Sincerely,
> Alex
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Pieter Hintjens
> Sent: Thursday, July 25, 2013 8:21 AM
> To: ZeroMQ development list
> Subject: Re: [zeromq-dev] Multicast messages corrupted
>
> Alex, check you're not reusing or freeing the message data buffer too soon. 
> zmq_send() is asynchronous and happens in the background. If you reuse the 
> message data buffer or free it, what will be sent (some short time after the 
> send call itself) will be garbage.
>
> On Tue, Jul 23, 2013 at 7:42 PM, Alexander Zhitlenok 
> <[email protected]> wrote:
>> This is what we really have. In that part of our app, we have multiple
>> clients send short messages to one subscriber. As soon as message
>> receives we do nothing but print (in log) number of frames and the
>> first integer from the frame, which is our message ID. (After that we
>> put the message in a queue and process it in another thread). Since
>> our messages are short, all the messages are single-frame ones.
>> However at some moment subscriber receives 3-4-frames message with
>> garbage ID (first int). We see, these unexpected messages come "in
>> pack", 3-5 at the same time. After getting these 3-5 unexpected
>> messages (actually, processing is just catching an exception and doing some 
>> logging) no more messages come.
>>
>> When I say "we receive message", I mean we do nothing but call
>> ReceiveFrame for ZmqSocket object (clrzmq.dll)
>>
>>
>>
>> I can easily admit that we are doing something wrong, but we do not do
>> anything at the stage of messages initial receiving.
>>
>>
>>
>> Thank you,
>>
>> Alex
>>
>> (we use win7\epgm\clrzmq)
>>
>>
>>
>>
>>
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Steven McCoy
>> Sent: Tuesday, July 23, 2013 8:19 AM
>> To: ZeroMQ development list
>> Subject: Re: [zeromq-dev] Multicast messages corrupted
>>
>>
>>
>> On 22 July 2013 15:06, Alexander Zhitlenok
>> <[email protected]>
>> wrote:
>>
>> All works fine, sometimes for hours, however at some unpredictable
>> moment we start receiving corrupted messages. After 4-5 corrupted
>> messages, our custom C# layer stops receiving messages. I'm not sure
>> yet (still testing) does Zmq Cpp-layer still receive messages or not?
>>
>>
>>
>>
>>
>> Ideally it should not be corrupted messages from the wire as each
>> packet is checksum verified.
>>
>>
>>
>> This leaves corruption in software and hardware.  You really need to
>> capture in parallel with other clients to narrow down the scope of
>> corruption.  The implication in your message is that multiple Windows
>> machines are receiving and thus it is likely to be somewhere in the software 
>> stack.
>>
>>
>>
>> Preferably a capture of the wire traffic would be recommended to try
>> replaying.
>>
>>
>>
>> --
>>
>> Steve-o
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> [email protected]
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to