Hi Alexander, What I'd do is write a simple C listener so you can trace what's happening and learn whether the issue is in libzmq, clrzmq, or your app.
-Pieter On Thu, Jul 25, 2013 at 6:18 PM, Alexander Zhitlenok <[email protected]> wrote: > Hi Pieter, thank you for reply. > > 1. I don't think that we have problems with data corruption on sender side. > Even if for some reason a sender sends corrupted data, it should not stop > data receiving on listener side. However that's what we have! After receiving > 3-5 corrupted messages (coming at the same moment), ReceiveFrame() method > (clrzmq.dll) never returns back with non-empty message. So, it looks like > something is broken on listener side. > > 2. Since our app is C# app and we call libzmq through clrzmq, it's almost > impossible for us from our layer to destroy c++ data buffer. Our layer calls > clrzmq layer with sending managed data byte array. As I get from clrzmq code, > they copy our managed data into reusable unmanaged buffer and send the > unmanaged buffer to zmq_send() method in a synchronous manner. If zmq_send() > internally works asynchronously (as you explained to me), clrzmq c++ data > buffer could be corrupted. It would seem that if clrzmq had such an obvious > bug that a lot of users must have experienced it? And how can I fix it > without changing clrzmq code? > > Just to reiterate, it does not seem possible that sending corrupted data from > a single sender would cause the listener side to stop receiving messages from > all senders. > > Sincerely, > Alex > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Pieter Hintjens > Sent: Thursday, July 25, 2013 8:21 AM > To: ZeroMQ development list > Subject: Re: [zeromq-dev] Multicast messages corrupted > > Alex, check you're not reusing or freeing the message data buffer too soon. > zmq_send() is asynchronous and happens in the background. If you reuse the > message data buffer or free it, what will be sent (some short time after the > send call itself) will be garbage. > > On Tue, Jul 23, 2013 at 7:42 PM, Alexander Zhitlenok > <[email protected]> wrote: >> This is what we really have. In that part of our app, we have multiple >> clients send short messages to one subscriber. As soon as message >> receives we do nothing but print (in log) number of frames and the >> first integer from the frame, which is our message ID. (After that we >> put the message in a queue and process it in another thread). Since >> our messages are short, all the messages are single-frame ones. >> However at some moment subscriber receives 3-4-frames message with >> garbage ID (first int). We see, these unexpected messages come "in >> pack", 3-5 at the same time. After getting these 3-5 unexpected >> messages (actually, processing is just catching an exception and doing some >> logging) no more messages come. >> >> When I say "we receive message", I mean we do nothing but call >> ReceiveFrame for ZmqSocket object (clrzmq.dll) >> >> >> >> I can easily admit that we are doing something wrong, but we do not do >> anything at the stage of messages initial receiving. >> >> >> >> Thank you, >> >> Alex >> >> (we use win7\epgm\clrzmq) >> >> >> >> >> >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Steven McCoy >> Sent: Tuesday, July 23, 2013 8:19 AM >> To: ZeroMQ development list >> Subject: Re: [zeromq-dev] Multicast messages corrupted >> >> >> >> On 22 July 2013 15:06, Alexander Zhitlenok >> <[email protected]> >> wrote: >> >> All works fine, sometimes for hours, however at some unpredictable >> moment we start receiving corrupted messages. After 4-5 corrupted >> messages, our custom C# layer stops receiving messages. I'm not sure >> yet (still testing) does Zmq Cpp-layer still receive messages or not? >> >> >> >> >> >> Ideally it should not be corrupted messages from the wire as each >> packet is checksum verified. >> >> >> >> This leaves corruption in software and hardware. You really need to >> capture in parallel with other clients to narrow down the scope of >> corruption. The implication in your message is that multiple Windows >> machines are receiving and thus it is likely to be somewhere in the software >> stack. >> >> >> >> Preferably a capture of the wire traffic would be recommended to try >> replaying. >> >> >> >> -- >> >> Steve-o >> >> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
