On 19/01/2012, at 5:35 AM, Chuck Remes wrote: >> >> Yes, I did that because I got EAGAIN. If I take out the loop on EAGAIN, >> I get .. well I get EAGAIN (code 35 on OSX). > > It doesn't matter if you are using a REQ socket for blocking or non-blocking > writes. For that socket type you must adhere to a strict send/recv/send/recv > pattern. Don't do that.
Ok.. >> Note: I only get this problem when the client sends the message >> to the server, so the server IS reading the message .. well, >> its doing something in response to the message from the client. > > May I assume the server has connected via a REP socket? The code is written in Felix, and it is intended to be the Felix version of the Hello World example: hwclient/hwserver documented in the zguide. The Felix compiler generates C++, so I can inspect the generated C++ code (to check that my binding is doing the "right thing"). It looks good to me: i.e. the zmq binding is right, and so is the use of it. I'm hoping that this is not the case. The reason is that the alternative is a bug in the Felix compiler or Felix run time system causing a corruption and that will be extremely hard to track down! This happened once before integrating Google's RE2 regex library and the problem turned out to be leaving off a "hint" to the garbage collector on the library binding .. and this one took almost a year to find (because the problem only occurred when enough allocations had happened to trigger the GC, and none of my regression tests do that) AND use Re2. > This is a fairly common error. You might want to scan the guide again... > don't worry, we've all had to read it 3 or 4 times before it sank in. :) As above: the problem is that I'm actually *implementing* the guide examples :) The loop on EAGAIN was only added after I got the resource temporarily available message (EAGAIN) and the correct behaviour for that is to retry AFAIK... If I should not retry, ZMQ should not issue that error code. The C version of this code (from the zguide) works fine. So there is a problem in the Felix generated code somewhere. It is not impossible there is a memory corruption and the error code is a spurious and lucky side effect of it. -- john skaller [email protected] _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
