On Mon, Jul 9, 2012 at 11:40 AM, Raphael Bauduin <[email protected]> wrote: > On Wed, Jul 4, 2012 at 5:20 PM, Chuck Remes <[email protected]> wrote: >> >> On Jul 4, 2012, at 5:05 AM, Raphael Bauduin wrote: >> >>> Hi, >>> >>> I'm using the ruby zmq bindings in a web application. I regularly get >>> error message "ZMQ::Error: Interrupted system call" related to a send. >>> This is in a Ruby on Rails application served with passenger, which >>> spawns worker processes. I think I have identified a process that >>> generated this error, and an strace on it shows no activity at all. >>> This process however keeps open a connection to the mysql server. An >>> accumulation of such errors will eventually become problematic server >>> side, in addition to clients getting an error page and messages being >>> lost. >> >> I'm assuming this happens under MRI. Is it 1.8.x or 1.9.x? > > It is REE: ruby 1.8.7 (2012-02-08 MBARI 8/0x6770 on patchlevel 358) > [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2012.02 > >> >> Do you see the same behavior when running your app with JRuby or Rubinius? >> > > The problem is that I don't have the problem systematically. It > happens once every x days in production where there are thousands of > page views that run the code in question. So it's very hard to > reproduce. > >>> I'm looking for advice in avoiding this error and possibly for further >>> debugging hints. Related to that I have several questions: >>> - Should I simply catch this exception, and retry the send if needed? >>> As this is done in the process sending the page content back to the >>> client, won't it possibly make some requests too slow? (This could >>> still be better than an error as we have currently) >> >> Using exception handling for flow control in Ruby can be slow. But unless >> you are building the next amazon.com then it probably won't hurt you too >> much. You could give this a try though it's always better to figure out the >> actual underlying cause and fix it. Using exceptions here is just a band-aid. >> >>> - If my understanding is correct, the problem occurs with blocking >>> syscalls, and requests having the error don't return any content to >>> the client. But what happens if I make the send non blocking? >>> (http://zeromq.github.com/rbzmq/classes/ZMQ/Socket.html#M000010) >> >> Try it and see. > > My question was more about knowing if the same problem could occur. As > mentioned above, I can't reproduce the problem systematically. > >> >>> - Finally, what might interrupt the syscall? Any interesting read about >>> this? >> >> Something in your app is generating a signal. The technique I use to figure >> out these kinds of errors is to run my app under other Ruby runtimes. Most >> of the time they will fail differently and/or give me an exact backtrace >> pointing to the source of the problem. > > Can it also be a signal coming from outside the app, eg passenger? > > Or can it be due to the fact that I set the LINGER option? > s.setsockopt(ZMQ::LINGER,100) > .. > s.send(m) > s.close > > Any suggestion on this would be really welcome! > >> >> Lastly, you may want to look at the ffi-rzmq gem (disclaimer: I'm its >> maintainer). It has a different API from the zmq gem but it appears to enjoy >> wider usage by the community so it may be a bit more stable. > > Thanks for the tip, I add it as an option, but I'd like to understand > what's going on too.
I think I have identified what is the cause of the problem: EINTR is not handled in the code of rbzmq. I thought to replace this call (see code at https://github.com/zeromq/rbzmq/blob/master/rbzmq.c#L1573 ) rc = zmq_send (s, &msg, flags); by this: int do_loop=1; while ( do_loop>0) { rc = zmq_send (s, &msg, flags); if (rc==0 || zmq_errno () != EINTR) do_loop=0; } I've run it successfully in my staging env. Any counter indications? thx Raph _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
