Re: [zeromq-dev] Interrupted System Call: advice to handle it

Raphael Bauduin Mon, 06 Aug 2012 04:58:11 -0700

On Mon, Jul 9, 2012 at 11:40 AM, Raphael Bauduin <[email protected]> wrote:
> On Wed, Jul 4, 2012 at 5:20 PM, Chuck Remes <[email protected]> wrote:
>>
>> On Jul 4, 2012, at 5:05 AM, Raphael Bauduin wrote:
>>
>>> Hi,
>>>
>>> I'm using the ruby zmq bindings in a web application. I regularly get
>>> error message "ZMQ::Error: Interrupted system call" related to a send.
>>> This is in a Ruby on Rails application served with passenger, which
>>> spawns worker processes. I think I have identified a process that
>>> generated this error, and an strace on it shows no activity at all.
>>> This process however keeps open a connection to the mysql server. An
>>> accumulation of such errors will eventually become problematic server
>>> side, in addition to clients getting an error page and messages being
>>> lost.
>>
>> I'm assuming this happens under MRI. Is it 1.8.x or 1.9.x?
>
> It is REE: ruby 1.8.7 (2012-02-08 MBARI 8/0x6770 on patchlevel 358)
> [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2012.02
>
>>
>> Do you see the same behavior when running your app with JRuby or Rubinius?
>>
>
> The problem is that I don't have the problem systematically. It
> happens once every x days in production where there are thousands of
> page views that run the code in question. So it's very hard to
> reproduce.
>
>>> I'm looking for advice in avoiding this error and possibly for further
>>> debugging hints. Related to that I have several questions:
>>> - Should I simply catch this exception, and retry the send if needed?
>>> As this is done in the process sending the page content back to the
>>> client, won't it possibly make some requests too slow? (This could
>>> still be better than an error as we have currently)
>>
>> Using exception handling for flow control in Ruby can be slow. But unless 
>> you are building the next amazon.com then it probably won't hurt you too 
>> much. You could give this a try though it's always better to figure out the 
>> actual underlying cause and fix it. Using exceptions here is just a band-aid.
>>
>>> - If my understanding is correct, the problem occurs with blocking
>>> syscalls, and requests having the error don't return any content to
>>> the client. But what happens if I make the send non blocking?
>>> (http://zeromq.github.com/rbzmq/classes/ZMQ/Socket.html#M000010)
>>
>> Try it and see.
>
> My question was more about knowing if the same problem could occur. As
> mentioned above, I can't reproduce the problem systematically.
>
>>
>>> - Finally, what might interrupt the syscall? Any interesting read about 
>>> this?
>>
>> Something in your app is generating a signal. The technique I use to figure 
>> out these kinds of errors is to run my app under other Ruby runtimes. Most 
>> of the time they will fail differently and/or give me an exact backtrace 
>> pointing to the source of the problem.
>
> Can it also be a signal coming from outside the app, eg passenger?
>
> Or can it be due to the fact that I set the LINGER option?
>   s.setsockopt(ZMQ::LINGER,100)
>   ..
>   s.send(m)
>   s.close
>
> Any suggestion on this would be really welcome!
>
>>
>> Lastly, you may want to look at the ffi-rzmq gem (disclaimer: I'm its 
>> maintainer). It has a different API from the zmq gem but it appears to enjoy 
>> wider usage by the community so it may be a bit more stable.
>
> Thanks for the tip, I add it as an option, but I'd like to understand
> what's going on too.



I think I have identified what is the cause of the problem: EINTR is
not handled in the code of rbzmq.

I thought to replace this call (see code at
https://github.com/zeromq/rbzmq/blob/master/rbzmq.c#L1573 )

        rc = zmq_send (s, &msg, flags);

by this:

    int do_loop=1;
    while ( do_loop>0) {
        rc = zmq_send (s, &msg, flags);
        if (rc==0 || zmq_errno () != EINTR)
            do_loop=0;
    }

I've run it successfully in my staging env. Any counter indications?

thx

Raph
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] Interrupted System Call: advice to handle it

Reply via email to