Hi,
I've run into a very difficult to reproduce bug using czmq 3.0.0 - It
doesn't look like its fixed on the master branch yet either.  I'm curious
if anyone else has run into it and if there are any thoughts on my
suggested fix.

I've got a timer and and poller registered on a zloop.

If the timer expires, it removes the poller (zloop_poller_end) and deletes
the socket.

There is an extremely rare condition I'm seeing when the zmq_poll in
zloop_start will return with both the timer AND the poller at the same
time.  The timer is handled first, which removes then deletes the poller,
setting the "need_rebuild" flag, however, the pollers are then serviced
before the pollset is rebuilt, thus causing a callback on a pollitem that
has been deleted (and thus the arg data I've got is a bad ptr)

My first intuition would be to check the "need_rebuild" flag and break
after handling a timer (which is already in place after handling the
pollitem):

In zloop.c zloop_start():

Handling of timer:
      rc = timer->handler (self, timer->timer_id, timer->arg);
      if (rc == -1)   // ?? I believe this line should be changed to if (rc
== -1 || self->need_rebuild) ???
          break;      //  Timer handler signaled break

Handling of poller (already checks the flag):
      rc = poller->handler (self, &self->pollset [item_nbr], poller->arg);
      if (rc == -1 || self->need_rebuild)
          break;

Any thoughts on this?

Thanks,
Matt Spencer
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to