Hi all,

Here's more info about the problem. It's a multi-faceted issue, so bear 
with me, please:

1. Mailboxes are used to communicate between 0MQ's I/O threads and 
user's application threads.

2. The communication is done by sending commands. Each command is ~50B long.

3. There can be multiple threads sending commands to a single mailbox; 
each command has to be written to the mailbox (actually a socketpair) in 
atomic fashion. Thus, the command size has to be less than PIPE_BUF (see 
the POSIX atomicity guarantees).

4. Only a single thread (mailbox owner) is reading from the mailbox.

5. Most command interchange scenarios are per-connection. Given that 
multiple connections can live within a single I/O thread, a single 
mailbox can carry multiple conversations at the same time.

6. Each conversation is designed in such a way as to have at most N 
commands on the fly at any particular point in time (N being small 
integer constant, such as 1,2 or 3). Thus if you are seeing mailbox 
overflow when connected only to a single peer, there's a bug in one of 
the interaction patterns causing it to produce unlimited number of commands.

7. Unfortunately, if there are many connections, individual 
conversations can add up and blow the buffer. In theory, this can be 
prevented by limiting number of connections any 0MQ socket can handle in 
parallel.

8. Note that commands sent to the application (user's) thread can't be 
read unless user calls some 0MQ function and thus allows the library to 
process the commands. Thus, prolonged periods of not calling 0MQ can 
result in overflow (if there are many connections being established or 
going away etc. in the meantime).

9. The problem in pt.8 is somewhat alleviated by recent "reaper thread" 
patch that passes zmq_closed socket to a dedicated thread that handles 
subsequent incomming commands asynchronously.

10. If the buffer space is depleted, mailbox_t tries to expand the 
buffer. This approach has couple of problems: a. it may cause sedning to 
be non-atomic; b. there's a system limit on the size of the buffer; c. 
on OSX the whole buffer resizing seems to be broken.

Martin
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to