Hi all, Here are some comments on the topic:
Every 0MQ/TCP connection has 4 buffers: 0MQ buffer on sender limited by HWM on sender, TCP tx buffer on sender limited by sender's SNDBUF, TCP rx buffer on receiver limited by receiver's RCVBUF and finally 0MQ message buffer on receiver limited by receiver's HWM setting. You can think of the whole thing as a series of tubes, each having particular capacity. If particular buffer is full, either because it's filled in too quickly or because it's emptied too slowly, it applies backpressure, ie. it stops accepting new data from upstream. That causes upstream buffer to fill in and when it hits its limit to apply backpressure further on upstream etc. One of the consequences of the model is that if there's an unlimited buffer somewhere in the chain, the backpressure from other buffers will cause all the messages to accumulate there when congestion hits. That's why I proposed to set default HWM to a finite value recently. Even an arbitrary number like 1000 is better than infinite buffer. The case of PUB/SUB is special in that the buffers, when full start dropping messages instead of applying backpressure. The reason is not to block the whole distribution tree because of a single slow consumer. Anyway, pub/sub is not the problem we are solving here, so this comment is a bit off-topic. As for the monitoring stuff, I assume you have something like parallelised pipeline: the messages are passed through several processing steps, always being forwarded to the next step (worker app) by some central device (broker). What would really help in such case would be to have HWM set to reasonable values everywhere in the topology and let the excess messages queue in the devices. A smart device can than do monitoring, ie. periodically publish the number of messages it holds or whatever. I am not sure whether there are such smart devices with monitoring around. I dimly recall that pyzmq project may contain something like that, but I am not sure. In any case, I believe the smart devices are the area where most value-add can be brought and will ultimately become a significant part of 0mq ecosystem. If you have no devices in the topology, the monitoring becomes more complex. The easiest way is probably to monitor the applications' memory usage. If it grows, the messages are likely queueing there. The issue was discussed on the 0mq conference recently. The need for explicit monitoring of the library was expressed. It's not 100% clear how to do it though. The options mentioned were: 1. write the statistics to the syslog 2. publish them in-process using 0MQ sys://log transport 3. publish them to the outside world using 0MQ tcp transport 4. expose the statistics using socket options Martin _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
