Francesco <francesco.monto...@gmail.com> writes: > Here's what I get varying the spin_loop duration:
Thanks for sharing it. It's cool to see the effect in action! > Do you think the cpu load of zmq background thread would be caused by > the much more frequent TCP ACKs coming from the SUB when the > "batching" suddenly stops happening ? Well, I don't know enough about all the mechanisms to personally say it is the TCP ACKs which are the driver of the effect. Though, that certainly sounds reasonable to me. > Your suggestion is that if the application thread is fast enough (spin loop > is "short enough") > then the while() loop body is actually executed 2-3-4 times and we send() a > large TCP packet, > thereby reducing both syscall overhead and number of TCP acks from the SUB > (and thus kernel > overhead). > If instead the application thread is not fast enough (spin loop is "too > long") then the while > () loop body executes only once and we send my 300B frames one by one to the > zmq::tcp_write() > and send() syscall. That would kill performances of zmq background thread. > Is that correct? Yep, that's the basic premise I had. Though, I don't know the exact mechanisms beyond "more stuff happens when many, tiny packets are sent". :) > Now the other 1M$ question: if that's the case, is there any tuning I > can do to force the zmq background thread to wait for some time before > invoking send() ? > I'm thinking that I could try to change the option TCP_NODELAY that is set on > the tcp socket > with the option TCP_CORK instead and see what happens. In this way I > basically go to the > opposite direction in the throughput-vs-latency tradeoff ... > Or maybe I could change libzmq source code to invoke tcp_write() only e.g. > every N times > out_event() is invoked? I think I risk getting some bytes stuck into the > stream engine if at > some point I stop sending out messages though.... > > Any other suggestion? Nothing specific. As you say, it's a throughput-vs-latency problem. And in this case it is a bit more complicated because the particular size/time parameters bring the problem to a place where the Nagle "step function" matters. Two approaches to try, with maybe not much hope of huge improvements, is to push Nagle's algorithm out of libzmq and either back down to the TCP stack or up into the application layer. I don't know how to tell libzmq to give this optimization back to the TCP stack. I recall reading (maybe on this list) about someone doing work in this direction. I also don't remember the outcome of that work but I'd guess there was not much benefit. The libzmq developers took the effort to bring Nagle up into libzmq (presumably) because libzmq has more knowledge than exists down in the TCP stack and so can perform the optimization more... er, optimally. Likewise, doing message batching in the application may or may not help. But, in this case it would be rather easy to try. And there are two approaches to try. Either send N 300B parts in an N-part multipart message or enact join/split operations in the application layer. In particular, if the application can directly deal with concatenated parts so no explicit join/split is required, then you may solve this problem. At least, reading N 300B blocks "in place" on the recv() side should be easy enough. As an example, zproto-generated code uses this trick to "unpack-in-place" highly structured data. My other general suggestion is to step back and see what the application actually requires w.r.t. throughput-vs-latency. Testing the limits is one (interesting) thing but in practical use, will the app actually come close to the limit? If it really must push 300B messages at low latency and at 1Gbps then using faster links may be appropriate. Eg, 10 GbE can give better than 2 Gbps throughput for 300B messages[1] while keeping latency low. -Brett. [1] http://wiki.zeromq.org/results:10gbe-tests-v432 http://wiki.zeromq.org/results:100gbe-tests-v432
signature.asc
Description: PGP signature
_______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org https://lists.zeromq.org/mailman/listinfo/zeromq-dev