Mike, I too am using 0MQ in an HPC context and am worrying about sending large objects....
> The code in question takes a few separate data structures (matrices and > associated metadata) and sends it to another process over a socket. The > most straightforward way for me to accomplish this with 0MQ is to create a > message (I know the data structure's size ahead of time), pack the > structures into contiguous memory managed by the message, and send with the > 0MQ API. The issue that I'm running into is that the matrices are rather > large, and I don't want to incur the penalty of copying them into contiguous > memory as required by the 0MQ API. In my experience so far, it is not clear that avoiding copying is something that you want to do, even for large messages. Here is my logic: * If you have large messages, there are two issues, the time cost of memcpy and the memory footprint. * The latency improvement that you get by not memcpying is less than you might imagine and it doesn't kick in until you get quite large messages. * The memory footprint can be *worse* if you don't copy. This may seem non-intuitive, but here is the argument. If you don't copy the original object, but simply point 0MQ to it, the original object *has* to remain alive until 0MQ sends the message. Options: 1) pass a deallocator function to 0MQ. This is i nice idea, but if you are using multiple threads in 0MQ (most people do), your deallocator will be called in the IO thread. Thus, you will have to introduce a lock or something to make the deallocator threadsafe. BUT, the time it takes to acquire that lock destroys any latency benefit you got by not memcopying. 2) You don't pass a deallocator, but instead just make sure to hold the original object until 0MQ sends the message. But, 0MQ doesn't have any way of telling you that the message has been sent. Thus, you end up holding onto the original message longer than you wanted, increasing the memory footprint. Summary: * You don't want to mess with lock that manage data across the IO/app thread boundary. * memcpy is not that slow, and by using it, you enable 0MQ to deallocate the msg ASAP. With all that said, I would *love* to figure out a way of doing fast non-copying sends, so hopefully, other can help figure this out. Cheers, Brian > 0MQ seems to have a mechanism to avoid excessive copying by providing the > memory for the underlying message. However, this functionality requires the > input data to be already stored in a single chunk of contiguous memory. > > Has anyone considered implementing a message in 0MQ that would allow a > developer to send several regions of memory as a message's content? One > idea to accomplish this would be to create a separate message constructor > that took a list of memory regions (base + length) representing the > message's content. Another idea to accomplish this is to provide a > streaming interface that would allow a developer to append to a message > without having to copy data. > > What are your thoughts on this idea? If this is something that would be > generally useful to others, I'd be happy to contribute to an investigation > or development effort. > > Thanks, > Mike > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo [email protected] [email protected] _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
