Oliver,
>>> It combines concepts from OpenMP and Intel's TBB.
>>>
>> I am not the expert on either of the two, but the idea of making 0MQ
>> infrastructure look more friendly to OpenMP/TBB developers looks like an
>> interesting idea. If you are interested in discussing it, comparing the
>>
> I'm also far from an expert on either, but we have been researching ways
> to leverage multi-cores for our players out of a 10+ year old codebase
> so over the last year I've been trying to squeeze in time to find a way
> to migrate from monolithic single threading to parallelism and looked at
> several approaches. OpenMP is considered "lightweight" because you
> simply start by marking-up your code with #pragmas, making it a good
> stepping stone.
>
> Unfortunately, The parallelism is achieved by creating threads when you
> reach enter a parallel region of code, which then spinlock/futex. So
> ultimately, for a longer-term parallelization you're going to want to do
> more.
>
> Intel TBB has lots of pros and cons. Provides various useful templates,
> classes and algorithms. See
> http://www.threadingbuildingblocks.org/files/documentation/index.html.
> There are both Open Source and Commercial licenses for Intel TBB
> (http://www.threadingbuildingblocks.org/).
>
> Ultimately, you wind up dealing with the minutae of parallelism, which
> you can avoid by message passing. ZeroMQs advantage is going to be
> scalability and the comparative simplicity of encapsulated message
> passing, at a slight cost in performance: I would /strongly/ encourage
> you to even minimally flesh out the zmq_queue, zmq_forwarder and
> zmq_streamer documentation :)
>
>> OpenMP/TBB/0MQ approaches, benchmarks etc. you can possibly write a blog
>> about it to post on zeromq.org.
>>
> This reminded me of a discussion I started on TBBs forums, you might
> like my last two posts on the thread:
> http://software.intel.com/en-us/forums/showthread.php?t=73155&p=2&o=d&s=lr
> <http://software.intel.com/en-us/forums/showthread.php?t=73155&p=2&o=d&s=lr>
>
> Ha - reviewing my original post there, I can see the seeds of
> Async::Worker :)
The above comparison is interesting. It would be good to have it
accessible somewhere on the website. I'll give it a thought.
>>> This is a somewhat weak example because the work being done by the
>>> worker is so trivial, but even so on a virtual quad-core machine
>>> building with -O0 I see a 35-40% reduction in processing time.
>>>
>> Wrker being trivial, the large reduction in processing time is even more
>> impressive.
>>
> The great shame is that - by passing pointers - this first version would
> /seem/ to preclude scalability across machines, but the very first thing
> I wanted to pass was a { std::string ; std::vector ; }.
Actually, when using inproc:// transport 0MQ passes pointers between the
threads under the hood. Yet you can trivially change it to tcp:// when
scaling to multiple boxes. The only overhead is serialisation /
deserialisation of your structures into the binary BLOB.
> The workload I was going to perform on them wasn't very hefty, and I
> thought "by the time I'm done creating a Worker and a message and
> serializing the string and vector ... I've lost any gains". It's almost
> like I need to offload the work of serializing the data to another local
> thread...
>
> I suspect you see where I'm going with that :)
Yup.
> The most obvious weak point in my current implementation is that I
> failed to do zero-copy on the pointer itself! I need to figure out what
> stupid thing I did wrong there because eliminating that extra allocation
> would significantly improve throughput.
I'm a bit lost here, what extra allocation? If you are passing just the
pointer, it's 8 bytes (on 64-bit microarchs). Messages below 30 bytes of
length are called VSMs (very small messages) in 0MQ and are passed
*without* any extra memory allocations.
Martin
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev