Hi David,

As general advice, you would want one thread per core. You have to then divide up the threads according to what components have more work to do. It seems mysterious, but just requires that you first understand what your components are doing and then do some testing with stats collection.

For example, I've found that a 1:2 ratio works best in some of my topologies, i.e. the parallelism of the spout is say 2, but that for the bolt that is doing the work is 4.

Regarding worker processes versus threads, to be honest I haven't yet seen enough data to say which is more important. At the end of the day, you just want as little CPU contention as possible for the guys doing more of the work.

I will post something on my site on this topic in the next day or so; I'll reply back on this thread when I do.

Cheers,

Lajos



theconsultantcto.com
Enterprise Lucene/Solr

On 16/03/2014 12:59, David Crossland wrote:
Hi, I have a 3 node cluster 2 medium, 1 small instances (I'm probably
going to up this to a medium).  10 Cores total.  My main bottleneck is a
service bus which has approx. 3.5mil json string messages published to
it in a day.  I don't seem to be consuming messages at a fast enough rate.

Ive tried modifying the parallelism hint to a number of values, I've
tried 8/20/64/128.. all pretty much stabs in the dark.

I'm looking for some advice as to how to configure this in my
environment.  I assume there would be some relationship between the
number of cores and the amount of parallelism I should specify that
would ensure best performance and throughput.

I wonder also how the number of worker roles fits into this.  Again, I'm
taking a bit of a stab in choosing 12, one for each slot associated with
the supervisors.

Any pointers you can give me would be appreciated.

Thanks
David

Reply via email to