Hi,

we have a topology with 34 bolts and a single spout, running 101 tasks in 101 executors.

When running the topology in a single worker, we reach a certain throughput "x".

Since some of the bolts will be rather memory-intensive, we decided to split the topology across 4 workers, on four physically separated server machines.

When running that 4-worker setup, we see only a slight increase of throughput. None of the servers is under heavy CPU load and all bolts have a capacity below .5, according to the web ui.

The spout implementation is able to easily serve 100 times more tuples, so it's not a problem of "starvation" at that level.

Do I see it right that the bottleneck is not within the topology implementation, but in the inter-worker communications setup? Else I'd have expected to see some bolt to run at 1.0 or higher capacity.

(We had run the same test on older hardware, where the topology was able to consume all available CPU - there, spreading across more hardware lead to a significant throughput increase, and "capacity" was reported as >1 for some bolts.)

How would I proceed to identify the actual bottleneck(s)?

Regards,
Jens

Reply via email to