All tasks on a worker run in the same JVM. This can be good for performance
in some cases (like a localShuffle), but can cause issues. If you have 100
tasks, and one of them runs wild, and crashes, that will take down the
entire JVM. Similarly for memory -- if you run out of memory, it can take
out more.

Some applications also run into issues co-existing multiple instances in
the same JVM (usually relying on static variables).


Additionally, workers cannot be shared across different topologies. So if
you only have one worker per machine, you can't run multiple topologies on
that machine. There's no real good rule of thumb for number of workers vs
size of workers. It's all application-dependent.


On Sun, Mar 30, 2014 at 12:35 PM, Vincenzo Gulisano <
[email protected]> wrote:

> Hi,
> If you have: (1) a topology composed by a certain number of spouts and
> bolts, (2) each task assigned to a single executor and (3) a multi-core
> machine that acts as supervisor, how many workers should you define for the
> latter?
>
> If my understanding is correct, you will have distinct threads running
> each spout and bolt independently of the number of workers you span.
> Also, the inter-worker communication between tasks will be slower than the
> intra-worker one, isn't it?
>
> Is there any reason related to the application throughput for having
> multiple workers?
>
> Thank you very much!
>

Reply via email to