Hi Jon, thank you for your answer!

I imagined fault tolerance and number of topologies would play a role wrt
to the number of workers.

I hoped the number or workers could be used to limit the resource
utilization of a topology for a given supervisor. That is, if a supervisor
node has 4 workers and the topology runs on 1, then the topology will use
1/4 of the supervisor resources (e.g., 1 out of 4 cores). Since the number
of executors/threads is independent of the number of workers, that will not
be the case, isn't it?

As an example:
1) 1 supervisor
2) 4 topologies T1 (10 tasks), T2 (1 task), T3 (1 task) and T4 (1 task),
3) 4 workers (each topology assigned to 1 distinct worker)
and all tasks perform the same amount of work.
At their maximum throughput, topology T1 will consume 10/13 of the
supervisor resources.
If this correct?

Thanks again!


On 30 March 2014 19:11, Jon Logan <[email protected]> wrote:

> All tasks on a worker run in the same JVM. This can be good for
> performance in some cases (like a localShuffle), but can cause issues. If
> you have 100 tasks, and one of them runs wild, and crashes, that will take
> down the entire JVM. Similarly for memory -- if you run out of memory, it
> can take out more.
>
> Some applications also run into issues co-existing multiple instances in
> the same JVM (usually relying on static variables).
>
>
> Additionally, workers cannot be shared across different topologies. So if
> you only have one worker per machine, you can't run multiple topologies on
> that machine. There's no real good rule of thumb for number of workers vs
> size of workers. It's all application-dependent.
>
>
> On Sun, Mar 30, 2014 at 12:35 PM, Vincenzo Gulisano <
> [email protected]> wrote:
>
>> Hi,
>> If you have: (1) a topology composed by a certain number of spouts and
>> bolts, (2) each task assigned to a single executor and (3) a multi-core
>> machine that acts as supervisor, how many workers should you define for the
>> latter?
>>
>> If my understanding is correct, you will have distinct threads running
>> each spout and bolt independently of the number of workers you span.
>> Also, the inter-worker communication between tasks will be slower than
>> the intra-worker one, isn't it?
>>
>> Is there any reason related to the application throughput for having
>> multiple workers?
>>
>> Thank you very much!
>>
>
>

Reply via email to