Hi Jon, thank you for your answer! I imagined fault tolerance and number of topologies would play a role wrt to the number of workers.
I hoped the number or workers could be used to limit the resource utilization of a topology for a given supervisor. That is, if a supervisor node has 4 workers and the topology runs on 1, then the topology will use 1/4 of the supervisor resources (e.g., 1 out of 4 cores). Since the number of executors/threads is independent of the number of workers, that will not be the case, isn't it? As an example: 1) 1 supervisor 2) 4 topologies T1 (10 tasks), T2 (1 task), T3 (1 task) and T4 (1 task), 3) 4 workers (each topology assigned to 1 distinct worker) and all tasks perform the same amount of work. At their maximum throughput, topology T1 will consume 10/13 of the supervisor resources. If this correct? Thanks again! On 30 March 2014 19:11, Jon Logan <[email protected]> wrote: > All tasks on a worker run in the same JVM. This can be good for > performance in some cases (like a localShuffle), but can cause issues. If > you have 100 tasks, and one of them runs wild, and crashes, that will take > down the entire JVM. Similarly for memory -- if you run out of memory, it > can take out more. > > Some applications also run into issues co-existing multiple instances in > the same JVM (usually relying on static variables). > > > Additionally, workers cannot be shared across different topologies. So if > you only have one worker per machine, you can't run multiple topologies on > that machine. There's no real good rule of thumb for number of workers vs > size of workers. It's all application-dependent. > > > On Sun, Mar 30, 2014 at 12:35 PM, Vincenzo Gulisano < > [email protected]> wrote: > >> Hi, >> If you have: (1) a topology composed by a certain number of spouts and >> bolts, (2) each task assigned to a single executor and (3) a multi-core >> machine that acts as supervisor, how many workers should you define for the >> latter? >> >> If my understanding is correct, you will have distinct threads running >> each spout and bolt independently of the number of workers you span. >> Also, the inter-worker communication between tasks will be slower than >> the intra-worker one, isn't it? >> >> Is there any reason related to the application throughput for having >> multiple workers? >> >> Thank you very much! >> > >
