The actual CPU utilization is determined by contention for the processor time by the operating system. While it's likely that T1 would use more resources, it's not guaranteed. The workers could be running at different scheduling priority levels, or one worker may be blocking on IO or something.
Additionally, you can set the number of tasks independently from the number of threads. So one worker could have 100 tasks, but be running them on only 2 threads. On Sun, Mar 30, 2014 at 2:01 PM, Vincenzo Gulisano < [email protected]> wrote: > Hi Jon, thank you for your answer! > > I imagined fault tolerance and number of topologies would play a role wrt > to the number of workers. > > I hoped the number or workers could be used to limit the resource > utilization of a topology for a given supervisor. That is, if a supervisor > node has 4 workers and the topology runs on 1, then the topology will use > 1/4 of the supervisor resources (e.g., 1 out of 4 cores). Since the number > of executors/threads is independent of the number of workers, that will not > be the case, isn't it? > > As an example: > 1) 1 supervisor > 2) 4 topologies T1 (10 tasks), T2 (1 task), T3 (1 task) and T4 (1 task), > 3) 4 workers (each topology assigned to 1 distinct worker) > and all tasks perform the same amount of work. > At their maximum throughput, topology T1 will consume 10/13 of the > supervisor resources. > If this correct? > > Thanks again! > > > On 30 March 2014 19:11, Jon Logan <[email protected]> wrote: > >> All tasks on a worker run in the same JVM. This can be good for >> performance in some cases (like a localShuffle), but can cause issues. If >> you have 100 tasks, and one of them runs wild, and crashes, that will take >> down the entire JVM. Similarly for memory -- if you run out of memory, it >> can take out more. >> >> Some applications also run into issues co-existing multiple instances in >> the same JVM (usually relying on static variables). >> >> >> Additionally, workers cannot be shared across different topologies. So if >> you only have one worker per machine, you can't run multiple topologies on >> that machine. There's no real good rule of thumb for number of workers vs >> size of workers. It's all application-dependent. >> >> >> On Sun, Mar 30, 2014 at 12:35 PM, Vincenzo Gulisano < >> [email protected]> wrote: >> >>> Hi, >>> If you have: (1) a topology composed by a certain number of spouts and >>> bolts, (2) each task assigned to a single executor and (3) a multi-core >>> machine that acts as supervisor, how many workers should you define for the >>> latter? >>> >>> If my understanding is correct, you will have distinct threads running >>> each spout and bolt independently of the number of workers you span. >>> Also, the inter-worker communication between tasks will be slower than >>> the intra-worker one, isn't it? >>> >>> Is there any reason related to the application throughput for having >>> multiple workers? >>> >>> Thank you very much! >>> >> >> >
