All tasks on a worker run in the same JVM. This can be good for performance in some cases (like a localShuffle), but can cause issues. If you have 100 tasks, and one of them runs wild, and crashes, that will take down the entire JVM. Similarly for memory -- if you run out of memory, it can take out more.
Some applications also run into issues co-existing multiple instances in the same JVM (usually relying on static variables). Additionally, workers cannot be shared across different topologies. So if you only have one worker per machine, you can't run multiple topologies on that machine. There's no real good rule of thumb for number of workers vs size of workers. It's all application-dependent. On Sun, Mar 30, 2014 at 12:35 PM, Vincenzo Gulisano < [email protected]> wrote: > Hi, > If you have: (1) a topology composed by a certain number of spouts and > bolts, (2) each task assigned to a single executor and (3) a multi-core > machine that acts as supervisor, how many workers should you define for the > latter? > > If my understanding is correct, you will have distinct threads running > each spout and bolt independently of the number of workers you span. > Also, the inter-worker communication between tasks will be slower than the > intra-worker one, isn't it? > > Is there any reason related to the application throughput for having > multiple workers? > > Thank you very much! >
