Hi, I was trying to come up with an approach to evaluate the parallelism needed for a topology.
Assuming I have 5 machines with 8 cores and 32 gb. And my topology has one spout and 5 bolts. 1. Define one worker port per CPU to start off. (= 8 workers per machine ie 40 workers over all) 2. Each worker spawns one executor per component per worker, it translates to 6 executors per worker which is 40x6= 240 executors. 3. Of this, if the bolt logic is CPU intensive, then leave parallelism hint at 40 (total workers), else increase parallelism hint beyond 40 till you hit a number beyond which there is no more visible performance. Does this look right? Thanks Kashyap
