Hi,
I was hoping that someone on here would be able to help me with a conceptual issue? I understand how Storm implements parallelism. I am researching how to model the performance of Storm topologies so I have dug around in the source code quite a bit. However, I still can't quite wrap my head around tasks. I know they are linked to Fields Groupings, so that a tuple with the same field value will always go to the same Executor. If task state was preserved through a re-balance then this would make sense as the state would follow the task and tuples would continue to be routed correctly. But, as I understand it, by default task state is not preserved through a re-balance. In this stateless case having tasks doesn't make sense, you could arbitrarily number the executors of each component and use those numbers for routing tuples? This would remove the upper scaling limit for each component of the topology? Of course, if you have a state saving system (statefulBolt etc) tasks make sense and having tasks also simplify the hash functions that do the routing. So is this the reason they exist and that in the stateless case they are not strictly required (other than to make routing simpler)? I am concerned that I am missing something fundamental? Thanks in advance, Thomas Cooper PhD Student Newcastle University, School of Computer Science Twitter: @tomncooper
