Hi,

I was hoping that someone on here would be able to help me with a conceptual 
issue?


I understand how Storm implements parallelism. I am researching how to model 
the performance of Storm topologies so I have dug around in the source code 
quite a bit. However, I still can't quite wrap my head around tasks.


I know they are linked to Fields Groupings, so that a tuple with the same field 
value will always go to the same Executor. If task state was preserved through 
a re-balance then this would make sense as the state would follow the task and 
tuples would continue to be routed correctly. But, as I understand it, by 
default task state is not preserved through a re-balance. In this stateless 
case having tasks doesn't make sense, you could arbitrarily number the 
executors of each component and use those numbers for routing tuples? This 
would remove the upper scaling limit for each component of the topology?

Of course, if you have a state saving system (statefulBolt etc) tasks make 
sense and having tasks also simplify the hash functions that do the routing. So 
is this the reason they exist and that in the stateless case they are not 
strictly required (other than to make routing simpler)?

I am concerned that I am missing something fundamental?


Thanks in advance,


Thomas Cooper
PhD Student
Newcastle University, School of Computer Science
Twitter: @tomncooper

Reply via email to