Hi - We are concerned with some loads accumulating on certain workers, and having that negatively impact the efficiency of the system.
Would a CustomStreamGrouping that is based on the broadcast of load stats be a way to deal with this? For example, in our system, the content stream and processing overhead can be adjusted by external systems, creating stream with processing costs that vary per tuple. We are currently using shuffleGrouping, which means that there is a chance that the most “expensive” tuples will land on the same task for processing, which would be bad (for those tuples) since other tasks might be underutilized by comparison. Fieldgrouping isn’t helpful since it may also route loads disproportionately across the cluster. So, I’m wondering if we can create a CustomStreamGrouping that will: - periodically post worker stats (heap/cpu usage) + worker task assignments to zookeeper - periodically read worker stats + task assignments of other workers from zookeeper - during chooseTasks() impl, refer to the worker stats to determine which worker is least-loaded, and use the taskIds for that worker to route the tuple. I haven’t tried this yet, but I know there is an issue (https://issues.apache.org/jira/browse/STORM-162) created a while back suggesting to add load balancing to shuffle grouping, and this seems like a simpler alternative. Thanks Tyson
