There're other cases that in-memory state gets lost (like worker crash, restarting worker in UI or Nimbus) so only handling rebalance doesn't help much IMO.
We may need to add note about what Arun said in this doc. If I were an one of Storm users I might not be able to think about that. http://storm.apache.org/releases/1.0.3/State-checkpointing.html If we don't think in-memory state is for any real use cases, supporting only Redis state doesn't look enough: 1. Someone still think Redis is not suitable for persistent storage, and memory is not cheap enough. 2. Users should install and maintain Redis (or even Redis Cluster) even it's not their stack. I didn't use Stateful bolt heavily so not sure its overhead is, but does it make sense to store states to non-in-memory storages like HBase or even HDFS, or RDBs? Ideally if we can provide the solution without let users maintain no other thing that should be awesome. (Maybe Blobstore if it meets performance requirement, or implementing distributed storage with RocksDB, etc.) Maybe we have rooms to improve State storage support. Thanks, Jungtaek Lim (HeartSavioR) 2017년 2월 20일 (월) 오후 1:24, Arun Mahadevan <[email protected]>님이 작성: This is expected with in-memory state, which stores the state in a local hash map and is not intended for any real use cases. And I don’t think there is any value in serializing the in-memory state during rebalance. How would you resurrect the state if the task gets reassigned to a different host? Better use the redis state implementation or write a state implementation that uses distributed memory like on top of memcached. Arun *From: *anshu shukla <[email protected]> *Reply-To: *"[email protected]" <[email protected]> *Date: *Monday, February 20, 2017 at 8:50 AM *To: *"[email protected]" <[email protected]> *Subject: *Rebalancing Stateful bolts in storm 1.0.2 Hey, I was running in-memory stateful bolt and while doing *rebalance the state for the tasks get lost. * Can anyone suggest some work around to retain the state while rebalancing without using stores like Redis (like on rebalancing I should serialize the state along with the task while regrouping task to other threads)? please suggest some general idea about the possibilities. -- Thanks & Regards, Anshu Shukla
