We are maintaining state of objects in-memory as part of maximizing the bolt performance and reduce the number of data access calls. To date, the “cache” of the in-memory state has been persisted to a Cassandra column-family using the taskId as the row key. However, Nathan pointed out at one time that the taskId was not entirely reliable for re-assignment, though in our testing it does tend to be re-assigned as long as the deployment looks identical.
So the question is: what’s the best “key” to use for a state that a worker can rely on across deployments and/or rebalancing? We “build up” actions by writing to Cassandra, and then execute the actions once we reach a determined threshold. However, each task is responsible for a subset based on the grouping. In the event those groupings change (rebalance?), is there a way to programmatically have the task know which “keys” he is responsible for recovering? Thanks! Bryan ========================== This e-mail, including any attachments, is intended for the exclusive use of the person(s) to which it is addressed and may contain proprietary, confidential and/or privileged information. If the reader of this e-mail is not the intended recipient or his or her authorized agent, any review, use, printing, copying, disclosure, dissemination or distribution of this e-mail is strictly prohibited. If you think that you have received the e-mail in error, please notify the sender immediately by return e-mail, delete this communication and destroy all copies. ==========================
