We are maintaining state of objects in-memory as part of maximizing the bolt 
performance and reduce the number of data access calls.  To date, the “cache” 
of the in-memory state has been persisted to a Cassandra column-family using 
the taskId as the row key.  However, Nathan pointed out at one time that the 
taskId was not entirely reliable for re-assignment, though in our testing it 
does tend to be re-assigned as long as the deployment looks identical.

So the question is: what’s the best “key” to use for a state that a worker can 
rely on across deployments and/or rebalancing?  We “build up” actions by 
writing to Cassandra, and then execute the actions once we reach a determined 
threshold.  However, each task is responsible for a subset based on the 
grouping.  In the event those groupings change (rebalance?), is there a way to 
programmatically have the task know which “keys” he is responsible for 
recovering?

Thanks!
Bryan


==========================

This e-mail, including any attachments, is intended for the exclusive use of 
the person(s) to which it is addressed and may contain proprietary, 
confidential and/or privileged information. If the reader of this e-mail is not 
the intended recipient or his or her authorized agent, any review, use, 
printing, copying, disclosure, dissemination or distribution of this e-mail is 
strictly prohibited. If you think that you have received the e-mail in error, 
please notify the sender immediately by return e-mail, delete this 
communication and destroy all copies.

==========================

Reply via email to