Hello,

Are you familiar with field grouping ? The idea is that the same bolt instance 
would always update the value of a specific key (similar to web load balancer 
cookie stickiness).

https://storm.apache.org/documentation/Concepts.html

"Fields grouping: The stream is partitioned by the fields specified in the 
grouping. For example, if the stream is grouped by the "user-id" field, tuples 
with the same "user-id" will always go to the same task, but tuples with 
different "user-id"'s may go to different tasks."


?Itai


________________________________

From: Kushan Maskey <[email protected]>
Sent: Tuesday, January 20, 2015 8:55 PM
To: [email protected]
Subject: URGENT!! Race condition

We are having a major issue trying to update Cassandra database where we see 
race condition in a bolt.

Here is an example,

I have a columnfamily, where i have 2 partitioning columns say X and Y. There 
is another columns Z which basically aggregated number. We are suppose to 
update Z based on X and Y. Storm is reading a huge volume of data from Kafka. 
When sport receives a message, first bolt reads the database for that 
combination of X and Y and get the value of Z. Then it updates the value Z and 
store it back into the database. Bolt parallelism is set to be 4 which mean 4 
instances of bolt are trying to update the database. So when first bolt (B1) 
read the value of Z to be say 100, same time the second bolt (B2) also read it 
to be 100, but once B1 completed execution and the value of Z is now 150, B2 
still has 100 so the value of Z is out of sync.

How can we prevent the race condition like this? This is causing a major 
nuisance to us.

Any help is highly appreciated. Thanks.

--
Kushan Maskey

Reply via email to