Seems like stateful processing, have you looked at using trident ?

-Rajiv

> On Jan 20, 2015, at 12:26 PM, Kushan Maskey 
> <[email protected]> wrote:
> 
> Thanks Keith and Itai,
> 
> We are using fieldGrouping. Initially we were using suffleGrouping, we saw 
> this problem and then moved to fieldGrouping, with better result, until now. 
> I am thinking due to bolts parallelism which we have set it to 4, is the 
> culprit here. My understanding of parallelism is threading, correct me if I 
> am not incorrect.
> 
> --
> Kushan Maskey
> 
>> On Tue, Jan 20, 2015 at 1:03 PM, Itai Frenkel <[email protected]> wrote:
>> Hello,
>> 
>> Are you familiar with field grouping ? The idea is that the same bolt 
>> instance would always update the value of a specific key (similar to web 
>> load balancer cookie stickiness).
>> https://storm.apache.org/documentation/Concepts.html
>> "Fields grouping: The stream is partitioned by the fields specified in the 
>> grouping. For example, if the stream is grouped by the "user-id" field, 
>> tuples with the same "user-id" will always go to the same task, but tuples 
>> with different "user-id"'s may go to different tasks."
>> 
>> 
>> ​Itai
>> 
>>  
>> From: Kushan Maskey <[email protected]>
>> Sent: Tuesday, January 20, 2015 8:55 PM
>> To: [email protected]
>> Subject: URGENT!! Race condition
>>  
>> We are having a major issue trying to update Cassandra database where we see 
>> race condition in a bolt.
>> 
>> Here is an example,
>> 
>> I have a columnfamily, where i have 2 partitioning columns say X and Y. 
>> There is another columns Z which basically aggregated number. We are suppose 
>> to update Z based on X and Y. Storm is reading a huge volume of data from 
>> Kafka. When sport receives a message, first bolt reads the database for that 
>> combination of X and Y and get the value of Z. Then it updates the value Z 
>> and store it back into the database. Bolt parallelism is set to be 4 which 
>> mean 4 instances of bolt are trying to update the database. So when first 
>> bolt (B1) read the value of Z to be say 100, same time the second bolt (B2) 
>> also read it to be 100, but once B1 completed execution and the value of Z 
>> is now 150, B2 still has 100 so the value of Z is out of sync.
>> 
>> How can we prevent the race condition like this? This is causing a major 
>> nuisance to us. 
>> 
>> Any help is highly appreciated. Thanks.
>> 
>> --
>> Kushan Maskey
> 

Reply via email to