Not at the moment. We have been using KafkaSpout for all the other projects
but have not looked into using trident. How would it help resolve the issue
we are facing at the moment. We also need to keep in mind the development
time it would take to implement triedent. While KafkaSpout has been working
fine with all the other projects.

--
Kushan Maskey

On Tue, Jan 20, 2015 at 3:05 PM, Rajiv Onat <[email protected]> wrote:

> Seems like stateful processing, have you looked at using trident ?
>
> -Rajiv
>
> On Jan 20, 2015, at 12:26 PM, Kushan Maskey <
> [email protected]> wrote:
>
> Thanks Keith and Itai,
>
> We are using fieldGrouping. Initially we were using suffleGrouping, we saw
> this problem and then moved to fieldGrouping, with better result, until
> now. I am thinking due to bolts parallelism which we have set it to 4, is
> the culprit here. My understanding of parallelism is threading, correct me
> if I am not incorrect.
>
> --
> Kushan Maskey
>
> On Tue, Jan 20, 2015 at 1:03 PM, Itai Frenkel <[email protected]> wrote:
>
>>  Hello,
>>
>>
>>  Are you familiar with field grouping ? The idea is that the same bolt
>> instance would always update the value of a specific key (similar to web
>> load balancer cookie stickiness).
>>
>> https://storm.apache.org/documentation/Concepts.html
>>
>> *"Fields grouping**: The stream is partitioned by the fields specified
>> in the grouping. For example, if the stream is grouped by the "user-id"
>> field, tuples with the same "user-id" will always go to the same task, but
>> tuples with different "user-id"'s may go to different tasks."*
>>
>>
>>  ​Itai
>>
>>  ------------------------------
>>
>> *From:* Kushan Maskey <[email protected]>
>> *Sent:* Tuesday, January 20, 2015 8:55 PM
>> *To:* [email protected]
>> *Subject:* URGENT!! Race condition
>>
>>  We are having a major issue trying to update Cassandra database where
>> we see race condition in a bolt.
>>
>>  Here is an example,
>>
>>  I have a columnfamily, where i have 2 partitioning columns say X and Y.
>> There is another columns Z which basically aggregated number. We are
>> suppose to update Z based on X and Y. Storm is reading a huge volume of
>> data from Kafka. When sport receives a message, first bolt reads the
>> database for that combination of X and Y and get the value of Z. Then it
>> updates the value Z and store it back into the database. Bolt parallelism
>> is set to be 4 which mean 4 instances of bolt are trying to update the
>> database. So when first bolt (B1) read the value of Z to be say 100, same
>> time the second bolt (B2) also read it to be 100, but once B1 completed
>> execution and the value of Z is now 150, B2 still has 100 so the value of Z
>> is out of sync.
>>
>>  How can we prevent the race condition like this? This is causing a
>> major nuisance to us.
>>
>>  Any help is highly appreciated. Thanks.
>>
>>    --
>> Kushan Maskey
>>
>>
>

Reply via email to