"B1 and B2 are the same bolt but running on 2 separate tasks." This
confusing me a bit. So both B1 and B2 are same bolt code ?and they both
doing cassandra lookups and updates? In this case I would be using
fieldsGrouping here as well builder.setBolt("B1", new Bolt1(),
bolt_parallelism_hint).fieldsGrouping(MainBolt);



On Tue, Jan 20, 2015, at 02:33 PM, Nathan Leung wrote:
> I assume Bolt2 in the snippet is the bolt in question? What do
> declareOutputFields and emit in Bolt1 look like? Are you able to show
> the logic of Bolt2?


> On Jan 20, 2015 5:08 PM, "Kushan Maskey"
> <[email protected]> wrote:
>> B1 and B2 are the same bolt but running on 2 separate tasks.
>>
>>
>> Here is the snippet of the topologyBuilder function I have.
>>
>> spout_parallelism_hint = 4; bolt_parallelism_hint = 4;
>>
>> private static void buildTopology(TopologyBuilder builder) {
>> KafkaSpout spout = new
>> KafkaSpout(getSpoutConfig(propMap.get(KAFKA_TOPIC), "ID1"));
>>
>> builder.setSpout(SPOUT_NAME, spout, spout_parallelism_hint);
>>
>> builder.setBolt("MainBolt", new MainBolt(),
>> bolt_parallelism_hint).shuffleGrouping(SPOUT_NAME);
>> builder.setBolt("B1", new Bolt1(),
>> bolt_parallelism_hint).shuffleGrouping(MainBolt);
>>
>> // go to store sales bolts first builder.setBolt("B2", new Bolt2(),
>> bolt_parallelism_hint).fieldsGrouping(B1, new Fields("X"));
>>
>> // split on assoc, dept and vendor builder.setBolt("B3", new Bolt3(),
>> bolt_parallelism_hint).shuffleGrouping(B2); } I got bunch of other
>> bolts pretty much doing the same thing as above.
>>
>> LMK if that is sufficient. Thanks.
>>
>>
>> --
>> Kushan Maskey
>>
>> On Tue, Jan 20, 2015 at 3:45 PM, Nathan Leung
>> <[email protected]> wrote:
>>> Actually I thought about it and you should not have to do
>>> fieldsGrouping on both X and Y; one should be sufficient. In your
>>> original email, are B1 and B2 the same bolt, but different tasks, or
>>> are they different bolts entirely? As Harsha pointed out, it may
>>> help if you give more details of how your topology is constructed.
>>>
>>> On Tue, Jan 20, 2015 at 4:42 PM, Kushan Maskey
>>> <[email protected]> wrote:
>>>> I am only fieldGrouping on X and not Y. Is it necessary to
>>>> fieldGroup by both the fields? Is there any sample document I can
>>>> look at? Thanks.
>>>>
>>>> --
>>>> Kushan Maskey
>>>> 817.403.7500
>>>> M. Miller & Associates[1] [email protected]
>>>>
>>>> On Tue, Jan 20, 2015 at 3:14 PM, Nathan Leung <[email protected]>
>>>> wrote:
>>>>> which fields are you doing fieldsGrouping on? If you do fields
>>>>> grouping on X and Y, why are you having a race condition in a
>>>>> separate bolt task? Each X and Y combo should always go to the
>>>>> same bolt task with fieldsGrouping, and the scenario you describe
>>>>> should work properly whether you have 1 task, 4 tasks, or 100
>>>>> tasks.
>>>>>
>>>>> On Tue, Jan 20, 2015 at 4:11 PM, Kushan Maskey
>>>>> <[email protected]> wrote:
>>>>>> Not at the moment. We have been using KafkaSpout for all the
>>>>>> other projects but have not looked into using trident. How would
>>>>>> it help resolve the issue we are facing at the moment. We also
>>>>>> need to keep in mind the development time it would take to
>>>>>> implement triedent. While KafkaSpout has been working fine with
>>>>>> all the other projects.
>>>>>>
>>>>>> --
>>>>>> Kushan Maskey
>>>>>>
>>>>>> On Tue, Jan 20, 2015 at 3:05 PM, Rajiv Onat <[email protected]>
>>>>>> wrote:
>>>>>>> Seems like stateful processing, have you looked at using
>>>>>>> trident ?
>>>>>>>
>>>>>>> -Rajiv
>>>>>>>
>>>>>>> On Jan 20, 2015, at 12:26 PM, Kushan Maskey
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>> Thanks Keith and Itai,
>>>>>>>>
>>>>>>>> We are using fieldGrouping. Initially we were using
>>>>>>>> suffleGrouping, we saw this problem and then moved to
>>>>>>>> fieldGrouping, with better result, until now. I am thinking due
>>>>>>>> to bolts parallelism which we have set it to 4, is the culprit
>>>>>>>> here. My understanding of parallelism is threading, correct me
>>>>>>>> if I am not incorrect.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Kushan Maskey
>>>>>>>>
>>>>>>>> On Tue, Jan 20, 2015 at 1:03 PM, Itai Frenkel <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> Hello,


>>>>>>>>>


>>>>>>>>> Are you familiar with field grouping ? The idea is that the
>>>>>>>>> same bolt instance would always update the value of a specific
>>>>>>>>> key (similar to web load balancer cookie stickiness).


>>>>>>>>> https://storm.apache.org/documentation/Concepts.html


>>>>>>>>> **"Fields grouping***: The stream is partitioned by the fields
>>>>>>>>> specified in the grouping. For example, if the stream is
>>>>>>>>> grouped by the "user-id" field, tuples with the same "user-id"
>>>>>>>>> will always go to the same task, but tuples with
 different "user-id"'s may go to different tasks."*


>>>>>>>>> **


>>>>>>>>> Itai


>>>>>>>>>
>>>>>>>>> *From:* Kushan Maskey <[email protected]>
>>>>>>>>> *Sent:* Tuesday, January 20, 2015 8:55 PM *To:*
>>>>>>>>> [email protected] *Subject:* URGENT!! Race condition
>>>>>>>>>
>>>>>>>>> We are having a major issue trying to update Cassandra
>>>>>>>>> database where we see race condition in a bolt.
>>>>>>>>>
>>>>>>>>> Here is an example,
>>>>>>>>>
>>>>>>>>> I have a columnfamily, where i have 2 partitioning columns say
>>>>>>>>> X and Y. There is another columns Z which basically aggregated
>>>>>>>>> number. We are suppose to update Z based on X and Y. Storm is
>>>>>>>>> reading a huge volume of data from Kafka. When sport receives
>>>>>>>>> a
 message, first bolt reads the database for that combination of X and Y
 and get the value of Z. Then it updates the value Z and store it back
 into the database. Bolt parallelism is set to be 4 which mean 4
 instances of bolt are trying to update the database. So when first bolt
 (B1) read the value of Z to be say 100, same time the second bolt (B2)
 also read it to be 100, but once B1 completed execution and the value
 of Z is now 150, B2 still has 100 so the value of Z is out of sync.
>>>>>>>>>
>>>>>>>>> How can we prevent the race condition like this? This is
>>>>>>>>> causing a major nuisance to us.
>>>>>>>>>
>>>>>>>>> Any help is highly appreciated. Thanks.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Kushan Maskey
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>



Links:

  1. http://mmillerassociates.com/

Reply via email to