Hi Kushan,

The best way to avoid this is probably to use a grouping within the
topology so that all values of (X,Y) go to the same bolt instance. Unless
your bolt is multithreaded, of course, but then it has its own
race-condition problems.

The only downside here will be if your values of (X, Y) aren't evenly
distributed. Then some bolts will be overwhelmed and some will starve.  You
can see how that's going in the storm ui if you look at how many tuples
each instance gets.  If this ends up being a problem you'll want to rethink
your Cassandra schema.  In 2.1.x, Counter columns have gotten significantly
faster and may be a good way to go.  Regardless of the dbms,
read-update-write is always prone to race conditions unless it has
transaction support.

Hope this helps,

Keith.

On Tue Jan 20 2015 at 1:57:15 PM Kushan Maskey <
[email protected]> wrote:

> We are having a major issue trying to update Cassandra database where we
> see race condition in a bolt.
>
> Here is an example,
>
> I have a columnfamily, where i have 2 partitioning columns say X and Y.
> There is another columns Z which basically aggregated number. We are
> suppose to update Z based on X and Y. Storm is reading a huge volume of
> data from Kafka. When sport receives a message, first bolt reads the
> database for that combination of X and Y and get the value of Z. Then it
> updates the value Z and store it back into the database. Bolt parallelism
> is set to be 4 which mean 4 instances of bolt are trying to update the
> database. So when first bolt (B1) read the value of Z to be say 100, same
> time the second bolt (B2) also read it to be 100, but once B1 completed
> execution and the value of Z is now 150, B2 still has 100 so the value of Z
> is out of sync.
>
> How can we prevent the race condition like this? This is causing a major
> nuisance to us.
>
> Any help is highly appreciated. Thanks.
>
> --
> Kushan Maskey
>
>

Reply via email to