1. If messages from 2 spouts can trigger updates to the same row you will
need ideally need to process them using a single thread, if there is a
possibility updates can be triggered at the same time it will require you
to have some sort of master epoch and compare timestamps to understand the
sequence of applying these updates. Additionally if your end database or
your schema supports versioning that would be the most lock-free setup you
can probably achieve. This problem generally speaking is out of scope of
Storm however for processing events for a given row with a rowid e.g. xyz
you can use FieldsGrouping that will guarantee that all tuples for this row
will always go to the same instance of a Bolt.

2. As mentioned earlier, you need either locking or versioning to control
this. For starters reviewing MVCC concept might help.


On Thu, May 4, 2017 at 8:39 AM, Ramin Farajollah (BLOOMBERG/ 731 LEX) <
[email protected]> wrote:

> Hi,
>
> The questions are around sequencing and synchronization of certain tuples.
>
> In my use case, I have a few spouts that act upon millions of cached rows
> before the updated rows successfully exit the topology (published to
> clients).
>
> A new tuple (an update) from spout A may result in thousands of updated
> rows. The same with spout B, except that the updates may or may not overlap.
>
> Also, performance is important.
>
>
> The questions are:
>
> 1. How can I ensure the updates for each row is applied in the order of
> arrival? (As a given row can be updated from multiple spouts/streams)
>
> 2. How can I ensure a new update does not step over in-flight updates?
> (Probably the same as the last question)
>
> Thank you
>
>
>
> << �gA mind is like a parachute. It doesn't work if
> it is not open.�h Frank Zappa >>
>

Reply via email to