Thanks for replying. I understand the db approaches.
----- Original Message ----- From: Ambud Sharma <[email protected]> To: RAMIN FARAJOLLAH, [email protected] At: 06-May-2017 12:44:45 1. If messages from 2 spouts can trigger updates to the same row you will need ideally need to process them using a single thread, if there is a possibility updates can be triggered at the same time it will require you to have some sort of master epoch and compare timestamps to understand the sequence of applying these updates. Additionally if your end database or your schema supports versioning that would be the most lock-free setup you can probably achieve. This problem generally speaking is out of scope of Storm however for processing events for a given row with a rowid e.g. xyz you can use FieldsGrouping that will guarantee that all tuples for this row will always go to the same instance of a Bolt. 2. As mentioned earlier, you need either locking or versioning to control this. For starters reviewing MVCC concept might help. On Thu, May 4, 2017 at 8:39 AM, Ramin Farajollah (BLOOMBERG/ 731 LEX) <[email protected]> wrote: Hi, The questions are around sequencing and synchronization of certain tuples. In my use case, I have a few spouts that act upon millions of cached rows before the updated rows successfully exit the topology (published to clients). A new tuple (an update) from spout A may result in thousands of updated rows. The same with spout B, except that the updates may or may not overlap. Also, performance is important. The questions are: 1. How can I ensure the updates for each row is applied in the order of arrival? (As a given row can be updated from multiple spouts/streams) 2. How can I ensure a new update does not step over in-flight updates? (Probably the same as the last question) Thank you << �gA mind is like a parachute. It doesn't work if it is not open.�h Frank Zappa >> << �gA mind is like a parachute. It doesn't work if it is not open.�h Frank Zappa >>
