Use a fields grouping on device id, the same bolt instance will always handle the same device.
Now you may have the case that a downstream bolt is handling the device while a new bit of data comes in, but that depends on how your application is setup. If you do all of your writing in one bolt you will be ok. On Feb 12, 2015 6:21 AM, "Legg John" <[email protected]> wrote: > Hi > > After doing lots of reading and building a POC for our use case we are > still unsure as to whether Storm can handle our use case: > > - We have an inbound stream of sensor data for millions of devices > (which have unique identifiers). > - We need to perform aggregation of this stream on a per device > level. The aggregation will read data that has already been processed (and > persisted) in previous batches. > - *Key point: *When we process data for a particular device we need > to ensure that no other processes are processing data for that particular > device. This is because the outcome of our processing will affect the > downstream processing for that device. Effectively we need a distributed > lock. > - In addition the event device data needs to be processed in the order > that the events occurred. > > Essentially we can’t have two batches for the same device being processed > at the same time. > > Can storm handle our use case? > > Any advice appreciated. > > Regards > John >
