Hi After doing lots of reading and building a POC for our use case we are still unsure as to whether Spark Streaming can handle our use case:
* We have an inbound stream of sensor data for millions of devices (which have unique identifiers). * We need to perform aggregation of this stream on a per device level. The aggregation will read data that has already been processed (and persisted) in previous batches. * Key point: When we process data for a particular device we need to ensure that no other processes are processing data for that particular device. This is because the outcome of our processing will affect the downstream processing for that device. Effectively we need a distributed lock. * In addition the event device data needs to be processed in the order that the events occurred. Essentially we canĀ¹t have two batches for the same device being processed at the same time. Can Spark handle our use case? Any advice appreciated. Regards John --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org