Hi! I wanted to run something by ya'll, since I'm working on a new project that's a bit outside of my expertise. I'm not sure if Storm is the right fit, as there's some unique aspects. Here's the data flow:
1. Data arrives and is appended to one of 10M buffers (partitioned by userID) 1a. Each buffer holds only up to a few hundred records over a 24 hour period 2. When data arrives, computation is run on it that may also use data in the buffer. The model executed is unique for every buffer. 3. Depending on results of the computation, data in the buffer is unchanged, updated, or aggregated and pushed to another system. 4. Once every X hours from the last message received, all non-empty buffers have a computation executed and the results pushed to another system. I'm looking for something that seems to be a hybrid ring buffer/stream processor, but it seems that I can only find one or the only component (Kafka, Spark, Kinesis, etc.). Does this make sense? Is there enough detail? Can I achieve real-time, buffering, and data/computational locality? -B -- Bradford Stephens Freelance CTO & Startup Exec 22acacia.com (530) 763-DATA
