Jianneng, On Wed, Oct 8, 2014 at 8:44 AM, Jianneng Li <jiannen...@berkeley.edu> wrote: > > I understand that Spark Streaming uses micro-batches to implement > streaming, while traditional streaming systems use the record-at-a-time > processing model. The performance benefit of the former is throughput, and > the latter is latency. I'm wondering what it would take to implement > record-at-a-time for Spark Streaming? Would it be something that is > feasible to prototype in one or two months? >
I think this is so much against the fundamental design concept of Spark Streaming that there would be nothing left of Spark Streaming when you are done with it. Spark is fundamentally based on the idea of an RDD, that is, distributed storage of data, and Spark Streaming basically a wrapper that stores incoming data as an RDD and then processes it as a batch. "One item at a time" does not match this model. Even if you *were* able to prototype something, I think performance would be abysmal. Tobias