Re: Record-at-a-time model for Spark Streaming

Tobias Pfeiffer Tue, 07 Oct 2014 19:25:27 -0700

Jianneng,

On Wed, Oct 8, 2014 at 8:44 AM, Jianneng Li <jiannen...@berkeley.edu> wrote:
>
> I understand that Spark Streaming uses micro-batches to implement
> streaming, while traditional streaming systems use the record-at-a-time
> processing model. The performance benefit of the former is throughput, and
> the latter is latency. I'm wondering what it would take to implement
> record-at-a-time for Spark Streaming? Would it be something that is
> feasible to prototype in one or two months?
>


I think this is so much against the fundamental design concept of Spark
Streaming that there would be nothing left of Spark Streaming when you are
done with it. Spark is fundamentally based on the idea of an RDD, that is,
distributed storage of data, and Spark Streaming basically a wrapper that
stores incoming data as an RDD and then processes it as a batch. "One item
at a time" does not match this model. Even if you *were* able to prototype
something, I think performance would be abysmal.

Tobias

Re: Record-at-a-time model for Spark Streaming

Reply via email to