Also another difference I see is some thing like Spark Sql where there are
logical plans, physical plans, Code generation and all those optimizations
I don't see them in Kafka Streaming at this time.

On Sun, Jun 11, 2017 at 2:19 PM, kant kodali <kanth...@gmail.com> wrote:

> I appreciate the responses however I see the other side of the argument
> and I actually feel they are competitors now in Streaming space in some
> sense.
>
> Kafka Streaming can indeed do map, reduce, join and window operations and
> Like wise data can be ingested from many sources in Kafka and send the
> results out to many sinks. Look up "Kafka Connect"
>
> Regarding Event at a time vs Micro-batch. I hear arguments from a group of
> people saying Spark Streaming is real time and other group of people is
> Kafka streaming is the true real time. so do we say Micro-batch is real
> time or Event at a time is real time?
>
> It is well known fact that Spark is more popular with Data scientists who
> want to run ML Algorithms and so on but I also hear that people can use H2O
> package along with Kafka Streaming. so efficient each of these approaches
> are is something I have no clue.
>
> The major difference I see is actually the *Spark Scheduler* I don't
> think Kafka Streaming has anything like this instead it just allows you to
> run lambda expressions on a stream and write it out to specific
> topic/partition and from there one can use Kafka Connect to write it out to
> any sink. so In short, All the optimizations built into spark scheduler
> don't seem to exist in Kafka Streaming so if I were to make a decision on
> which framework to use this is an additional question I would think about
> like "Do I want my stream to go through the scheduler and if so, why or why
> not"
>
> Above all, please correct me if I am wrong :)
>
>
>
>
> On Sun, Jun 11, 2017 at 12:41 PM, Mohammed Guller <moham...@glassbeam.com>
> wrote:
>
>> Just to elaborate more on Vincent wrote – Kafka streaming provides true
>> record-at-a-time processing capabilities whereas Spark Streaming provides
>> micro-batching capabilities on top of Spark. Depending on your use case,
>> you may find one better than the other. Both provide stateless ad stateful
>> stream processing capabilities.
>>
>>
>>
>> A few more things to consider:
>>
>>    1. If you don’t already have a Spark cluster, but have Kafka cluster,
>>    it may be easier to use Kafka streaming since you don’t need to setup and
>>    manage another cluster.
>>    2. On the other hand, if you already have a spark cluster, but don’t
>>    have a Kafka cluster (in case you are using some other messaging system),
>>    Spark streaming is a better option.
>>    3. If you already know and use Spark, you may find it easier to
>>    program with Spark Streaming API even if you are using Kafka.
>>    4. Spark Streaming may give you better throughput. So you have to
>>    decide what is more important for your stream processing application –
>>    latency or throughput?
>>    5. Kafka streaming is relatively new and less mature than Spark
>>    Streaming
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* vincent gromakowski [mailto:vincent.gromakow...@gmail.com]
>> *Sent:* Sunday, June 11, 2017 12:09 PM
>> *To:* yohann jardin <yohannjar...@hotmail.com>
>> *Cc:* kant kodali <kanth...@gmail.com>; vaquar khan <
>> vaquar.k...@gmail.com>; user <user@spark.apache.org>
>> *Subject:* Re: What is the real difference between Kafka streaming and
>> Spark Streaming?
>>
>>
>>
>> I think Kafka streams is good when the processing of each row is
>> independant from each other (row parsing, data cleaning...)
>>
>> Spark is better when processing group of rows (group by, ml, window
>> func...)
>>
>>
>>
>> Le 11 juin 2017 8:15 PM, "yohann jardin" <yohannjar...@hotmail.com> a
>> écrit :
>>
>> Hey,
>>
>> Kafka can also do streaming on its own: https://kafka.apache.org/docum
>> entation/streams
>> I don’t know much about it unfortunately. I can only repeat what I heard
>> in conferences, saying that one should give a try to Kafka streaming when
>> its whole pipeline is using Kafka. I have no pros/cons to argument on this
>> topic.
>>
>> *Yohann Jardin*
>>
>> Le 6/11/2017 à 7:08 PM, vaquar khan a écrit :
>>
>> Hi Kant,
>>
>> Kafka is the message broker that using as Producers and Consumers and
>> Spark Streaming is used as the real time processing ,Kafka and Spark
>> Streaming work together not competitors.
>>
>> Spark Streaming is reading data from Kafka and process into micro
>> batching for streaming data, In easy terms collects data for some time,
>> build RDD and then process these micro batches.
>>
>>
>>
>>
>>
>> Please read doc : https://spark.apache.org/doc
>> s/latest/streaming-programming-guide.html
>>
>>
>>
>> Spark Streaming is an extension of the core Spark API that enables
>> scalable, high-throughput, fault-tolerant stream processing of live data
>> streams. Data can be ingested from many sources like *Kafka, Flume,
>> Kinesis, or TCP sockets*, and can be processed using complex algorithms
>> expressed with high-level functions like map, reduce, join and window.
>> Finally, processed data can be pushed out to filesystems, databases, and
>> live dashboards. In fact, you can apply Spark’s machine learning
>> <https://spark.apache.org/docs/latest/ml-guide.html> and graph processing
>> <https://spark.apache.org/docs/latest/graphx-programming-guide.html> 
>> algorithms
>> on data streams.
>>
>>
>>
>> Regards,
>>
>> Vaquar khan
>>
>>
>>
>> On Sun, Jun 11, 2017 at 3:12 AM, kant kodali <kanth...@gmail.com> wrote:
>>
>> Hi All,
>>
>>
>>
>> I am trying hard to figure out what is the real difference between Kafka
>> Streaming vs Spark Streaming other than saying one can be used as part of
>> Micro services (since Kafka streaming is just a library) and the other is a
>> Standalone framework by itself.
>>
>>
>>
>> If I can accomplish same job one way or other this is a sort of a
>> puzzling question for me so it would be great to know what Spark streaming
>> can do that Kafka Streaming cannot do efficiently or whatever ?
>>
>>
>>
>> Thanks!
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Regards,
>>
>> Vaquar Khan
>>
>> +1 -224-436-0783 <(224)%20436-0783>
>>
>> Greater Chicago
>>
>>
>>
>>
>>
>>
>

Reply via email to