Also another difference I see is some thing like Spark Sql where there are logical plans, physical plans, Code generation and all those optimizations I don't see them in Kafka Streaming at this time.
On Sun, Jun 11, 2017 at 2:19 PM, kant kodali <kanth...@gmail.com> wrote: > I appreciate the responses however I see the other side of the argument > and I actually feel they are competitors now in Streaming space in some > sense. > > Kafka Streaming can indeed do map, reduce, join and window operations and > Like wise data can be ingested from many sources in Kafka and send the > results out to many sinks. Look up "Kafka Connect" > > Regarding Event at a time vs Micro-batch. I hear arguments from a group of > people saying Spark Streaming is real time and other group of people is > Kafka streaming is the true real time. so do we say Micro-batch is real > time or Event at a time is real time? > > It is well known fact that Spark is more popular with Data scientists who > want to run ML Algorithms and so on but I also hear that people can use H2O > package along with Kafka Streaming. so efficient each of these approaches > are is something I have no clue. > > The major difference I see is actually the *Spark Scheduler* I don't > think Kafka Streaming has anything like this instead it just allows you to > run lambda expressions on a stream and write it out to specific > topic/partition and from there one can use Kafka Connect to write it out to > any sink. so In short, All the optimizations built into spark scheduler > don't seem to exist in Kafka Streaming so if I were to make a decision on > which framework to use this is an additional question I would think about > like "Do I want my stream to go through the scheduler and if so, why or why > not" > > Above all, please correct me if I am wrong :) > > > > > On Sun, Jun 11, 2017 at 12:41 PM, Mohammed Guller <moham...@glassbeam.com> > wrote: > >> Just to elaborate more on Vincent wrote – Kafka streaming provides true >> record-at-a-time processing capabilities whereas Spark Streaming provides >> micro-batching capabilities on top of Spark. Depending on your use case, >> you may find one better than the other. Both provide stateless ad stateful >> stream processing capabilities. >> >> >> >> A few more things to consider: >> >> 1. If you don’t already have a Spark cluster, but have Kafka cluster, >> it may be easier to use Kafka streaming since you don’t need to setup and >> manage another cluster. >> 2. On the other hand, if you already have a spark cluster, but don’t >> have a Kafka cluster (in case you are using some other messaging system), >> Spark streaming is a better option. >> 3. If you already know and use Spark, you may find it easier to >> program with Spark Streaming API even if you are using Kafka. >> 4. Spark Streaming may give you better throughput. So you have to >> decide what is more important for your stream processing application – >> latency or throughput? >> 5. Kafka streaming is relatively new and less mature than Spark >> Streaming >> >> >> >> Mohammed >> >> >> >> *From:* vincent gromakowski [mailto:vincent.gromakow...@gmail.com] >> *Sent:* Sunday, June 11, 2017 12:09 PM >> *To:* yohann jardin <yohannjar...@hotmail.com> >> *Cc:* kant kodali <kanth...@gmail.com>; vaquar khan < >> vaquar.k...@gmail.com>; user <user@spark.apache.org> >> *Subject:* Re: What is the real difference between Kafka streaming and >> Spark Streaming? >> >> >> >> I think Kafka streams is good when the processing of each row is >> independant from each other (row parsing, data cleaning...) >> >> Spark is better when processing group of rows (group by, ml, window >> func...) >> >> >> >> Le 11 juin 2017 8:15 PM, "yohann jardin" <yohannjar...@hotmail.com> a >> écrit : >> >> Hey, >> >> Kafka can also do streaming on its own: https://kafka.apache.org/docum >> entation/streams >> I don’t know much about it unfortunately. I can only repeat what I heard >> in conferences, saying that one should give a try to Kafka streaming when >> its whole pipeline is using Kafka. I have no pros/cons to argument on this >> topic. >> >> *Yohann Jardin* >> >> Le 6/11/2017 à 7:08 PM, vaquar khan a écrit : >> >> Hi Kant, >> >> Kafka is the message broker that using as Producers and Consumers and >> Spark Streaming is used as the real time processing ,Kafka and Spark >> Streaming work together not competitors. >> >> Spark Streaming is reading data from Kafka and process into micro >> batching for streaming data, In easy terms collects data for some time, >> build RDD and then process these micro batches. >> >> >> >> >> >> Please read doc : https://spark.apache.org/doc >> s/latest/streaming-programming-guide.html >> >> >> >> Spark Streaming is an extension of the core Spark API that enables >> scalable, high-throughput, fault-tolerant stream processing of live data >> streams. Data can be ingested from many sources like *Kafka, Flume, >> Kinesis, or TCP sockets*, and can be processed using complex algorithms >> expressed with high-level functions like map, reduce, join and window. >> Finally, processed data can be pushed out to filesystems, databases, and >> live dashboards. In fact, you can apply Spark’s machine learning >> <https://spark.apache.org/docs/latest/ml-guide.html> and graph processing >> <https://spark.apache.org/docs/latest/graphx-programming-guide.html> >> algorithms >> on data streams. >> >> >> >> Regards, >> >> Vaquar khan >> >> >> >> On Sun, Jun 11, 2017 at 3:12 AM, kant kodali <kanth...@gmail.com> wrote: >> >> Hi All, >> >> >> >> I am trying hard to figure out what is the real difference between Kafka >> Streaming vs Spark Streaming other than saying one can be used as part of >> Micro services (since Kafka streaming is just a library) and the other is a >> Standalone framework by itself. >> >> >> >> If I can accomplish same job one way or other this is a sort of a >> puzzling question for me so it would be great to know what Spark streaming >> can do that Kafka Streaming cannot do efficiently or whatever ? >> >> >> >> Thanks! >> >> >> >> >> >> >> >> -- >> >> Regards, >> >> Vaquar Khan >> >> +1 -224-436-0783 <(224)%20436-0783> >> >> Greater Chicago >> >> >> >> >> >> >