Thank you for your answer. I have a couple of follow up questions: 1. Does it support 'exactly once semantics' that Spark and Storm support? 2. (Related to 1) What happens when an error occurs during processing? 3. Is there a plan for adding Machine Learning support on top of Flink? Say Alternative Least Squares, Basic Naive Bayes? 4. When you say Flink manages itself, does it mean I don't have to fiddle with number of partitions (Spark), number of reduces / happers (Hadoop?) to optimize performance? (In some cases this might be needed) 5. How far along is the Python API? I don't see the specs in the Website.
On Thu, Dec 25, 2014 at 4:31 AM, Márton Balassi <[email protected]> wrote: > Dear Samarth, > > Besides the discussions you have mentioned [1] I can recommend one of our > recent presentations [2], especially the distinguishing Flink section (from > slide 16). > > It is generally a difficult question as both the systems are rapidly > evolving, so the answer can become outdated quite fast. However there are > fundamental design features that are highly unlikely to change, for example > Spark uses "true" batch processing, meaning that intermediate results are > materialized (mostly in memory) as RDDs. Flink's engine is internally more > like streaming, forwarding the results to the next operator asap. The > latter can yield performance benefits for more complex jobs. Flink also > gives you a query optimizer, spills gracefully to disk when the system runs > out of memory and has some cool features around serialization. For > performance numbers and some more insight please check out the presentation > [2] and do not hesitate to post a follow-up mail here if you come across > something unclear or extraordinary. > > [1] > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=search_page&node=1&query=spark > [2] http://www.slideshare.net/GyulaFra/flink-apachecon > > Best, > > Marton > > On Tue, Dec 23, 2014 at 6:19 PM, Samarth Mailinglist < > [email protected]> wrote: > >> Hey folks, I have a noob question. >> >> I already looked up the archives and saw a couple of discussions >> <http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=search_page&node=1&query=spark> >> about Spark and Flink. >> >> I am familiar with spark (the python API, esp MLLib), and I see many >> similarities between Flink and Spark. >> >> How does Flink distinguish itself from Spark? >> > >
