Re: Comparison of storm and flink
Hi, I don't have a perfect list available but these are some of the things to keep in mind: 1) end2end Latency. Some systems (like spark) use microbatching which introduces a latency of seconds 2) Do you get "exactly once guarantees"? Storm can give you that but then the throughput goes really down. 3) Ease of programming. How 'nice' is the api you have to work with. 4) Resiliance of state. If you need some state over several events, Does the framework support this and has built in recovery of this stat in case of a faillure? 5) Tools. What kind of tools are " ready to run" available? I.e. kmeans, and things like that. 6) Deployment. How do you run it? Do you need a separate infrastructure or can you deploy it in an existing yarn/mesos/... 7) Security: Can it access kerberos secured resources (like hbase, hdfs or any other service) in a long running situation. As a final note: I've been hacking at Storm for over a year now and last summer I found Flink. Today Storm is for me no longer an option and we are taking down what we already had running. Niels Basjes On 23 Jan 2016 20:59, "Vinaya M S"wrote: > Hi Flink user group, > > I am working on a project for the Insight Data Engineering Program in New > York to compare streaming tools. The program is designed for software > engineers and those straight from the university to transition to a data > engineering role. After completing the project, we present demos of the > project to several companies in NYC that we are interested in working for > (including top companies like NY Times, Capital One, Bloomberg, etc). > > I have decided to work on a project to compare streaming tools, namely > Flink and Storm. I already have Twitter data stored and would like to > design tests to benchmark the the two tools if possible. > > I wanted to be extra-careful in constructing a benchmark to work on and > present at companies here in NY. Do you have any recommendations to tests > to run with the Twitter data that I have that would showcase when to and > not use Flink compared to Storm? > > Thanks! > Vinaya >
Comparison of storm and flink
Hi Flink user group, I am working on a project for the Insight Data Engineering Program in New York to compare streaming tools. The program is designed for software engineers and those straight from the university to transition to a data engineering role. After completing the project, we present demos of the project to several companies in NYC that we are interested in working for (including top companies like NY Times, Capital One, Bloomberg, etc). I have decided to work on a project to compare streaming tools, namely Flink and Storm. I already have Twitter data stored and would like to design tests to benchmark the the two tools if possible. I wanted to be extra-careful in constructing a benchmark to work on and present at companies here in NY. Do you have any recommendations to tests to run with the Twitter data that I have that would showcase when to and not use Flink compared to Storm? Thanks! Vinaya
Re: Comparison of storm and flink
Hi Vinaya 1. Comparing streaming tools ( in this case Storm and Flink) should not be based on performance benchmarks only! For example, slides 16-36 list over 96 criteria, that we identified at Capital One, to compare two streaming tools http://www.slideshare.net/sbaltagi/flink-vs-spark/17 2. Now, if you are focusing on performance only, I'll suggest a few related resources: - Benchmarking Streaming Computation Engines at Yahoo! http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at December 16, 2015 Code at github: https://github.com/yahoo/streaming-benchmarks - There is some work started by some Flink contributors to create some performance scripts for Flink, Spark, and MapReduce here: There is Apache Flink: Performance and Testing https://github.com/project-flink/flink-perf - Some first numbers on performance of streaming jobs with Apache Flink are here: http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/ under the section: 'Show me the numbers'. Code used is at: https://github.com/dataArtisans/performance - Yangjun Wang is currently working on his Master thesis at Aalto university in Helsinki, Finland. The topic of his thesis is about building a standard benchmark system for streaming processing systems like Apache Storm, Spark and Flink. Code at github https://github.com/wangyangjun/StreamBench/tree/master/StreamBench 3. I am giving a talk in NYC on Tuesday February 2nd, 2016 on Apache Flink and I will be touching a bit on benchmarks http://www.meetup.com/New-York-City-NYC-Apache-Flink-Meetup/events/228113118/ You are welcome to attend. Thanks Slim Baltagi -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Comparison-of-storm-and-flink-tp4468p4469.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.