Thanks gents for your guidelines. Appreciate the time.These are my first shot at the Benchmarking goals: • What worked? • What did not work? • What was setup and usability like? • What issues did we run into? • How did we resolve these issues? • Were we able to get the system operational? • Were we able to get the results you wanted? • Comparative Performance Analysis?
I am still researching the parameters I need to take into account for the last bullet: Comparative Performance AnalysisHave a great weekend. From: Jean-Baptiste Onofré <[email protected]> To: [email protected] Sent: Friday, May 27, 2016 1:33 AM Subject: Re: Beam Flink vs Beam Spark Benchmarking Hi Ismaël, as discussed together, clearly, the pipeline code should not use a runner specific pipeline options object, in order to be runner agnostic. Something like: SparkPipelineOptions options = PipelineOptionsFactory.as(SparkPipelineOptions.class) should not be used. It's better to use something like: PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create(); However, I think we may improve a bit the factory. Regards JB On 05/27/2016 10:22 AM, Ismaël Mejía wrote: > > I passed last week running tests on multiple runners, and theoretically > you should not change many things, however you must take care of not > mixing runner specific dependencies while you create your project (e.g. > you don't want to mix specific classes like FlinkPipelineOptions or > SparkPipelineOptions in your code). > > About specific good practices of how to benchmark things this is a more > tricky subject, e.g. you must be sure that both runners are using at > least similar parallelism levels. Of course there are many dimensions in > benchmarking and in particular in this space, the real question you have > to start with is what do you want to benchmark (throughput, resource > utilisation, etc) ? Is your pipeline batch only or streaming too ?. And > then try to create an scenario that you can reproduce where you expect a > similar behaviour among runners. > > But one thing is clear, you have to expect some differences since the > internal model of each runner is different as well as their maturity > level (at least at this point). > > Ismaël > > > On Fri, May 27, 2016 at 1:19 AM, amir bahmanyari <[email protected] > <mailto:[email protected]>> wrote: > > Hi Colleagues, > I have implemented the Java version of the MIT's Linear Road > algorithm as a Beam app. > I sanity tested it in a Flink Cluster (FlinkRunner). Works fine. > Receives tuples from Kafka, executes the LR algorithm, and produces > the correct results. > I would like to repeat the same in a Spark cluster. > I am assuming that, other than changing the type of the Runner > (Flink vs Spark) at runtime, I should not make any code changes. > Is that the right assumption based on what Beam is promising > regarding unifying of the underlying streaming engines? > > The real question is: What should I take into consideration if I > want to Benchmark Flink vs Spark by executing my same Beam LR app in > both engines? > How would you approach the benchmarking process? What would you be > looking for to compare? etc. > Thanks so much for your valuable time. > Amir- > > > > On Fri, May 27, 2016 at 1:19 AM, amir bahmanyari <[email protected] > <mailto:[email protected]>> wrote: > > Hi Colleagues, > I have implemented the Java version of the MIT's Linear Road > algorithm as a Beam app. > I sanity tested it in a Flink Cluster (FlinkRunner). Works fine. > Receives tuples from Kafka, executes the LR algorithm, and produces > the correct results. > I would like to repeat the same in a Spark cluster. > I am assuming that, other than changing the type of the Runner > (Flink vs Spark) at runtime, I should not make any code changes. > Is that the right assumption based on what Beam is promising > regarding unifying of the underlying streaming engines? > > The real question is: What should I take into consideration if I > want to Benchmark Flink vs Spark by executing my same Beam LR app in > both engines? > How would you approach the benchmarking process? What would you be > looking for to compare? etc. > Thanks so much for your valuable time. > Amir- > > -- Jean-Baptiste Onofré [email protected] http://blog.nanthrax.net Talend - http://www.talend.com
