Re: Beam Flink vs Beam Spark Benchmarking

amir bahmanyari Fri, 27 May 2016 12:02:56 -0700

Thanks gents for your guidelines. Appreciate the time.These are my first shot 
at the Benchmarking goals:            • What worked?
            • What did not work? 
            • What was setup and usability like? 
            • What issues did we run into? 
            • How did we resolve these issues? 
            • Were we able to get the system operational? 
            • Were we able to get the results you wanted?
            • Comparative Performance Analysis?


I am still researching the parameters I need to take into account for the last 
bullet: Comparative Performance AnalysisHave a great weekend.      From: 
Jean-Baptiste Onofré <[email protected]>
 To: [email protected] 
 Sent: Friday, May 27, 2016 1:33 AM
 Subject: Re: Beam Flink vs Beam Spark Benchmarking
   
Hi Ismaël,

as discussed together, clearly, the pipeline code should not use a 
runner specific pipeline options object, in order to be runner agnostic.

Something like:

SparkPipelineOptions options = 
PipelineOptionsFactory.as(SparkPipelineOptions.class)

should not be used.
It's better to use something like:

  PipelineOptions options = 
PipelineOptionsFactory.fromArgs(args).withValidation().create();


However, I think we may improve a bit the factory.

Regards
JB

On 05/27/2016 10:22 AM, Ismaël Mejía wrote:
> 
> I passed last week running tests on multiple runners, and theoretically
> you should not change many things, however you must take care of not
> mixing runner specific dependencies while you create your project (e.g.
> you don't want to mix specific classes like FlinkPipelineOptions or
> SparkPipelineOptions in your code).
>
> About specific good practices of how to benchmark things this is a more
> tricky subject, e.g. you must be sure that both runners are using at
> least similar parallelism levels. Of course there are many dimensions in
> benchmarking and in particular in this space, the real question you have
> to start with is what do you want to benchmark (throughput, resource
> utilisation, etc) ? Is your pipeline batch only or streaming too ?. And
> then try to create an scenario that you can reproduce where you expect a
> similar behaviour among runners.
>
> But one thing is clear, you have to expect some differences since the
> internal model of each runner is different as well as their maturity
> level (at least at this point).
>
> Ismaël
>
>
> On Fri, May 27, 2016 at 1:19 AM, amir bahmanyari <[email protected]
> <mailto:[email protected]>> wrote:
>
>    Hi Colleagues,
>    I have implemented the Java version of the MIT's Linear Road
>    algorithm as a Beam app.
>    I sanity tested it in a Flink Cluster (FlinkRunner). Works fine.
>    Receives tuples from Kafka, executes the LR algorithm, and produces
>    the correct results.
>    I would like to repeat the same in a Spark cluster.
>    I am assuming that, other than changing the type of the Runner
>    (Flink vs Spark) at runtime, I should not make any code changes.
>    Is that the right assumption based on what Beam is promising
>    regarding unifying of the underlying streaming engines?
>
>    The real question is: What should I take into consideration if I
>    want to Benchmark Flink vs Spark by executing my same Beam LR app in
>    both engines?
>    How would you approach the benchmarking process? What would you be
>    looking for to compare? etc.
>    Thanks so much for your valuable time.
>    Amir-
>
>
>
> On Fri, May 27, 2016 at 1:19 AM, amir bahmanyari <[email protected]
> <mailto:[email protected]>> wrote:
>
>    Hi Colleagues,
>    I have implemented the Java version of the MIT's Linear Road
>    algorithm as a Beam app.
>    I sanity tested it in a Flink Cluster (FlinkRunner). Works fine.
>    Receives tuples from Kafka, executes the LR algorithm, and produces
>    the correct results.
>    I would like to repeat the same in a Spark cluster.
>    I am assuming that, other than changing the type of the Runner
>    (Flink vs Spark) at runtime, I should not make any code changes.
>    Is that the right assumption based on what Beam is promising
>    regarding unifying of the underlying streaming engines?
>
>    The real question is: What should I take into consideration if I
>    want to Benchmark Flink vs Spark by executing my same Beam LR app in
>    both engines?
>    How would you approach the benchmarking process? What would you be
>    looking for to compare? etc.
>    Thanks so much for your valuable time.
>    Amir-
>
>

-- 
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Beam Flink vs Beam Spark Benchmarking

Reply via email to