Re: Beam Flink vs Beam Spark Benchmarking

Ismaël Mejía Fri, 27 May 2016 01:23:15 -0700


I passed last week running tests on multiple runners, and theoretically you
should not change many things, however you must take care of not mixing
runner specific dependencies while you create your project (e.g. you don't
want to mix specific classes like FlinkPipelineOptions or
SparkPipelineOptions in your code).

About specific good practices of how to benchmark things this is a more
tricky subject, e.g. you must be sure that both runners are using at least
similar parallelism levels. Of course there are many dimensions in
benchmarking and in particular in this space, the real question you have to
start with is what do you want to benchmark (throughput, resource
utilisation, etc) ? Is your pipeline batch only or streaming too ?. And
then try to create an scenario that you can reproduce where you expect a
similar behaviour among runners.

But one thing is clear, you have to expect some differences since the
internal model of each runner is different as well as their maturity level
(at least at this point).

Ismaël

On Fri, May 27, 2016 at 1:19 AM, amir bahmanyari <[email protected]>
wrote:

> Hi Colleagues,
> I have implemented the Java version of the MIT's Linear Road algorithm as
> a Beam app.
> I sanity tested it in a Flink Cluster (FlinkRunner). Works fine.
> Receives tuples from Kafka, executes the LR algorithm, and produces the
> correct results.
> I would like to repeat the same in a Spark cluster.
> I am assuming that, other than changing the type of the Runner (Flink vs
> Spark) at runtime, I should not make any code changes.
> Is that the right assumption based on what Beam is promising regarding
> unifying of the underlying streaming engines?
>
> The real question is: What should I take into consideration if I want to
> Benchmark Flink vs Spark by executing my same Beam LR app in both engines?
> How would you approach the benchmarking process? What would you be looking
> for to compare? etc.
> Thanks so much for your valuable time.
> Amir-
>

On Fri, May 27, 2016 at 1:19 AM, amir bahmanyari <[email protected]>
wrote:

> Hi Colleagues,
> I have implemented the Java version of the MIT's Linear Road algorithm as
> a Beam app.
> I sanity tested it in a Flink Cluster (FlinkRunner). Works fine.
> Receives tuples from Kafka, executes the LR algorithm, and produces the
> correct results.
> I would like to repeat the same in a Spark cluster.
> I am assuming that, other than changing the type of the Runner (Flink vs
> Spark) at runtime, I should not make any code changes.
> Is that the right assumption based on what Beam is promising regarding
> unifying of the underlying streaming engines?
>
> The real question is: What should I take into consideration if I want to
> Benchmark Flink vs Spark by executing my same Beam LR app in both engines?
> How would you approach the benchmarking process? What would you be looking
> for to compare? etc.
> Thanks so much for your valuable time.
> Amir-
>

Re: Beam Flink vs Beam Spark Benchmarking

Reply via email to