I passed last week running tests on multiple runners, and theoretically you should not change many things, however you must take care of not mixing runner specific dependencies while you create your project (e.g. you don't want to mix specific classes like FlinkPipelineOptions or SparkPipelineOptions in your code).
About specific good practices of how to benchmark things this is a more tricky subject, e.g. you must be sure that both runners are using at least similar parallelism levels. Of course there are many dimensions in benchmarking and in particular in this space, the real question you have to start with is what do you want to benchmark (throughput, resource utilisation, etc) ? Is your pipeline batch only or streaming too ?. And then try to create an scenario that you can reproduce where you expect a similar behaviour among runners. But one thing is clear, you have to expect some differences since the internal model of each runner is different as well as their maturity level (at least at this point). Ismaël On Fri, May 27, 2016 at 1:19 AM, amir bahmanyari <[email protected]> wrote: > Hi Colleagues, > I have implemented the Java version of the MIT's Linear Road algorithm as > a Beam app. > I sanity tested it in a Flink Cluster (FlinkRunner). Works fine. > Receives tuples from Kafka, executes the LR algorithm, and produces the > correct results. > I would like to repeat the same in a Spark cluster. > I am assuming that, other than changing the type of the Runner (Flink vs > Spark) at runtime, I should not make any code changes. > Is that the right assumption based on what Beam is promising regarding > unifying of the underlying streaming engines? > > The real question is: What should I take into consideration if I want to > Benchmark Flink vs Spark by executing my same Beam LR app in both engines? > How would you approach the benchmarking process? What would you be looking > for to compare? etc. > Thanks so much for your valuable time. > Amir- > On Fri, May 27, 2016 at 1:19 AM, amir bahmanyari <[email protected]> wrote: > Hi Colleagues, > I have implemented the Java version of the MIT's Linear Road algorithm as > a Beam app. > I sanity tested it in a Flink Cluster (FlinkRunner). Works fine. > Receives tuples from Kafka, executes the LR algorithm, and produces the > correct results. > I would like to repeat the same in a Spark cluster. > I am assuming that, other than changing the type of the Runner (Flink vs > Spark) at runtime, I should not make any code changes. > Is that the right assumption based on what Beam is promising regarding > unifying of the underlying streaming engines? > > The real question is: What should I take into consideration if I want to > Benchmark Flink vs Spark by executing my same Beam LR app in both engines? > How would you approach the benchmarking process? What would you be looking > for to compare? etc. > Thanks so much for your valuable time. > Amir- >
