Re: Portable Flink Runner plan

2018-03-08 Thread Kenneth Knowles
I want to nitpick slightly the wording of "Java-only runner". I would like/expect that a runner using some specialized Java execution paths would still be accepting a portable pipeline and using the URNs and URLs to pick out special codepaths, so it is still different than just leaving the old code

Re: Portable Flink Runner plan

2018-03-08 Thread Robert Bradshaw
All runners should support portable execution for Java, which should be just as easy as supporting execution of non-Java pipelines over this API. As for non-portable "specialized" execution of Java, I think it's a tradeoff between the overhead of the portability framework vs. the maintenance cost

Re: Portable Flink Runner plan

2018-03-08 Thread Kenneth Knowles
+1 to Luke's answer of "yes" for everything to be "portable by default". However, I (always) favor decentralizing this decision as long as the "Beam model" is respected. Baseline: - the input pipeline should always be in portable format - the results of execution should match portable execution

Re: Portable Flink Runner plan

2018-03-08 Thread Lukasz Cwik
I ran some very pessimistic pipelines that were shuffle heavy (Random KV -> GBK -> IdentityDoFn) and found that the performance overhead was 15% when executed with Dataflow. This is a while back and there was a lot of inefficiencies due to coder encode/decode cycles and based upon profiling informa

Re: Portable Flink Runner plan

2018-03-08 Thread Thomas Weise
Performance, due to the extra gRPC hop. On Thu, Mar 8, 2018 at 11:08 AM, Lukasz Cwik wrote: > The goal is to use containers (and similar technologies) in the future. It > really hinders pipeline portability between runners if you also have to > deal with the dependency conflicts between Flink/D

Re: Portable Flink Runner plan

2018-03-08 Thread Lukasz Cwik
The goal is to use containers (and similar technologies) in the future. It really hinders pipeline portability between runners if you also have to deal with the dependency conflicts between Flink/Dataflow/Spark/... execution runtimes. What kinds of penalty are you referring to (perf, user complexi

Re: Portable Flink Runner plan

2018-03-08 Thread Thomas Weise
I'm curious if pipelines that are exclusively Java will be executed (when running on Flink or other JVM based runnner) in separate harness containers also? This would impose a significant penalty compared to the current execution model. Will this be something the user can control? Thanks, Thomas

Re: Portable Flink Runner plan

2018-03-07 Thread Aljoscha Krettek
@Axel I assigned https://issues.apache.org/jira/browse/BEAM-2588 to you. It might make sense to also grab other issues that you're already working on. > On 7. Mar 2018, at 21:18, Aljoscha Krettek wrote: > > Cool, so we had the same ideas. I thi

Re: Portable Flink Runner plan

2018-03-07 Thread Aljoscha Krettek
Cool, so we had the same ideas. I think this indicates that we're not completely on the wrong track with this! ;-) Aljoscha > On 7. Mar 2018, at 21:14, Thomas Weise wrote: > > Ben, > > Looks like we hit the send button at the same time. Is the plan the to derive > the Flink implementation of

Re: Portable Flink Runner plan

2018-03-07 Thread Thomas Weise
Ben, Looks like we hit the send button at the same time. Is the plan the to derive the Flink implementation of the various execution services from those under org.apache.beam.runners.fnexecution ? Thanks On Wed, Mar 7, 2018 at 11:02 AM, Thomas Weise wrote: > What's the plan for the endpoints t

Re: Portable Flink Runner plan

2018-03-07 Thread Thomas Weise
What's the plan for the endpoints that the Flink operator needs to provide (control/data plane, state, logging)? Is the intention to provide base implementations that can be shared across runners and then implement the Flink specific parts on top of it? Has work started on those? If there are subt

Re: Portable Flink Runner plan

2018-03-07 Thread Ben Sidhom
With respect to sharing code for rewriting pipelines: we've already written a few utilities for pipeline fusion and rewriting transforms to work with portable runners. Fusion functions the same way as in the ULR and is as simple as a single method call. However, two things prevent us from complete

Re: Portable Flink Runner plan

2018-03-07 Thread Axel Magnuson
My current solution is sort of a middle ground between the two. I have made a lot of the portable API service logic generalizable, and it relies on the runner implementing a few intefaces to use it. It doesn't use decorators, but my hope is that it will prevent the need for each runner to complet

Re: Portable Flink Runner plan

2018-03-07 Thread Romain Manni-Bucau
Open question: did you think to a way to run the portable api on top of any runner to implement it once? Since runners have primitive it should be doable and avoid a per runner codebase, no? Other benefit: no direct portable api code in runners, yeah :). (Im thinking to a runner decorator or orches

Re: Portable Flink Runner plan

2018-03-07 Thread Ben Sidhom
Yes, Axel has started work on such a shim. Our plan in the short term is to keep the old FlinkRunner around and to call into it to process jobs from the job service itself. That way we can keep the non-portable runner fully-functional while working on portability. Eventually, I think it makes sens

Re: Portable Flink Runner plan

2018-03-07 Thread Aljoscha Krettek
Hi, Has anyone started on https://issues.apache.org/jira/browse/BEAM-2588 (FlinkRunner shim for serving Job API). If not I would start on that. My plan is to implement a FlinkJobService that implements JobServiceImplBase, similar to ReferenceRu