Hi Georg -- Great to see you are evaluating Beam for your scenario.
> > someone told me that e.g. the flink runner for beam seems to be slower >> than a >> > native flink job. Is this true? Did you observe such characteristics >> for several >> > runners? >> > This should not be true in a general sense -- the performance should be ~equivalent. The Flink runner in Beam constructs a "native" Flink pipeline; the overhead of invoking user-defined functions is often to set a few fields and invoke a function, which is negligible. The actual performance of a pipeline tend to depend on other factors -- stragglers, how fast the system can adopt to changing load, etc. (If there's a gap somewhere, it is likely a bug -- and we'd like to know about it and fix it.) > in case I want to use some low level functionality (specific to a runner) >> like >> > ML, graph processing or sql-tables api, is it possible to just drop >> from the >> > beam API one level deeper to the actual runner and sort of mesh beam >> with runner >> > native code to integrate these features? > > The Beam API, in a general sense, doesn't provide such hooks, as that would break portability. I wouldn't advise this, but technically, it wouldn't be hard -- you'd create a PTransform in Beam, and modify the runner to replace it with their own specific implementation. Instead, I'd suggest using Beam's abstractions and, in the case of a missing pattern or a feature, to work with us to augment the Beam model accordingly. Hope this helps -- and that you find Beam fitting for your case. Please let us know if we can assist any further -- thanks! Davor
