thanks guys, yes, it all makes sense. I actually have it implemented the way Ismaël is proposing (using the --runner= parameter) but just don't like the redundant syntax when submitting to ie. spark cluster (spark-submit bla bla bla --runner SparkRunner) not mentioning it allows submitting to ie. spark but specifying different runner (spark-submit bla bla bla --runner FlinkRunner) so was really just looking for some cleaner nonredundant syntax to submit the job. cheers,a.
On Tuesday, 4 April 2017, 18:12, Thomas Groh <[email protected]> wrote: In addition to what Ismaël said, there was another reason why I wanted to be careful with this kind of automatic inference based on the classpath. When you're submitting a job that can potentially run forever, we want to be very explicit about it (since it can easily outlive the process you're submitting it from, and may not loudly signal that the job will still be active). The added complexity from requiring the runner type on the submitter's end is relatively low, especially given that most runners will already require additional configuration to function properly or at all. On Tue, Apr 4, 2017 at 6:01 AM, Ismaël Mejía <[email protected]> wrote: Antony, You can do this explicitly when building your pipeline from the command args: Options options = PipelineOptionsFactory. fromArgs(args).withValidation( ).as(Options.class); and when you run your app you pass --runner=YourFavoriteRunner and it will resolve, however different runners can need a bit of tuning. You can look at the examples module for how to enable profiles per runner, and some instructions in how to execute this with maven. https://github.com/apache/ beam/tree/master/examples/java Also remember that if you run in a cluster you have to submit your jar, e.g. spark-submit or flink run, and this will be different in that style of deployment. I am not sure that resolving the runners implicitly is a good thing, for the issue that Dan mentions, each runner may need to be tuned with different options, and additionally because if we have multiple runners in the classpath we would need to define some priority to resolve them and I don't think it is a good thing to prefer one runner over the others. Ismaël
