I agree that the Spark runner should support submitting programatically to cluster, though not instead of "spark-submit" but in addition, so that Beam users enjoy a good experience of portability while Spark users enjoy a quick ramp-up into Beam.
You can follow: https://issues.apache.org/jira/browse/BEAM-981 on this. On Wed, Apr 5, 2017 at 6:25 PM Kenneth Knowles <[email protected]> wrote: > The solution we want is removing the requirement of using spark-submit, so > that you really do just run the same program with different flags to target > different runners. I can see how in the near term this looks a bit odd. > > On Wed, Apr 5, 2017 at 3:42 AM, Antony Mayi <[email protected]> wrote: > > thanks guys, > > yes, it all makes sense. I actually have it implemented the way Ismaël is > proposing (using the --runner= parameter) but just don't like the redundant > syntax when submitting to ie. spark cluster (spark-submit bla bla bla > --runner SparkRunner) not mentioning it allows submitting to ie. spark but > specifying different runner (spark-submit bla bla bla --runner FlinkRunner) > so was really just looking for some cleaner nonredundant syntax to submit > the job. > > cheers, > a. > > > > > On Tuesday, 4 April 2017, 18:12, Thomas Groh <[email protected]> wrote: > > > In addition to what Ismaël said, there was another reason why I wanted to > be careful with this kind of automatic inference based on the classpath. > When you're submitting a job that can potentially run forever, we want to > be very explicit about it (since it can easily outlive the process you're > submitting it from, and may not loudly signal that the job will still be > active). The added complexity from requiring the runner type on the > submitter's end is relatively low, especially given that most runners will > already require additional configuration to function properly or at all. > > On Tue, Apr 4, 2017 at 6:01 AM, Ismaël Mejía <[email protected]> wrote: > > Antony, You can do this explicitly when building your pipeline from > the command args: > > Options options = > PipelineOptionsFactory. fromArgs(args).withValidation( ).as(Options.class); > > and when you run your app you pass --runner=YourFavoriteRunner and it > will resolve, however different runners can need a bit of tuning. You > can look at the examples module for how to enable profiles per runner, > and some instructions in how to execute this with maven. > > https://github.com/apache/ beam/tree/master/examples/java > <https://github.com/apache/beam/tree/master/examples/java> > > Also remember that if you run in a cluster you have to submit your > jar, e.g. spark-submit or flink run, and this will be different in > that style of deployment. > > I am not sure that resolving the runners implicitly is a good thing, > for the issue that Dan mentions, each runner may need to be tuned with > different options, and additionally because if we have multiple > runners in the classpath we would need to define some priority to > resolve them and I don't think it is a good thing to prefer one runner > over the others. > > Ismaël > > > > > >
