Re: automatic runner inference

Amit Sela Wed, 05 Apr 2017 12:20:45 -0700

I agree that the Spark runner should support submitting programatically to
cluster, though not instead of "spark-submit" but in addition, so that Beam
users enjoy a good experience of portability while Spark users enjoy a
quick ramp-up into Beam.


You can follow: https://issues.apache.org/jira/browse/BEAM-981 on this.

On Wed, Apr 5, 2017 at 6:25 PM Kenneth Knowles <[email protected]> wrote:

> The solution we want is removing the requirement of using spark-submit, so
> that you really do just run the same program with different flags to target
> different runners. I can see how in the near term this looks a bit odd.
>
> On Wed, Apr 5, 2017 at 3:42 AM, Antony Mayi <[email protected]> wrote:
>
> thanks guys,
>
> yes, it all makes sense. I actually have it implemented the way Ismaël is
> proposing (using the --runner= parameter) but just don't like the redundant
> syntax when submitting to ie. spark cluster (spark-submit bla bla bla
> --runner SparkRunner) not mentioning it allows submitting to ie. spark but
> specifying different runner (spark-submit bla bla bla --runner FlinkRunner)
> so was really just looking for some cleaner nonredundant syntax to submit
> the job.
>
> cheers,
> a.
>
>
>
>
> On Tuesday, 4 April 2017, 18:12, Thomas Groh <[email protected]> wrote:
>
>
> In addition to what Ismaël said, there was another reason why I wanted to
> be careful with this kind of automatic inference based on the classpath.
> When you're submitting a job that can potentially run forever, we want to
> be very explicit about it (since it can easily outlive the process you're
> submitting it from, and may not loudly signal that the job will still be
> active). The added complexity from requiring the runner type on the
> submitter's end is relatively low, especially given that most runners will
> already require additional configuration to function properly or at all.
>
> On Tue, Apr 4, 2017 at 6:01 AM, Ismaël Mejía <[email protected]> wrote:
>
> Antony, You can do this explicitly when building your pipeline from
> the command args:
>
> Options options =
> PipelineOptionsFactory. fromArgs(args).withValidation( ).as(Options.class);
>
> and when you run your app you pass --runner=YourFavoriteRunner and it
> will resolve, however different runners can need a bit of tuning. You
> can look at the examples module for how to enable profiles per runner,
> and some instructions in how to execute this with maven.
>
> https://github.com/apache/ beam/tree/master/examples/java
> <https://github.com/apache/beam/tree/master/examples/java>
>
> Also remember that if you run in a cluster you have to submit your
> jar, e.g. spark-submit or flink run, and this will be different in
> that style of deployment.
>
> I am not sure that resolving the runners implicitly is a good thing,
> for the issue that Dan mentions, each runner may need to be tuned with
> different options, and additionally because if we have multiple
> runners in the classpath we would need to define some priority to
> resolve them and I don't think it is a good thing to prefer one runner
> over the others.
>
> Ismaël
>
>
>
>
>
>

Re: automatic runner inference

Reply via email to