Thanks for the information. So it looks like we can't easily run portable
pipelines on Dataproc cluster at the moment.
> you can set --output_executable_path to create a jar that you can then
submit to yarn via spark-submit.
I tried to create a jar, but I ran into a problem. I left an error
> So hopefully setting --spark-master-url to be yarn will work too.
This is not supported.
On Tue, Jun 23, 2020 at 2:58 PM Xinyu Liu wrote:
> I am doing some prototyping on this too. I used spark-submit script
> instead of the rest api. In my simple setup, I ran
> SparkJobServerDriver.main()
I am doing some prototyping on this too. I used spark-submit script instead
of the rest api. In my simple setup, I ran SparkJobServerDriver.main()
directly in the AM as a spark job, which will submit the python job to the
default spark master url pointing to "local". I also use --files in the
Hi Kamil, there is a JIRA for this:
https://issues.apache.org/jira/browse/BEAM-8970 It's theoretically possible
but remains untested as far as I know :)
As I indicated in a comment, you can set --output_executable_path to create
a jar that you can then submit to yarn via spark-submit.
If you can
Hi all,
I'm trying to run a Beam pipeline using Spark on YARN. My pipeline is
written in Python, so I need to use a portable runner. Does anybody know
how I should configure job server parameters, especially
--spark-master-url? Is there anything else I need to be aware of while
using such setup?