Re: Running Beam pipeline using Spark on YARN

2020-06-24 Thread Kamil Wasilewski
Thanks for the information. So it looks like we can't easily run portable pipelines on Dataproc cluster at the moment. > you can set --output_executable_path to create a jar that you can then submit to yarn via spark-submit. I tried to create a jar, but I ran into a problem. I left an error

Re: Running Beam pipeline using Spark on YARN

2020-06-23 Thread Kyle Weaver
> So hopefully setting --spark-master-url to be yarn will work too. This is not supported. On Tue, Jun 23, 2020 at 2:58 PM Xinyu Liu wrote: > I am doing some prototyping on this too. I used spark-submit script > instead of the rest api. In my simple setup, I ran > SparkJobServerDriver.main()

Re: Running Beam pipeline using Spark on YARN

2020-06-23 Thread Xinyu Liu
I am doing some prototyping on this too. I used spark-submit script instead of the rest api. In my simple setup, I ran SparkJobServerDriver.main() directly in the AM as a spark job, which will submit the python job to the default spark master url pointing to "local". I also use --files in the

Re: Running Beam pipeline using Spark on YARN

2020-06-23 Thread Kyle Weaver
Hi Kamil, there is a JIRA for this: https://issues.apache.org/jira/browse/BEAM-8970 It's theoretically possible but remains untested as far as I know :) As I indicated in a comment, you can set --output_executable_path to create a jar that you can then submit to yarn via spark-submit. If you can

Running Beam pipeline using Spark on YARN

2020-06-23 Thread Kamil Wasilewski
Hi all, I'm trying to run a Beam pipeline using Spark on YARN. My pipeline is written in Python, so I need to use a portable runner. Does anybody know how I should configure job server parameters, especially --spark-master-url? Is there anything else I need to be aware of while using such setup?