> Looks like for Java, a standalone Job Service is not required to run beam functions on Spark, and spark-submit handles everything in cluster mode. But this is not the case for Python runner.
That's correct. > Are you aware of any example in Python that runs in a (i.e. Kubernetes) cluster? Not that I'm aware of, but there's been some work on Beam Python+Flink+k8s: https://github.com/apache/beam/pull/9872. I am planning on doing something similar for the Spark runner. On Tue, Oct 29, 2019 at 8:57 AM Matthew K. <[email protected]> wrote: > Thanks Tom, > > Looks like for Java, a standalone Job Service is not required to run beam > functions on Spark, and spark-submit handles everything in cluster mode. > But this is not the case for Python runner. Are you aware of any example in > Python that runs in a (i.e. Kubernetes) cluster? > > *Sent:* Monday, October 28, 2019 at 6:21 PM > *From:* "Tom Barber" <[email protected]> > *To:* [email protected], "Matthew K." <[email protected]> > *Subject:* Re: Running Python Beam Functions on Spark Kubernetes Cluster > As my websever needs to move house tomorrow, here’s a snippet version of > the post in case the link isn’t available: > https://gitlab.com/spiculedata/spark-beam-demo/snippets/1908248 > > > > > On 28 October 2019 at 23:16:20, Tom Barber ([email protected]) wrote: > > > > > I spend a while figuring that out a week or two ago and wrote up a blog > post on it: > https://www.spicule.co.uk/news/post/2019-09-30-running-an-apache-beam-pipeline-over-spark-on-kubernetes > > And some sample code here: https://gitlab.com/spiculedata/spark-beam-demo > > > The actual submit command looks something like this: > > > ./spark-submit --master k8s://https:// --deploy-mode cluster --name > spark-demo --class com.example.beam.ProcessHealth2 --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=/spark: > local:///opt/wordcount-app-1.0.0-shaded.jar "--runner=SparkRunner" > "--awsKey=" "--awsSecret=" "--outputPath=s3:///" "--awsRegion=us-east-1” > > > > Tom > > On 28 October 2019 at 22:50:51, Matthew K. ([email protected]) wrote: > > I would like to run Beam functions on Spark cluster created on a > Kubernetes using `spark-submit`. However, it is not clear how to integrate > Beam's Job Service with non-standalone Spark master (on Kubernetes) > launched by `spark-submit`. > > Any information is appreciated. > > Thanks > > > > Spicule Limited is registered in England & Wales. Company Number: > 09954122. Registered office: First Floor, Telecom House, 125-135 Preston > Road, Brighton, England, BN1 6AF. VAT No. 251478891. > > > > All engagements are subject to Spicule Terms and Conditions of Business. > This email and its contents are intended solely for the individual to whom > it is addressed and may contain information that is confidential, > privileged or otherwise protected from disclosure, distributing or copying. > Any views or opinions presented in this email are solely those of the > author and do not necessarily represent those of Spicule Limited. The > company accepts no liability for any damage caused by any virus transmitted > by this email. If you have received this message in error, please notify us > immediately by reply email before deleting it from your system. Service of > legal notice cannot be effected on Spicule Limited by email. > >
