Hi Buvana, Running Beam Python on Spark on Kubernetes is more complicated, because Beam has its own solution for running Python code [1]. Unfortunately there's no guide that I know of for Spark yet, however we do have instructions for Flink [2]. Beam's Flink and Spark runners, and I assume GCP's (unofficial) Flink and Spark [3] operators, are probably similar enough that it shouldn't be too hard to port the YAML from the Flink operator to the Spark operator. I filed an issue for it [4], but I probably won't have the bandwidth to work on it myself for a while. <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md>
- Kyle [1] https://beam.apache.org/roadmap/portability/ [2] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md [3] https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/ [4] https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/870 On Sat, Apr 11, 2020 at 4:33 PM Ramanan, Buvana (Nokia - US/Murray Hill) < [email protected]> wrote: > Thank you, Rahul for your very useful response. Can you please extend your > response by commenting on the procedure for Beam python pipeline? > > > > *From: *rahul patwari <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Friday, April 10, 2020 at 10:57 PM > *To: *user <[email protected]> > *Subject: *Re: SparkRunner on k8s > > > > Hi Buvana, > > > > You can submit a Beam Pipeline to Spark on k8s like any other Spark > Pipeline using the spark-submit script. > > > > Create an Uber Jar of your Beam code and provide it as the primary > resource to spark-submit. Provide the k8s master and the container image to > use as arguments to spark-submit. > > Refer https://spark.apache.org/docs/latest/running-on-kubernetes.html to > know more about how to run Spark on k8s. > > > > The Beam pipeline will be translated to a Spark Pipeline using Spark APIs > in Runtime. > > > > Regards, > > Rahul > > > > On Sat, Apr 11, 2020 at 4:38 AM Ramanan, Buvana (Nokia - US/Murray Hill) < > [email protected]> wrote: > > Hello, > > > > I newly joined this group and I went through the archive to see if any > discussion exists on submitting Beam pipelines to a SparkRunner on k8s. > > > > I run my Spark jobs on a k8s cluster in the cluster mode. Would like to > deploy my beam pipeline on a SparkRunner with k8s underneath. > > > > The Beam documentation: > > https://beam.apache.org/documentation/runners/spark/ > > does not discuss about k8s (though there is mention of Mesos and YARN). > > > > Can someone please point me to relevant material in this regard? Or, > provide the steps for running my beam pipeline in this configuration? > > > > Thank you, > > Regards, > > Buvana > >
