Re: One click to run Spark on Kubernetes
Hi Bo Yang, Would it be something along the lines of Apache livy? Thanks, Prasad On Tue, Feb 22, 2022 at 10:22 PM bo yang wrote: > It is not a standalone spark cluster. In some details, it deploys a Spark > Operator (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) > and an extra REST Service. When people submit Spark application to that > REST Service, the REST Service will create a CRD inside the > Kubernetes cluster. Then Spark Operator will pick up the CRD and launch the > Spark application. The one click tool intends to hide these details, so > people could just submit Spark and do not need to deal with too many > deployment details. > > On Tue, Feb 22, 2022 at 8:09 PM Bitfox wrote: > >> Can it be a cluster installation of spark? or just the standalone node? >> >> Thanks >> >> On Wed, Feb 23, 2022 at 12:06 PM bo yang wrote: >> >>> Hi Spark Community, >>> >>> We built an open source tool to deploy and run Spark on Kubernetes with >>> a one click command. For example, on AWS, it could automatically create an >>> EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will >>> be able to use curl or a CLI tool to submit Spark application. After the >>> deployment, you could also install Uber Remote Shuffle Service to enable >>> Dynamic Allocation on Kuberentes. >>> >>> Anyone interested in using or working together on such a tool? >>> >>> Thanks, >>> Bo >>> >>> -- Regards, Prasad Paravatha
Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images
Apologies, please ignore my previous message On Mon, Feb 21, 2022 at 5:56 PM Prasad Paravatha wrote: > FYI, I am getting 404 for https://hub.docker.com/apache/spark > > On Mon, Feb 21, 2022 at 5:51 PM Holden Karau wrote: > >> Yeah I think we should still adopt that naming convention, however no one >> has taken the time submit write a script to do it yet so until we get that >> script merged I think we'll just have one build. I can try and do that for >> the next release but it would be a great 2nd issue for someone getting more >> familiar with the release tooling. >> >> On Mon, Feb 21, 2022 at 2:18 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Ok thanks for the correction. >>> >>> The docker pull line shows as follows: >>> >>> docker pull apache/spark:v3.2.1 >>> >>> >>> So this only tells me the version of Spark 3.2.1 >>> >>> >>> I thought we discussed deciding on the docker naming conventions in >>> detail, and broadly agreed on what needs to be in the naming convention. >>> For example, in this thread: >>> >>> >>> Time to start publishing Spark Docker Images? - >>> mich.talebza...@gmail.com - Gmail (google.com) >>> <https://mail.google.com/mail/u/0/?hl=en-GB#search/publishing/FMfcgzGkZQSzbXWQDWfddGDNRDQfPCpg> >>> dated >>> 22nd July 2021 >>> >>> >>> Referring to that, I think the broad agreement was that the docker image >>> name should be of the form: >>> >>> >>> The name of the file provides: >>> >>>- Built for spark or spark-py (PySpark) spark-r >>>- Spark version: 3.1.1, 3.1.2, 3.2.1 etc. >>>- Scala version; 2.1.2 >>>- The OS version based on JAVA: 8-jre-slim-buster, >>>11-jre-slim-buster meaning JAVA 8 and JAVA 11 respectively >>> >>> I believe it is a good thing and we ought to adopt that convention. For >>> example: >>> >>> >>> spark-py-3.2.1-scala_2.12-11-jre-slim-buster >>> >>> >>> HTH >>> >>> >>> >>>view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Mon, 21 Feb 2022 at 21:58, Holden Karau wrote: >>> >>>> My bad, the correct link is: >>>> >>>> https://hub.docker.com/r/apache/spark/tags >>>> >>>> On Mon, Feb 21, 2022 at 1:17 PM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> well that docker link is not found! may be permission issue >>>>> >>>>> [image: image.png] >>>>> >>>>> >>>>> >>>>> >>>>>view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, 21 Feb 2022 at 21:09, Holden Karau >>>>> wrote: >>>>> >>>>>> We are happy to announce the availability of Spark 3.1.3! >>>>>> >>>>>> Spark 3.1.3 is a maintenance release containing stability fixes. This >>>>>> release is based on the branch-3.1 maintenance branch of Spark. We >>>>>> strongly >>>>>> recommend all 3.1 users to upgrade to thi
Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images
this release:* >>>>> >>>>> We've also started publishing docker containers to the Apache >>>>> Dockerhub, >>>>> these contain non-ASF artifacts that are subject to different license >>>>> terms than the >>>>> Spark release. The docker containers are built for Linux x86 and ARM64 >>>>> since that's >>>>> what I have access to (thanks to NV for the ARM64 machines). >>>>> >>>>> You can get them from https://hub.docker.com/apache/spark (and >>>>> spark-r and spark-py) :) >>>>> (And version 3.2.1 is also now published on Dockerhub). >>>>> >>>>> Holden >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>> >>>> >>> >>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >> > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > -- Regards, Prasad Paravatha
Re: docker image distribution in Kubernetes cluster
can have >> limited/restricted version of the image or with some additional software >> that they use on the executors that is used during processing. >> >> >> >> So, in your case you only need to provide the first one since the other >> two configs will be copied from it >> >> >> >> Regards >> >> Khalid >> >> >> >> On Wed, 8 Dec 2021, 10:41 Mich Talebzadeh, >> wrote: >> >> Just a correction that in Spark 3.2 documentation it states >> <https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration> >> that >> >> >> >> *Property Name* >> >> *Default* >> >> *Meaning* >> >> spark.kubernetes.container.image >> >> (none) >> >> Container image to use for the Spark application. This is usually of the >> form example.com/repo/spark:v1.0.0. This configuration is required and >> must be provided by the user, unless explicit images are provided for each >> different container type. >> >> 2.3.0 >> >> spark.kubernetes.driver.container.image >> >> (value of spark.kubernetes.container.image) >> >> Custom container image to use for the driver. >> >> 2.3.0 >> >> spark.kubernetes.executor.container.image >> >> (value of spark.kubernetes.container.image) >> >> Custom container image to use for executors. >> >> So both driver and executor images are mapped to the container image. In >> my opinion, they are redundant and will potentially add confusion so they >> should be removed? >> >> >> >> [image: Image removed by sender.] view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> >> >> >> On Wed, 8 Dec 2021 at 10:15, Mich Talebzadeh >> wrote: >> >> Hi, >> >> >> >> We have three conf parameters to distribute the docker image with >> spark-sumit in Kubernetes cluster. >> >> >> >> These are >> >> >> >> spark-submit --verbose \ >> >> --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \ >> >>--conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \ >> >>--conf spark.kubernetes.container.image=${IMAGEGCP} \ >> >> >> >> when the above is run, it shows >> >> >> >> (spark.kubernetes.driver.docker.image, >> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages >> ) >> >> (spark.kubernetes.executor.docker.image, >> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages >> ) >> >> (spark.kubernetes.container.image, >> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages >> ) >> >> >> >> You notice that I am using the same docker image for driver, executor and >> container. In Spark 3.2 (actually in recent spark versions), I cannot see >> reference to driver or executor. Are these depreciated? It appears that >> Spark still accepts them? >> >> >> >> Thanks >> >> >> >> >> >> >> [image: Image removed by sender.] view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> h >> >> >> >> >> >> >> >> >> >> -- Regards, Prasad Paravatha
Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal
This is a great feature/idea. I'd love to get involved in some form (testing and/or documentation). This could be my 1st contribution to Spark! On Tue, Nov 30, 2021 at 10:46 PM John Zhuge wrote: > +1 Kudos to Yikun and the community for starting the discussion! > > On Tue, Nov 30, 2021 at 8:47 AM Chenya Zhang > wrote: > >> Thanks folks for bringing up the topic of natively integrating Volcano >> and other alternative schedulers into Spark! >> >> +Weiwei, Wilfred, Chaoran. We would love to contribute to the discussion >> as well. >> >> From our side, we have been using and improving on one alternative >> resource scheduler, Apache YuniKorn (https://yunikorn.apache.org/), for >> Spark on Kubernetes in production at Apple with solid results in the past >> year. It is capable of supporting Gang scheduling (similar to PodGroups), >> multi-tenant resource queues (similar to YARN), FIFO, and other handy >> features like bin packing to enable efficient autoscaling, etc. >> >> Natively integrating with Spark would provide more flexibility for users >> and reduce the extra cost and potential inconsistency of maintaining >> different layers of resource strategies. One interesting topic we hope to >> discuss more about is dynamic allocation, which would benefit from native >> coordination between Spark and resource schedulers in K8s & >> cloud environment for an optimal resource efficiency. >> >> >> On Tue, Nov 30, 2021 at 8:10 AM Holden Karau >> wrote: >> >>> Thanks for putting this together, I’m really excited for us to add >>> better batch scheduling integrations. >>> >>> On Tue, Nov 30, 2021 at 12:46 AM Yikun Jiang >>> wrote: >>> >>>> Hey everyone, >>>> >>>> I'd like to start a discussion on "Support Volcano/Alternative >>>> Schedulers Proposal". >>>> >>>> This SPIP is proposed to make spark k8s schedulers provide more YARN >>>> like features (such as queues and minimum resources before scheduling jobs) >>>> that many folks want on Kubernetes. >>>> >>>> The goal of this SPIP is to improve current spark k8s scheduler >>>> implementations, add the ability of batch scheduling and support volcano as >>>> one of implementations. >>>> >>>> Design doc: >>>> https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg >>>> JIRA: https://issues.apache.org/jira/browse/SPARK-36057 >>>> Part of PRs: >>>> Ability to create resources https://github.com/apache/spark/pull/34599 >>>> Add PodGroupFeatureStep: https://github.com/apache/spark/pull/34456 >>>> >>>> Regards, >>>> Yikun >>>> >>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >> > > -- > John Zhuge > -- Regards, Prasad Paravatha
Re: [ANNOUNCE] Apache Spark 3.2.0
Works now. Thanks Minor thing, the version naming convention could cause confusion. The name on this UI vs the tgz file name. > On Oct 19, 2021, at 10:09 AM, Gengliang Wang wrote: > > > Hi Prasad, > > Thanks for reporting the issue. The link was wrong. It should be fixed now. > Could you try again on https://spark.apache.org/downloads.html? > >> On Tue, Oct 19, 2021 at 10:53 PM Prasad Paravatha >> wrote: >> https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz >> >> FYI, unable to download from this location. >> Also, I don’t see Hadoop 3.3 version in the dist >> >> >>>> On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD >>>> wrote: >>>> >>> >>> Many thanks! >>> >>> >>> >>> From: Gengliang Wang >>> Sent: Dienstag, 19. Oktober 2021 16:16 >>> To: dev ; user >>> Subject: [ANNOUNCE] Apache Spark 3.2.0 >>> >>> >>> >>> Hi all, >>> >>> >>> >>> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous >>> contribution from the open-source community, this release managed to >>> resolve in excess of 1,700 Jira tickets. >>> >>> >>> >>> We'd like to thank our contributors and users for their contributions and >>> early feedback to this release. This release would not have been possible >>> without you. >>> >>> >>> >>> To download Spark 3.2.0, head over to the download page: >>> https://spark.apache.org/downloads.html >>> >>> >>> >>> To view the release notes: >>> https://spark.apache.org/releases/spark-release-3-2-0.html
Re: [ANNOUNCE] Apache Spark 3.2.0
https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz FYI, unable to download from this location. Also, I don’t see Hadoop 3.3 version in the dist > On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD > wrote: > > > Many thanks! > > From: Gengliang Wang > Sent: Dienstag, 19. Oktober 2021 16:16 > To: dev ; user > Subject: [ANNOUNCE] Apache Spark 3.2.0 > > Hi all, > > Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous > contribution from the open-source community, this release managed to resolve > in excess of 1,700 Jira tickets. > > We'd like to thank our contributors and users for their contributions and > early feedback to this release. This release would not have been possible > without you. > > To download Spark 3.2.0, head over to the download page: > https://spark.apache.org/downloads.html > > To view the release notes: > https://spark.apache.org/releases/spark-release-3-2-0.html