Re: One click to run Spark on Kubernetes

2022-02-22 Thread Prasad Paravatha
Hi Bo Yang,
Would it be something along the lines of Apache livy?

Thanks,
Prasad


On Tue, Feb 22, 2022 at 10:22 PM bo yang  wrote:

> It is not a standalone spark cluster. In some details, it deploys a Spark
> Operator (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
> and an extra REST Service. When people submit Spark application to that
> REST Service, the REST Service will create a CRD inside the
> Kubernetes cluster. Then Spark Operator will pick up the CRD and launch the
> Spark application. The one click tool intends to hide these details, so
> people could just submit Spark and do not need to deal with too many
> deployment details.
>
> On Tue, Feb 22, 2022 at 8:09 PM Bitfox  wrote:
>
>> Can it be a cluster installation of spark? or just the standalone node?
>>
>> Thanks
>>
>> On Wed, Feb 23, 2022 at 12:06 PM bo yang  wrote:
>>
>>> Hi Spark Community,
>>>
>>> We built an open source tool to deploy and run Spark on Kubernetes with
>>> a one click command. For example, on AWS, it could automatically create an
>>> EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will
>>> be able to use curl or a CLI tool to submit Spark application. After the
>>> deployment, you could also install Uber Remote Shuffle Service to enable
>>> Dynamic Allocation on Kuberentes.
>>>
>>> Anyone interested in using or working together on such a tool?
>>>
>>> Thanks,
>>> Bo
>>>
>>>

-- 
Regards,
Prasad Paravatha


Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-21 Thread Prasad Paravatha
Apologies, please ignore my previous message

On Mon, Feb 21, 2022 at 5:56 PM Prasad Paravatha 
wrote:

> FYI, I am getting 404 for https://hub.docker.com/apache/spark
>
> On Mon, Feb 21, 2022 at 5:51 PM Holden Karau  wrote:
>
>> Yeah I think we should still adopt that naming convention, however no one
>> has taken the time submit write a script to do it yet so until we get that
>> script merged I think we'll just have one build. I can try and do that for
>> the next release but it would be a great 2nd issue for someone getting more
>> familiar with the release tooling.
>>
>> On Mon, Feb 21, 2022 at 2:18 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Ok thanks for the correction.
>>>
>>> The docker pull line shows as follows:
>>>
>>> docker pull apache/spark:v3.2.1
>>>
>>>
>>> So this only tells me the version of Spark 3.2.1
>>>
>>>
>>> I thought we discussed deciding on the docker naming conventions in
>>> detail, and broadly agreed on what needs to be in the naming convention.
>>> For example, in this thread:
>>>
>>>
>>> Time to start publishing Spark Docker Images? -
>>> mich.talebza...@gmail.com - Gmail (google.com)
>>> <https://mail.google.com/mail/u/0/?hl=en-GB#search/publishing/FMfcgzGkZQSzbXWQDWfddGDNRDQfPCpg>
>>>  dated
>>> 22nd July 2021
>>>
>>>
>>> Referring to that, I think the broad agreement was that the docker image
>>> name should be of the form:
>>>
>>>
>>> The name of the file provides:
>>>
>>>- Built for spark or spark-py (PySpark) spark-r
>>>- Spark version: 3.1.1, 3.1.2, 3.2.1 etc.
>>>- Scala version; 2.1.2
>>>- The OS version based on JAVA: 8-jre-slim-buster,
>>>11-jre-slim-buster meaning JAVA 8 and JAVA 11 respectively
>>>
>>> I believe it is a good thing and we ought to adopt that convention. For
>>> example:
>>>
>>>
>>> spark-py-3.2.1-scala_2.12-11-jre-slim-buster
>>>
>>>
>>> HTH
>>>
>>>
>>>
>>>view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 21 Feb 2022 at 21:58, Holden Karau  wrote:
>>>
>>>> My bad, the correct link is:
>>>>
>>>> https://hub.docker.com/r/apache/spark/tags
>>>>
>>>> On Mon, Feb 21, 2022 at 1:17 PM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> well that docker link is not found! may be permission issue
>>>>>
>>>>> [image: image.png]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, 21 Feb 2022 at 21:09, Holden Karau 
>>>>> wrote:
>>>>>
>>>>>> We are happy to announce the availability of Spark 3.1.3!
>>>>>>
>>>>>> Spark 3.1.3 is a maintenance release containing stability fixes. This
>>>>>> release is based on the branch-3.1 maintenance branch of Spark. We
>>>>>> strongly
>>>>>> recommend all 3.1 users to upgrade to thi

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-21 Thread Prasad Paravatha
this release:*
>>>>>
>>>>> We've also started publishing docker containers to the Apache
>>>>> Dockerhub,
>>>>> these contain non-ASF artifacts that are subject to different license
>>>>> terms than the
>>>>> Spark release. The docker containers are built for Linux x86 and ARM64
>>>>> since that's
>>>>> what I have access to (thanks to NV for the ARM64 machines).
>>>>>
>>>>> You can get them from https://hub.docker.com/apache/spark (and
>>>>> spark-r and spark-py) :)
>>>>> (And version 3.2.1 is also now published on Dockerhub).
>>>>>
>>>>> Holden
>>>>>
>>>>> --
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>
>>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


-- 
Regards,
Prasad Paravatha


Re: docker image distribution in Kubernetes cluster

2021-12-08 Thread Prasad Paravatha
 can have
>> limited/restricted version of the image or with some additional software
>> that they use on the executors that is used during processing.
>>
>>
>>
>> So, in your case you only need to provide the first one since the other
>> two configs will be copied from it
>>
>>
>>
>> Regards
>>
>> Khalid
>>
>>
>>
>> On Wed, 8 Dec 2021, 10:41 Mich Talebzadeh, 
>> wrote:
>>
>> Just a correction that in Spark 3.2 documentation it states
>> <https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration>
>> that
>>
>>
>>
>> *Property Name*
>>
>> *Default*
>>
>> *Meaning*
>>
>> spark.kubernetes.container.image
>>
>> (none)
>>
>> Container image to use for the Spark application. This is usually of the
>> form example.com/repo/spark:v1.0.0. This configuration is required and
>> must be provided by the user, unless explicit images are provided for each
>> different container type.
>>
>> 2.3.0
>>
>> spark.kubernetes.driver.container.image
>>
>> (value of spark.kubernetes.container.image)
>>
>> Custom container image to use for the driver.
>>
>> 2.3.0
>>
>> spark.kubernetes.executor.container.image
>>
>> (value of spark.kubernetes.container.image)
>>
>> Custom container image to use for executors.
>>
>> So both driver and executor images are mapped to the container image. In
>> my opinion, they are redundant and will potentially add confusion so they
>> should be removed?
>>
>>
>>
>>  [image: Image removed by sender.]  view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>
>>
>>
>> On Wed, 8 Dec 2021 at 10:15, Mich Talebzadeh 
>> wrote:
>>
>> Hi,
>>
>>
>>
>> We have three conf parameters to distribute the docker image with
>> spark-sumit in Kubernetes cluster.
>>
>>
>>
>> These are
>>
>>
>>
>> spark-submit --verbose \
>>
>>   --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \
>>
>>--conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \
>>
>>--conf spark.kubernetes.container.image=${IMAGEGCP} \
>>
>>
>>
>> when the above is run, it shows
>>
>>
>>
>> (spark.kubernetes.driver.docker.image,
>> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages
>> )
>>
>> (spark.kubernetes.executor.docker.image,
>> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages
>> )
>>
>> (spark.kubernetes.container.image,
>> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages
>> )
>>
>>
>>
>> You notice that I am using the same docker image for driver, executor and
>> container. In Spark 3.2 (actually in recent spark versions), I cannot see
>> reference to driver or executor. Are these depreciated? It appears that
>> Spark still accepts them?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>  [image: Image removed by sender.]  view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>  h
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

-- 
Regards,
Prasad Paravatha


Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2021-11-30 Thread Prasad Paravatha
This is a great feature/idea.
I'd love to get involved in some form (testing and/or documentation). This
could be my 1st contribution to Spark!

On Tue, Nov 30, 2021 at 10:46 PM John Zhuge  wrote:

> +1 Kudos to Yikun and the community for starting the discussion!
>
> On Tue, Nov 30, 2021 at 8:47 AM Chenya Zhang 
> wrote:
>
>> Thanks folks for bringing up the topic of natively integrating Volcano
>> and other alternative schedulers into Spark!
>>
>> +Weiwei, Wilfred, Chaoran. We would love to contribute to the discussion
>> as well.
>>
>> From our side, we have been using and improving on one alternative
>> resource scheduler, Apache YuniKorn (https://yunikorn.apache.org/), for
>> Spark on Kubernetes in production at Apple with solid results in the past
>> year. It is capable of supporting Gang scheduling (similar to PodGroups),
>> multi-tenant resource queues (similar to YARN), FIFO, and other handy
>> features like bin packing to enable efficient autoscaling, etc.
>>
>> Natively integrating with Spark would provide more flexibility for users
>> and reduce the extra cost and potential inconsistency of maintaining
>> different layers of resource strategies. One interesting topic we hope to
>> discuss more about is dynamic allocation, which would benefit from native
>> coordination between Spark and resource schedulers in K8s &
>> cloud environment for an optimal resource efficiency.
>>
>>
>> On Tue, Nov 30, 2021 at 8:10 AM Holden Karau 
>> wrote:
>>
>>> Thanks for putting this together, I’m really excited for us to add
>>> better batch scheduling integrations.
>>>
>>> On Tue, Nov 30, 2021 at 12:46 AM Yikun Jiang 
>>> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> I'd like to start a discussion on "Support Volcano/Alternative
>>>> Schedulers Proposal".
>>>>
>>>> This SPIP is proposed to make spark k8s schedulers provide more YARN
>>>> like features (such as queues and minimum resources before scheduling jobs)
>>>> that many folks want on Kubernetes.
>>>>
>>>> The goal of this SPIP is to improve current spark k8s scheduler
>>>> implementations, add the ability of batch scheduling and support volcano as
>>>> one of implementations.
>>>>
>>>> Design doc:
>>>> https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg
>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-36057
>>>> Part of PRs:
>>>> Ability to create resources https://github.com/apache/spark/pull/34599
>>>> Add PodGroupFeatureStep: https://github.com/apache/spark/pull/34456
>>>>
>>>> Regards,
>>>> Yikun
>>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>
> --
> John Zhuge
>


-- 
Regards,
Prasad Paravatha


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Prasad Paravatha
Works now. Thanks 
Minor thing, the version naming convention could cause confusion. 
The name on this UI vs the tgz file name. 




> On Oct 19, 2021, at 10:09 AM, Gengliang Wang  wrote:
> 
> 
> Hi Prasad,
> 
> Thanks for reporting the issue. The link was wrong. It should be fixed now.
> Could you try again on https://spark.apache.org/downloads.html?
> 
>> On Tue, Oct 19, 2021 at 10:53 PM Prasad Paravatha 
>>  wrote:
>> https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz
>> 
>> FYI, unable to download from this location. 
>> Also, I don’t see Hadoop 3.3 version in the dist 
>> 
>> 
>>>> On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD 
>>>>  wrote:
>>>> 
>>> 
>>> Many thanks! 
>>> 
>>>  
>>> 
>>> From: Gengliang Wang  
>>> Sent: Dienstag, 19. Oktober 2021 16:16
>>> To: dev ; user 
>>> Subject: [ANNOUNCE] Apache Spark 3.2.0
>>> 
>>>  
>>> 
>>> Hi all,
>>> 
>>>  
>>> 
>>> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
>>> contribution from the open-source community, this release managed to 
>>> resolve in excess of 1,700 Jira tickets.
>>> 
>>>  
>>> 
>>> We'd like to thank our contributors and users for their contributions and 
>>> early feedback to this release. This release would not have been possible 
>>> without you.
>>> 
>>>  
>>> 
>>> To download Spark 3.2.0, head over to the download page: 
>>> https://spark.apache.org/downloads.html
>>> 
>>>  
>>> 
>>> To view the release notes: 
>>> https://spark.apache.org/releases/spark-release-3-2-0.html


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Prasad Paravatha
https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz

FYI, unable to download from this location. 
Also, I don’t see Hadoop 3.3 version in the dist 


> On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD 
>  wrote:
> 
> 
> Many thanks! 
>  
> From: Gengliang Wang  
> Sent: Dienstag, 19. Oktober 2021 16:16
> To: dev ; user 
> Subject: [ANNOUNCE] Apache Spark 3.2.0
>  
> Hi all,
>  
> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
> contribution from the open-source community, this release managed to resolve 
> in excess of 1,700 Jira tickets.
>  
> We'd like to thank our contributors and users for their contributions and 
> early feedback to this release. This release would not have been possible 
> without you.
>  
> To download Spark 3.2.0, head over to the download page: 
> https://spark.apache.org/downloads.html
>  
> To view the release notes: 
> https://spark.apache.org/releases/spark-release-3-2-0.html