RE: [Fork] ]RE: One click to run Spark on Kubernetes

2022-02-23 Thread Agarwal, Janak
Mich,

Not sure I follow you since I do not fully understand what GKE conventional is 
(which at first glance, appears to help customers to setup Kubernetes 
environment).
EMR on EKS offers a fully managed control plane (among other benefits such as 
Spark UI for completed jobs) that allows customers to focus on running Spark 
application on their EKS cluster.

Thanks,
Janak

From: Mich Talebzadeh 
Sent: Wednesday, February 23, 2022 11:54 AM
To: Agarwal, Janak 
Cc: Spark dev list 
Subject: RE: [EXTERNAL] [Fork] ]RE: One click to run Spark on Kubernetes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Thanks Janak,  the same as GKE conventional or GKE autopilot. 


Putting conventional aside, why do you think customers should choose a fully 
managed package for Spark?

thanks




 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile

 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Wed, 23 Feb 2022 at 19:00, Agarwal, Janak 
mailto:jana...@amazon.com>> wrote:
Hey Mich,

EMR on EKS works on both EKS-Fargate 
and EKS-managed/self-managed EC2 based node groups.

Thanks,
Janak

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Sent: Wednesday, February 23, 2022 10:46 AM
To: Agarwal, Janak mailto:jana...@amazon.com>>
Cc: Spark dev list mailto:dev@spark.apache.org>>
Subject: RE: [EXTERNAL] [Fork] ]RE: One click to run Spark on Kubernetes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Janak,

Are you talking about
EKS Fargate?
Thanks







 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile

 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Wed, 23 Feb 2022 at 17:47, Agarwal, Janak 
mailto:jana...@amazon.com>> wrote:
[Reducing to thread participants to avoid spamming the entire community’s 
mailboxes]

Sarath, Bo, Mich,

Have you read about EMR on EKS? We 
help customers to run Spark workloads on EKS. Today, EMR on EKS supports 
running Spark workloads on your EKS cluster. You will need to setup the EKS 
cluster yourself. To achieve one-click, all you really need to do is setup the 
EKS cluster. As mentioned earlier, setting up EKS cluster is fairly simple. We 
can help you to do that if it helps. Want to give EMR on EKS a spin as you 
decide your path forward?


Best,
Janak

From: Sarath Annareddy 
mailto:sarath.annare...@gmail.com>>
Sent: Wednesday, February 23, 2022 7:41 AM
To: bo yang mailto:bobyan...@gmail.com>>
Cc: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>; Spark Dev List 
mailto:dev@spark.apache.org>>; user 
mailto:u...@spark.apache.org>>
Subject: RE: [EXTERNAL] One click to run Spark on Kubernetes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi bo

I am interested to contribute.
But I don’t have free access to any cloud provider. Not sure how I can get free 
access. I know Google, aws, azure only provides temp free access, it may not be 
sufficient.

Guidance is appreciated.

Sarath
Sent from my iPhone

On Feb 23, 2022, at 2:01 AM, bo yang 
mailto:bobyan...@gmail.com>> wrote:

Right, normally people start with simple script, then add more stuff, like 
permission and more components. After some time, people want to run the script 
consistently in different environments. Things will become complex.

That is why we want to see whether people have interest for such a "one click" 
tool to make things easy.


On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
Hi,

There are two distinct actions here; namely Deploy and Run.

Deployment can be done by command 

Re: Recap on current status of "SPIP: Support Customized Kubernetes Schedulers"

2022-02-23 Thread Weiwei Yang
Thank you, Yikun.
I am working on SPARK-37809
 and SPARK-38310
. They are the major
stuff for the yunikorn part.
Keep in mind we also need to add the documents.
Thanks for building up the common things, great work.

On Wed, Feb 23, 2022 at 7:35 PM Yikun Jiang  wrote:

> First, much thanks for all your help (Spark/Volcano/Yunikorn community) to
> make this SPIP happen!
>
> Especially,@dongjoon-hyun @holdenk @william-wang @attilapiros @HyukjinKwon
> @martin-g @yangwwei @tgravescs
>
> The SPIP is near the end of the stage. It can be said that it is beta
> available at the basic level.
>
> I also draft a simple slide to show how to use and help you understand
> what we have done:
>
> https://docs.google.com/presentation/d/1XDsTWPcsBe4PQ-1MlBwd9pRl8mySdziE_dJE6iATNw8
>
> Below are also some recap to help you understand current implementation
> and next step on SPIP:
>
> *# Existing work*
> *## Basic part:*
> - SPARK-36059  *New
> configuration:* ability to specify "schedulerName" in driver/executor for
> Spark on K8S
> - SPARK-37331  *New
> workflow:*ability to create pre-populated resources before driver pod
>  for Spark on K8S
> - SPARK-37145  *New
> developer API:* support user feature step with configuration for Spark on
> K8S
> - *(reviewing)* *New Job Configurations* for Spark on K8S:
>   - SPARK-38188 :
> spark.kubernetes.job.queue
>   - SPARK-38187 :
> spark.kubernetes.job.[minCPU|minMemory]
>   - SPARK-38189 :
> spark.kubernetes.job.priorityClassName
>
> *## Volcano Part:*
> - SPARK-37258  *New
> volcano extension* in kubernetes-client fabric8io/kubernetes-client#3579
> - SPARK-36061  *New
> profile: *-Pvolcano
> - SPARK-36061  *New
> Feature Step:* VolcanoFeatureStep
> - SPARK-36061  *New
> integration test:*
>  *- Passed on x86 and Arm64 (Linux on Huawei Kunpeng 920 and MacOS on
> Apple Silicon M1).*
>  - Test basic volcano workflow
>  - Test all existing tests based on the volcano.
>
> *## Yunikorn Part:*
> @yangwwei  will also make the efforts for Yunikorn module feature step
> since this week.
> I will help to complete the yunikorn integration based on previous
> experience.
>
> *# Next Plan*
> There are also 3 main tasks to be completed before v3.3 code freeze:
> 1. (reviewing) SPARK-38188
> : Support queue
> scheduling configuration
> https://github.com/apache/spark/pull/35553
> 2. (reviewing) SPARK-38187
> : Support resource
> reservation (minCPU/minMemory configuration)
> https://github.com/apache/spark/pull/35640
> 3. (reviewing) SPARK-38187
> : Support priority
> scheduling (priorityClass configuration):
> https://issues.apache.org/jira/browse/SPARK-38189
> https://github.com/apache/spark/pull/35639
> 4. (WIP) SPARK-37809 :
> Yunikorn integration
>
> Also several misc work is gonna be completed before 3.3:
> 1. Integrated volcano deploy into integration test (x86 and arm)
> - Add it to spark kubernetes integration test once cross compile support:
> https://github.com/volcano-sh/volcano/pull/1571
> 2. Complete doc and test guideline.
>
> Please feel free to contact me if you have any other concerns! Thanks!
>
> [1] https://issues.apache.org/jira/browse/SPARK-36057
>


Recap on current status of "SPIP: Support Customized Kubernetes Schedulers"

2022-02-23 Thread Yikun Jiang
First, much thanks for all your help (Spark/Volcano/Yunikorn community) to
make this SPIP happen!

Especially,@dongjoon-hyun @holdenk @william-wang @attilapiros @HyukjinKwon
@martin-g @yangwwei @tgravescs

The SPIP is near the end of the stage. It can be said that it is beta
available at the basic level.

I also draft a simple slide to show how to use and help you understand what
we have done:
https://docs.google.com/presentation/d/1XDsTWPcsBe4PQ-1MlBwd9pRl8mySdziE_dJE6iATNw8

Below are also some recap to help you understand current implementation and
next step on SPIP:

*# Existing work*
*## Basic part:*
- SPARK-36059  *New
configuration:* ability to specify "schedulerName" in driver/executor for
Spark on K8S
- SPARK-37331  *New
workflow:*ability to create pre-populated resources before driver pod  for
Spark on K8S
- SPARK-37145  *New
developer API:* support user feature step with configuration for Spark on
K8S
- *(reviewing)* *New Job Configurations* for Spark on K8S:
  - SPARK-38188 :
spark.kubernetes.job.queue
  - SPARK-38187 :
spark.kubernetes.job.[minCPU|minMemory]
  - SPARK-38189 :
spark.kubernetes.job.priorityClassName

*## Volcano Part:*
- SPARK-37258  *New
volcano extension* in kubernetes-client fabric8io/kubernetes-client#3579
- SPARK-36061  *New
profile: *-Pvolcano
- SPARK-36061  *New
Feature Step:* VolcanoFeatureStep
- SPARK-36061  *New
integration test:*
 *- Passed on x86 and Arm64 (Linux on Huawei Kunpeng 920 and MacOS on Apple
Silicon M1).*
 - Test basic volcano workflow
 - Test all existing tests based on the volcano.

*## Yunikorn Part:*
@yangwwei  will also make the efforts for Yunikorn module feature step
since this week.
I will help to complete the yunikorn integration based on previous
experience.

*# Next Plan*
There are also 3 main tasks to be completed before v3.3 code freeze:
1. (reviewing) SPARK-38188
: Support queue
scheduling configuration
https://github.com/apache/spark/pull/35553
2. (reviewing) SPARK-38187
: Support resource
reservation (minCPU/minMemory configuration)
https://github.com/apache/spark/pull/35640
3. (reviewing) SPARK-38187
: Support priority
scheduling (priorityClass configuration):
https://issues.apache.org/jira/browse/SPARK-38189
https://github.com/apache/spark/pull/35639
4. (WIP) SPARK-37809 :
Yunikorn integration

Also several misc work is gonna be completed before 3.3:
1. Integrated volcano deploy into integration test (x86 and arm)
- Add it to spark kubernetes integration test once cross compile support:
https://github.com/volcano-sh/volcano/pull/1571
2. Complete doc and test guideline.

Please feel free to contact me if you have any other concerns! Thanks!

[1] https://issues.apache.org/jira/browse/SPARK-36057


Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
It uses Helm to deploy Spark Operator and Nginx. For other parts like
creating EKS, IAM role, node group, etc, it uses AWS SDK to provision those
AWS resources.

On Wed, Feb 23, 2022 at 11:28 AM Bjørn Jørgensen 
wrote:

> So if I get this right you will make a Helm  chart to
> deploy Spark and some other stuff on K8S?
>
> ons. 23. feb. 2022 kl. 17:49 skrev bo yang :
>
>> Hi Sarath, let's follow up offline on this.
>>
>> On Wed, Feb 23, 2022 at 8:32 AM Sarath Annareddy <
>> sarath.annare...@gmail.com> wrote:
>>
>>> Hi bo
>>>
>>> How do we start?
>>>
>>> Is there a plan? Onboarding, Arch/design diagram, tasks lined up etc
>>>
>>>
>>> Thanks
>>> Sarath
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 23, 2022, at 10:27 AM, bo yang  wrote:
>>>
>>> 
>>> Hi Sarath, thanks for your interest and willing to contribute! The
>>> project supports local development using MiniKube. Similarly there is a one
>>> click command with one extra argument to deploy all components in MiniKube,
>>> and people could use that to develop on their local MacBook.
>>>
>>>
>>> On Wed, Feb 23, 2022 at 7:41 AM Sarath Annareddy <
>>> sarath.annare...@gmail.com> wrote:
>>>
 Hi bo

 I am interested to contribute.
 But I don’t have free access to any cloud provider. Not sure how I can
 get free access. I know Google, aws, azure only provides temp free access,
 it may not be sufficient.

 Guidance is appreciated.

 Sarath

 Sent from my iPhone

 On Feb 23, 2022, at 2:01 AM, bo yang  wrote:

 

 Right, normally people start with simple script, then add more stuff,
 like permission and more components. After some time, people want to run
 the script consistently in different environments. Things will become
 complex.

 That is why we want to see whether people have interest for such a "one
 click" tool to make things easy.


 On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Hi,
>
> There are two distinct actions here; namely Deploy and Run.
>
> Deployment can be done by command line script with autoscaling. In the
> newer versions of Kubernnetes you don't even need to specify the node
> types, you can leave it to the Kubernetes cluster  to scale up and down 
> and
> decide on node type.
>
> The second point is the running spark that you will need to submit.
> However, that depends on setting up access permission, use of service
> accounts, pulling the correct dockerfiles for the driver and the 
> executors.
> Those details add to the complexity.
>
> Thanks
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Wed, 23 Feb 2022 at 04:06, bo yang  wrote:
>
>> Hi Spark Community,
>>
>> We built an open source tool to deploy and run Spark on Kubernetes
>> with a one click command. For example, on AWS, it could automatically
>> create an EKS cluster, node group, NGINX ingress, and Spark Operator. 
>> Then
>> you will be able to use curl or a CLI tool to submit Spark application.
>> After the deployment, you could also install Uber Remote Shuffle Service 
>> to
>> enable Dynamic Allocation on Kuberentes.
>>
>> Anyone interested in using or working together on such a tool?
>>
>> Thanks,
>> Bo
>>
>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>


Re: [Fork] ]RE: One click to run Spark on Kubernetes

2022-02-23 Thread Mich Talebzadeh
Thanks Janak,  the same as GKE conventional or GKE autopilot.


Putting conventional aside, why do you think customers should choose a
fully managed package* for Spark*?

thanks



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 23 Feb 2022 at 19:00, Agarwal, Janak  wrote:

> Hey Mich,
>
>
>
> EMR on EKS  works on both
> EKS-Fargate and EKS-managed/self-managed EC2 based node groups.
>
>
>
> Thanks,
>
> Janak
>
>
>
> *From:* Mich Talebzadeh 
> *Sent:* Wednesday, February 23, 2022 10:46 AM
> *To:* Agarwal, Janak 
> *Cc:* Spark dev list 
> *Subject:* RE: [EXTERNAL] [Fork] ]RE: One click to run Spark on Kubernetes
>
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> Hi Janak,
>
>
>
> Are you talking about
> EKS Fargate?
> Thanks
>
>
>
>
>
>
>
>
>view my Linkedin profile
> 
>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Wed, 23 Feb 2022 at 17:47, Agarwal, Janak  wrote:
>
> [Reducing to thread participants to avoid spamming the entire community’s
> mailboxes]
>
>
>
> Sarath, Bo, Mich,
>
>
>
> Have you read about EMR on EKS ?
> We help customers to run Spark workloads on EKS. Today, EMR on EKS supports
> running Spark workloads on your EKS cluster. You will need to setup the EKS
> cluster yourself. To achieve one-click, all you really need to do is setup
> the EKS cluster. As mentioned earlier, setting up EKS cluster is fairly
> simple. We can help you to do that if it helps. Want to give EMR on EKS a
> spin as you decide your path forward?
>
> 
>
>
>
> Best,
>
> Janak
>
>
>
> *From:* Sarath Annareddy 
> *Sent:* Wednesday, February 23, 2022 7:41 AM
> *To:* bo yang 
> *Cc:* Mich Talebzadeh ; Spark Dev List <
> dev@spark.apache.org>; user 
> *Subject:* RE: [EXTERNAL] One click to run Spark on Kubernetes
>
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> Hi bo
>
>
>
> I am interested to contribute.
>
> But I don’t have free access to any cloud provider. Not sure how I can get
> free access. I know Google, aws, azure only provides temp free access, it
> may not be sufficient.
>
>
>
> Guidance is appreciated.
>
>
>
> Sarath
>
> Sent from my iPhone
>
>
>
> On Feb 23, 2022, at 2:01 AM, bo yang  wrote:
>
> 
>
> Right, normally people start with simple script, then add more stuff, like
> permission and more components. After some time, people want to run the
> script consistently in different environments. Things will become complex.
>
>
>
> That is why we want to see whether people have interest for such a "one
> click" tool to make things easy.
>
>
>
>
>
> On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
> Hi,
>
>
>
> There are two distinct actions here; namely Deploy and Run.
>
>
>
> Deployment can be done by command line script with autoscaling. In the
> newer versions of Kubernnetes you don't even need to specify the node
> types, you can leave it to the Kubernetes cluster  to scale up and down and
> decide on node type.
>
>
>
> The second point is the running spark that you will need to submit.
> However, that depends on setting up access permission, use of service
> accounts, pulling the correct dockerfiles for the driver and the executors.
> Those details add to the complexity.
>
>
>
> Thanks
>
>
>
>
>
>view my Linkedin profile
> 
>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Wed, 23 Feb 2022 at 

Re: One click to run Spark on Kubernetes

2022-02-23 Thread Bjørn Jørgensen
So if I get this right you will make a Helm  chart to
deploy Spark and some other stuff on K8S?

ons. 23. feb. 2022 kl. 17:49 skrev bo yang :

> Hi Sarath, let's follow up offline on this.
>
> On Wed, Feb 23, 2022 at 8:32 AM Sarath Annareddy <
> sarath.annare...@gmail.com> wrote:
>
>> Hi bo
>>
>> How do we start?
>>
>> Is there a plan? Onboarding, Arch/design diagram, tasks lined up etc
>>
>>
>> Thanks
>> Sarath
>>
>>
>> Sent from my iPhone
>>
>> On Feb 23, 2022, at 10:27 AM, bo yang  wrote:
>>
>> 
>> Hi Sarath, thanks for your interest and willing to contribute! The
>> project supports local development using MiniKube. Similarly there is a one
>> click command with one extra argument to deploy all components in MiniKube,
>> and people could use that to develop on their local MacBook.
>>
>>
>> On Wed, Feb 23, 2022 at 7:41 AM Sarath Annareddy <
>> sarath.annare...@gmail.com> wrote:
>>
>>> Hi bo
>>>
>>> I am interested to contribute.
>>> But I don’t have free access to any cloud provider. Not sure how I can
>>> get free access. I know Google, aws, azure only provides temp free access,
>>> it may not be sufficient.
>>>
>>> Guidance is appreciated.
>>>
>>> Sarath
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 23, 2022, at 2:01 AM, bo yang  wrote:
>>>
>>> 
>>>
>>> Right, normally people start with simple script, then add more stuff,
>>> like permission and more components. After some time, people want to run
>>> the script consistently in different environments. Things will become
>>> complex.
>>>
>>> That is why we want to see whether people have interest for such a "one
>>> click" tool to make things easy.
>>>
>>>
>>> On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi,

 There are two distinct actions here; namely Deploy and Run.

 Deployment can be done by command line script with autoscaling. In the
 newer versions of Kubernnetes you don't even need to specify the node
 types, you can leave it to the Kubernetes cluster  to scale up and down and
 decide on node type.

 The second point is the running spark that you will need to submit.
 However, that depends on setting up access permission, use of service
 accounts, pulling the correct dockerfiles for the driver and the executors.
 Those details add to the complexity.

 Thanks



view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Wed, 23 Feb 2022 at 04:06, bo yang  wrote:

> Hi Spark Community,
>
> We built an open source tool to deploy and run Spark on Kubernetes
> with a one click command. For example, on AWS, it could automatically
> create an EKS cluster, node group, NGINX ingress, and Spark Operator. Then
> you will be able to use curl or a CLI tool to submit Spark application.
> After the deployment, you could also install Uber Remote Shuffle Service 
> to
> enable Dynamic Allocation on Kuberentes.
>
> Anyone interested in using or working together on such a tool?
>
> Thanks,
> Bo
>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297


RE: [Fork] ]RE: One click to run Spark on Kubernetes

2022-02-23 Thread Agarwal, Janak
Hey Mich,

EMR on EKS works on both EKS-Fargate 
and EKS-managed/self-managed EC2 based node groups.

Thanks,
Janak

From: Mich Talebzadeh 
Sent: Wednesday, February 23, 2022 10:46 AM
To: Agarwal, Janak 
Cc: Spark dev list 
Subject: RE: [EXTERNAL] [Fork] ]RE: One click to run Spark on Kubernetes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Janak,

Are you talking about
EKS Fargate?
Thanks







 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile

 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Wed, 23 Feb 2022 at 17:47, Agarwal, Janak 
mailto:jana...@amazon.com>> wrote:
[Reducing to thread participants to avoid spamming the entire community’s 
mailboxes]

Sarath, Bo, Mich,

Have you read about EMR on EKS? We 
help customers to run Spark workloads on EKS. Today, EMR on EKS supports 
running Spark workloads on your EKS cluster. You will need to setup the EKS 
cluster yourself. To achieve one-click, all you really need to do is setup the 
EKS cluster. As mentioned earlier, setting up EKS cluster is fairly simple. We 
can help you to do that if it helps. Want to give EMR on EKS a spin as you 
decide your path forward?


Best,
Janak

From: Sarath Annareddy 
mailto:sarath.annare...@gmail.com>>
Sent: Wednesday, February 23, 2022 7:41 AM
To: bo yang mailto:bobyan...@gmail.com>>
Cc: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>; Spark Dev List 
mailto:dev@spark.apache.org>>; user 
mailto:u...@spark.apache.org>>
Subject: RE: [EXTERNAL] One click to run Spark on Kubernetes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi bo

I am interested to contribute.
But I don’t have free access to any cloud provider. Not sure how I can get free 
access. I know Google, aws, azure only provides temp free access, it may not be 
sufficient.

Guidance is appreciated.

Sarath
Sent from my iPhone

On Feb 23, 2022, at 2:01 AM, bo yang 
mailto:bobyan...@gmail.com>> wrote:

Right, normally people start with simple script, then add more stuff, like 
permission and more components. After some time, people want to run the script 
consistently in different environments. Things will become complex.

That is why we want to see whether people have interest for such a "one click" 
tool to make things easy.


On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
Hi,

There are two distinct actions here; namely Deploy and Run.

Deployment can be done by command line script with autoscaling. In the newer 
versions of Kubernnetes you don't even need to specify the node types, you can 
leave it to the Kubernetes cluster  to scale up and down and decide on node 
type.

The second point is the running spark that you will need to submit. However, 
that depends on setting up access permission, use of service accounts, pulling 
the correct dockerfiles for the driver and the executors. Those details add to 
the complexity.

Thanks




 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile

 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Wed, 23 Feb 2022 at 04:06, bo yang 
mailto:bobyan...@gmail.com>> wrote:
Hi Spark Community,

We built an open source tool to deploy and run Spark on Kubernetes with a one 
click command. For example, on AWS, it could automatically create an EKS 
cluster, node group, NGINX ingress, and Spark Operator. Then you will be able 
to use curl or a CLI tool to submit Spark application. After the deployment, 
you could also install Uber Remote Shuffle Service to enable Dynamic Allocation 
on Kuberentes.

Anyone interested in using or working together on such a tool?

Thanks,
Bo



Re: [Fork] ]RE: One click to run Spark on Kubernetes

2022-02-23 Thread Mich Talebzadeh
Hi Janak,

Are you talking about EKS Fargate?
Thanks





   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 23 Feb 2022 at 17:47, Agarwal, Janak  wrote:

> [Reducing to thread participants to avoid spamming the entire community’s
> mailboxes]
>
>
>
> Sarath, Bo, Mich,
>
>
>
> Have you read about EMR on EKS ?
> We help customers to run Spark workloads on EKS. Today, EMR on EKS supports
> running Spark workloads on your EKS cluster. You will need to setup the EKS
> cluster yourself. To achieve one-click, all you really need to do is setup
> the EKS cluster. As mentioned earlier, setting up EKS cluster is fairly
> simple. We can help you to do that if it helps. Want to give EMR on EKS a
> spin as you decide your path forward?
>
> 
>
>
>
> Best,
>
> Janak
>
>
>
> *From:* Sarath Annareddy 
> *Sent:* Wednesday, February 23, 2022 7:41 AM
> *To:* bo yang 
> *Cc:* Mich Talebzadeh ; Spark Dev List <
> dev@spark.apache.org>; user 
> *Subject:* RE: [EXTERNAL] One click to run Spark on Kubernetes
>
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> Hi bo
>
>
>
> I am interested to contribute.
>
> But I don’t have free access to any cloud provider. Not sure how I can get
> free access. I know Google, aws, azure only provides temp free access, it
> may not be sufficient.
>
>
>
> Guidance is appreciated.
>
>
>
> Sarath
>
> Sent from my iPhone
>
>
>
> On Feb 23, 2022, at 2:01 AM, bo yang  wrote:
>
> 
>
> Right, normally people start with simple script, then add more stuff, like
> permission and more components. After some time, people want to run the
> script consistently in different environments. Things will become complex.
>
>
>
> That is why we want to see whether people have interest for such a "one
> click" tool to make things easy.
>
>
>
>
>
> On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
> Hi,
>
>
>
> There are two distinct actions here; namely Deploy and Run.
>
>
>
> Deployment can be done by command line script with autoscaling. In the
> newer versions of Kubernnetes you don't even need to specify the node
> types, you can leave it to the Kubernetes cluster  to scale up and down and
> decide on node type.
>
>
>
> The second point is the running spark that you will need to submit.
> However, that depends on setting up access permission, use of service
> accounts, pulling the correct dockerfiles for the driver and the executors.
> Those details add to the complexity.
>
>
>
> Thanks
>
>
>
>
>
>view my Linkedin profile
> 
>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Wed, 23 Feb 2022 at 04:06, bo yang  wrote:
>
> Hi Spark Community,
>
>
>
> We built an open source tool to deploy and run Spark on Kubernetes with a
> one click command. For example, on AWS, it could automatically create an
> EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will
> be able to use curl or a CLI tool to submit Spark application. After the
> deployment, you could also install Uber Remote Shuffle Service to enable
> Dynamic Allocation on Kuberentes.
>
>
>
> Anyone interested in using or working together on such a tool?
>
>
>
> Thanks,
>
> Bo
>
>
>
>


Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
Hi Sarath, let's follow up offline on this.

On Wed, Feb 23, 2022 at 8:32 AM Sarath Annareddy 
wrote:

> Hi bo
>
> How do we start?
>
> Is there a plan? Onboarding, Arch/design diagram, tasks lined up etc
>
>
> Thanks
> Sarath
>
>
> Sent from my iPhone
>
> On Feb 23, 2022, at 10:27 AM, bo yang  wrote:
>
> 
> Hi Sarath, thanks for your interest and willing to contribute! The project
> supports local development using MiniKube. Similarly there is a one click
> command with one extra argument to deploy all components in MiniKube, and
> people could use that to develop on their local MacBook.
>
>
> On Wed, Feb 23, 2022 at 7:41 AM Sarath Annareddy <
> sarath.annare...@gmail.com> wrote:
>
>> Hi bo
>>
>> I am interested to contribute.
>> But I don’t have free access to any cloud provider. Not sure how I can
>> get free access. I know Google, aws, azure only provides temp free access,
>> it may not be sufficient.
>>
>> Guidance is appreciated.
>>
>> Sarath
>>
>> Sent from my iPhone
>>
>> On Feb 23, 2022, at 2:01 AM, bo yang  wrote:
>>
>> 
>>
>> Right, normally people start with simple script, then add more stuff,
>> like permission and more components. After some time, people want to run
>> the script consistently in different environments. Things will become
>> complex.
>>
>> That is why we want to see whether people have interest for such a "one
>> click" tool to make things easy.
>>
>>
>> On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> There are two distinct actions here; namely Deploy and Run.
>>>
>>> Deployment can be done by command line script with autoscaling. In the
>>> newer versions of Kubernnetes you don't even need to specify the node
>>> types, you can leave it to the Kubernetes cluster  to scale up and down and
>>> decide on node type.
>>>
>>> The second point is the running spark that you will need to submit.
>>> However, that depends on setting up access permission, use of service
>>> accounts, pulling the correct dockerfiles for the driver and the executors.
>>> Those details add to the complexity.
>>>
>>> Thanks
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Wed, 23 Feb 2022 at 04:06, bo yang  wrote:
>>>
 Hi Spark Community,

 We built an open source tool to deploy and run Spark on Kubernetes with
 a one click command. For example, on AWS, it could automatically create an
 EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will
 be able to use curl or a CLI tool to submit Spark application. After the
 deployment, you could also install Uber Remote Shuffle Service to enable
 Dynamic Allocation on Kuberentes.

 Anyone interested in using or working together on such a tool?

 Thanks,
 Bo




Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-23 Thread John Zhuge
Holden has graciously agreed to shepherd the SPIP. Thanks!

On Thu, Feb 10, 2022 at 9:19 AM John Zhuge  wrote:

> The vote is now closed and the vote passes. Thank you to everyone who took
> the time to review and vote on this SPIP. I’m looking forward to adding
> this feature to the next Spark release. The tracking JIRA is
> https://issues.apache.org/jira/browse/SPARK-31357.
>
> The tally is:
>
> +1s:
>
> Walaa Eldin Moustafa
> Erik Krogen
> Holden Karau (binding)
> Ryan Blue
> Chao Sun
> L C Hsieh (binding)
> Huaxin Gao
> Yufei Gu
> Terry Kim
> Jacky Lee
> Wenchen Fan (binding)
>
> 0s:
>
> -1s:
>
> On Mon, Feb 7, 2022 at 10:04 PM Wenchen Fan  wrote:
>
>> +1 (binding)
>>
>> On Sun, Feb 6, 2022 at 10:27 AM Jacky Lee  wrote:
>>
>>> +1 (non-binding). Thanks John!
>>> It's great to see ViewCatalog moving on, it's a nice feature.
>>>
>>> Terry Kim  于2022年2月5日周六 11:57写道:
>>>
 +1 (non-binding). Thanks John!

 Terry

 On Fri, Feb 4, 2022 at 4:13 PM Yufei Gu  wrote:

> +1 (non-binding)
> Best,
>
> Yufei
>
> `This is not a contribution`
>
>
> On Fri, Feb 4, 2022 at 11:54 AM huaxin gao 
> wrote:
>
>> +1 (non-binding)
>>
>> On Fri, Feb 4, 2022 at 11:40 AM L. C. Hsieh  wrote:
>>
>>> +1
>>>
>>> On Thu, Feb 3, 2022 at 7:25 PM Chao Sun  wrote:
>>> >
>>> > +1 (non-binding). Looking forward to this feature!
>>> >
>>> > On Thu, Feb 3, 2022 at 2:32 PM Ryan Blue  wrote:
>>> >>
>>> >> +1 for the SPIP. I think it's well designed and it has worked
>>> quite well at Netflix for a long time.
>>> >>
>>> >> On Thu, Feb 3, 2022 at 2:04 PM John Zhuge 
>>> wrote:
>>> >>>
>>> >>> Hi Spark community,
>>> >>>
>>> >>> I’d like to restart the vote for the ViewCatalog design proposal
>>> (SPIP).
>>> >>>
>>> >>> The proposal is to add a ViewCatalog interface that can be used
>>> to load, create, alter, and drop views in DataSourceV2.
>>> >>>
>>> >>> Please vote on the SPIP until Feb. 9th (Wednesday).
>>> >>>
>>> >>> [ ] +1: Accept the proposal as an official SPIP
>>> >>> [ ] +0
>>> >>> [ ] -1: I don’t think this is a good idea because …
>>> >>>
>>> >>> Thanks!
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Ryan Blue
>>> >> Tabular
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>
> --
> John Zhuge
>


-- 
John Zhuge


Re: One click to run Spark on Kubernetes

2022-02-23 Thread Sarath Annareddy
Hi bo

How do we start?

Is there a plan? Onboarding, Arch/design diagram, tasks lined up etc


Thanks 
Sarath 


Sent from my iPhone

> On Feb 23, 2022, at 10:27 AM, bo yang  wrote:
> 
> 
> Hi Sarath, thanks for your interest and willing to contribute! The project 
> supports local development using MiniKube. Similarly there is a one click 
> command with one extra argument to deploy all components in MiniKube, and 
> people could use that to develop on their local MacBook.
> 
> 
>> On Wed, Feb 23, 2022 at 7:41 AM Sarath Annareddy 
>>  wrote:
>> Hi bo
>> 
>> I am interested to contribute. 
>> But I don’t have free access to any cloud provider. Not sure how I can get 
>> free access. I know Google, aws, azure only provides temp free access, it 
>> may not be sufficient.
>> 
>> Guidance is appreciated.
>> 
>> Sarath 
>> 
>> Sent from my iPhone
>> 
 On Feb 23, 2022, at 2:01 AM, bo yang  wrote:
 
>>> 
>> 
>>> Right, normally people start with simple script, then add more stuff, like 
>>> permission and more components. After some time, people want to run the 
>>> script consistently in different environments. Things will become complex.
>>> 
>>> That is why we want to see whether people have interest for such a "one 
>>> click" tool to make things easy.
>>> 
>>> 
 On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh 
  wrote:
 Hi,
 
 There are two distinct actions here; namely Deploy and Run.
 
 Deployment can be done by command line script with autoscaling. In the 
 newer versions of Kubernnetes you don't even need to specify the node 
 types, you can leave it to the Kubernetes cluster  to scale up and down 
 and decide on node type.
 
 The second point is the running spark that you will need to submit. 
 However, that depends on setting up access permission, use of service 
 accounts, pulling the correct dockerfiles for the driver and the 
 executors. Those details add to the complexity.
 
 Thanks
 
 
view my Linkedin profile
 
 
 
  https://en.everybodywiki.com/Mich_Talebzadeh
 
  
 Disclaimer: Use it at your own risk. Any and all responsibility for any 
 loss, damage or destruction of data or any other property which may arise 
 from relying on this email's technical content is explicitly disclaimed. 
 The author will in no case be liable for any monetary damages arising from 
 such loss, damage or destruction.
  
 
 
> On Wed, 23 Feb 2022 at 04:06, bo yang  wrote:
> Hi Spark Community,
> 
> We built an open source tool to deploy and run Spark on Kubernetes with a 
> one click command. For example, on AWS, it could automatically create an 
> EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will 
> be able to use curl or a CLI tool to submit Spark application. After the 
> deployment, you could also install Uber Remote Shuffle Service to enable 
> Dynamic Allocation on Kuberentes.
> 
> Anyone interested in using or working together on such a tool?
> 
> Thanks,
> Bo
> 


Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
Hi Sarath, thanks for your interest and willing to contribute! The project
supports local development using MiniKube. Similarly there is a one click
command with one extra argument to deploy all components in MiniKube, and
people could use that to develop on their local MacBook.


On Wed, Feb 23, 2022 at 7:41 AM Sarath Annareddy 
wrote:

> Hi bo
>
> I am interested to contribute.
> But I don’t have free access to any cloud provider. Not sure how I can get
> free access. I know Google, aws, azure only provides temp free access, it
> may not be sufficient.
>
> Guidance is appreciated.
>
> Sarath
>
> Sent from my iPhone
>
> On Feb 23, 2022, at 2:01 AM, bo yang  wrote:
>
> 
>
> Right, normally people start with simple script, then add more stuff, like
> permission and more components. After some time, people want to run the
> script consistently in different environments. Things will become complex.
>
> That is why we want to see whether people have interest for such a "one
> click" tool to make things easy.
>
>
> On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> There are two distinct actions here; namely Deploy and Run.
>>
>> Deployment can be done by command line script with autoscaling. In the
>> newer versions of Kubernnetes you don't even need to specify the node
>> types, you can leave it to the Kubernetes cluster  to scale up and down and
>> decide on node type.
>>
>> The second point is the running spark that you will need to submit.
>> However, that depends on setting up access permission, use of service
>> accounts, pulling the correct dockerfiles for the driver and the executors.
>> Those details add to the complexity.
>>
>> Thanks
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 23 Feb 2022 at 04:06, bo yang  wrote:
>>
>>> Hi Spark Community,
>>>
>>> We built an open source tool to deploy and run Spark on Kubernetes with
>>> a one click command. For example, on AWS, it could automatically create an
>>> EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will
>>> be able to use curl or a CLI tool to submit Spark application. After the
>>> deployment, you could also install Uber Remote Shuffle Service to enable
>>> Dynamic Allocation on Kuberentes.
>>>
>>> Anyone interested in using or working together on such a tool?
>>>
>>> Thanks,
>>> Bo
>>>
>>>


Re: One click to run Spark on Kubernetes

2022-02-23 Thread Sarath Annareddy
Hi bo

I am interested to contribute. 
But I don’t have free access to any cloud provider. Not sure how I can get free 
access. I know Google, aws, azure only provides temp free access, it may not be 
sufficient.

Guidance is appreciated.

Sarath 

Sent from my iPhone

> On Feb 23, 2022, at 2:01 AM, bo yang  wrote:
> 
> 
> Right, normally people start with simple script, then add more stuff, like 
> permission and more components. After some time, people want to run the 
> script consistently in different environments. Things will become complex.
> 
> That is why we want to see whether people have interest for such a "one 
> click" tool to make things easy.
> 
> 
>> On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh  
>> wrote:
>> Hi,
>> 
>> There are two distinct actions here; namely Deploy and Run.
>> 
>> Deployment can be done by command line script with autoscaling. In the newer 
>> versions of Kubernnetes you don't even need to specify the node types, you 
>> can leave it to the Kubernetes cluster  to scale up and down and decide on 
>> node type.
>> 
>> The second point is the running spark that you will need to submit. However, 
>> that depends on setting up access permission, use of service accounts, 
>> pulling the correct dockerfiles for the driver and the executors. Those 
>> details add to the complexity.
>> 
>> Thanks
>> 
>> 
>>view my Linkedin profile
>> 
>> 
>> 
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>> 
>>  
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed. The 
>> author will in no case be liable for any monetary damages arising from such 
>> loss, damage or destruction.
>>  
>> 
>> 
>>> On Wed, 23 Feb 2022 at 04:06, bo yang  wrote:
>>> Hi Spark Community,
>>> 
>>> We built an open source tool to deploy and run Spark on Kubernetes with a 
>>> one click command. For example, on AWS, it could automatically create an 
>>> EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will 
>>> be able to use curl or a CLI tool to submit Spark application. After the 
>>> deployment, you could also install Uber Remote Shuffle Service to enable 
>>> Dynamic Allocation on Kuberentes.
>>> 
>>> Anyone interested in using or working together on such a tool?
>>> 
>>> Thanks,
>>> Bo
>>> 


Spark 3.1.3 docker pre-built with Python Data science packages

2022-02-23 Thread Mich Talebzadeh
Some people asked me whether it was possible to create a docker file (spark
3.1.3) with Python packages geared towards DS etc., having the following
pre-built  packages

pyyaml TensorFlow Theano Pandas Keras NumPy SciPy Scrapy SciKit-Learn
XGBoost Matplotlib Seaborn Bokeh Plotly pydot Statsmodels


Ok I built and pushed this to the docker repository. It is called


spark-py-pthonpackages-3.1.3-scala_2.12-11-jre-slim-buster



It is 1.3 GB compared to the normal spark-py package of 432.79 MB

and you can download it from


https://hub.docker.com/repository/docker/michtalebzadeh/spark_dockerfiles/tags?page=1=last_updated


These are the loaded packages from inside this docker


docker run -u 0 -it 7621929f9c97 bash

root@bb71cb7a89de:/opt/spark/work-dir# pip list

Package  Version

 ---

absl-py  1.0.0

astunparse   1.6.3

attrs21.4.0

Automat  20.2.0

bokeh2.4.2

cachetools   5.0.0

certifi  2021.10.8

cffi 1.15.0

charset-normalizer   2.0.12

constantly   15.1.0

cryptography 36.0.1

cssselect1.1.0

cycler   0.11.0

flatbuffers  2.0

fonttools4.29.1

gast 0.5.3

google-auth  2.6.0

google-auth-oauthlib 0.4.6

google-pasta 0.2.0

grpcio   1.44.0

h2   3.2.0

h5py 3.6.0

hpack3.0.0

hyperframe   5.2.0

hyperlink21.0.0

idna 3.3

importlib-metadata   4.11.1

incremental  21.3.0

itemadapter  0.4.0

itemloaders  1.0.4

Jinja2   3.0.3

jmespath 0.10.0

joblib   1.1.0

keras2.8.0

Keras-Preprocessing  1.1.2

kiwisolver   1.3.2

libclang 13.0.0

lxml 4.8.0

Markdown 3.3.6

MarkupSafe   2.1.0

matplotlib   3.5.1

numpy1.22.2

oauthlib 3.2.0

opt-einsum   3.3.0

packaging21.3

pandas   1.4.1

parsel   1.6.0

patsy0.5.2

Pillow   9.0.1

pip  22.0.3

plotly   5.6.0

priority 1.3.0

Protego  0.2.1

protobuf 3.19.4

pyasn1   0.4.8

pyasn1-modules   0.2.8

pycparser2.21

PyDispatcher 2.0.5

pydot1.4.2

pyOpenSSL22.0.0

pyparsing3.0.7

python-dateutil  2.8.2

pytz 2021.3

PyYAML   6.0

queuelib 1.6.2

requests 2.27.1

requests-oauthlib1.3.1

rsa  4.8

scikit-learn 1.0.2

scipy1.8.0

Scrapy   2.5.1

seaborn  0.11.2

service-identity 21.1.0

setuptools   60.9.3

six  1.16.0

statsmodels  0.13.2

tenacity 8.0.1

tensorboard  2.8.0

tensorboard-data-server  0.6.1

tensorboard-plugin-wit   1.8.1

tensorflow   2.8.0

tensorflow-io-gcs-filesystem 0.24.0

termcolor1.1.0

tf-estimator-nightly 2.8.0.dev2021122109

Theano   1.0.5

threadpoolctl3.1.0

tornado  6.1

Twisted  22.1.0

typing_extensions4.1.1

urllib3  1.26.8

w3lib1.22.0

Werkzeug 2.0.3

wheel0.34.2

wrapt1.13.3

xgboost  1.5.2

zipp 3.7.0

zope.interface   5.4.0

Let me know how it works for you.


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly 

Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
Right, normally people start with simple script, then add more stuff, like
permission and more components. After some time, people want to run the
script consistently in different environments. Things will become complex.

That is why we want to see whether people have interest for such a "one
click" tool to make things easy.


On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh 
wrote:

> Hi,
>
> There are two distinct actions here; namely Deploy and Run.
>
> Deployment can be done by command line script with autoscaling. In the
> newer versions of Kubernnetes you don't even need to specify the node
> types, you can leave it to the Kubernetes cluster  to scale up and down and
> decide on node type.
>
> The second point is the running spark that you will need to submit.
> However, that depends on setting up access permission, use of service
> accounts, pulling the correct dockerfiles for the driver and the executors.
> Those details add to the complexity.
>
> Thanks
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 23 Feb 2022 at 04:06, bo yang  wrote:
>
>> Hi Spark Community,
>>
>> We built an open source tool to deploy and run Spark on Kubernetes with a
>> one click command. For example, on AWS, it could automatically create an
>> EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will
>> be able to use curl or a CLI tool to submit Spark application. After the
>> deployment, you could also install Uber Remote Shuffle Service to enable
>> Dynamic Allocation on Kuberentes.
>>
>> Anyone interested in using or working together on such a tool?
>>
>> Thanks,
>> Bo
>>
>>