Spark on Kubernetes focused workshops

2022-05-19 Thread Agarwal, Janak


Team, is there a meetup or workshop focused on Spark on Kubernetes?
If not, any interest in creating a once a month sync to exchange notes and best 
practices?

Thanks,
Janak


RE: [Fork] ]RE: One click to run Spark on Kubernetes

2022-02-23 Thread Agarwal, Janak
Mich,

Not sure I follow you since I do not fully understand what GKE conventional is 
(which at first glance, appears to help customers to setup Kubernetes 
environment).
EMR on EKS offers a fully managed control plane (among other benefits such as 
Spark UI for completed jobs) that allows customers to focus on running Spark 
application on their EKS cluster.

Thanks,
Janak

From: Mich Talebzadeh 
Sent: Wednesday, February 23, 2022 11:54 AM
To: Agarwal, Janak 
Cc: Spark dev list 
Subject: RE: [EXTERNAL] [Fork] ]RE: One click to run Spark on Kubernetes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Thanks Janak,  the same as GKE conventional or GKE autopilot. 
<https://cloud.google.com/kubernetes-engine>

Putting conventional aside, why do you think customers should choose a fully 
managed package for Spark?

thanks




 
[https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Wed, 23 Feb 2022 at 19:00, Agarwal, Janak 
mailto:jana...@amazon.com>> wrote:
Hey Mich,

EMR on EKS<https://aws.amazon.com/emr/features/eks/> works on both EKS-Fargate 
and EKS-managed/self-managed EC2 based node groups.

Thanks,
Janak

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Sent: Wednesday, February 23, 2022 10:46 AM
To: Agarwal, Janak mailto:jana...@amazon.com>>
Cc: Spark dev list mailto:dev@spark.apache.org>>
Subject: RE: [EXTERNAL] [Fork] ]RE: One click to run Spark on Kubernetes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Janak,

Are you talking about
EKS Fargate?
Thanks







 
[https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Wed, 23 Feb 2022 at 17:47, Agarwal, Janak 
mailto:jana...@amazon.com>> wrote:
[Reducing to thread participants to avoid spamming the entire community’s 
mailboxes]

Sarath, Bo, Mich,

Have you read about EMR on EKS<https://aws.amazon.com/emr/features/eks/>? We 
help customers to run Spark workloads on EKS. Today, EMR on EKS supports 
running Spark workloads on your EKS cluster. You will need to setup the EKS 
cluster yourself. To achieve one-click, all you really need to do is setup the 
EKS cluster. As mentioned earlier, setting up EKS cluster is fairly simple. We 
can help you to do that if it helps. Want to give EMR on EKS a spin as you 
decide your path forward?


Best,
Janak

From: Sarath Annareddy 
mailto:sarath.annare...@gmail.com>>
Sent: Wednesday, February 23, 2022 7:41 AM
To: bo yang mailto:bobyan...@gmail.com>>
Cc: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>; Spark Dev List 
mailto:dev@spark.apache.org>>; user 
mailto:u...@spark.apache.org>>
Subject: RE: [EXTERNAL] One click to run Spark on Kubernetes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi bo

I am interested to contribute.
But I don’t have free access to any cloud provider. Not sure how I can get free 
access. I know Google, aws, azure only provides temp free access, it may not be 
sufficient.

Guidance is appreciated.

Sarath
Sent from my iPhone

On Feb 23, 2022, at 2:01 AM, bo yang 
mailto:bobyan...@gmail.com>> wrote:

Right, normally people start with simple script, then add more stuff, like 
permission and more components. After some time, people want to run the script 
consistently in different environments. Things will become complex.

That is why we want to see whether people have interest for such a "one click" 
tool to make things easy.


On Tue, Feb 22, 2022 at 11:31 PM Mich Taleb

RE: [Fork] ]RE: One click to run Spark on Kubernetes

2022-02-23 Thread Agarwal, Janak
Hey Mich,

EMR on EKS<https://aws.amazon.com/emr/features/eks/> works on both EKS-Fargate 
and EKS-managed/self-managed EC2 based node groups.

Thanks,
Janak

From: Mich Talebzadeh 
Sent: Wednesday, February 23, 2022 10:46 AM
To: Agarwal, Janak 
Cc: Spark dev list 
Subject: RE: [EXTERNAL] [Fork] ]RE: One click to run Spark on Kubernetes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Janak,

Are you talking about
EKS Fargate?
Thanks







 
[https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Wed, 23 Feb 2022 at 17:47, Agarwal, Janak 
mailto:jana...@amazon.com>> wrote:
[Reducing to thread participants to avoid spamming the entire community’s 
mailboxes]

Sarath, Bo, Mich,

Have you read about EMR on EKS<https://aws.amazon.com/emr/features/eks/>? We 
help customers to run Spark workloads on EKS. Today, EMR on EKS supports 
running Spark workloads on your EKS cluster. You will need to setup the EKS 
cluster yourself. To achieve one-click, all you really need to do is setup the 
EKS cluster. As mentioned earlier, setting up EKS cluster is fairly simple. We 
can help you to do that if it helps. Want to give EMR on EKS a spin as you 
decide your path forward?


Best,
Janak

From: Sarath Annareddy 
mailto:sarath.annare...@gmail.com>>
Sent: Wednesday, February 23, 2022 7:41 AM
To: bo yang mailto:bobyan...@gmail.com>>
Cc: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>; Spark Dev List 
mailto:dev@spark.apache.org>>; user 
mailto:u...@spark.apache.org>>
Subject: RE: [EXTERNAL] One click to run Spark on Kubernetes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi bo

I am interested to contribute.
But I don’t have free access to any cloud provider. Not sure how I can get free 
access. I know Google, aws, azure only provides temp free access, it may not be 
sufficient.

Guidance is appreciated.

Sarath
Sent from my iPhone

On Feb 23, 2022, at 2:01 AM, bo yang 
mailto:bobyan...@gmail.com>> wrote:

Right, normally people start with simple script, then add more stuff, like 
permission and more components. After some time, people want to run the script 
consistently in different environments. Things will become complex.

That is why we want to see whether people have interest for such a "one click" 
tool to make things easy.


On Tue, Feb 22, 2022 at 11:31 PM Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
Hi,

There are two distinct actions here; namely Deploy and Run.

Deployment can be done by command line script with autoscaling. In the newer 
versions of Kubernnetes you don't even need to specify the node types, you can 
leave it to the Kubernetes cluster  to scale up and down and decide on node 
type.

The second point is the running spark that you will need to submit. However, 
that depends on setting up access permission, use of service accounts, pulling 
the correct dockerfiles for the driver and the executors. Those details add to 
the complexity.

Thanks




 
[https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Wed, 23 Feb 2022 at 04:06, bo yang 
mailto:bobyan...@gmail.com>> wrote:
Hi Spark Community,

We built an open source tool to deploy and run Spark on Kubernetes with a one 
click command. For example, on AWS, it could automatically create an EKS 
cluster, node group, NGINX ingress, and Spark Operator. Then you will be able 
to use curl or a CLI tool to submit Spark application. After the deployment, 
you could also install Uber Remote Shuffle Service to enable Dynamic Allocation 
on Kuberentes.

Anyone interested in using or working together on such a tool?

Thanks,
Bo



RE: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2022-01-04 Thread Agarwal, Janak
Hello Folks, Happy new year to one and all.

I’m from the EMR on EKS team. We help 
customers to run Spark workloads on Kubernetes.
My team had similar ideas, and we have also sourced requirements from customers 
who use EMR on EKS / Spark on EKS. Would love to participate in the design to 
help solve the problem for the vast majority of Spark on Kubernetes users.

Any guidance on how to best contribute?

Best,
Janak

From: Mich Talebzadeh 
Sent: Tuesday, January 4, 2022 2:12 AM
To: Yikun Jiang 
Cc: dev ; Weiwei Yang ; Holden Karau 
; wang.platf...@gmail.com; Prasad Paravatha 
; John Zhuge ; Chenya Zhang 
; Chaoran Yu ; Wilfred 
Spiegelenburg ; Klaus Ma 
Subject: RE: [EXTERNAL] [DISCUSSION] SPIP: Support Volcano/Alternative 
Schedulers Proposal


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Interesting,thanks

Do you have any indication of the ballpark figure (a rough numerical estimate) 
of adding Volcano as an alternative scheduler is going to improve Spark on k8s 
performance?

Thanks



 
[https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Tue, 4 Jan 2022 at 09:43, Yikun Jiang 
mailto:yikunk...@gmail.com>> wrote:

Hi, folks! Wishing you all the best in 2022.


I'd like to share the current status on "Support Customized K8S Scheduler in 
Spark".

https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg/edit#heading=h.1quyr1r2kr5n


Framework/Common support

- Volcano and Yunikorn team join the discussion and complete the initial doc on 
framework/common part.

- SPARK-37145 (under 
reviewing): We proposed to extend the customized scheduler by just using a 
custom feature step, it will meet the requirement of customized scheduler after 
it gets merged. After this, the user can enable featurestep and scheduler like:

spark-submit \

--conf 
spark.kubernete.scheduler.name volcano \

--conf spark.kubernetes.driver.pod.featureSteps 
org.apache.spark.deploy.k8s.features.scheduler.VolcanoFeatureStep

--conf spark.kubernete.job.queue xxx

(such as above, the VolcanoFeatureStep will help to set the the spark scheduler 
queue according user specified conf)

- SPARK-37331: Added the 
ability to create kubernetes resources before driver pod creation.

- SPARK-36059: Add the 
ability to specify a scheduler in driver/executor

After above all, the framework/common support would be ready for most of 
customized schedulers



Volcano part:

- SPARK-37258: Upgrade 
kubernetes-client to 5.11.1 to add volcano scheduler API support.

- SPARK-36061: Add a 
VolcanoFeatureStep to help users to create a PodGroup with user specified 
minimum resources required, there is also a WIP commit to show the preview of 
this.


Yunikorn part:

- @WeiweiYang is completing the doc of the Yunikorn part and implementing the 
Yunikorn part.


Regards,
Yikun


Weiwei Yang mailto:w...@apache.org>> 于2021年12月2日周四 02:00写道:
Thank you Yikun for the info, and thanks for inviting me to a meeting to 
discuss this.
I appreciate your effort to put these together, and I agree that the purpose is 
to make Spark easy/flexible enough to support other K8s schedulers (not just 
for Volcano).
As discussed, could you please help to abstract out the things in common and 
allow Spark to plug different implementations? I'd be happy to work with you 
guys on this issue.


On Tue, Nov 30, 2021 at 6:49 PM Yikun Jiang 
mailto:yikunk...@gmail.com>> wrote:
@Weiwei @Chenya

> Thanks for bringing this up. This is quite interesting, we definitely should 
> participate more in the discussions.

Thanks for your reply and welcome to join the discussion, I think the input 
from Yunikorn is very critical.

> The main thing here is, the Spark community should make Spark pluggable in 
> order to support other schedulers, not just for Volcano. It looks like this 
> proposal is pushing really hard for adopting PodGroup, which isn't part of 
> K8s yet, that to me is problematic.

Definitely yes, we are o