yangwwei commented on pull request #35663:
URL: https://github.com/apache/spark/pull/35663#issuecomment-1060230905


   Hi @dongjoon-hyun 
   
   I want to work with you and see what is the best way to solve this.  
Apologies for this long comment, there are mainly 2 parts: 1) why yunikorn 
feature step; 2) if not feature step, what's the alternative.
   
   ### 1) why yunikorn feature step
   
   In the 
[proposal](https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg),
 the goal is to provide users a customizable and consistent fashion to use a 
3rd party K8s scheduler for Spark, we proposed the following user-facing 
changes when submitting spark jobs:
   
   ```
   --conf spark.kubernete.driver.scheduler.name=xxx 
   --conf 
spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.scheduler.XxxFeatureStep
   --conf spark.kubernete.job.queue default
   --conf spark.kubernete.job.min.cpu 4
   --conf spark.kubernete.job.min.memory 8G
   ```
   
   both Volcano and YuniKorn will honor these configs and set up 
driver/executor pods accordingly via feature step. That's why I was waiting for 
@Yikun to finish the general stuff in the K8s feature step code implementation 
and then submitted this PR. For yunikorn side logic, there is no major 
difference from Volcano. We need to set up a few pod annotations for appID, job 
queue, and also K8s CRD. In the case of yunikorn, it is 
application.yunikorn.apache.org, [CRD 
definition](https://github.com/apache/incubator-yunikorn-release/blob/master/helm-charts/yunikorn/templates/crds/application-definition.yaml).
 The only difference is, PodGroup is a mandatory resource required by Volcano, 
but app CRD is optional in YuniKorn. So in the 1st phase, my PR doesn't 
introduce the CRD creation, but at least we have the basic integration working.
   
    ### 1) if not feature step, what's the alternative
   
   @Yikun summarize the alternative 
[here](https://github.com/apache/spark/pull/35663#issuecomment-1056101229): use 
the annotation placeholders introduced via: 
https://github.com/apache/spark/pull/35704. I looked into this approach, that 
looks like we will need to set up something like:
   
    ```
   --conf spark.kubernete.driver.scheduler.name=yunikorn 
   --conf 
spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}}
   --conf 
spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}}
   --conf spark.kubernete.job.queue default
   --conf spark.kubernete.job.min.cpu 4
   --conf spark.kubernete.job.min.memory 8G
   ```
   
   this can work for the 1st phase. However, I am not sure how to achieve our 
2nd phase target when the 
[CRD](https://github.com/apache/incubator-yunikorn-release/blob/master/helm-charts/yunikorn/templates/crds/application-definition.yaml)
 is introduced. Are you suggesting for the 1st phase we use this approach, and 
add the feature step in the 2nd phase? Will that lead to a different user 
experience for the end-users?
   
   I really appreciate it if you can share your thoughts. Thanks!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to