yangwwei edited a comment on pull request #35663: URL: https://github.com/apache/spark/pull/35663#issuecomment-1060230905
Hi @dongjoon-hyun I want to work with you and see what is the best way to solve this. Apologies for this long comment, there are mainly 2 parts: 1) why yunikorn feature step; 2) if not feature step, what's the alternative. ### 1) why yunikorn feature step In the [proposal](https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg), the goal is to provide users a customizable and consistent fashion to use a 3rd party K8s scheduler for Spark, we proposed the following user-facing changes when submitting spark jobs: ``` --conf spark.kubernete.driver.scheduler.name=xxx --conf spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.scheduler.XxxFeatureStep --conf spark.kubernete.job.queue default --conf spark.kubernete.job.min.cpu 4 --conf spark.kubernete.job.min.memory 8G ``` both Volcano and YuniKorn will honor these configs and set up driver/executor pods accordingly via feature step. That's why I was waiting for @Yikun to finish the general stuff in the K8s feature step code implementation and then submitted this PR. For yunikorn side logic, there is no major difference from Volcano. We need to set up a few pod annotations for appID, job queue, and also K8s CRD. In the case of yunikorn, it is application.yunikorn.apache.org, [CRD definition](https://github.com/apache/incubator-yunikorn-release/blob/master/helm-charts/yunikorn/templates/crds/application-definition.yaml). The only difference is, PodGroup is a mandatory resource required by Volcano, but app CRD is optional in YuniKorn. So in the 1st phase, my PR doesn't introduce the CRD creation, but at least we have the basic integration working. BTW, yunikorn has already passed the ASF graduation votes, so it will become to be an Apache TLP in a few weeks. ### 2) if not feature step, what's the alternative @Yikun summarize the alternative [here](https://github.com/apache/spark/pull/35663#issuecomment-1056101229): use the annotation placeholders introduced via: https://github.com/apache/spark/pull/35704. I looked into this approach, that looks like we will need to set up something like: ``` --conf spark.kubernete.driver.scheduler.name=yunikorn --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} --conf spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} --conf spark.kubernete.job.queue default --conf spark.kubernete.job.min.cpu 4 --conf spark.kubernete.job.min.memory 8G ``` this can work for the 1st phase. However, I am not sure how to achieve our 2nd phase target when the [CRD](https://github.com/apache/incubator-yunikorn-release/blob/master/helm-charts/yunikorn/templates/crds/application-definition.yaml) is introduced. Are you suggesting for the 1st phase we use this approach, and add the feature step in the 2nd phase? Will that lead to a different user experience for the end-users? I really appreciate it if you can share your thoughts. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
