yangwwei commented on pull request #35663: URL: https://github.com/apache/spark/pull/35663#issuecomment-1056075063
hi @dongjoon-hyun thank you for sharing your thoughts in the [PR](https://github.com/apache/spark/pull/35704). Maybe this PR gives the impression that we just need to add some annotations, but the actual integration will be much more complicated than that. AppID and queue name are the very first thing for integrating with a scheduler, to take advantage of the scheduler features, there are more than that. If you look at [this section in the SPIP doc](https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg/edit#heading=h.s1rofza4711d), it gives some more context. A full story will require us to support many things that (most of them) are already supported in YARN, such as priority, preemption, gang scheduling, etc. For these features, we will need to add more logic to tweak pod spec, or add additional K8s resources. And different scheduler implementations have different semantics to support these features. That's why we want to introduce scheduler feature step, in order to customize this with e.g VolcanoFeatureStep, YuniKornFeatureStep. The 1st phase for yunikorn, as well as volcano, is simple: let the Spark job be able to be scheduled by a customized scheduler natively. But it doesn't stop here, based on the added feature step, we can do more integration in the 2nd, 3rd phases. Hope this clarifies things. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
