Thanks Klaus! I am interested in more details. On Wed, Jun 23, 2021 at 6:54 PM Klaus Ma <klaus1982...@gmail.com> wrote:
> Hi team, > > I'm kube-batch/Volcano founder, and I'm excited to hear that the spark > community also has such requirements :) > > Volcano provides several features for batch workload, e.g. fair-share, > queue, reservation, preemption/reclaim and so on. > It has been used in several product environments with Spark; if necessary, > I can give an overall introduction about Volcano's features and those use > cases :) > > -- Klaus > > On Wed, Jun 23, 2021 at 11:26 PM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> >> >> Please allow me to be diverse and express a different point of view on >> this roadmap. >> >> >> I believe from a technical point of view spending time and effort plus >> talent on batch scheduling on Kubernetes could be rewarding. However, if I >> may say I doubt whether such an approach and the so-called democratization >> of Spark on whatever platform is really should be of great focus. >> >> Having worked on Google Dataproc <https://cloud.google.com/dataproc> (A fully >> managed and highly scalable service for running Apache Spark, Hadoop and >> more recently other artefacts) for that past two years, and Spark on >> Kubernetes on-premise, I have come to the conclusion that Spark is not a >> beast that that one can fully commoditize it much like one can do with >> Zookeeper, Kafka etc. There is always a struggle to make some niche areas >> of Spark like Spark Structured Streaming (SSS) work seamlessly and >> effortlessly on these commercial platforms with whatever as a Service. >> >> >> Moreover, Spark (and I stand corrected) from the ground up has already a >> lot of resiliency and redundancy built in. It is truly an enterprise class >> product (requires enterprise class support) that will be difficult to >> commoditize with Kubernetes and expect the same performance. After all, >> Kubernetes is aimed at efficient resource sharing and potential cost saving >> for the mass market. In short I can see commercial enterprises will work on >> these platforms ,but may be the great talents on dev team should focus on >> stuff like the perceived limitation of SSS in dealing with chain of >> aggregation( if I am correct it is not yet supported on streaming datasets) >> >> >> These are my opinions and they are not facts, just opinions so to speak :) >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Fri, 18 Jun 2021 at 23:18, Holden Karau <hol...@pigscanfly.ca> wrote: >> >>> I think these approaches are good, but there are limitations (eg dynamic >>> scaling) without us making changes inside of the Spark Kube scheduler. >>> >>> Certainly whichever scheduler extensions we add support for we should >>> collaborate with the people developing those extensions insofar as they are >>> interested. My first place that I checked was #sig-scheduling which is >>> fairly quite on the Kubernetes slack but if there are more places to look >>> for folks interested in batch scheduling on Kubernetes we should definitely >>> give it a shot :) >>> >>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Regarding your point and I quote >>>> >>>> ".. I know that one of the Spark on Kube operators >>>> supports volcano/kube-batch so I was thinking that might be a place I would >>>> start exploring..." >>>> >>>> There seems to be ongoing work on say Volcano as part of Cloud Native >>>> Computing Foundation <https://cncf.io/> (CNCF). For example through >>>> https://github.com/volcano-sh/volcano >>>> >>> <https://github.com/volcano-sh/volcano> >>>> >>>> There may be value-add in collaborating with such groups through CNCF >>>> in order to have a collective approach to such work. There also seems to be >>>> some work on Integration of Spark with Volcano for Batch Scheduling. >>>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md> >>>> >>>> >>>> >>>> What is not very clear is the degree of progress of these projects. You >>>> may be kind enough to elaborate on KPI for each of these projects and where >>>> you think your contributions is going to be. >>>> >>>> >>>> HTH, >>>> >>>> >>>> Mich >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Fri, 18 Jun 2021 at 00:44, Holden Karau <hol...@pigscanfly.ca> >>>> wrote: >>>> >>>>> Hi Folks, >>>>> >>>>> I'm continuing my adventures to make Spark on containers party and I >>>>> was wondering if folks have experience with the different batch >>>>> scheduler options that they prefer? I was thinking so that we can >>>>> better support dynamic allocation it might make sense for us to >>>>> support using different schedulers and I wanted to see if there are >>>>> any that the community is more interested in? >>>>> >>>>> I know that one of the Spark on Kube operators supports >>>>> volcano/kube-batch so I was thinking that might be a place I start >>>>> exploring but also want to be open to other schedulers that folks >>>>> might be interested in. >>>>> >>>>> Cheers, >>>>> >>>>> Holden :) >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>> https://amzn.to/2MaRAG9 >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >> -- John Zhuge