Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]
xieshuaihu closed pull request #46278: [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool URL: https://github.com/apache/spark/pull/46278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]
xieshuaihu commented on PR #46278: URL: https://github.com/apache/spark/pull/46278#issuecomment-2088566513 @hvanhovell @HyukjinKwon Thers are two reasons to support set scheduler pool in spark connect. 1. Vanilla spark supports fair scheduler and pools, if server runs in a specific pool, then client cannot make full use of the fair scheduler ability. This missig feature could block uses to use connect in their environment. (At least we want this feature in our environment) 2. In multi user environment, it's important to limit execution resources allocated to some jobs (or allocate more resources to some jobs). I think supporting multi-user environment is a selling point of connect, so it's better to bring scheduler pools to connect. Further discussions are expected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]
hvanhovell commented on PR #46278: URL: https://github.com/apache/spark/pull/46278#issuecomment-2086174926 I am not 100% sure we should expose this as a client side conf. A client shouldn't have to set these things. Can't we just make the connect sever use a specific scheduler pool? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]
xieshuaihu commented on PR #46278: URL: https://github.com/apache/spark/pull/46278#issuecomment-2084934296 @HyukjinKwon I add a new rpc to make the `setSchedulerPool` api less confuse. Please let me know if this PR is in the right way? If is, more unit test will be added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]
HyukjinKwon commented on PR #46278: URL: https://github.com/apache/spark/pull/46278#issuecomment-2084282381 Oh, okay. I misread the PR. I thought you're making `spark.scheduler.mode` a runtime conf. Okay I got that it makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]
xieshuaihu commented on PR #46278: URL: https://github.com/apache/spark/pull/46278#issuecomment-2084128139 Let me clarify this question. In vanilla spark, we could do this ```scala // create context, this config could be only once // or set in spark-submit: --conf spark.scheduler.mode=FAIR --conf spark.scheduler.allocation.file=file:///path/to/file SparkSession.builder.config("spark.scheduler.mode", "FAIR").config("spark.scheduler.allocation.file", "file:///path/to/file").getOrCreate() // in one thread, could set its pool as "pool1" val spark = sparkSparkSession.builder.getOrCreate() spark.sparkContext.setLocalProperty("spark.scheduler.pool", "pool1") // in another thread, could set its pool as "pool2" val spark = sparkSparkSession.builder.getOrCreate() spark.sparkContext.setLocalProperty("spark.scheduler.pool", "pool2") // Note: pool1 and pool2 should be defined in this file 'file:///path/to/file' ``` But current spark connect don't support set jobs pool, in other words, jobs submitted by connect client cannot change sparkContext's local property "spark.scheduler.pool" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org