Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]

2024-05-06 Thread via GitHub


xieshuaihu closed pull request #46278: [SPARK-48040][CONNECT][WIP]Spark connect 
supports scheduler pool
URL: https://github.com/apache/spark/pull/46278


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]

2024-05-01 Thread via GitHub


xieshuaihu commented on PR #46278:
URL: https://github.com/apache/spark/pull/46278#issuecomment-2088566513

   @hvanhovell @HyukjinKwon 
   Thers are two reasons to support set scheduler pool in spark connect.
   
   1. Vanilla spark supports fair scheduler and pools, if server runs in a 
specific pool, then client cannot make full use of the fair scheduler ability. 
This missig feature could block uses to use connect in their environment. (At 
least we want this feature in our environment)
   2. In multi user environment, it's important to limit execution resources 
allocated to some jobs (or allocate more resources to some jobs). I think 
supporting multi-user environment is a selling point of connect, so it's better 
to bring scheduler pools to connect.
   
   Further discussions are expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]

2024-04-30 Thread via GitHub


hvanhovell commented on PR #46278:
URL: https://github.com/apache/spark/pull/46278#issuecomment-2086174926

   I am not 100% sure we should expose this as a client side conf. A client 
shouldn't have to set these things. Can't we just make the connect sever use a 
specific scheduler pool?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]

2024-04-30 Thread via GitHub


xieshuaihu commented on PR #46278:
URL: https://github.com/apache/spark/pull/46278#issuecomment-2084934296

   @HyukjinKwon 
   I add a new rpc to make the `setSchedulerPool` api less confuse.
   
   Please let me know if this PR is in the right way? If is, more unit test 
will be added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]

2024-04-29 Thread via GitHub


HyukjinKwon commented on PR #46278:
URL: https://github.com/apache/spark/pull/46278#issuecomment-2084282381

   Oh, okay. I misread the PR. I thought you're making `spark.scheduler.mode` a 
runtime conf. Okay I got that it makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48040][CONNECT][WIP]Spark connect supports scheduler pool [spark]

2024-04-29 Thread via GitHub


xieshuaihu commented on PR #46278:
URL: https://github.com/apache/spark/pull/46278#issuecomment-2084128139

   Let me clarify this question.
   
   In vanilla spark, we could do this
   
   ```scala
   // create context, this config could be only once
   // or set in spark-submit: --conf spark.scheduler.mode=FAIR --conf 
spark.scheduler.allocation.file=file:///path/to/file
   SparkSession.builder.config("spark.scheduler.mode", 
"FAIR").config("spark.scheduler.allocation.file", 
"file:///path/to/file").getOrCreate()
   
   // in one thread, could set its pool as "pool1"
   val spark = sparkSparkSession.builder.getOrCreate()
   spark.sparkContext.setLocalProperty("spark.scheduler.pool", "pool1")
   
   // in another thread, could set its pool as "pool2"
   val spark = sparkSparkSession.builder.getOrCreate()
   spark.sparkContext.setLocalProperty("spark.scheduler.pool", "pool2")
   
   // Note: pool1 and pool2 should be defined in this file 
'file:///path/to/file'
   ```
   
   But current spark connect don't support set jobs pool, in other words, jobs 
submitted by connect client cannot change sparkContext's local property 
"spark.scheduler.pool"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org