[GitHub] [spark] ulysses-you commented on pull request #28778: [SPARK-31949][SQL] Add spark.default.parallelism in SQLConf for isolated across session

GitBox Tue, 16 Jun 2020 06:33:17 -0700


ulysses-you commented on pull request #28778:
URL: https://github.com/apache/spark/pull/28778#issuecomment-644768163



   Yeah, parallelism is a physical concept, but it is also shared among 
sessions.
   
   I used a long-lived Spark application with enough core and memory (means 
`defaultParallelism` is big), then sqls can be executed in parallel and shared 
the resource.
   
   Some sql which query on hive table contains small files, as a result one sql 
may hold the total task resource. Then I tried to increase file size in each 
partition to reduce the partition number so that other sql can be assigned more 
tasks. But what I can do is reduce the `defaultParallelism`, and the change 
will affect all sql.
   
   As said above, I think Spark need to provide a behavior that can control 
every sql/session parallelism so that user can reduce the parallelism if one 
sql query on small files.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on pull request #28778: [SPARK-31949][SQL] Add spark.default.parallelism in SQLConf for isolated across session

Reply via email to