[GitHub] [spark] ulysses-you commented on pull request #28778: [SPARK-31949][SQL] Add spark.default.parallelism in SQLConf for isolated across session

GitBox Tue, 16 Jun 2020 06:32:28 -0700


ulysses-you commented on pull request #28778:
URL: https://github.com/apache/spark/pull/28778#issuecomment-644767912



   Yeah, parallelism is a physical concept, but it is also shared among 
sessions.
   
   I used a long-lived Spark application with enough core and memory (means 
`defaultParallelism` is big), then sqls can be executed in parallel and shared 
the resource.
   
   Some sql which query on hive table contains small files, as a result one sql 
may hold the total task resource. Then I tried to increase file size in each 
partition to reduce the partition number so that other sql can be assigned more 
tasks. But what I can do is reduce the `defaultParallelism`, and the change 
will affect all sql.
   
   As said above, I think Spark need to provide a behavior that can control 
every sql/session parallelism (in this case is file parallelism) so that user 
can reduce the parallelism if one sql query on small files.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ulysses-you commented on pull request #28778: [SPARK-31949][SQL] Add spark.default.parallelism in SQLConf for isolated across session

Reply via email to