ulysses-you commented on pull request #28778: URL: https://github.com/apache/spark/pull/28778#issuecomment-644768163
Yeah, parallelism is a physical concept, but it is also shared among sessions. I used a long-lived Spark application with enough core and memory (means `defaultParallelism` is big), then sqls can be executed in parallel and shared the resource. Some sql which query on hive table contains small files, as a result one sql may hold the total task resource. Then I tried to increase file size in each partition to reduce the partition number so that other sql can be assigned more tasks. But what I can do is reduce the `defaultParallelism`, and the change will affect all sql. As said above, I think Spark need to provide a behavior that can control every sql/session parallelism so that user can reduce the parallelism if one sql query on small files. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org