ulysses-you commented on pull request #28778: URL: https://github.com/apache/spark/pull/28778#issuecomment-644767912
Yeah, parallelism is a physical concept, but it is also shared among sessions. I used a long-lived Spark application with enough core and memory (means `defaultParallelism` is big), then sqls can be executed in parallel and shared the resource. Some sql which query on hive table contains small files, as a result one sql may hold the total task resource. Then I tried to increase file size in each partition to reduce the partition number so that other sql can be assigned more tasks. But what I can do is reduce the `defaultParallelism`, and the change will affect all sql. As said above, I think Spark need to provide a behavior that can control every sql/session parallelism (in this case is file parallelism) so that user can reduce the parallelism if one sql query on small files. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
