cloud-fan opened a new pull request #25328: [SPARK-28595][SQL] explain should not trigger partition listing URL: https://github.com/apache/spark/pull/25328 ## What changes were proposed in this pull request? Sometimes when you explain a query, you will get stuck for a while. What's worse, you will get stuck again if you explain again. This is caused by `FileSourceScanExec`: 1. In its `toString`, it needs to report the number of partitions it reads. This needs to query the hive metastore. 2. In its `outputOrdering`, it needs to get all the files. This needs to query the hive metastore. This PR fixes by: 1. `toString` do not need to report the number of partitions it reads. We should report it via SQL metrics. 2. The `outputOrdering` is not very useful. We can only apply it if a) all the bucket columns are read. b) there is only one file in each bucket. This condition is really hard to meet, and even if we meet, sorting an already sorted file is pretty fast and avoiding the sort is not that useful. I think it's worth to give up this optimization so that explain don't need to get stuck. ## How was this patch tested? existing tests
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
