[GitHub] [spark] cxzl25 commented on pull request #32583: [SPARK-35437][SQL] Use expressions to filter Hive partitions at client side

GitBox Sun, 08 Aug 2021 04:31:51 -0700


cxzl25 commented on pull request #32583:
URL: https://github.com/apache/spark/pull/32583#issuecomment-894783546



   > > Hive: Time taken: 2.816 seconds, Fetched: 10 row(s)
   > > Spark: Time taken: 248 seconds, Fetched: 10 row(s)
   > > Patch and verify:
   > > Time taken: 19.241 seconds, Fetched 10 row(s)
   > 
   > @cxzl25 I'm a little surprised that Hive is still so much faster than 
Spark even with the patch. Curious if you have any insight on this.
   
   Under this patch, Spark `listPartitionsByFilter` and Hive `doEvalClientSide` 
method are about the same speed, but Spark will call `listPartitionsByFilter` 
in several places, the speed will be slower.
   `PruneHiveTablePartitions`
   `OptimizeMetadataOnlyQuery`
   `HiveTableScanExec`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cxzl25 commented on pull request #32583: [SPARK-35437][SQL] Use expressions to filter Hive partitions at client side

Reply via email to