[GitHub] [spark] LuciferYang commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

GitBox Thu, 19 Aug 2021 22:23:08 -0700


LuciferYang commented on pull request #33748:
URL: https://github.com/apache/spark/pull/33748#issuecomment-902439693



   > I understand you want to avoid the duplicate footer lookup. In Parquet at 
least we can just pass the footer from either ParquetFileFormat or 
ParquetPartitionReaderFactory to SpecificParquetRecordReaderBase for reuse, 
which I think is much simpler than using a cache.
   
   If we can add some strategies to Spark in the future to ensure that `the 
task reading the same Orc file has to be scheduled on the same executor`, the 
benefits of `fileMetaCache` will be more obvious. In fact, in our production 
environment, about 100000 sql queries are using this feature every day.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LuciferYang commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

Reply via email to