LuciferYang commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902439693
> I understand you want to avoid the duplicate footer lookup. In Parquet at least we can just pass the footer from either ParquetFileFormat or ParquetPartitionReaderFactory to SpecificParquetRecordReaderBase for reuse, which I think is much simpler than using a cache. If we can add some strategies to Spark in the future to ensure that `the task reading the same Orc file has to be scheduled on the same executor`, the benefits of `fileMetaCache` will be more obvious. In fact, in our production environment, about 100000 sql queries are using this feature every day. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
