LuciferYang edited a comment on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902383334
> In Hive it's common that the same file name (e.g., 000000_0) gets used when doing insert overwrite. Even if we check file size and other stuff, it can't completely prevent us from hitting a stale cache. Can we add `ctime` or `mtime` of the file to the `PartitionedFile` and use this information for check? Similarly, how do we ensure that the `FileStatus` cache(`SharedInMemoryCache`) is correct when the user overwrites the file and does not send the `refreshTable` command to the current `SparkApp`? There is also the problem of the same name of the file. I feel that solving this consistency problem may become an independent topic :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
