HeartSaVioR edited a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-625605044
> If we want to do full TTL then a separate GC would be good to delete files matching 2nd and 3rd bullet points (of course only after whne from metadata removed). Yeah I didn't deal with this because there may be some reader queries which still read from old version of metadata which may contain excluded files. (Batch query would read all available files so there's still a chance for race condition.) > What I see as a potential problem is that FS timestamp may be different from local time (not yet checked how Hadoop handles time). While I'm not sure it's a real problem (as we rely on the last modified time while reading files), I eliminated the case via adding "commit time" on entry and applying retention based on commit time. So I guess the thing is no longer valid. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
