mikedias commented on issue #22952: [SPARK-20568][SS] Provide option to clean up completed files in streaming query URL: https://github.com/apache/spark/pull/22952#issuecomment-466247024 I think what will happen is the new file will never get processed until stream restarts because the obsolete files are not removed from the `seenFiles` map. Only when the stream restarts, the `seenFiles` will be build using the `metadataLog` information and then it wont contain the obsolete files. And the timestamp does not play a role here. The current code only checks the filename to consider if the file is new or not (#23782 proposes an option to also consider the timestamp).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
