mikedias commented on issue #22952: [SPARK-20568][SS] Provide option to clean 
up completed files in streaming query
URL: https://github.com/apache/spark/pull/22952#issuecomment-466247024
 
 
   I think what will happen is the new file will never get processed until 
stream restarts because the obsolete files are not removed from the `seenFiles` 
map. Only when the stream restarts, the `seenFiles` will be build using the 
`metadataLog` information and then it wont contain the obsolete files. 
   
   And the timestamp does not play a role here. The current code only checks 
the filename to consider if the file is new or not (#23782 proposes an option 
to also consider the timestamp). 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to