HeartSaVioR commented on issue #24128: [SPARK-27188][SS] FileStreamSink: 
provide a new option to disable metadata log
URL: https://github.com/apache/spark/pull/24128#issuecomment-474124945
 
 
   There's also another possible workaround - if we really feel OK to exclude 
old output files in metadata even they're not actually deleted in its 
directory, we can expose option to set retention policy (mostly time to live) 
and force FileStreamSink to filter out entries which becomes old based on TTL. 
Readers cannot read some of output files even these are not deleted, but it's a 
policy being set from end users, so that might be OK. We can still expose 
option for File(Stream)Sources to ignore metadata.
   
   So we have some alternatives on this patch and all of things have its 
trade-off. What's our preference? I believe this is ongoing issue end users are 
already struggling with, so we have to take one of approach even it's not that 
ideal.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to