HeartSaVioR edited a comment on issue #24128: [SPARK-27188][SS] FileStreamSink: provide a new option to disable metadata log URL: https://github.com/apache/spark/pull/24128#issuecomment-474124945 There's also another possible workaround - if we really feel OK to exclude old output files in metadata even they're not actually deleted in its directory, we can expose option to set retention policy (mostly time to live) and force FileStreamSink to filter out entries which becomes old based on TTL. Readers cannot read some of output files even these are not deleted, but it's a policy being set from end users, so that might be OK. We may still want expose option for File(Stream)Sources to ignore metadata. So we have some alternatives on this patch and all of things have its trade-off. What's our preference? I believe this is ongoing issue end users are already struggling with, so we have to take one of approach even it's not that ideal.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
