HeartSaVioR edited a comment on issue #24128: [SPARK-27188][SS] FileStreamSink: provide a new option to disable metadata log URL: https://github.com/apache/spark/pull/24128#issuecomment-474115781 > I'm not comfortable adding an option to just turn it off; there are all sorts of ways that could cause more subtle issues than at-least-once semantics. I totally understand about uncomfortable of disabling the metadata, but as I described in JIRA issue and description of PR there's no workaround except letting end users deal with dirty thing by their hands. I'd give it another try to let FileStreamSink checks deleted output files in background (which would be deleted by end users via some retention policies) and exclude when compacting metadata (I guess it's ideal one to go), but that definitely brings overhead and maybe some configurations as well. Regarding subtle issues it would be better for us to share possible issues (instead of 'something might happen') if we can imagine any: it would help to lead our direction to the right way.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
