viirya commented on pull request #32702: URL: https://github.com/apache/spark/pull/32702#issuecomment-851107724
> Which steps end users require to do to resolve such case with your PR? Deleting metadata directory and letting read path to ignore the metadata? Currently in the use-case, what the users do is, when they change the query and the checkpoint doesn't work anymore, they clean up the metadata directory, run the changed query with new checkpoint. They have another Spark app reading from the streaming query output. But as Spark respects the metadata, the another Spark app can only read the files written by the changed streaming query (i.e. the files recorded in the metadata). The other files written before changing the streaming query, are ignored by Spark now. > I know this is a valid workaround to unblock such case end users would be stuck on reusing directory, but they should be quite cautious as they must remember the state of directory; the metadata won't have some parts of output, which is easy to forget. Once they forget the fact and also forget setting the flag on read query, only the parts of output will be read and they will complain about the result of read query without indicating what they did. > > So just allowing end users to ignore metadata is simple, but the risks on turning on the flag are not that simple. Let's take our responsibility to guide the meaning of ignoring metadata and try to provide the possible risks. I agree. That is why this config is internal only so far. I should also add more cautious wordings in the config doc too. I have discussed with the users, seems to me they should know what they are asking and be cautious of the effect of this config. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
