HeartSaVioR commented on issue #26639: [SPARK-29999][SS] Handle FileStreamSink metadata correctly for empty partition URL: https://github.com/apache/spark/pull/26639#issuecomment-558395855 @gengliangwang Thanks for the pointer on root issue. Missed it. Based on the root issue, this issue wouldn't only reside on DSv2. In fact, the fix takes the easiest and non-intrusive approach since it would be only occurred for streaming sink. There's another possible fix if we are not happy with that; let `commitTask` (optionally) receives actual list of files being written instead of dealing with its own tracking list. That would change the interface of FileCommitProtocol. (`FileFormatDataWriter.commit()` can pass the list as it tracks with `statsTrackers`, but SparkHadoopWriter doesn't do it, so the list would need to be optional, and we need to check the existence if there's no hint.) Would it make sense? I feel the change is bigger than just a follow-up issue, as we would change the interface. @gatorsmile That's not only DSv2 issue but also DSv1 issue; I guess we'd be better to move the new UT to FileStreamSinkSuite so that the UT covers both. WDYT? If it works for you, I'll craft a followup PR. Thanks!
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
