HeartSaVioR edited a comment on issue #26639: [SPARK-29999][SS] Handle 
FileStreamSink metadata correctly for empty partition
URL: https://github.com/apache/spark/pull/26639#issuecomment-558395855
 
 
   @gengliangwang Thanks for the pointer on root issue. Missed it. Based on the 
root issue, this issue wouldn't only reside on DSv2.
   
   In fact, the fix takes the easiest and non-intrusive approach since the 
issue is only found from streaming sink. There's another possible fix if we are 
not happy with that; let `commitTask` (optionally) receives actual list of 
files being written instead of dealing with its own tracking list. That would 
change the interface of FileCommitProtocol. (`FileFormatDataWriter.commit()` 
can pass the list as it tracks with `statsTrackers`, but SparkHadoopWriter 
doesn't do it, so the list would need to be optional, and we need to check the 
existence if there's no hint.)
   
   Would it make sense? I feel the change is bigger than just a follow-up 
issue, as we would change the interface.
   
   @gatorsmile That's not only DSv2 issue but also DSv1 issue; I guess we'd be 
better to move the new UT to FileStreamSinkSuite so that the UT covers both. 
WDYT? If it works for you, I'll craft a followup PR. Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to