HeartSaVioR commented on issue #26639: [SPARK-29999][SS] Handle FileStreamSink 
metadata correctly for empty partition
URL: https://github.com/apache/spark/pull/26639#issuecomment-558395855
 
 
   @gengliangwang Thanks for the pointer on root issue. Missed it. Based on the 
root issue, this issue wouldn't only reside on DSv2.
   
   In fact, the fix takes the easiest and non-intrusive approach since it would 
be only occurred for streaming sink. There's another possible fix if we are not 
happy with that; let `commitTask` (optionally) receives actual list of files 
being written instead of dealing with its own tracking list. That would change 
the interface of FileCommitProtocol. (`FileFormatDataWriter.commit()` can pass 
the list as it tracks with `statsTrackers`, but SparkHadoopWriter doesn't do 
it, so the list would need to be optional, and we need to check the existence 
if there's no hint.)
   
   Would it make sense? I feel the change is bigger than just a follow-up 
issue, as we would change the interface.
   
   @gatorsmile That's not only DSv2 issue but also DSv1 issue; I guess we'd be 
better to move the new UT to FileStreamSinkSuite so that the UT covers both. 
WDYT? If it works for you, I'll craft a followup PR. Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to