pan3793 commented on PR #49089:
URL: https://github.com/apache/spark/pull/49089#issuecomment-3688567464

   @wangyum @cloud-fan @dongjoon-hyun, I think we should have another flag to 
control the behavior of this change, because `BlockMissingException` can happen 
either in
   
   - a transient network issue - e.g., the transient DNS issue causes the 
client to fail to connect to all HDFS DN that store the block replica
   - or permanent storage corruption - e.g., a large-scale disk damage causes 
all replicas to be corrupt permanently
   
   @wangyum from your description, I think you hit the first case, right? this 
does make sense to ignore the `BlockMissingException`.
   
   but the intention of `ignoreCorruptFiles` feature is to allow the user to 
skip reading permanently corrupted files, that's say, we still want to ignore 
the `BlockMissingException` in `permanent storage corruption` when 
`ignoreCorruptFiles` is enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to