pan3793 commented on PR #49089: URL: https://github.com/apache/spark/pull/49089#issuecomment-3688567464
@wangyum @cloud-fan @dongjoon-hyun, I think we should have another flag to control the behavior of this change, because `BlockMissingException` can happen either in - a transient network issue - e.g., the transient DNS issue causes the client to fail to connect to all HDFS DN that store the block replica - or permanent storage corruption - e.g., a large-scale disk damage causes all replicas to be corrupt permanently @wangyum from your description, I think you hit the first case, right? this does make sense to ignore the `BlockMissingException`. but the intention of `ignoreCorruptFiles` feature is to allow the user to skip reading permanently corrupted files, that's say, we still want to ignore the `BlockMissingException` in `permanent storage corruption` when `ignoreCorruptFiles` is enabled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
