viirya commented on pull request #35994:
URL: https://github.com/apache/spark/pull/35994#issuecomment-1082465493


   > Is it a unique issue of HDFS? If so, I’m surprised that HDFS client cannot 
survive from transient errors. Is Spark the right layer to fix this issue?
   
   The error happened when opening files on S3. I'm not familiar with HDFS 
client or S3 client, but seems I don't see retrying happened there. I think we 
have similar retrying mechanisms in Spark at several places, if you ask me if 
Spark is the right layer to fix this issue, I'm not sure if it is exactly right 
place, but it seems a consistent approach?
   
   > In addition, will the same issue happen after `open`? For example, when 
reading the file content? Do we need to worry about other places as well? This 
is not the only place that touches FileSystem in the driver.
   
   I'd say I don't exclude the possibilities. Currently this is constrained on 
`open` only as it is the issue. This seems also the most safer place to retry 
as I don't want to change the behavior unexpectedly.
   
   > Could you also add a unit test to verify the retry code? For example, you 
can use a fake file system to simulate the errors from `open`.
   
   Okay. I'll try to add one.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to