Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/18979
  
    > To mimic S3-like behavior, you can overwrite the file system 
spark.hadoop.fs.$scheme.impl"
    
    @gatorsmile: you will be able to do something better soon, as S3A is adding 
an inconsistent AWS client into `hadoop-aws` JAR, which you can then enable to 
guarantee consistency delays and inject intermittent faults into the system 
(throttling, transient network events). All it will take is a config option to 
switch to this client, plus the chaos-monkey-esque probabilities and delays. 
This is what I'm already using —you will be able to as well. That is, no need 
to switch clients, just 
go`spark.hadoop.fs.s3a.s3.client.factory.impl=org.apache.hadoop.fs.s3a.InconsistentS3ClientFactory`
 and wait for the stack traces.
    
    The S3A FS itself [needs to do 
more](https://issues.apache.org/jira/browse/HADOOP-14531) to handle throttling 
& failures (retry, add failure metrics so throttling & error rates can be 
measured).  Knowing throttling rates is important as it will help identify perf 
problems due to bad distribution of work across a bucket, excess use of KMS key 
lookup..., things that in surface in support calls.
    
    This patch restores Spark 2.3 to the behaviour it has in Spark 2.2: a brief 
delay between object creation and visibility does not cause the task to fail


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to