GitHub user NicoK opened a pull request:
https://github.com/apache/flink/pull/5602
[FLINK-8801][yarn/s3] fix Utils#setupLocalResource() relying on consistent
read-after-write
## What is the purpose of the change
> Amazon S3 provides read-after-write consistency for PUTS of new objects
in your
> S3 bucket in all regions with one caveat. The caveat is that if you make
a HEAD
> or GET request to the key name (to find if the object exists) before
creating
> the object, Amazon S3 provides eventual consistency for read-after-write."
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel
Some S3 file system implementations may actually execute such a request for
the about-to-write object and thus the read-after-write is only eventually
consistent. `org.apache.flink.yarn.Utils#setupLocalResource()` currently relies
on a consistent read-after-write since it accesses the remote resource to get
file size and modification timestamp. Since there we have access to the local
resource, we can use this metadata directly instead and circumvent the problem.
Please note that this PR is built upon #5601.
## Brief change log
- do not retrieve the remote object after writing it just for getting file
statistics
- preserve file modification times for uploaded resources to pass this
check done by YARN
## Verifying this change
This change is already covered by existing tests, such as
`YARNSessionCapacitySchedulerITCase` for YARN accepting the uploaded resources
and `YarnFileStageTestS3ITCase` for the upload path via S3.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): **no**
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: **no**
- The serializers: **no**
- The runtime per-record code paths (performance sensitive): **no**
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: **yes**
- The S3 file system connector: **yes**
## Documentation
- Does this pull request introduce a new feature? **no**
- If yes, how is the feature documented? **JavaDocs**
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/NicoK/flink flink-8801
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5602.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5602
commit 9570f0d0afe528e2814006a120e7c424b96753d0
Author: Nico Kruber
Date: 2018-02-27T16:23:20Z
[FLINK-8801][yarn/s3] fix Utils#setupLocalResource() relying on consistent
read-after-write
"Amazon S3 provides read-after-write consistency for PUTS of new objects in
your
S3 bucket in all regions with one caveat. The caveat is that if you make a
HEAD
or GET request to the key name (to find if the object exists) before
creating
the object, Amazon S3 provides eventual consistency for read-after-write."
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel
Some S3 file system implementations may actually execute such a request for
the
about-to-write object and thus the read-after-write is only eventually
consistent. org.apache.flink.yarn.Utils#setupLocalResource() currently
relies on
a consistent read-after-write since it accesses the remote resource to get
file
size and modification timestamp. Since there we have access to the local
resource, we can use this metadata directly instead and circumvent the
problem.
commit 216d9674a5116ecd0d7d52aedd8126e2b3e12eea
Author: Nico Kruber
Date: 2018-02-27T16:29:00Z
[FLINK-8336][yarn/s3][tests] harden YarnFileStageTest upload test for
eventual consistent read-after-write
In case the newly written object cannot be read (yet), we do 4 more retries
to
retrieve the value and wait 50ms each. While this does not solve all the
cases
it should make the (rare) case of the written object not being available for
read even more unlikely.
---