[
https://issues.apache.org/jira/browse/HADOOP-13007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058166#comment-17058166
]
Steve Loughran commented on HADOOP-13007:
-
h2. Presto features
I just had a big look through the code to see what is directory logic was.
* docs:
https://prestosql.io/docs/current/connector/hive.html#amazon-s3-configuration
* source
https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystem.java
h3. Interesting points
* glacier objects can be skipped
* list always adds the trailing / for non-empty and then does the scan
* it doesn't create directories, so no need to recreate them
(I was about to say "demand create AWS client", but I misread it. Interesting
thought though
h2. listLocatedStatus
does a list under path p + /
{code}
String key = keyFromPath(path);
if (!key.isEmpty()) {
key += PATH_SEPARATOR;
}
ListObjectsRequest request = new ListObjectsRequest()
.withBucketName(getBucketName(uri))
.withPrefix(key)
.withDelimiter(PATH_SEPARATOR);
{code}
mapping to stats can skip "GLACIER" == getStorageClass; they aren't visible at
all.
{code}
private Iterator statusFromObjects(List
objects)
{
// NOTE: for encrypted objects, S3ObjectSummary.size() used below is
NOT correct,
// however, to get the correct size we'd need to make an additional
request to get
// user metadata, and in this case it doesn't matter.
return objects.stream()
.filter(object -> !object.getKey().endsWith(PATH_SEPARATOR))
.filter(object -> !skipGlacierObjects ||
!isGlacierObject(object))
.map(object -> new FileStatus(
object.getSize(),
false,
1,
BLOCK_SIZE.toBytes(),
object.getLastModified().getTime(),
qualifiedPath(new Path(PATH_SEPARATOR +
object.getKey()
.map(this::createLocatedFileStatus)
.iterator();
}
{code}
Note: this feeds into delete(); intentional or not
{code}
public boolean mkdirs(Path f, FsPermission permission)
{
// no need to do anything for S3
return true;
}
{code}
* but getFileStatus does do: HEAD, HEAD + /, LIST + /
{code}
return new FileStatus(
getObjectSize(path, metadata),
S3_DIRECTORY_OBJECT_CONTENT_TYPE.equals(metadata.getContentType()),
1,
BLOCK_SIZE.toBytes(),
lastModifiedTime(metadata),
qualifiedPath(path));
{code}
* file length inferred from metadata -"x-amz-unencrypted-content-length"
parsed. docs acknowlege that the results of a list are not consistent.
* dir marker content type used for "is this a dir"
h3. input stream
optimised for forward reads;
* doesn't do a HEAD at open
* lazy seek
* skips range (default 1MB) before (non-draining) abort() and reopen
* simply opens with initial range in GET, but no limit
* maps 416 -> EOFException
h3. delete()
- doesn't do bulk deletes, just lists children and recursively calls delete on
them (very, very inefficient)
-for recursive delete, if glacier files are skipped, they don't get deleted.
Interesting idea
h3. Metrics
implements AWS SDK stats collection in
com.facebook.presto.hive.s3.PrestoS3FileSystemMetricCollector; these feed back
to the Presto metrics
h3. create()
if overwrite=true, skips all checks (even for dest being a dir).
{code}
if ((!overwrite) && exists(path)) {
throw new IOException("File already exists:" + path);
}
{code}
Should we do that? it's, well, aggressive, but for apps which know what they
are doing...
1. A new transfer manager is created for each output stream
1. The entire file is written to the staging dir
1. the transfer manager is given the file to upload; it will partition and
upload how it chooses
so: no incremental writes, disk always used. But good simplicitly and no risk
of partial uploads remaining around.
omitting all existence checks does make for a faster write and avoids all 404s
sneaking in.
h2. Overall analysis
It's nicely minimal; optimised for directory trees with no markers.
* input stream seems a lot less optimised than our code, which works better for
backwards seeking clients.
* output stream avoids 404s by omitting all probes. Something to consider
* Metric wireup from AWS SDK looks simple enough for us to copy
* it does handle CSE where actual length < list length, but caller has to do a
HEAD to see this; if it runs off the result of listLocatedStatus and then did a
seek off EOF-16, things would fail. Similarly: splitting.
What can we adopt?
# metrics
# maybe: dest is dir for overwrite=true
# dir content type get/check (ie
{code}
isDir == (path.endswith"/ && len==0) ||
content_type==_"application/x-directory")
{code}
>