Steve Loughran created HADOOP-13695:
---------------------------------------

             Summary: S3A to use a thread pool for async path operations
                 Key: HADOOP-13695
                 URL: https://issues.apache.org/jira/browse/HADOOP-13695
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 2.8.0
            Reporter: Steve Loughran


S3A path operations are often slow due to directory scanning, mock directory 
create/delete, etc. Many of these can be done asynchronously

* because deletion is eventually consistent, deleting parent dirs after an 
operation has returned doesn't alter the behaviour, except in the special case 
of : operation failure.
* scanning for paths/parents of a file in the create operation only needs to 
complete before the close() operation instantiates the object, no need to block 
create().
* parallelized COPY calls would permit asynchronous rename.

We could either use the thread pool used for block writes, or somehow isolate 
low cost path ops (GET, DELETE) from the more expensive calls (COPY, PUT) so 
that a thread doing basic IO doesn't block for the duration of the long op. 
Maybe also use {{Semaphore.tryAcquire()}} and only start async work if there 
actually is an idle thread, doing it synchronously if not. Maybe it depends on 
the operation. path query/cleanup before/after a write is something which could 
be scheduled as just more futures to schedule in the block write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to