Ajay Sachdev created HDFS-13398:
-----------------------------------

             Summary: Hdfs recursive listing operation is very slow
                 Key: HDFS-13398
                 URL: https://issues.apache.org/jira/browse/HDFS-13398
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs
    Affects Versions: 2.7.1
         Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
(Object Store).
            Reporter: Ajay Sachdev
             Fix For: 2.7.1


The hdfs dfs -ls -R command is sequential in nature and is very slow for a HCFS 
system. We have seen around 6 mins for 40K directory/files structure.

The proposal is to use multithreading approach to speed up recursive list, du 
and count operations.

We have tried a ForkJoinPool implementation to improve performance for 
recursive listing operation.

[https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]

commit id : 

82387c8cd76c2e2761bd7f651122f83d45ae8876

Another implementation is to use Java Executor Service to improve performance 
to run listing operation in multiple threads in parallel. This has 
significantly reduced the time to 40 secs from 6 mins.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to