[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-14 Thread Ajay Sachdev (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474674#comment-16474674
 ] 

Ajay Sachdev commented on HDFS-13398:
-

Hi Rushabh,

Thats a good question. As you may know FJP is very critical for multi-core 
systems and it forks large tasks into smaller subtasks and then joins (waits) 
individual results to form final result (Divide and Conquer approach). We tried 
different number for parallelism (ie number of threads) in ForkJoinPool 
framework such as 8, 16, 32 and 64. The use case was a directory/file hierarchy 
structure of 40K+ and a nested tree-level. In this scenario we were able to 
demonstrate that a numberOfThreads=32 was ideal configuration. So our tests 
were based off these variables.

FsShell commands (single threaded approach) -

ls -R -> 12 mins

du -> 14 mins

count -> 14 mins

 

FsShell commands (FJP approach) -

ls -R -> 3 mins

du -> 2 mins

count -> 9 mins

 

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, HDFS-13398.002.patch, 
> parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-14 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474369#comment-16474369
 ] 

Rushabh S Shah commented on HDFS-13398:
---

[~ajaysachdev]: Thanks for the patch.
Do you have performance numbers for {{ForkJoinPool}} approach. Sorry if I have 
missed that part.
Also for {{fs.threads}}, did you compare with different #threads and evaluate 
what #threads is recommended.

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, HDFS-13398.002.patch, 
> parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-14 Thread Ajay Sachdev (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474335#comment-16474335
 ] 

Ajay Sachdev commented on HDFS-13398:
-

Hi Yiqun,

Thanks for taking the time to take a look at the patch. I have also uploaded 
the 2nd version of the patch against apache trunk. I would appreciate if you 
could provide some feedback against that version as well. This patch is 
intended for our customer which is using ViPRFS interface and they only want to 
run -ls -R/-du and count commands of FsShell. I agree that if there is any 
mutable shared state that any of sub-commands such as usagesTable in 
FsUsage.java class then we will wrap synchronized block around it to make it 
thread-safe. As a long term solution we would like to optimize all FsShell 
commands to take use of this ForkJoin Pool Framework and then all sub-commands 
implementation can ensure that any shared state are thread-safe.

 

Thanks

Ajay

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, HDFS-13398.002.patch, 
> parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-14 Thread Ajay Sachdev (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474147#comment-16474147
 ] 

Ajay Sachdev commented on HDFS-13398:
-

Hi Mukul,

I have attached the new patch to JIRA.

 

Cheers,

Ajay

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, HDFS-13398.002.patch, 
> parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-14 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474010#comment-16474010
 ] 

Yiqun Lin commented on HDFS-13398:
--

Hi [~ajaysachdev], I just take a quick look for the v001 patch. One comment 
from me:

In the v001 patch, we introduced the ForkJoinPool to execute commands in an 
asyn way. Since we make this change in Command class that means this will also 
make sense for other fs sub-commands, not only -du, -ls commands. Will this be 
thread-safe to run in other fs commands?

In addition, looks like this JIRA should move into Hadoop Common.

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-13 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473546#comment-16473546
 ] 

Mukul Kumar Singh commented on HDFS-13398:
--

Thanks [~ajaysachdev].

I am still not able to see the "HDFS-13398.002.patch" patch in Attachments 
section of the jira.
I tried reviewing the HDFS-13398.001.patch but the patch does not apply on the 
latest trunk.

Please submit the patch by using the Attachment section of the jira. 

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-12 Thread Ajay Sachdev (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473104#comment-16473104
 ] 

Ajay Sachdev commented on HDFS-13398:
-

HDFS-13398.002.patch

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-11 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472922#comment-16472922
 ] 

Mukul Kumar Singh commented on HDFS-13398:
--

Hi [~ajaysachdev], I cannot see a patch uploaded to the jira. Can you please 
upload the patch to the jira :)


> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-11 Thread Ajay Sachdev (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472582#comment-16472582
 ] 

Ajay Sachdev commented on HDFS-13398:
-

I have attached apache hadoop trunk multithreading diff. Please take a look at 
code review and provide comments.

 

Thanks

Ajay

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-09 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469244#comment-16469244
 ] 

Mukul Kumar Singh commented on HDFS-13398:
--

Thanks for the updated patch [~ajaysachdev]. The Apache Hadoop trunk can be 
found at "git clone git://git.apache.org/hadoop.git". Please refer to 
https://wiki.apache.org/hadoop/GitAndHadoop for details.

Please rebase the patch over the latest Apache Hadoop trunk obtained from the 
above location.

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-05-02 Thread Ajay Sachdev (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461122#comment-16461122
 ] 

Ajay Sachdev commented on HDFS-13398:
-

Hi Mukul/Arpit,

Sorry for late response. We would like to get a formal review process for this 
patch before we deploy on customer site. I will work on items #3 and #4 above. 
Also I was unable to find trunk branch in 

[https://github.com/hortonworks/hadoop-release.] So we have used HDP tag -

HDP-2.6.2.0-205-tag

I have also attached the patch against this tag. I would appreciate if you 
could take a look at code and get a review as well.

 

Thanks

Ajay[^HDFS-13398.001.patch]

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-04-17 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441039#comment-16441039
 ] 

Mukul Kumar Singh commented on HDFS-13398:
--

Thanks for working on this [~ajaysachdev]. Please find my comments as following.

1) The current patch is not applying right now. Can you please rebase the patch 
over latest trunk ?
2) Also can you please upload the patch with filename as 
"HDFS-13398.001.patch". Please follow the guidelines for naming the patch as 
https://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch.
3) Also can the config "fs.threads" can be replaced with a command line 
argument. I feel that will help in controlling the parallelization for each 
command. We can certainly have a default value in the case it is not specified.
4) Also it will be great if some unit tests can be added for the patch.


> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Assignee: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-04-11 Thread Arpit Khare (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434512#comment-16434512
 ] 

Arpit Khare commented on HDFS-13398:


[~ajaysachdev] : Is this the same functionality which we are trying to achive?
https://issues.apache.org/jira/browse/HDFS-11786

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13398) Hdfs recursive listing operation is very slow

2018-04-11 Thread Arpit Khare (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434508#comment-16434508
 ] 

Arpit Khare commented on HDFS-13398:


Cc: [~arpitagarwal]

> Hdfs recursive listing operation is very slow
> -
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>Reporter: Ajay Sachdev
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org