[jira] [Resolved] (HADOOP-15268) Back port HADOOP-13972 to 2.8.1 and 2.8.3

2018-02-26 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S resolved HADOOP-15268.

  Resolution: Invalid
Target Version/s: 2.8.3, 2.8.1  (was: 2.8.1, 2.8.3)

This is not required.

> Back port HADOOP-13972 to 2.8.1 and 2.8.3
> -
>
> Key: HADOOP-15268
> URL: https://issues.apache.org/jira/browse/HADOOP-15268
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 2.8.1, 2.8.3
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
>Priority: Major
>
> Back port the HADOOP-13972 to branch-2.8.1 and branch-2.8.3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15268) Back port HADOOP-13972 to 2.8.1 and 2.8.3

2018-02-26 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378131#comment-16378131
 ] 

Omkar Aradhya K S commented on HADOOP-15268:


Thanks [~jojochuang] I will delete this sub-task. This is not required.

> Back port HADOOP-13972 to 2.8.1 and 2.8.3
> -
>
> Key: HADOOP-15268
> URL: https://issues.apache.org/jira/browse/HADOOP-15268
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 2.8.1, 2.8.3
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
>Priority: Major
>
> Back port the HADOOP-13972 to branch-2.8.1 and branch-2.8.3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15268) Back port HADOOP-13972 to 2.8.1 and 2.8.3

2018-02-26 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-15268:
---
Summary: Back port HADOOP-13972 to 2.8.1 and 2.8.3  (was: Back port to 
2.8.1 and 2.8.3)

> Back port HADOOP-13972 to 2.8.1 and 2.8.3
> -
>
> Key: HADOOP-15268
> URL: https://issues.apache.org/jira/browse/HADOOP-15268
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 2.8.1, 2.8.3
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
>Priority: Major
>
> Back port the HADOOP-13972 to branch-2.8.1 and branch-2.8.3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15268) Back port to 2.8.1 and 2.8.3

2018-02-26 Thread Omkar Aradhya K S (JIRA)
Omkar Aradhya K S created HADOOP-15268:
--

 Summary: Back port to 2.8.1 and 2.8.3
 Key: HADOOP-15268
 URL: https://issues.apache.org/jira/browse/HADOOP-15268
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/adl
Affects Versions: 2.8.3, 2.8.1
Reporter: Omkar Aradhya K S
Assignee: Omkar Aradhya K S
 Fix For: 2.8.3, 2.8.1


Back port the HADOOP-13972 to branch-2.8.1 and branch-2.8.3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13972) ADLS to support per-store configuration

2018-02-13 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362074#comment-16362074
 ] 

Omkar Aradhya K S commented on HADOOP-13972:


Hi [~jzhuge] do you have any further information from where you left off on 
this feature?

> ADLS to support per-store configuration
> ---
>
> Key: HADOOP-13972
> URL: https://issues.apache.org/jira/browse/HADOOP-13972
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Priority: Major
>
> Useful when distcp needs to access 2 Data Lake stores with different SPIs.
> Of course, a workaround is to grant the same SPI access permission to both 
> stores, but sometimes it might not be feasible.
> One idea is to embed the store name in the configuration property names, 
> e.g., {{dfs.adls.oauth2..client.id}}. Per-store keys will be consulted 
> first, then fall back to the global keys.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-24 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024138#comment-16024138
 ] 

Omkar Aradhya K S commented on HADOOP-14407:


Thanks [~yzhangal].

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, 
> HADOOP-14407.004.branch2.patch, HADOOP-14407.004.patch, 
> HADOOP-14407.004.patch, HADOOP-14407.branch2.002.patch, 
> TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-24 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022643#comment-16022643
 ] 

Omkar Aradhya K S edited comment on HADOOP-14407 at 5/24/17 10:06 AM:
--

[~yzhangal] Thanks for pointing this out! I have couple of doubts:
1. My tests are present in 
"/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java" 
--> testToString(), testParseCopyBufferSize().
2. The 
"/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpOptions.java 
--> " doesn't have "blocksPerChunk=0" so how was it passing earlier?
3. Why are there 2 redundant "testToString()" in 2 different classes?
4. Can I re-open this issue and add the patch here itself?
5. What is the procedure to test branch-2 patches? I see a "-1" from Hadoop QA 
for the branch-2 patch above!





was (Author: omkarksa):
[~yzhangal] Thanks for pointing this out! I have couple of doubts:
1. The 
"/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpOptions.java" 
doesn't have "blocksPerChunk=0" so how was it passing earlier?
2. Can I re-open this issue and add the patch here itself?
3. What is the procedure to test branch-2 patches? I see a "-1" from Hadoop QA 
for the branch-2 patch above!



> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, 
> HADOOP-14407.004.branch2.patch, HADOOP-14407.004.patch, 
> HADOOP-14407.004.patch, TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-24 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022643#comment-16022643
 ] 

Omkar Aradhya K S commented on HADOOP-14407:


[~yzhangal] Thanks for pointing this out! I have couple of doubts:
1. The 
"/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpOptions.java" 
doesn't have "blocksPerChunk=0" so how was it passing earlier?
2. Can I re-open this issue and add the patch here itself?
3. What is the procedure to test branch-2 patches? I see a "-1" from Hadoop QA 
for the branch-2 patch above!



> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, 
> HADOOP-14407.004.branch2.patch, HADOOP-14407.004.patch, 
> HADOOP-14407.004.patch, TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-21 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019155#comment-16019155
 ] 

Omkar Aradhya K S commented on HADOOP-14407:


Thanks [~yzhangal] for the quick reviews and commits.

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, 
> HADOOP-14407.004.branch2.patch, HADOOP-14407.004.patch, 
> HADOOP-14407.004.patch, TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-19 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017770#comment-16017770
 ] 

Omkar Aradhya K S commented on HADOOP-14407:


[~yzhangal] I have backported the feature to branch-2 and uploaded the patch 
(HADOOP-14407.004.branch2.patch). Please do the needful.

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, 
> HADOOP-14407.004.branch2.patch, HADOOP-14407.004.patch, 
> HADOOP-14407.004.patch, TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-19 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Attachment: HADOOP-14407.004.branch2.patch

Feature patch backported to branch-2

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, 
> HADOOP-14407.004.branch2.patch, HADOOP-14407.004.patch, 
> HADOOP-14407.004.patch, TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-19 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017038#comment-16017038
 ] 

Omkar Aradhya K S commented on HADOOP-14407:


{quote}
I committed to trunk. When trying to backport to branch-2, saw quite some 
conflicts. Would you please help doing branch-2 version and other ones you 
prefer?
{quote}

[~yzhangal] Thanks. OK, I will work on the branch-2 patch today.


> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, HADOOP-14407.004.patch, 
> HADOOP-14407.004.patch, TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-18 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Status: Patch Available  (was: Open)

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, HADOOP-14407.004.patch, 
> HADOOP-14407.004.patch, TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-18 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Status: Open  (was: Patch Available)

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, HADOOP-14407.004.patch, 
> HADOOP-14407.004.patch, TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-18 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Attachment: HADOOP-14407.004.patch

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, HADOOP-14407.004.patch, 
> HADOOP-14407.004.patch, TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-18 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Status: Patch Available  (was: Open)

submit new patch - removed "long" variable type for copybuffersize

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, HADOOP-14407.004.patch, 
> TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-18 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Attachment: HADOOP-14407.004.patch

uploading new patch - removed "long" variable type for copybuffersize

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, HADOOP-14407.004.patch, 
> TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-18 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Status: Open  (was: Patch Available)

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, 
> TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-18 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Status: Patch Available  (was: Open)

Submit new patch with cosmetic changes

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, 
> TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-18 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Status: Open  (was: Patch Available)

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, 
> TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-18 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Attachment: HADOOP-14407.003.patch

new patch with cosmetic changes

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, HADOOP-14407.003.patch, 
> TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-17 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015158#comment-16015158
 ] 

Omkar Aradhya K S commented on HADOOP-14407:


{quote}
Thanks for the updated patch Omkar Aradhya K S, are you guys still looking into 
setting input and output buffer to different size? Or any chance we need to do 
that in the future?
{quote}
[~yzhangal] Thanks for checking the patch. As explained in the previous commit, 
we don't need to do this change since even a small copybiffersize can give huge 
boos in performance.

{quote}
Somehow your submitting the patch did not trigger a jenkins test, maybe there 
is an infra issue.
{quote}
Thanks for pointing this out. Yes, even I waited for quite some time, but there 
was no result! Do I need any additional permissions for this?
Also, can you point out how exactly you triggered the build? May be I missed 
something?

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> HADOOP-14407.002.patch, TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-17 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014375#comment-16014375
 ] 

Omkar Aradhya K S commented on HADOOP-14407:


{quote}
Also, we will do more investigations into introducing both input and output 
copybuffersize configurations.
{quote}
We did a small set of runs to see at what level the copybuffersize stagnates in 
performance:
!TotalTime-vs-CopyBufferSize.jpg!
In this case, with copybuffersize set to just 128KB, we get >3x performance!

{quote}
If there is benefit of doing this I will submit a new patch with the changes or 
else we will go ahead with this patch.
{quote}
Since even a small increase in the copybuffersize give the desired performance, 
we will not need two separate copybuffersize for input and output.

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-17 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Attachment: TotalTime-vs-CopyBufferSize.jpg

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, 
> TotalTime-vs-CopyBufferSize.jpg
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-17 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Status: Open  (was: Patch Available)

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-17 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Status: Patch Available  (was: Open)

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-17 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013785#comment-16013785
 ] 

Omkar Aradhya K S commented on HADOOP-14407:


[~yzhangal] Thanks for reviewing. Please find attached the new patch with the 
fixes.

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-17 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Attachment: HADOOP-14407.002.patch

Attached new patch (HADOOP-14407.002.patch) with all required fixes.

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-12 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007968#comment-16007968
 ] 

Omkar Aradhya K S commented on HADOOP-14407:


[~yzhangal] I have submitted the patch. Could you please check this 
(HADOOP-14407.001.patch)?
Also, we will do more investigations into introducing both input and output 
copybuffersize configurations.
If there is benefit of doing this I will submit a new patch with the changes or 
else we will go ahead with this patch.

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-12 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Attachment: HADOOP-14407.001.patch

Initial patch with the required changes - HADOOP-14407.001.patch

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14407.001.patch
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-12 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Status: Patch Available  (was: In Progress)

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-12 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Status: Open  (was: Patch Available)

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work started] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-12 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-14407 started by Omkar Aradhya K S.
--
> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-11 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007606#comment-16007606
 ] 

Omkar Aradhya K S commented on HADOOP-14407:


Thanks [~yzhangal]. 
We found that we don't need 2 buffers. 
Just making the existing copy buffer size configurable should do. 
I will submit the patch soon.

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Omkar Aradhya K S
> Fix For: 2.9.0, 3.0.0-alpha3
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-10 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Description: Currently, the RetriableFileCopyCommand has a fixed copy 
buffer size of just 8KB. We have noticed in our performance tests that with 
bigger buffer sizes we saw upto ~3x performance boost. Hence, making the copy 
buffer size a configurable setting via the new parameter .  
(was: The minimum unit of work for a distcp task is a file. We have files that 
are greater than 1 TB with a block size of  1 GB. If we use distcp to copy 
these files, the tasks either take a long long long time or finally fails. A 
better way for distcp would be to copy all the source blocks in parallel, and 
then stich the blocks back to files at the destination via the HDFS Concat API 
(HDFS-222))

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Yongjun Zhang
> Fix For: 2.9.0, 3.0.0-alpha3
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 
> 8KB. We have noticed in our performance tests that with bigger buffer sizes 
> we saw upto ~3x performance boost. Hence, making the copy buffer size a 
> configurable setting via the new parameter .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-10 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Release Note: The copy buffer size can be configured via the new parameter 
. By default the  is set to 8KB.  (was: If  a 
positive value is passed to command line switch -blocksperchunk, files with 
more blocks than this value will be split into chunks of `` 
blocks to be transferred in parallel, and reassembled on the destination. By 
default, `` is 0 and the files will be transmitted in their 
entirety without splitting. This switch is only applicable when both the source 
file system supports getBlockLocations and target supports concat. 
)

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Yongjun Zhang
> Fix For: 2.9.0, 3.0.0-alpha3
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-10 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Affects Version/s: (was: 0.21.0)
   2.9.0

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Yongjun Zhang
> Fix For: 2.9.0, 3.0.0-alpha3
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-10 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-14407:
---
Hadoop Flags:   (was: Reviewed)

> DistCp - Introduce a configurable copy buffer size
> --
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Omkar Aradhya K S
>Assignee: Yongjun Zhang
> Fix For: 2.9.0, 3.0.0-alpha3
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size

2017-05-10 Thread Omkar Aradhya K S (JIRA)
Omkar Aradhya K S created HADOOP-14407:
--

 Summary: DistCp - Introduce a configurable copy buffer size
 Key: HADOOP-14407
 URL: https://issues.apache.org/jira/browse/HADOOP-14407
 Project: Hadoop Common
  Issue Type: Improvement
  Components: tools/distcp
Affects Versions: 0.21.0
Reporter: Omkar Aradhya K S
Assignee: Yongjun Zhang
 Fix For: 2.9.0, 3.0.0-alpha3


The minimum unit of work for a distcp task is a file. We have files that are 
greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
files, the tasks either take a long long long time or finally fails. A better 
way for distcp would be to copy all the source blocks in parallel, and then 
stich the blocks back to files at the destination via the HDFS Concat API 
(HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) Enable distcp to copy blocks in parallel

2017-04-11 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965371#comment-15965371
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


[~steve_l] I was able to test the bits with HDI 3.3, which is *2.7.1*. 
However, I was wondering if we can go as back as *2.5.x*/*2.2.x*?

> Enable distcp to copy blocks in parallel
> 
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> HADOOP-11794.009.patch, HADOOP-11794.010.branch2.patch, 
> HADOOP-11794.010.patch, MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) Enable distcp to copy blocks in parallel

2017-04-11 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964044#comment-15964044
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


{quote}
Yongjun Zhang Sure, I will finish testing this by early next week.
{quote}
[~yzhangal] I was able to do some basic tests and it works! Thanks for the 
patch.

The branch-2 is *2.9.0*. However, will this patch work on older versions like 
*2.2.x*?

> Enable distcp to copy blocks in parallel
> 
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> HADOOP-11794.009.patch, HADOOP-11794.010.branch2.patch, 
> HADOOP-11794.010.patch, MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) Enable distcp to copy blocks in parallel

2017-04-07 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960541#comment-15960541
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


{quote}
Hi Omkar Aradhya K S, wonder if you could help run this branch-2 patch on ADLS 
too if possible?
{quote}
[~yzhangal] Sure, I will finish testing this by early next week.

> Enable distcp to copy blocks in parallel
> 
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> HADOOP-11794.009.patch, HADOOP-11794.010.branch2.patch, 
> HADOOP-11794.010.patch, MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) Enable distcp to copy blocks in parallel

2017-04-05 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956675#comment-15956675
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


{quote}
BTW, Steve still has an item for you to follow-up here

https://issues.apache.org/jira/browse/HADOOP-11794?focusedCommentId=15938217=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15938217
{quote}
[~yzhangal] Sorry for the late reply. Thanks for pointing this out. I almost 
missed this!

{quote}
Omkar: if ADL doesn't implement the distcp contract test, you might want to 
follow up this patch with a distcp test that forces the use of the concat 
operation.
{quote}
[~steve_l] I will look into this.


> Enable distcp to copy blocks in parallel
> 
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> HADOOP-11794.009.patch, HADOOP-11794.010.patch, MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) Enable distcp to copy blocks in parallel

2017-03-30 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950360#comment-15950360
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


[~yzhangal] Thanks for re-considering the suggestions and re-doing the patch to 
accommodate all FileSystem implementations.
{quote}
I just committed to trunk. Will work on branch-2 version asap (tried and see 
quite some conflicts).
{quote}
Could you please elaborate on how you plan to proceed with backporting?


> Enable distcp to copy blocks in parallel
> 
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> HADOOP-11794.009.patch, HADOOP-11794.010.patch, MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) distcp can copy blocks in parallel

2017-03-30 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948924#comment-15948924
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


[~yzhangal] I have tested the patch with ADLS and it works without any changes. 
Thanks.

> distcp can copy blocks in parallel
> --
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> HADOOP-11794.009.patch, HADOOP-11794.010.patch, MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11794) distcp can copy blocks in parallel

2017-03-28 Thread Omkar Aradhya K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Aradhya K S updated HADOOP-11794:
---

Hi Yongjun,

I am on vacation till tomorrow.
Would it be late if I review it tomorrow?
If you are held up because om me, I can reach home and try it today itself. 
Please let me know.

Regards,
Omkar



> distcp can copy blocks in parallel
> --
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> HADOOP-11794.009.patch, HADOOP-11794.010.patch, MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) distcp can copy blocks in parallel

2017-03-24 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940043#comment-15940043
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


{quote}
This is usually handled in the same ticket, and by cherry-picking the patch. 
Backporting doesn't usually warrant a new JIRA unless the implementation is 
significantly different.
{quote}
[~chris.douglas], [~yzhangal] Thanks for all the clarifications.

> distcp can copy blocks in parallel
> --
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> HADOOP-11794.009.patch, HADOOP-11794.010.patch, MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) distcp can copy blocks in parallel

2017-03-23 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937918#comment-15937918
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


[~yzhangal] Thanks for reconsidering the comments and making the required 
changes for FlieSystem compatibility.
[~steve_l], [~jzhuge], [~atm], [~chris.douglas] Thanks for providing clarity 
and a way ahead.

{quote}
Omkar Aradhya K S, can you post your patch?
{quote}
I made only the following changes in my patch to get it working with ADLS on 
hadoop 2.7:
# Remove all the checks for *DistributedFileSystem*
# Use {code}fs.concat{code} instead of {code}dstdistfs.concat{code}
# Use {code}fs.getFileBlockLocations{code} instead of 
{code}dfs.getBlockLocations{code}
# Use (now deprecated) {code}final DFSClient dfs = new DFSClient(conf);{code} 
instead of {code}final DFSClient dfs = new 
DFSClient(DFSUtilClient.getNNAddress(conf), conf);{code}

[~yzhangal] Once the new patch with all above changes is checked in, we need to 
back port it to older versions of hadoop, which will be addressed by new JIRAs?

> distcp can copy blocks in parallel
> --
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> HADOOP-11794.009.patch, MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) distcp can copy blocks in parallel

2017-03-22 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936826#comment-15936826
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


Thanks [~steve_l], [~jzhuge],

Yes, this could be one way to do it. Let me see if there is any other way.
{quote}
There is always the option of doing that: sending in an invalid concat() 
request and differentiating between: UnsupportedException and any other 
response, then assuming that the "any other response" exception means that it 
is implemented, but that the arguments were invalid. concat("/", new Path[0]) 
should be enough.
{quote}

That's right ADLS supports both *concat* and *getFileBlockLocations* and as 
commented before in this JIRA, I was able to get this patch working with ADLS 
with these changes and one more change.

> distcp can copy blocks in parallel
> --
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) distcp can copy blocks in parallel

2017-03-22 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935814#comment-15935814
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


Thanks for the clarification [~yzhangal].
About this ...
{quote}
About other file systems, I did not get to test various file systems with this 
patch (except for DistributedFileSystem), we could follow-up with new jira to 
relax the file system requirement, and add corresponding tests for the 
corresponding file system?
{quote}
Is there any reason *not* to use *FileSystem.concat* & 
*FileSystem.getFileBlockLocations* ?

> distcp can copy blocks in parallel
> --
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) distcp can copy blocks in parallel

2017-03-21 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935743#comment-15935743
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


{quote}
The main reason of checking DistributedFileSystem is the support of 
getBlockLocations, and concat feature. I'm not sure whether we can assume other 
File System support that.
{quote}
The *getFileBlockLocations* and *concat* are APIs that are part of 
*FileSystem.java* from [hadoop 
v1.2.1|https://hadoop.apache.org/docs/r1.2.1/api/index.html]
{quote}
The current patch is for trunk where client and server code are separated. When 
we backport this change to other version of hadoop, we can make the change 
accordingly, for example, to use DFSUtil. 
{quote}
You could just use the default constructor that would internally get the 
NNAddress:
{code}
final DFSClient dfs = new DFSClient(conf);
{code}

> distcp can copy blocks in parallel
> --
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-11794) distcp can copy blocks in parallel

2017-03-20 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934045#comment-15934045
 ] 

Omkar Aradhya K S edited comment on HADOOP-11794 at 3/21/17 3:46 AM:
-

I was trying to evaluate your patch with ADLS:
Tried the bits on a HDInsight 3.5 cluster (this comes with hadoop 2.7)
Observed following compatibility issues:
 a. You are checking for instance of *DistributedFileSystem* in many places 
and all other *FileSystem* implementations don’t implement 
*DistributedFileSystem*
  i.Could this be changed to something more compatible with other 
*FileSystem* implementations?
 b. You are using the new *DFSUtilClient*, which makes DistCp incompatible 
with older versions of Hadoop
 i. Can this be changed to be backward compatible?
If the compatibility issues are addressed, the DistCp with your feature would 
be available for other *FileSystem* implementations and also would be backward 
compatible.


was (Author: omkarksa):
I was trying to evaluate your patch with ADLS:
Tried the bits on a HDInsight 3.5 cluster (this comes with hadoop 2.7)
Observed following compatibility issues:
 a. You are checking for instance of {code}DistributedFileSystem{code} in 
many places and all other {code}FileSystem{code} implementations don’t 
implement {code}DistributedFileSystem{code}
  i.Could this be changed to something more compatible with other 
{code}FileSystem{code} implementations?
 b. You are using the new {code}DFSUtilClient{code}, which makes DistCp 
incompatible with older versions of Hadoop
 i. Can this be changed to be backward compatible?
If the compatibility issues are addressed, the DistCp with your feature would 
be available for other {code}FileSystem{code} implementations and also would be 
backward compatible.

> distcp can copy blocks in parallel
> --
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11794) distcp can copy blocks in parallel

2017-03-20 Thread Omkar Aradhya K S (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934045#comment-15934045
 ] 

Omkar Aradhya K S commented on HADOOP-11794:


I was trying to evaluate your patch with ADLS:
Tried the bits on a HDInsight 3.5 cluster (this comes with hadoop 2.7)
Observed following compatibility issues:
 a. You are checking for instance of {code}DistributedFileSystem{code} in 
many places and all other {code}FileSystem{code} implementations don’t 
implement {code}DistributedFileSystem{code}
  i.Could this be changed to something more compatible with other 
{code}FileSystem{code} implementations?
 b. You are using the new {code}DFSUtilClient{code}, which makes DistCp 
incompatible with older versions of Hadoop
 i. Can this be changed to be backward compatible?
If the compatibility issues are addressed, the DistCp with your feature would 
be available for other {code}FileSystem{code} implementations and also would be 
backward compatible.

> distcp can copy blocks in parallel
> --
>
> Key: HADOOP-11794
> URL: https://issues.apache.org/jira/browse/HADOOP-11794
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Yongjun Zhang
> Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, 
> HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, 
> HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, 
> MAPREDUCE-2257.patch
>
>
> The minimum unit of work for a distcp task is a file. We have files that are 
> greater than 1 TB with a block size of  1 GB. If we use distcp to copy these 
> files, the tasks either take a long long long time or finally fails. A better 
> way for distcp would be to copy all the source blocks in parallel, and then 
> stich the blocks back to files at the destination via the HDFS Concat API 
> (HDFS-222)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org