[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392540#comment-16392540
 ] 

genericqa commented on HADOOP-14999:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 54s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
18s{color} | {color:green} hadoop-aliyun in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-14999 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913728/HADOOP-14999.009.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fd903c602e94 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 113f401 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14287/testReport/ |
| Max. process+thread count | 328 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14287/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: 

[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-08 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Attachment: HADOOP-14445.06.patch

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-08 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Attachment: (was: HADOOP-14445.06.patch)

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-08 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392510#comment-16392510
 ] 

Xiao Chen commented on HADOOP-14445:


Patch 6 to fix pre-commit errors.

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-08 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Attachment: HADOOP-14445.06.patch

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.009.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15234) NPE when initializing KMSWebApp

2018-03-08 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392402#comment-16392402
 ] 

Xiao Chen commented on HADOOP-15234:


Thanks all for the comments.
bq. Sorry for asking you to make 2-3 revisions for such a simple patch.
While I appreciate the friendliness, IMO there is nothing to be sorry about. 
We're all trying to solve the problem in the best way. Even if the problem 
appears simple, as it turned out here the possible solutions are many, and I 
think it's common to iterate over a few patches - it's simply development. :)

bq. without unit tests
I'm okay here given this is a supportability improvement.

bq. Preconditions
Since here we're just doing a one-off at service startup time, which is a rare 
operation and not performance critical, I'd vote for readability. Null check 
and throw is fine by me too.

bq. should we throw in the implementation of 
Maybe I misunderstood, please let me know if so.
The factory [depends 
on|https://github.com/apache/hadoop/blob/113f401f41ee575cb303ceb647bc243108d93a04/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProviderFactory.java#L97]
 the null check to create the correct provider out of all the factories loaded 
by service loader. So throwing in one of them would not work. 
It may be reasonable to throw instead of returning null in 
{{KeyProviderFactory#get}}, but that class is {{InterfaceAudience.Public}}.

I do have 1 comment on the patch:
Can we add {{providerString}} to the message being thrown, so the exception is 
more self-explaining?

> NPE when initializing KMSWebApp
> ---
>
> Key: HADOOP-15234
> URL: https://issues.apache.org/jira/browse/HADOOP-15234
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Reporter: Xiao Chen
>Assignee: fang zhenyi
>Priority: Major
> Attachments: HADOOP-15234.001.patch, HADOOP-15234.002.patch
>
>
> During KMS startup, if the {{keyProvider}} is null, it will NPE inside 
> KeyProviderExtension.
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.crypto.key.KeyProviderExtension.(KeyProviderExtension.java:43)
>   at 
> org.apache.hadoop.crypto.key.CachingKeyProvider.(CachingKeyProvider.java:93)
>   at 
> org.apache.hadoop.crypto.key.kms.server.KMSWebApp.contextInitialized(KMSWebApp.java:170)
> {noformat}
> We're investigating the exact scenario that could lead to this, but the NPE 
> and log around it can be improved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories

2018-03-08 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392299#comment-16392299
 ] 

Aaron Fabbri commented on HADOOP-15209:
---

Noticed you just mentioned cancelling the patch, nevermind on my last "is it 
ready" comment.

My first feedback is about CopyCommitter#deleteMissing().  The goal seems to be 
to reduce no-op deletes, but you have 3 retries with 1 second sleeps on failed 
deletes.  Ideally we'd only do that for S3, or add a config flag (default 
false) to enable retries there.  Really we should be able to query the FS for 
capabilities and do retry for eventual consistent stores.

Just ping me when you think this is ready to commit and I'll re-review.

> DistCp to eliminate needless deletion of files under already-deleted 
> directories
> 
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, 
> HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, 
> HADOOP-15209-006.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12897) KerberosAuthenticator.authenticate to include URL on IO failures

2018-03-08 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392206#comment-16392206
 ] 

Arpit Agarwal commented on HADOOP-12897:


Pre-commit only runs unit tests for the changed module to save time.

> KerberosAuthenticator.authenticate to include URL on IO failures
> 
>
> Key: HADOOP-12897
> URL: https://issues.apache.org/jira/browse/HADOOP-12897
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-12897.001.patch, HADOOP-12897.002.patch, 
> HADOOP-12897.003.patch, HADOOP-12897.004.patch, HADOOP-12897.005.patch, 
> HADOOP-12897.006.patch, HADOOP-12897.007.patch
>
>
> If {{KerberosAuthenticator.authenticate}} can't connect to the endpoint, you 
> get a stack trace, but without the URL it is trying to talk to.
> That is: it doesn't have any equivalent of the {{NetUtils.wrapException}} 
> handler —which can't be called here as its not in the {{hadoop-auth}} module



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12897) KerberosAuthenticator.authenticate to include URL on IO failures

2018-03-08 Thread Ajay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392205#comment-16392205
 ] 

Ajay Kumar commented on HADOOP-12897:
-

[~xiaochen],[~arpitagarwal] Sorry i missed this earlier. Not sure why 
pre-commit didn't caught it. Thanks for fixing it.

> KerberosAuthenticator.authenticate to include URL on IO failures
> 
>
> Key: HADOOP-12897
> URL: https://issues.apache.org/jira/browse/HADOOP-12897
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-12897.001.patch, HADOOP-12897.002.patch, 
> HADOOP-12897.003.patch, HADOOP-12897.004.patch, HADOOP-12897.005.patch, 
> HADOOP-12897.006.patch, HADOOP-12897.007.patch
>
>
> If {{KerberosAuthenticator.authenticate}} can't connect to the endpoint, you 
> get a stack trace, but without the URL it is trying to talk to.
> That is: it doesn't have any equivalent of the {{NetUtils.wrapException}} 
> handler —which can't be called here as its not in the {{hadoop-auth}} module



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

2018-03-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392198#comment-16392198
 ] 

Íñigo Goiri commented on HADOOP-13144:
--

I think I'm too used to the HDFS runs... Two full +1 from Yetus in a day!

> Enhancing IPC client throughput via multiple connections per user
> -
>
> Key: HADOOP-13144
> URL: https://issues.apache.org/jira/browse/HADOOP-13144
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Jason Kace
>Assignee: Íñigo Goiri
>Priority: Minor
> Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch, 
> HADOOP-13144.002.patch
>
>
> The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single 
> connection thread for each {{ConnectionId}}.  The {{ConnectionId}} is unique 
> to the connection's remote address, ticket and protocol.  Each ConnectionId 
> is 1:1 mapped to a connection thread by the client via a map cache.
> The result is to serialize all IPC read/write activity through a single 
> thread for a each user/ticket + address.  If a single user makes repeated 
> calls (1k-100k/sec) to the same destination, the IPC client becomes a 
> bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12897) KerberosAuthenticator.authenticate to include URL on IO failures

2018-03-08 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392186#comment-16392186
 ] 

Arpit Agarwal commented on HADOOP-12897:


Thanks for committing the fix [~xiaochen].

> KerberosAuthenticator.authenticate to include URL on IO failures
> 
>
> Key: HADOOP-12897
> URL: https://issues.apache.org/jira/browse/HADOOP-12897
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-12897.001.patch, HADOOP-12897.002.patch, 
> HADOOP-12897.003.patch, HADOOP-12897.004.patch, HADOOP-12897.005.patch, 
> HADOOP-12897.006.patch, HADOOP-12897.007.patch
>
>
> If {{KerberosAuthenticator.authenticate}} can't connect to the endpoint, you 
> get a stack trace, but without the URL it is trying to talk to.
> That is: it doesn't have any equivalent of the {{NetUtils.wrapException}} 
> handler —which can't be called here as its not in the {{hadoop-auth}} module



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-08 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392185#comment-16392185
 ] 

Arpit Agarwal commented on HADOOP-15280:


Thanks for taking care of this [~bharatviswa] and [~xiaochen].

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392178#comment-16392178
 ] 

genericqa commented on HADOOP-13144:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
39s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-13144 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913669/HADOOP-13144.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7825417f3dac 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 113f401 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14286/testReport/ |
| Max. process+thread count | 1355 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14286/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Enhancing IPC client throughput via multiple connections per user
> -
>
> 

[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9

2018-03-08 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392125#comment-16392125
 ] 

Akira Ajisaka commented on HADOOP-15293:


+1, LGTM

> TestLogLevel fails on Java 9
> 
>
> Key: HADOOP-15293
> URL: https://issues.apache.org/jira/browse/HADOOP-15293
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
> Environment: Applied HADOOP-12760 and HDFS-11610
>Reporter: Akira Ajisaka
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HADOOP-15293.1.patch, HADOOP-15293.2.patch
>
>
> {noformat}
> [INFO] Running org.apache.hadoop.log.TestLogLevel
> [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 
> s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel
> [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel)  
> Time elapsed: 1.179 s  <<< FAILURE!
> java.lang.AssertionError: 
>  Expected to find 'Unrecognized SSL message' but got unexpected exception: 
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
>   at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

2018-03-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HADOOP-13144:
-
Attachment: HADOOP-13144.002.patch

> Enhancing IPC client throughput via multiple connections per user
> -
>
> Key: HADOOP-13144
> URL: https://issues.apache.org/jira/browse/HADOOP-13144
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Jason Kace
>Assignee: Íñigo Goiri
>Priority: Minor
> Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch, 
> HADOOP-13144.002.patch
>
>
> The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single 
> connection thread for each {{ConnectionId}}.  The {{ConnectionId}} is unique 
> to the connection's remote address, ticket and protocol.  Each ConnectionId 
> is 1:1 mapped to a connection thread by the client via a map cache.
> The result is to serialize all IPC read/write activity through a single 
> thread for a each user/ticket + address.  If a single user makes repeated 
> calls (1k-100k/sec) to the same destination, the IPC client becomes a 
> bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15297) Make S3A etag => checksum feature optional

2018-03-08 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392043#comment-16392043
 ] 

Devaraj Das commented on HADOOP-15297:
--

Seems fine except for a minor issue.. There is an empty test() method that you 
should remove.

> Make S3A etag => checksum feature optional
> --
>
> Key: HADOOP-15297
> URL: https://issues.apache.org/jira/browse/HADOOP-15297
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15297-001.patchh
>
>
> HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with 
> checksums.
> Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to 
> s3a.
> Rather than revert  I want to make it an option, off by default. Once we are 
> happy with distcp in future, we can turn it on.
> Why an option? Because it lines up for a successor to distcp which saves src 
> and dest checksums to a file and can then verify whether or not files have 
> really changed. Currently distcp relies on dest checksum algorithm being the 
> same as the src for incremental updates, but if either of the stores don't 
> serve checksums, silently downgrades to not checking. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories

2018-03-08 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392038#comment-16392038
 ] 

Aaron Fabbri commented on HADOOP-15209:
---

I will try to review / test this today.

> DistCp to eliminate needless deletion of files under already-deleted 
> directories
> 
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, 
> HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, 
> HADOOP-15209-006.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories

2018-03-08 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392038#comment-16392038
 ] 

Aaron Fabbri edited comment on HADOOP-15209 at 3/8/18 10:42 PM:


I will try to review / test this today. You feel like this is ready to commit?


was (Author: fabbri):
I will try to review / test this today.

> DistCp to eliminate needless deletion of files under already-deleted 
> directories
> 
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, 
> HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, 
> HADOOP-15209-006.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15299) Bump Hadoop's Jackson 2 dependency 2.9.x

2018-03-08 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15299:
---
Description: There are a few new CVEs open against Jackson 2.7.x. It 
doesn't (necessarily) mean Hadoop is vulnerable to the attack - I don't know 
that it is, but fixes were released for Jackson 2.8.x and 2.9.x but not 2.7.x 
(which we're on). We shouldn't be on an unmaintained line, regardless. HBase is 
already on 2.9.x, we have a shaded client now, the API changes are relatively 
minor and so far in my testing I haven't seen any problems. I think many of our 
usual reasons to hesitate upgrading this dependency don't apply.  (was: There 
are a few new CVEs open against Jackson 2.7.x. It doesn't (necessarily) mean 
Hadoop is vulnerable to the attack - I don't know that it is, but fixes were 
released for 2.8.x and 2.9.x but not 2.7.x (which we're on). We shouldn't be on 
an unmaintained line, regardless. HBase is already on 2.9.x, we have a shaded 
client now, the API changes are relatively minor and so far in my testing I 
haven't seen any problems. I think many of our usual reasons to hesitate 
upgrading this dependency don't apply.)

> Bump Hadoop's Jackson 2 dependency 2.9.x
> 
>
> Key: HADOOP-15299
> URL: https://issues.apache.org/jira/browse/HADOOP-15299
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
>
> There are a few new CVEs open against Jackson 2.7.x. It doesn't (necessarily) 
> mean Hadoop is vulnerable to the attack - I don't know that it is, but fixes 
> were released for Jackson 2.8.x and 2.9.x but not 2.7.x (which we're on). We 
> shouldn't be on an unmaintained line, regardless. HBase is already on 2.9.x, 
> we have a shaded client now, the API changes are relatively minor and so far 
> in my testing I haven't seen any problems. I think many of our usual reasons 
> to hesitate upgrading this dependency don't apply.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

2018-03-08 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391984#comment-16391984
 ] 

Wei Yan commented on HADOOP-13144:
--

+1 from my side. The RPC.java is not active updated part. Maybe [~cnauroth] , 
[~steve_l] can help take a look?

> Enhancing IPC client throughput via multiple connections per user
> -
>
> Key: HADOOP-13144
> URL: https://issues.apache.org/jira/browse/HADOOP-13144
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Jason Kace
>Assignee: Íñigo Goiri
>Priority: Minor
> Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch
>
>
> The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single 
> connection thread for each {{ConnectionId}}.  The {{ConnectionId}} is unique 
> to the connection's remote address, ticket and protocol.  Each ConnectionId 
> is 1:1 mapped to a connection thread by the client via a map cache.
> The result is to serialize all IPC read/write activity through a single 
> thread for a each user/ticket + address.  If a single user makes repeated 
> calls (1k-100k/sec) to the same destination, the IPC client becomes a 
> bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391878#comment-16391878
 ] 

genericqa commented on HADOOP-13144:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
27s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 52s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 7 new + 368 unchanged - 0 fixed = 375 total (was 368) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 18s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
47s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-13144 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913638/HADOOP-13144.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a18365f71f7e 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 113f401 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14285/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14285/testReport/ |
| Max. process+thread count | 1589 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14285/console |
| Powered by | Apache Yetus 

[jira] [Commented] (HADOOP-15277) remove .FluentPropertyBeanIntrospector from CLI operation log output

2018-03-08 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391870#comment-16391870
 ] 

Devaraj Das commented on HADOOP-15277:
--

+1

> remove .FluentPropertyBeanIntrospector from CLI operation log output
> 
>
> Key: HADOOP-15277
> URL: https://issues.apache.org/jira/browse/HADOOP-15277
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15277-001.patch
>
>
> When hadoop metrics is started, a message about bean introspection appears.
> {code}
> 18/03/01 18:43:54 INFO beanutils.FluentPropertyBeanIntrospector: Error when 
> creating PropertyDescriptor for public final void 
> org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)!
>  Ignoring this property.
> {code}
> When using wasb or s3a,. this message appears in the client logs, because 
> they both start metrics
> I propose to raise the log level to ERROR for that class in log4j.properties



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-08 Thread Virajith Jalaparti (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391856#comment-16391856
 ] 

Virajith Jalaparti commented on HADOOP-15292:
-

[~ste...@apache.org], Thanks for reviewing and committing it. [~elgoiri] and 
[~chris.douglas], thanks for the reviews.

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15300) distcp -update to WASB and ADL copies up all the files, always

2018-03-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391853#comment-16391853
 ] 

Steve Loughran commented on HADOOP-15300:
-

wasb updates every time. As does adl
{code:java}
File System Counters
FILE: Number of bytes read=1640418
FILE: Number of bytes written=1636188
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
WASB: Number of bytes read=0
WASB: Number of bytes written=915753
WASB: Number of read operations=0
WASB: Number of large read operations=0
WASB: Number of write operations=0
Map-Reduce Framework
Map input records=96
Map output records=0
Input split bytes=308
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=16
Total committed heap usage (bytes)=408944640
File Input Format Counters 
Bytes Read=34752
File Output Format Counters 
Bytes Written=16
DistCp Counters
Bandwidth in Btyes=12212
Bytes Copied=461862
Bytes Expected=461862
Files Copied=69
DIR_COPY=27
  118.98 real13.33 user 1.83 sys
{code}
Updated
{code:java}
2018-03-08 15:21:44,045 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1665)) - Counters: 25
File System Counters
FILE: Number of bytes read=1635633
FILE: Number of bytes written=1630856
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
WASB: Number of bytes read=0
WASB: Number of bytes written=910462
WASB: Number of read operations=0
WASB: Number of large read operations=0
WASB: Number of write operations=0
Map-Reduce Framework
Map input records=96
Map output records=0
Input split bytes=306
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=18
Total committed heap usage (bytes)=457179136
File Input Format Counters 
Bytes Read=35264
File Output Format Counters 
Bytes Written=16
DistCp Counters
Bandwidth in Btyes=10566
Bytes Copied=461862
Bytes Expected=461862
Files Copied=69
DIR_COPY=27
  129.40 real14.55 user 2.08 sys
{code}

> distcp -update to WASB and ADL copies up all the files, always
> --
>
> Key: HADOOP-15300
> URL: https://issues.apache.org/jira/browse/HADOOP-15300
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
>
> If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the 
> source files are copied up every time. In contrast, if you use hdfs:// or 
> s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums 
> for a diff, but s3a is just returning file length and relying on distcp logic 
> being "if either src or dest doesn't do checksums, only compare file len"
> somehow that's not kicking in. Tested for file:  and hdfs sources, wasb and 
> adl dests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15300) distcp -update to WASB and ADL copies up all the files, always

2018-03-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391851#comment-16391851
 ] 

Steve Loughran commented on HADOOP-15300:
-

distcp to s3a

{code}
2018-03-08 15:09:17,385 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1658)) - Job job_local1068976850_0001 completed 
successfully
2018-03-08 15:09:17,394 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1665)) - Counters: 25
File System Counters
FILE: Number of bytes read=1622306
FILE: Number of bytes written=1634552
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
S3A: Number of bytes read=0
S3A: Number of bytes written=897647
S3A: Number of read operations=1688
S3A: Number of large read operations=0
S3A: Number of write operations=902
Map-Reduce Framework
Map input records=96
Map output records=0
Input split bytes=306
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=63
Total committed heap usage (bytes)=752877568
File Input Format Counters 
Bytes Read=34752
File Output Format Counters 
Bytes Written=16
DistCp Counters
Bandwidth in Btyes=32392
Bytes Copied=461862
Bytes Expected=461862
Files Copied=69
DIR_COPY=27
{code}

second
{code}
018-03-08 15:10:07,937 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1658)) - Job job_local864019435_0001 completed 
successfully
2018-03-08 15:10:07,944 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1665)) - Counters: 24
File System Counters
FILE: Number of bytes read=724653
FILE: Number of bytes written=1651348
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
S3A: Number of bytes read=0
S3A: Number of bytes written=0
S3A: Number of read operations=389
S3A: Number of large read operations=0
S3A: Number of write operations=0
Map-Reduce Framework
Map input records=96
Map output records=69
Input split bytes=304
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=6
Total committed heap usage (bytes)=529530880
File Input Format Counters 
Bytes Read=34752
File Output Format Counters 
Bytes Written=11169
DistCp Counters
Bandwidth in Btyes=0
Bytes Skipped=461862
DIR_COPY=27
Files Skipped=69
{code}

> distcp -update to WASB and ADL copies up all the files, always
> --
>
> Key: HADOOP-15300
> URL: https://issues.apache.org/jira/browse/HADOOP-15300
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
>
> If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the 
> source files are copied up every time. In contrast, if you use hdfs:// or 
> s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums 
> for a diff, but s3a is just returning file length and relying on distcp logic 
> being "if either src or dest doesn't do checksums, only compare file len"
> somehow that's not kicking in. Tested for file:  and hdfs sources, wasb and 
> adl dests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15300) distcp -update to WASB and ADL copies up all the files, always

2018-03-08 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15300:
---

 Summary: distcp -update to WASB and ADL copies up all the files, 
always
 Key: HADOOP-15300
 URL: https://issues.apache.org/jira/browse/HADOOP-15300
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/adl, fs/azure
Affects Versions: 3.1.0
Reporter: Steve Loughran


If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the 
source files are copied up every time. In contrast, if you use hdfs:// or 
s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums 
for a diff, but s3a is just returning file length and relying on distcp logic 
being "if either src or dest doesn't do checksums, only compare file len"

somehow that's not kicking in. Tested for file:  and hdfs sources, wasb and adl 
dests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories

2018-03-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391846#comment-16391846
 ] 

Steve Loughran commented on HADOOP-15209:
-

yes, as jenkins never runs any of the store tests...we can't give it the 
credentials. Policy for submitting a patch to a specific store is: declare the 
endpoint you ran against, and no declaration == no review. We're strict even 
with ourselves. 
Changes to the layers indirectly used by the stores (hadoop common, distcp) 
aren't so well managed. if people know they are going to interfere with a store 
then they should test.

Cancelling the current patch as I've decided that the retry logic is 
over-convoluted, I'm just going to ignore if delete(path) returns false as all 
the connectors just mean "no file there, so no operation attempted"...except 
for Ftp though.

> DistCp to eliminate needless deletion of files under already-deleted 
> directories
> 
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, 
> HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, 
> HADOOP-15209-006.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15299) Bump Hadoop's Jackson 2 dependency 2.9.x

2018-03-08 Thread Sean Mackrory (JIRA)
Sean Mackrory created HADOOP-15299:
--

 Summary: Bump Hadoop's Jackson 2 dependency 2.9.x
 Key: HADOOP-15299
 URL: https://issues.apache.org/jira/browse/HADOOP-15299
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.1.0, 3.2.0
Reporter: Sean Mackrory
Assignee: Sean Mackrory


There are a few new CVEs open against Jackson 2.7.x. It doesn't (necessarily) 
mean Hadoop is vulnerable to the attack - I don't know that it is, but fixes 
were released for 2.8.x and 2.9.x but not 2.7.x (which we're on). We shouldn't 
be on an unmaintained line, regardless. HBase is already on 2.9.x, we have a 
shaded client now, the API changes are relatively minor and so far in my 
testing I haven't seen any problems. I think many of our usual reasons to 
hesitate upgrading this dependency don't apply.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories

2018-03-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15209:

Status: Open  (was: Patch Available)

> DistCp to eliminate needless deletion of files under already-deleted 
> directories
> 
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, 
> HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, 
> HADOOP-15209-006.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small

2018-03-08 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391796#comment-16391796
 ] 

Aaron Fabbri commented on HADOOP-15206:
---

Ah.. yes, I didn't notice the read() call.  Thank you, makes sense now.

> BZip2 drops and duplicates records when input split size is small
> -
>
> Key: HADOOP-15206
> URL: https://issues.apache.org/jira/browse/HADOOP-15206
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Aki Tanaka
>Assignee: Aki Tanaka
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2
>
> Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch, 
> HADOOP-15206.002.patch, HADOOP-15206.003.patch, HADOOP-15206.004.patch, 
> HADOOP-15206.005.patch, HADOOP-15206.006.patch, HADOOP-15206.007.patch, 
> HADOOP-15206.008.patch
>
>
> BZip2 can drop and duplicate record when input split file is small. I 
> confirmed that this issue happens when the input split size is between 1byte 
> and 4bytes.
> I am seeing the following 2 problem behaviors.
>  
> 1. Drop record:
> BZip2 skips the first record in the input file when the input split size is 
> small
>  
> Set the split size to 3 and tested to load 100 records (0, 1, 2..99)
> {code:java}
> 2018-02-01 10:52:33,502 INFO  [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(317)) - 
> splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3
>  count=99{code}
> > The input format read only 99 records but not 100 records
>  
> 2. Duplicate Record:
> 2 input splits has same BZip2 records when the input split size is small
>  
> Set the split size to 1 and tested to load 100 records (0, 1, 2..99)
>  
> {code:java}
> 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file 
> /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1
>  count=99
> 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 
> at position 8
> {code}
>  
> I experienced this error when I execute Spark (SparkSQL) job under the 
> following conditions:
> * The file size of the input files are small (around 1KB)
> * Hadoop cluster has many slave nodes (able to launch many executor tasks)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391727#comment-16391727
 ] 

Hudson commented on HADOOP-15280:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13798 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13798/])
HADOOP-15280. TestKMS.testWebHDFSProxyUserKerb and (xiao: rev 
a906a226458a0b4c4b2df61d9bcf375a1d194925)
* (edit) 
hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java


> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-08 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391711#comment-16391711
 ] 

Xiao Chen commented on HADOOP-14445:


Failed tests in TestKMS are not related. TestLBKMSCP test case should be 
removed, now that we do not have the URI format configuration - will do that in 
next rev.

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

2018-03-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391669#comment-16391669
 ] 

Íñigo Goiri commented on HADOOP-13144:
--

Thanks [~ywskycn] for trying this out; I posted [^HADOOP-13144.001.patch] with 
the fixes for compilation.
I submitted the patch so Yetus should cover the next ones.

In general, I think this is touching a pretty sensitive part of the Hadoop code 
but I think the modifications are pretty minimal.
At the same time, as [~ywskycn] pointed out, it helps dramatically with the 
performance of the Routers for HDFS.
We would open a separate JIRA for the Router connection creation if this goes 
in.

Anybody available for a review?

> Enhancing IPC client throughput via multiple connections per user
> -
>
> Key: HADOOP-13144
> URL: https://issues.apache.org/jira/browse/HADOOP-13144
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Jason Kace
>Assignee: Íñigo Goiri
>Priority: Minor
> Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch
>
>
> The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single 
> connection thread for each {{ConnectionId}}.  The {{ConnectionId}} is unique 
> to the connection's remote address, ticket and protocol.  Each ConnectionId 
> is 1:1 mapped to a connection thread by the client via a map cache.
> The result is to serialize all IPC read/write activity through a single 
> thread for a each user/ticket + address.  If a single user makes repeated 
> calls (1k-100k/sec) to the same destination, the IPC client becomes a 
> bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-08 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15280:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

Thanks [~rchiang] for filing the Jira and [~bharatviswa] for the fix.

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

2018-03-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HADOOP-13144:
-
Attachment: HADOOP-13144.001.patch

> Enhancing IPC client throughput via multiple connections per user
> -
>
> Key: HADOOP-13144
> URL: https://issues.apache.org/jira/browse/HADOOP-13144
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Jason Kace
>Assignee: Íñigo Goiri
>Priority: Minor
> Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch
>
>
> The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single 
> connection thread for each {{ConnectionId}}.  The {{ConnectionId}} is unique 
> to the connection's remote address, ticket and protocol.  Each ConnectionId 
> is 1:1 mapped to a connection thread by the client via a map cache.
> The result is to serialize all IPC read/write activity through a single 
> thread for a each user/ticket + address.  If a single user makes repeated 
> calls (1k-100k/sec) to the same destination, the IPC client becomes a 
> bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

2018-03-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HADOOP-13144:
-
Assignee: Íñigo Goiri
  Status: Patch Available  (was: Open)

> Enhancing IPC client throughput via multiple connections per user
> -
>
> Key: HADOOP-13144
> URL: https://issues.apache.org/jira/browse/HADOOP-13144
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Jason Kace
>Assignee: Íñigo Goiri
>Priority: Minor
> Attachments: HADOOP-13144.000.patch, HADOOP-13144.001.patch
>
>
> The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single 
> connection thread for each {{ConnectionId}}.  The {{ConnectionId}} is unique 
> to the connection's remote address, ticket and protocol.  Each ConnectionId 
> is 1:1 mapped to a connection thread by the client via a map cache.
> The result is to serialize all IPC read/write activity through a single 
> thread for a each user/ticket + address.  If a single user makes repeated 
> calls (1k-100k/sec) to the same destination, the IPC client becomes a 
> bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-08 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391663#comment-16391663
 ] 

Xiao Chen commented on HADOOP-15280:


I was more thinking of walking into the cause of the exception and check cause 
there in the util. But don't feel strongly. Let's fix trunk tests for now. +1

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories

2018-03-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391647#comment-16391647
 ] 

Íñigo Goiri commented on HADOOP-15209:
--

The ADL related tests seem ignored 
([report|https://builds.apache.org/job/PreCommit-HADOOP-Build/14263/testReport/org.apache.hadoop.fs.adl.live/TestAdlContractDistCpLive/]).
Is that expected?

> DistCp to eliminate needless deletion of files under already-deleted 
> directories
> 
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, 
> HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, 
> HADOOP-15209-006.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391646#comment-16391646
 ] 

genericqa commented on HADOOP-15293:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
52s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15293 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913624/HADOOP-15293.2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 04f186aeffef 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7ef4d94 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14283/testReport/ |
| Max. process+thread count | 1357 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14283/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestLogLevel fails on Java 9
> 
>
> Key: HADOOP-15293
> URL: 

[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391645#comment-16391645
 ] 

genericqa commented on HADOOP-15280:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  3s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-common-project/hadoop-kms: The patch 
generated 0 new + 97 unchanged - 1 fixed = 97 total (was 98) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 59s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
59s{color} | {color:green} hadoop-kms in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15280 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913627/HADOOP-15280.01.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 360098f02dee 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7ef4d94 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14284/testReport/ |
| Max. process+thread count | 319 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-kms U: 
hadoop-common-project/hadoop-kms |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14284/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple 

[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small

2018-03-08 Thread Aki Tanaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391568#comment-16391568
 ] 

Aki Tanaka commented on HADOOP-15206:
-

In my understanding, skipBytes-- will not be executed basically.

bufferedIn.read() is executed only when skipBytes is 0, this usually means that 
the file position is at the end of the split. However, InputStream.skip says 
"Skips over and discards {{n}} bytes of data from the input stream. The 
{{skip}} method may, for a variety of reasons, end up skipping over some 
smaller number of bytes, possibly {{0}}. The actual number of bytes skipped is 
returned." 
([https://docs.oracle.com/javase/7/docs/api/java/io/FilterInputStream.html).] 
So I thought InputStream.skip() might return 0 even if the position is not at 
the end of the split. 

 

Please let me know if my understanding is wrong. Thank you.

> BZip2 drops and duplicates records when input split size is small
> -
>
> Key: HADOOP-15206
> URL: https://issues.apache.org/jira/browse/HADOOP-15206
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Aki Tanaka
>Assignee: Aki Tanaka
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2
>
> Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch, 
> HADOOP-15206.002.patch, HADOOP-15206.003.patch, HADOOP-15206.004.patch, 
> HADOOP-15206.005.patch, HADOOP-15206.006.patch, HADOOP-15206.007.patch, 
> HADOOP-15206.008.patch
>
>
> BZip2 can drop and duplicate record when input split file is small. I 
> confirmed that this issue happens when the input split size is between 1byte 
> and 4bytes.
> I am seeing the following 2 problem behaviors.
>  
> 1. Drop record:
> BZip2 skips the first record in the input file when the input split size is 
> small
>  
> Set the split size to 3 and tested to load 100 records (0, 1, 2..99)
> {code:java}
> 2018-02-01 10:52:33,502 INFO  [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(317)) - 
> splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3
>  count=99{code}
> > The input format read only 99 records but not 100 records
>  
> 2. Duplicate Record:
> 2 input splits has same BZip2 records when the input split size is small
>  
> Set the split size to 1 and tested to load 100 records (0, 1, 2..99)
>  
> {code:java}
> 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file 
> /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1
>  count=99
> 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 
> at position 8
> {code}
>  
> I experienced this error when I execute Spark (SparkSQL) job under the 
> following conditions:
> * The file size of the input files are small (around 1KB)
> * Hadoop cluster has many slave nodes (able to launch many executor tasks)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15277) remove .FluentPropertyBeanIntrospector from CLI operation log output

2018-03-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15277:

Description: 
When hadoop metrics is started, a message about bean introspection appears.
{code}
18/03/01 18:43:54 INFO beanutils.FluentPropertyBeanIntrospector: Error when 
creating PropertyDescriptor for public final void 
org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)!
 Ignoring this property.
{code}

When using wasb or s3a,. this message appears in the client logs, because they 
both start metrics

I propose to raise the log level to ERROR for that class in log4j.properties

  was:
when using the default logs, I get told off by beanutils
{code}
18/03/01 18:43:54 INFO beanutils.FluentPropertyBeanIntrospector: Error when 
creating PropertyDescriptor for public final void 
org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)!
 Ignoring this property.
{code}

This is a distraction.

I propose to raise the log level to ERROR for that class in log4j.properties


> remove .FluentPropertyBeanIntrospector from CLI operation log output
> 
>
> Key: HADOOP-15277
> URL: https://issues.apache.org/jira/browse/HADOOP-15277
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15277-001.patch
>
>
> When hadoop metrics is started, a message about bean introspection appears.
> {code}
> 18/03/01 18:43:54 INFO beanutils.FluentPropertyBeanIntrospector: Error when 
> creating PropertyDescriptor for public final void 
> org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)!
>  Ignoring this property.
> {code}
> When using wasb or s3a,. this message appears in the client logs, because 
> they both start metrics
> I propose to raise the log level to ERROR for that class in log4j.properties



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13622) `-atomic` should not be supported while using `distcp` command in object file system

2018-03-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391525#comment-16391525
 ] 

Steve Loughran commented on HADOOP-13622:
-

having been playing with distcp, I consider HADOOP-15281 a priority item as it 
means that every upload forces a rename of data, even without the -atomic 
operator

> `-atomic` should not be supported while using `distcp` command in object file 
> system
> 
>
> Key: HADOOP-13622
> URL: https://issues.apache.org/jira/browse/HADOOP-13622
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.3
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Minor
>
> After discussing with [~ste...@apache.org] in HADOOP-13593, I get the point 
> that none of the object stores support atomic renames. So I file a new jira 
> and ready to provide a patch to disable `distcp -atomic`  in object file 
> system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15297) Make S3A etag => checksum feature optional

2018-03-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15297:

Summary: Make S3A etag => checksum feature optional  (was: Make s3a etag -> 
checksum publishing option)

> Make S3A etag => checksum feature optional
> --
>
> Key: HADOOP-15297
> URL: https://issues.apache.org/jira/browse/HADOOP-15297
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15297-001.patchh
>
>
> HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with 
> checksums.
> Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to 
> s3a.
> Rather than revert  I want to make it an option, off by default. Once we are 
> happy with distcp in future, we can turn it on.
> Why an option? Because it lines up for a successor to distcp which saves src 
> and dest checksums to a file and can then verify whether or not files have 
> really changed. Currently distcp relies on dest checksum algorithm being the 
> same as the src for incremental updates, but if either of the stores don't 
> serve checksums, silently downgrades to not checking. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small

2018-03-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391522#comment-16391522
 ] 

Jason Lowe commented on HADOOP-15206:
-

skipBytes is decremented because of the read() call.  The skip() call is not 
guaranteed to be able to skip, and the workaround in that case is to try to 
read().  If the read() is successful then we were able to skip one more byte 
and need to account for that in the total number of bytes trying to be skipped.


> BZip2 drops and duplicates records when input split size is small
> -
>
> Key: HADOOP-15206
> URL: https://issues.apache.org/jira/browse/HADOOP-15206
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Aki Tanaka
>Assignee: Aki Tanaka
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2
>
> Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch, 
> HADOOP-15206.002.patch, HADOOP-15206.003.patch, HADOOP-15206.004.patch, 
> HADOOP-15206.005.patch, HADOOP-15206.006.patch, HADOOP-15206.007.patch, 
> HADOOP-15206.008.patch
>
>
> BZip2 can drop and duplicate record when input split file is small. I 
> confirmed that this issue happens when the input split size is between 1byte 
> and 4bytes.
> I am seeing the following 2 problem behaviors.
>  
> 1. Drop record:
> BZip2 skips the first record in the input file when the input split size is 
> small
>  
> Set the split size to 3 and tested to load 100 records (0, 1, 2..99)
> {code:java}
> 2018-02-01 10:52:33,502 INFO  [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(317)) - 
> splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3
>  count=99{code}
> > The input format read only 99 records but not 100 records
>  
> 2. Duplicate Record:
> 2 input splits has same BZip2 records when the input split size is small
>  
> Set the split size to 1 and tested to load 100 records (0, 1, 2..99)
>  
> {code:java}
> 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file 
> /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1
>  count=99
> 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 
> at position 8
> {code}
>  
> I experienced this error when I execute Spark (SparkSQL) job under the 
> following conditions:
> * The file size of the input files are small (around 1KB)
> * Hadoop cluster has many slave nodes (able to launch many executor tasks)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-08 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391497#comment-16391497
 ] 

Bharat Viswanadham edited comment on HADOOP-15280 at 3/8/18 4:46 PM:
-

[~xiaochen]

As we are wrapping the exception with a new message, checking the original 
cause exception will get original message. I have not added any utility methods 
to GenericTestUtils, let me know if you want to do in a different approach, 
other than proposed in the patch.
Attached v02 patch.


was (Author: bharatviswa):
As we are wrapping the exception with a new message, checking the original 
cause exception will get original message.
 Attached v02 patch.

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-08 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391497#comment-16391497
 ] 

Bharat Viswanadham commented on HADOOP-15280:
-

As we are wrapping the exception with a new message, checking the original 
cause exception will get original message.
Attached v02 to patch.

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-08 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391497#comment-16391497
 ] 

Bharat Viswanadham edited comment on HADOOP-15280 at 3/8/18 4:44 PM:
-

As we are wrapping the exception with a new message, checking the original 
cause exception will get original message.
 Attached v02 patch.


was (Author: bharatviswa):
As we are wrapping the exception with a new message, checking the original 
cause exception will get original message.
Attached v02 to patch.

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-08 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HADOOP-15280:

Attachment: HADOOP-15280.01.patch

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HADOOP-15280.00.patch, HADOOP-15280.01.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories

2018-03-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391493#comment-16391493
 ] 

Steve Loughran commented on HADOOP-15209:
-

FWIW, managed to fail distcp if you run it against an S3 store w/ simulated 
inconsistency turned on (no s3guard); operation saw duplicate entries in the 
directory listing at the destination.
{code}
2018-03-08 16:33:17,517 [Thread-131] WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(590)) - job_local148600535_0001
org.apache.hadoop.tools.CopyListing$DuplicateFileException: File 
s3a://hwdev-steve-frankfurt-new/SLOW/hadoop-auth/src and 
s3a://hwdev-steve-frankfurt-new/SLOW/hadoop-auth/src would cause duplicates. 
Aborting
at 
org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:175)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:93)
at 
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:89)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
at 
org.apache.hadoop.tools.mapred.CopyCommitter.listTargetFiles(CopyCommitter.java:575)
at 
org.apache.hadoop.tools.mapred.CopyCommitter.deleteMissing(CopyCommitter.java:402)
at 
org.apache.hadoop.tools.mapred.CopyCommitter.commitJob(CopyCommitter.java:117)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:567)
2018-03-08 16:33:18,469 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1660)) - Job job_local148600535_0001 failed with 
state FAILED due to: NA
2018-03-08 16:33:18,478 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1665)) - Counters: 25
File System Counters
FILE: Number of bytes read=1621092
FILE: Number of bytes written=1632776
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
S3A: Number of bytes read=0
S3A: Number of bytes written=895927
S3A: Number of read operations=1673
S3A: Number of large read operations=0
S3A: Number of write operations=904
Map-Reduce Framework
Map input records=96
{code}

I'm not going to fix that here

> DistCp to eliminate needless deletion of files under already-deleted 
> directories
> 
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, 
> HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, 
> HADOOP-15209-006.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

2018-03-08 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391491#comment-16391491
 ] 

Wei Yan commented on HADOOP-13144:
--

Thanks for the patch [~elgoiri]. I tried it yesterday and it worked well. The 
Router RPC throughput has been largely improved, and RPC handlers are not 
blocked on the connection itself. BTW, it also needs to add new function 
implementation in classed ProtobufRpcEngine and TestRPC.StoppedRpcEngine.

> Enhancing IPC client throughput via multiple connections per user
> -
>
> Key: HADOOP-13144
> URL: https://issues.apache.org/jira/browse/HADOOP-13144
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Jason Kace
>Priority: Minor
> Attachments: HADOOP-13144.000.patch
>
>
> The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single 
> connection thread for each {{ConnectionId}}.  The {{ConnectionId}} is unique 
> to the connection's remote address, ticket and protocol.  Each ConnectionId 
> is 1:1 mapped to a connection thread by the client via a map cache.
> The result is to serialize all IPC read/write activity through a single 
> thread for a each user/ticket + address.  If a single user makes repeated 
> calls (1k-100k/sec) to the same destination, the IPC client becomes a 
> bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13023) Distcp with -update feature on first time raw data not working

2018-03-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13023:

Component/s: tools/distcp

> Distcp with -update feature on first time raw data not working
> --
>
> Key: HADOOP-13023
> URL: https://issues.apache.org/jira/browse/HADOOP-13023
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.6.0
>Reporter: Mavin Martin
>Priority: Major
>
> When attempting to do a distcp with the -update feature toggled on encrypted 
> data, the distcp shows as successful.  Reading the encrypted file on the 
> target_path does not work since the keyName does not exist.  
> Please see my example to reproduce the issue.
> {code}
> [root@xxx bin]# hdfs crypto -listZones
> /tmp/a/tedDEF00013
> [root@xxx bin]# hdfs dfs -ls -R /tmp
> drwxr-xr-x   - xxx xxx  0 2016-04-14 00:22 /tmp/a
> drwxr-xr-x   - xxx xxx  0 2016-04-14 00:00 /tmp/a/ted
> -rw-r--r--   3 xxx xxx 33 2016-04-14 00:00 /tmp/a/ted/test.txt
> [root@xxx bin]# hadoop distcp -update /.reserved/raw/tmp/a/ted 
> /.reserved/raw/tmp/a-with-update/ted
> [root@xxx bin]# hdfs crypto -listZones
> /tmp/a/tedDEF00013
> [root@xxx bin]# hadoop distcp /.reserved/raw/tmp/a/ted 
> /.reserved/raw/tmp/a-no-update/ted
> [root@xxx bin]# hdfs crypto -listZones
> /tmp/a/tedDEF00013
> /tmp/a-no-update/ted  DEF00013
> {code}
> The crypto zone for 'a-with-update' should have been created since this is a 
> new destination.  You can verify this by looking at 'a-no-update'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9

2018-03-08 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391474#comment-16391474
 ] 

Takanobu Asanuma commented on HADOOP-15293:
---

Thanks for the review, [~ste...@apache.org]. Actually, I was wondering that way 
or the 1st patch's approach. 

Uploaded a new patch. Surely this is simpler.

> TestLogLevel fails on Java 9
> 
>
> Key: HADOOP-15293
> URL: https://issues.apache.org/jira/browse/HADOOP-15293
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
> Environment: Applied HADOOP-12760 and HDFS-11610
>Reporter: Akira Ajisaka
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HADOOP-15293.1.patch, HADOOP-15293.2.patch
>
>
> {noformat}
> [INFO] Running org.apache.hadoop.log.TestLogLevel
> [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 
> s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel
> [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel)  
> Time elapsed: 1.179 s  <<< FAILURE!
> java.lang.AssertionError: 
>  Expected to find 'Unrecognized SSL message' but got unexpected exception: 
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
>   at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15293) TestLogLevel fails on Java 9

2018-03-08 Thread Takanobu Asanuma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HADOOP-15293:
--
Attachment: HADOOP-15293.2.patch

> TestLogLevel fails on Java 9
> 
>
> Key: HADOOP-15293
> URL: https://issues.apache.org/jira/browse/HADOOP-15293
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
> Environment: Applied HADOOP-12760 and HDFS-11610
>Reporter: Akira Ajisaka
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HADOOP-15293.1.patch, HADOOP-15293.2.patch
>
>
> {noformat}
> [INFO] Running org.apache.hadoop.log.TestLogLevel
> [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 
> s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel
> [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel)  
> Time elapsed: 1.179 s  <<< FAILURE!
> java.lang.AssertionError: 
>  Expected to find 'Unrecognized SSL message' but got unexpected exception: 
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
>   at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15298) provide non-zero default for the Azure rename & delete thread pool sizes

2018-03-08 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15298:
---

 Summary: provide non-zero default for the Azure rename & delete 
thread pool sizes
 Key: HADOOP-15298
 URL: https://issues.apache.org/jira/browse/HADOOP-15298
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure, tools/distcp
Affects Versions: 3.0.0
Reporter: Steve Loughran


If you provide non-zero values for the rename & delete threads, distcp gets 
faster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15158) AliyunOSS: Supports role based credential in URL

2018-03-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391230#comment-16391230
 ] 

Steve Loughran commented on HADOOP-15158:
-

Is this adding the idea of putting user:secret into the URI? If so, I'm going 
to have to -1 it on security grounds.

If you look at HADOOP-3733 you can see the effort I had to put in to try and 
keep secrets embedded in s3n/s3a URLs out of logs, and even then failed. If you 
put confidental secrets in URLs, they get into Paths, which get into error 
messages and stack traces, and so into bug reports. I know this, I've seen it. 
It's why I'm getting close to cutting the user:secret feature from S3A 
entirely. except if users explicity enable it with an option to make clear you 
shouldn't be doing it "fs.s3a.dangerous.secrets.in.uris".

S3a does per-bucket settings on URIs & lets you hide secrets in URLs, ADL had 
just added this (HADOOP-13972). I believe this is the better way to do it, as 
it also lets you tune any other option  on a container-by-container basis

> AliyunOSS: Supports role based credential in URL
> 
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, 
> HADOOP-15158.003.patch, HADOOP-15158.004.patch, HADOOP-15158.005.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9

2018-03-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391227#comment-16391227
 ] 

Steve Loughran commented on HADOOP-15293:
-

not that pretty, Why not look for if you looked for the string "recognized SSL 
message" and you'll have a partial match on both?

> TestLogLevel fails on Java 9
> 
>
> Key: HADOOP-15293
> URL: https://issues.apache.org/jira/browse/HADOOP-15293
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
> Environment: Applied HADOOP-12760 and HDFS-11610
>Reporter: Akira Ajisaka
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HADOOP-15293.1.patch
>
>
> {noformat}
> [INFO] Running org.apache.hadoop.log.TestLogLevel
> [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 
> s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel
> [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel)  
> Time elapsed: 1.179 s  <<< FAILURE!
> java.lang.AssertionError: 
>  Expected to find 'Unrecognized SSL message' but got unexpected exception: 
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
>   at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-08 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391154#comment-16391154
 ] 

Rushabh S Shah commented on HADOOP-15292:
-

[~ste...@apache.org]: This seems like a performance improvement to distcp tool.
Should we backport to branch-2.8 also ?

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391140#comment-16391140
 ] 

Hudson commented on HADOOP-15273:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13797 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13797/])
HADOOP-15273.distcp can't handle remote stores with different checksum (stevel: 
rev 7ef4d942dd96232b0743a40ed25f77065254f94d)
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
* (edit) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java


> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391120#comment-16391120
 ] 

Hudson commented on HADOOP-15292:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13796 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13796/])
HADOOP-15292. Distcp's use of pread is slowing it down. Contributed by (stevel: 
rev 3bd6b1fd85c44354c777ef4fda6415231505b2a4)
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
* (edit) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java


> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

committed to branch-3.1+; reran copy mapper test first.
 
Test-wise, this shows we need some more realistic store distcp tests, 
specifically: need to set HDFS <--> store rather than just local <--> store. 
And also: intra-store, inter-store. Which will make it a fairly complex piece 
of work.

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15292:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

+1 committed. Ran the S3A distcp test first.

 

thanks for finding & fixing this Virajith!

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-15292:
---

Assignee: Virajith Jalaparti

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390999#comment-16390999
 ] 

genericqa commented on HADOOP-14445:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m  
2s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 10s{color} | {color:orange} hadoop-common-project: The patch generated 12 
new + 288 unchanged - 7 fixed = 300 total (was 295) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  3s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m 13s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  4m 33s{color} 
| {color:red} hadoop-kms in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
39s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 11s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.crypto.key.kms.TestLoadBalancingKMSClientProvider 
|
|   | hadoop.crypto.key.kms.server.TestKMS |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-14445 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913556/HADOOP-14445.05.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux e8eba6ba1c08 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HADOOP-15296) Fix a wrong link for RBF in the top page

2018-03-08 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390984#comment-16390984
 ] 

Takanobu Asanuma commented on HADOOP-15296:
---

Thanks for reviewing and committing it, [~linyiqun]!

> Fix a wrong link for RBF in the top page
> 
>
> Key: HADOOP-15296
> URL: https://issues.apache.org/jira/browse/HADOOP-15296
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 3.1.0, 3.2.0, 3.0.2
>
> Attachments: HADOOP-15296.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390932#comment-16390932
 ] 

genericqa commented on HADOOP-14999:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
3s{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HADOOP-14999 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-14999 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913562/diff-between-patch7-and-patch8.txt
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14282/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390910#comment-16390910
 ] 

Genmao Yu edited comment on HADOOP-14999 at 3/8/18 8:42 AM:


Thanks for [~Sammi] 's review. 
 1. comment-1: remove unused config and refine some config
 2. comment-2: fixed
 3. comment-3: Sorry, any problems? All style check passed.
{code:java}
Preconditions.checkArgument(v >= min,
String.format("Value of %s: %d is below the minimum value %d",
key, v, min));
{code}
4. comment-4: update unit test
 5. comment-5: IMHO, It is too large to test 5GB in integration test. And 
{{MULTIPART_UPLOAD_SIZE}} may cover this case as you mentioned.
 6. "But they are not cleaned when exception happens during the write() 
process.": all temp files are {{deleteOnExit}}, but I also add the resource 
clean logic in {{try-finally}}

performance test: test file upload
|file size|before patch|after patch (with 4 parallelism)|
|10MB|1.03s|1.1s|
|100MB|6.5s|2.3s|
|1GB|56.5s|13.5s|
|10GB|574s|173s|


was (Author: unclegen):
Thanks for [~Sammi] 's review. 
1. comment-1: remove unused config and refine some config
2. comment-2: fixed
3. comment-3: Sorry, any problems?

{code}
Preconditions.checkArgument(v >= min,
String.format("Value of %s: %d is below the minimum value %d",
key, v, min));
{code} 
4. comment-4: update unit test
5. comment-5: IMHO, It is too large to test 5GB in integration test. And 
{{MULTIPART_UPLOAD_SIZE}} may cover this case as you mentioned.
6. "But they are not cleaned when exception happens during the write() 
process.": all temp files are {{deleteOnExit}}, but I also add the resource 
clean logic in {{try-finally}}


performance test: test file upload

|file size|before patch|after patch (with 4 parallelism)|
|10MB|1.03s|1.1s|
|100MB|6.5s|2.3s|
|1GB|56.5s|13.5s|
|10GB|574s|173s|


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15296) Fix a wrong link for RBF in the top page

2018-03-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390917#comment-16390917
 ] 

Hudson commented on HADOOP-15296:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13794 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13794/])
HADOOP-15296. Fix a wrong link for RBF in the top page. Contributed by (yqlin: 
rev 4cc9a6d9bb34329d6de30706d5432c7cb675bb88)
* (edit) hadoop-project/src/site/markdown/index.md.vm


> Fix a wrong link for RBF in the top page
> 
>
> Key: HADOOP-15296
> URL: https://issues.apache.org/jira/browse/HADOOP-15296
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 3.1.0, 3.2.0, 3.0.2
>
> Attachments: HADOOP-15296.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390912#comment-16390912
 ] 

Genmao Yu commented on HADOOP-14999:


All the tests passed against "oss-cn-shanghai.aliyuncs.com

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390911#comment-16390911
 ] 

Genmao Yu commented on HADOOP-14999:


diff-between-patch7-and-patch8.txt shows the detailed changes

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: diff-between-patch7-and-patch8.txt

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390910#comment-16390910
 ] 

Genmao Yu commented on HADOOP-14999:


Thanks for [~Sammi] 's review. 
1. comment-1: remove unused config and refine some config
2. comment-2: fixed
3. comment-3: Sorry, any problems?

{code}
Preconditions.checkArgument(v >= min,
String.format("Value of %s: %d is below the minimum value %d",
key, v, min));
{code} 
4. comment-4: update unit test
5. comment-5: IMHO, It is too large to test 5GB in integration test. And 
{{MULTIPART_UPLOAD_SIZE}} may cover this case as you mentioned.
6. "But they are not cleaned when exception happens during the write() 
process.": all temp files are {{deleteOnExit}}, but I also add the resource 
clean logic in {{try-finally}}


performance test: test file upload

|file size|before patch|after patch (with 4 parallelism)|
|10MB|1.03s|1.1s|
|100MB|6.5s|2.3s|
|1GB|56.5s|13.5s|
|10GB|574s|173s|


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.008.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15296) Fix a wrong link for RBF in the top page

2018-03-08 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HADOOP-15296:
---
Affects Version/s: 3.0.0
 Target Version/s: 3.1.0, 3.2.0, 3.0.2
Fix Version/s: 3.0.2
   3.2.0
   3.1.0

Committed to trunk, branch-3.1 and branch-3.0. Thanks [~tasanuma0829] for the 
contribution!

> Fix a wrong link for RBF in the top page
> 
>
> Key: HADOOP-15296
> URL: https://issues.apache.org/jira/browse/HADOOP-15296
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 3.1.0, 3.2.0, 3.0.2
>
> Attachments: HADOOP-15296.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15296) Fix a wrong link for RBF in the top page

2018-03-08 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HADOOP-15296:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Fix a wrong link for RBF in the top page
> 
>
> Key: HADOOP-15296
> URL: https://issues.apache.org/jira/browse/HADOOP-15296
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 3.1.0, 3.2.0, 3.0.2
>
> Attachments: HADOOP-15296.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org