[jira] [Updated] (HADOOP-15409) S3AFileSystem.verifyBucketExists to move to s3.doesBucketExistV2

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15409:

Priority: Blocker  (was: Major)

tagging as blocker for 3.2; my task is to review this

> S3AFileSystem.verifyBucketExists to move to s3.doesBucketExistV2
> 
>
> Key: HADOOP-15409
> URL: https://issues.apache.org/jira/browse/HADOOP-15409
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Blocker
>
> in S3AFileSystem.initialize(), we check for the bucket existing with 
> verifyBucketExists(), which calls s3.doesBucketExist(). But that doesn't 
> check for auth issues. 
> s3. doesBucketExistV2() does at least validate credentials, and should be 
> switched to. This will help things fail faster 
> See SPARK-24000



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15609) Retry KMS calls when SSLHandshakeException occurs

2018-07-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555153#comment-16555153
 ] 

Hudson commented on HADOOP-15609:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14633 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14633/])
HADOOP-15609. Retry KMS calls when SSLHandshakeException occurs. (xiao: rev 
81d59506e539673edde12e19c0df5c2edd9d02ad)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java


> Retry KMS calls when SSLHandshakeException occurs
> -
>
> Key: HADOOP-15609
> URL: https://issues.apache.org/jira/browse/HADOOP-15609
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common, kms
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Fix For: 3.1.1, 3.0.4, 3.2
>
> Attachments: HADOOP-15609.001.patch, HADOOP-15609.002.patch, 
> HADOOP-15609.003.patch, HADOOP-15609.004.patch
>
>
> KMS call should retry when javax.net.ssl.SSLHandshakeException occurs and 
> FailoverOnNetworkExceptionRetry policy is used.
> For example in the following stack trace, we can see that the KMS Provider's 
> connection is lost, an SSLHandshakeException is thrown and the operation is 
> not retried:
> {code}
> W0711 18:19:50.213472  1508 LoadBalancingKMSClientProvider.java:132] KMS 
> provider at [https://example.com:16000/kms/v1/] threw an IOException:
> Java exception follows:
> javax.net.ssl.SSLHandshakeException: Remote host closed connection during 
> handshake
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1002)
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
> at 
> sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
> at 
> sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291)
> at 
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:512)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:502)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:791)
> at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:288)
> at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:284)
> at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:124)
> at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:284)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:532)
> at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:927)
> at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedInputStream(DFSClient.java:946)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:316)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:311)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:323)
> Caused by: java.io.EOFException: SSL peer shut down incorrectly
> at sun.security.ssl.InputRecord.read(InputRecord.java:505)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
> ... 22 more
> W0711 18:19:50.239328  1508 LoadBalancingKMSClientProvider.java:149] Aborting 
> since the Request has failed with all KMS providers(depending on 
> hadoop.security.kms.client.failover.max.retries=1 setting and numProviders=1) 
> in the group OR the exception is not recoverable
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HADOOP-15592) AssumedRoleCredentialProvider to propagate connection settings of S3A FS

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15592.
-
Resolution: Duplicate

> AssumedRoleCredentialProvider to propagate connection settings of S3A FS
> 
>
> Key: HADOOP-15592
> URL: https://issues.apache.org/jira/browse/HADOOP-15592
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> The Assumed Role stuff of HADOOP-15141 doesn't pass down the various timeout 
> options to the STS connection it builds up. That's OK For testing, but not in 
> production, not if things play up (or you want to set a proxy).
> Proposed: we will have to use that painful builder API so as to be able to 
> set the aws client config up



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555154#comment-16555154
 ] 

Xiao Chen commented on HADOOP-15593:


Thanks for the clarification [~eyang].

According to the JDK issue, I think we should treat null endTime the same as 
tgt destroyed here.
{code}
- * @return the expiration time for this ticket's validity period.
+ * @return the expiration time for this ticket's validity period,
+ * or {@code null} if destroyed.
  */
 public final java.util.Date getEndTime() {
-return (Date) endTime.clone();
+return (endTime == null) ? null : (Date) endTime.clone();
 }
{code}

> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: https://issues.apache.org/jira/browse/HADOOP-15593
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Blocker
> Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, 
> HADOOP-15593.003.patch, HADOOP-15593.004.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for f...@example.com,5,main]
> java.lang.NullPointerException
> at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
> at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
> at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555140#comment-16555140
 ] 

Eric Yang commented on HADOOP-15593:


[~xiaochen] If tgt.getEndTime().getTime() captured the NullPointerException, 
tgtEndTime = now, and nextRefresh is always smaller than now, renewal thread 
will not try to renew once more, and thread stops earlier than expected.

> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: https://issues.apache.org/jira/browse/HADOOP-15593
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Blocker
> Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, 
> HADOOP-15593.003.patch, HADOOP-15593.004.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for f...@example.com,5,main]
> java.lang.NullPointerException
> at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
> at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
> at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555136#comment-16555136
 ] 

Steve Loughran commented on HADOOP-15426:
-

there's also the option of using an inconsistent DDB connector, as we do for 
the S3 client. I'm semi-motivated to do that. 

Good: we can replicate the failures at different steps, including partial 
deletes

Bad: extra work.

As I can currently replicate it locally, I've got a nice setup where i can 
simply crank back the capacity and generate load > it can handle, so trigger 
the real failure. We can't do that on S3, which is why the inconsistent client 
is needed.

I think I could just do a scale test which made IO calls on the DDB tables 
without the delays of mixing in S3 calls in between...and I could run this 
within an existing DDB table too simply by using a store URL different from 
normal.

> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15426:

Priority: Blocker  (was: Major)

> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554997#comment-16554997
 ] 

Steve Loughran edited comment on HADOOP-15426 at 7/25/18 4:55 AM:
--

This is pretty major, as it really means: do the retry logic for the DDB 
metastore, adding in the ability to choose a different throttle policy for DDB 
than from S3 (on the assumption that it takes serious effort to throttle S3, 
but for DDB it can happen from an underprovisioned table, so can be seen more 
often: need to add more retries, more backoff before giving up)


was (Author: ste...@apache.org):
This is pretty major, as it really means: do the retry logic for the DDB 
metastore, adding in the ability to choose a different throttle policy for DDB 
than from S3 (on the assumption that it takes serious effort to throttle S3, 
but for DDB it can happen from an underprovisioned table, so can be seen more 
often: need to add more retries, more backoff before giving up)

 

 

> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15609) Retry KMS calls when SSLHandshakeException occurs

2018-07-24 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15609:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2
   3.0.4
   3.1.1
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-3.[0-1].
Thanks [~knanasi] for the work here, and all for reviews / comments!

> Retry KMS calls when SSLHandshakeException occurs
> -
>
> Key: HADOOP-15609
> URL: https://issues.apache.org/jira/browse/HADOOP-15609
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common, kms
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Fix For: 3.1.1, 3.0.4, 3.2
>
> Attachments: HADOOP-15609.001.patch, HADOOP-15609.002.patch, 
> HADOOP-15609.003.patch, HADOOP-15609.004.patch
>
>
> KMS call should retry when javax.net.ssl.SSLHandshakeException occurs and 
> FailoverOnNetworkExceptionRetry policy is used.
> For example in the following stack trace, we can see that the KMS Provider's 
> connection is lost, an SSLHandshakeException is thrown and the operation is 
> not retried:
> {code}
> W0711 18:19:50.213472  1508 LoadBalancingKMSClientProvider.java:132] KMS 
> provider at [https://example.com:16000/kms/v1/] threw an IOException:
> Java exception follows:
> javax.net.ssl.SSLHandshakeException: Remote host closed connection during 
> handshake
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1002)
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
> at 
> sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
> at 
> sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291)
> at 
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:512)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:502)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:791)
> at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:288)
> at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:284)
> at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:124)
> at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:284)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:532)
> at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:927)
> at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedInputStream(DFSClient.java:946)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:316)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:311)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:323)
> Caused by: java.io.EOFException: SSL peer shut down incorrectly
> at sun.security.ssl.InputRecord.read(InputRecord.java:505)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
> ... 22 more
> W0711 18:19:50.239328  1508 LoadBalancingKMSClientProvider.java:149] Aborting 
> since the Request has failed with all KMS providers(depending on 
> hadoop.security.kms.client.failover.max.retries=1 setting and numProviders=1) 
> in the group OR the exception is not recoverable
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555116#comment-16555116
 ] 

Xiao Chen commented on HADOOP-15593:


If we did what's proposed in [my previous 
comment|https://issues.apache.org/jira/browse/HADOOP-15593?focusedCommentId=16554585=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16554585],
 the case when tgt is destroyed will be handled by the {{return}} statement.

In the rare race that the tgt gets destroyed after the code has gone after 
those lines, the rest of the logic including the nextRefresh part you pointed 
out does not depend on tgt anymore (it only depends on the local var 
{{tgtEndTime}}). We should be fine just it retry one more time and return on 
the tgt null check next time it enters the while loop.

So we don't need to change {{now > nextRefresh}} part of code. Did I miss 
anything?

> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: https://issues.apache.org/jira/browse/HADOOP-15593
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Blocker
> Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, 
> HADOOP-15593.003.patch, HADOOP-15593.004.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for f...@example.com,5,main]
> java.lang.NullPointerException
> at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
> at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
> at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15583) Stabilize S3A Assumed Role support

2018-07-24 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555083#comment-16555083
 ] 

genericqa commented on HADOOP-15583:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 27m 44s{color} 
| {color:red} root generated 1 new + 1463 unchanged - 5 fixed = 1464 total (was 
1468) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 10 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
25s{color} | {color:red} hadoop-tools/hadoop-aws generated 2 new + 0 unchanged 
- 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
11s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
38s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-aws |
|  |  org.apache.hadoop.fs.s3a.auth.RolePolicies.KMS_KEY_READ should be package 
protected  At RolePolicies.java: At RolePolicies.java:[line 49] |
|  |  org.apache.hadoop.fs.s3a.auth.RolePolicies.KMS_KEY_RW should be package 
protected  At RolePolicies.java: At RolePolicies.java:[line 44] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HADOOP-15583 |
| JIRA Patch URL | 

[jira] [Commented] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554997#comment-16554997
 ] 

Steve Loughran commented on HADOOP-15426:
-

This is pretty major, as it really means: do the retry logic for the DDB 
metastore, adding in the ability to choose a different throttle policy for DDB 
than from S3 (on the assumption that it takes serious effort to throttle S3, 
but for DDB it can happen from an underprovisioned table, so can be seen more 
often: need to add more retries, more backoff before giving up)

 

 

> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15426:

Affects Version/s: (was: 3.2.0)
   3.1.0
 Target Version/s: 3.2.0

> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work started] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-15426 started by Steve Loughran.
---
> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15629) Missing trimming in readlink in case of protocol

2018-07-24 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554989#comment-16554989
 ] 

Giovanni Matteo Fumarola commented on HADOOP-15629:
---

We found some tests failed in the current branch.

Tests in error: 
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameDirectoryAsEmptyDirectory:1008->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameDirectoryAsFile:1057->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameDirectoryAsNonEmptyDirectory:1032->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameDirectoryAsNonExistentDirectory:966->FSMainOperationsBaseTest.doTestRenameDirectoryAsNonExistentDirectory:983->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameDirectoryToItself:921->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameDirectoryToNonExistentParent:944->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameFileAsExistingDirectory:902->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameFileAsExistingFile:881->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameFileToDestinationWithParentFile:827->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameFileToExistingParent:847->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameFileToItself:856->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameFileToNonExistentDirectory:803->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestFSMainOperationsLocalFileSystem>FSMainOperationsBaseTest.testRenameNonExistentPath:779->FSMainOperationsBaseTest.rename:1136
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameDirectoryAsEmptyDirectory:1074->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameDirectoryAsFile:1123->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameDirectoryAsNonEmptyDirectory:1098->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameDirectoryAsNonExistentDirectory:1033->FileContextMainOperationsBaseTest.testRenameDirectoryAsNonExistentDirectory:1049->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameDirectoryToItself:994->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameDirectoryToNonExistentParent:1017->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameFileAsExistingDirectory:975->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameFileAsExistingFile:954->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameFileToDestinationWithParentFile:900->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 >FileContextMainOperationsBaseTest.testRenameFileToExistingParent:920->FileContextMainOperationsBaseTest.rename:1194
 > � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameFileToItself:929->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameFileToNonExistentDirectory:876->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestLocalFSFileContextMainOperations>FileContextMainOperationsBaseTest.testRenameNonExistentPath:853->FileContextMainOperationsBaseTest.rename:1194
 � InvalidPath
 
TestSymlinkLocalFSFileContext>TestSymlinkLocalFS.testDanglingLinkFilePartQual:124
 � InvalidPath
 
TestSymlinkLocalFSFileSystem>TestSymlinkLocalFS.testDanglingLinkFilePartQual:124
 � InvalidPath

> Missing trimming in readlink in case of protocol
> 
>
>  

[jira] [Assigned] (HADOOP-15629) Missing trimming in readlink in case of protocol

2018-07-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HADOOP-15629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned HADOOP-15629:


Assignee: Giovanni Matteo Fumarola

> Missing trimming in readlink in case of protocol
> 
>
> Key: HADOOP-15629
> URL: https://issues.apache.org/jira/browse/HADOOP-15629
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
>
> When extending the unit tests for the links, we surfaced errors in readLink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15629) Missing trimming in readlink in case of protocol

2018-07-24 Thread JIRA
Íñigo Goiri created HADOOP-15629:


 Summary: Missing trimming in readlink in case of protocol
 Key: HADOOP-15629
 URL: https://issues.apache.org/jira/browse/HADOOP-15629
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Íñigo Goiri


When extending the unit tests for the links, we surfaced errors in readLink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15583) Stabilize S3A Assumed Role support

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15583:

Status: Patch Available  (was: Open)

> Stabilize S3A Assumed Role support
> --
>
> Key: HADOOP-15583
> URL: https://issues.apache.org/jira/browse/HADOOP-15583
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15583-001.patch, HADOOP-15583-002.patch, 
> HADOOP-15583-003.patch
>
>
> started off just on sharing credentials across S3A and S3Guard, but in the 
> process it has grown to becoming one of stabilising the assumed role support 
> so it can be used for more than just testing.
> Was: "S3Guard to get AWS Credential chain from S3AFS; credentials closed() on 
> shutdown"
> h3. Issue: lack of auth chain sharing causes ddb and s3 to get out of sync
> S3Guard builds its DDB auth chain itself, which stops it having to worry 
> about being created standalone vs part of an S3AFS, but it means its 
> authenticators are in a separate chain.
> When you are using short-lived assumed roles or other session credentials 
> updated in the S3A FS authentication chain, you need that same set of 
> credentials picked up by DDB. Otherwise, at best you are doubling load, at 
> worse: the DDB connector may not get refreshed credentials.
> Proposed: {{DynamoDBClientFactory.createDynamoDBClient()}} to take an 
> optional ref to aws credentials. If set: don't create a new set. 
> There's one little complication here: our {{AWSCredentialProviderList}} list 
> is autocloseable; it's close() will go through all children and close them. 
> Apparently the AWS S3 client (And hopefully the DDB client) will close this 
> when they are closed themselves. If DDB  has the same set of credentials as 
> the FS, then there could be trouble if they are closed in one place when the 
> other still wants to use them.
> Solution; have a use count the uses of the credentials list, starting at one: 
> every close() call decrements, and when this hits zero the cleanup is kicked 
> off
> h3. Issue: {{AssumedRoleCredentialProvider}} connector to STS not picking up 
> the s3a connection settings, including proxy.
> h3. issue: we're not using getPassword() to get user/password for proxy 
> binding for STS. Fix: use that and pass down the bucket ref for per-bucket 
> secrets in a JCEKS file.
> h3. Issue; hard to debug what's going wrong :)
> h3. Issue: docs about KMS permissions for SSE-KMS are wrong, and the 
> ITestAssumedRole* tests don't request KMS permissions, so fail in a bucket 
> when the base s3 FS is using SSE-KMS. KMS permissions need to be included in 
> generated profiles



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15583) Stabilize S3A Assumed Role support

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554963#comment-16554963
 ] 

Steve Loughran commented on HADOOP-15583:
-

Patch 003
 * assumed role tests work properly when the unrestricted client is using 
SSE-KMS, that is: they set the permissions up so that restricted RW has the KMS 
RW perms, and restricted RO only has the decrypt perms.
 * contains HADOOP-15627. S3A ITests failing if bucket explicitly set to 
s3guard+DDB
 * better handling and testing of getBucketLocation permissions issues: DDB 
metastore states what the problem is and ITestAssumeRole tests all enable this 
permission
 * Docs cover DDB, getBucketLocation and kms permissions needed

Testing: S3 US-west. Some DDB overload issues as discussed in HADOOP-15426, and 
I rediscovered the "encryption tests fail if you mandate S3-KMS", but 
otherwise: fine

> Stabilize S3A Assumed Role support
> --
>
> Key: HADOOP-15583
> URL: https://issues.apache.org/jira/browse/HADOOP-15583
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15583-001.patch, HADOOP-15583-002.patch, 
> HADOOP-15583-003.patch
>
>
> started off just on sharing credentials across S3A and S3Guard, but in the 
> process it has grown to becoming one of stabilising the assumed role support 
> so it can be used for more than just testing.
> Was: "S3Guard to get AWS Credential chain from S3AFS; credentials closed() on 
> shutdown"
> h3. Issue: lack of auth chain sharing causes ddb and s3 to get out of sync
> S3Guard builds its DDB auth chain itself, which stops it having to worry 
> about being created standalone vs part of an S3AFS, but it means its 
> authenticators are in a separate chain.
> When you are using short-lived assumed roles or other session credentials 
> updated in the S3A FS authentication chain, you need that same set of 
> credentials picked up by DDB. Otherwise, at best you are doubling load, at 
> worse: the DDB connector may not get refreshed credentials.
> Proposed: {{DynamoDBClientFactory.createDynamoDBClient()}} to take an 
> optional ref to aws credentials. If set: don't create a new set. 
> There's one little complication here: our {{AWSCredentialProviderList}} list 
> is autocloseable; it's close() will go through all children and close them. 
> Apparently the AWS S3 client (And hopefully the DDB client) will close this 
> when they are closed themselves. If DDB  has the same set of credentials as 
> the FS, then there could be trouble if they are closed in one place when the 
> other still wants to use them.
> Solution; have a use count the uses of the credentials list, starting at one: 
> every close() call decrements, and when this hits zero the cleanup is kicked 
> off
> h3. Issue: {{AssumedRoleCredentialProvider}} connector to STS not picking up 
> the s3a connection settings, including proxy.
> h3. issue: we're not using getPassword() to get user/password for proxy 
> binding for STS. Fix: use that and pass down the bucket ref for per-bucket 
> secrets in a JCEKS file.
> h3. Issue; hard to debug what's going wrong :)
> h3. Issue: docs about KMS permissions for SSE-KMS are wrong, and the 
> ITestAssumedRole* tests don't request KMS permissions, so fail in a bucket 
> when the base s3 FS is using SSE-KMS. KMS permissions need to be included in 
> generated profiles



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15627) S3A ITests failing if bucket explicitly set to s3guard+DDB

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15627.
-
Resolution: Duplicate

> S3A ITests failing if bucket explicitly set to s3guard+DDB
> --
>
> Key: HADOOP-15627
> URL: https://issues.apache.org/jira/browse/HADOOP-15627
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> Repeatable failure in {{ITestS3GuardWriteBack.testListStatusWriteBack}}
> Possible causes could include
> * test not setting up the three fs instances
> * (disabled) caching not isolating properly
> * something more serious



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13600) S3a rename() to copy files in a directory in parallel

2018-07-24 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554959#comment-16554959
 ] 

genericqa commented on HADOOP-13600:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HADOOP-13600 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13600 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12886680/HADOOP-13600.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14939/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> S3a rename() to copy files in a directory in parallel
> -
>
> Key: HADOOP-13600
> URL: https://issues.apache.org/jira/browse/HADOOP-13600
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HADOOP-13600.001.patch
>
>
> Currently a directory rename does a one-by-one copy, making the request 
> O(files * data). If the copy operations were launched in parallel, the 
> duration of the copy may be reducable to the duration of the longest copy. 
> For a directory with many files, this will be significant



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15583) Stabilize S3A Assumed Role support

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15583:

Attachment: HADOOP-15583-003.patch

> Stabilize S3A Assumed Role support
> --
>
> Key: HADOOP-15583
> URL: https://issues.apache.org/jira/browse/HADOOP-15583
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15583-001.patch, HADOOP-15583-002.patch, 
> HADOOP-15583-003.patch
>
>
> started off just on sharing credentials across S3A and S3Guard, but in the 
> process it has grown to becoming one of stabilising the assumed role support 
> so it can be used for more than just testing.
> Was: "S3Guard to get AWS Credential chain from S3AFS; credentials closed() on 
> shutdown"
> h3. Issue: lack of auth chain sharing causes ddb and s3 to get out of sync
> S3Guard builds its DDB auth chain itself, which stops it having to worry 
> about being created standalone vs part of an S3AFS, but it means its 
> authenticators are in a separate chain.
> When you are using short-lived assumed roles or other session credentials 
> updated in the S3A FS authentication chain, you need that same set of 
> credentials picked up by DDB. Otherwise, at best you are doubling load, at 
> worse: the DDB connector may not get refreshed credentials.
> Proposed: {{DynamoDBClientFactory.createDynamoDBClient()}} to take an 
> optional ref to aws credentials. If set: don't create a new set. 
> There's one little complication here: our {{AWSCredentialProviderList}} list 
> is autocloseable; it's close() will go through all children and close them. 
> Apparently the AWS S3 client (And hopefully the DDB client) will close this 
> when they are closed themselves. If DDB  has the same set of credentials as 
> the FS, then there could be trouble if they are closed in one place when the 
> other still wants to use them.
> Solution; have a use count the uses of the credentials list, starting at one: 
> every close() call decrements, and when this hits zero the cleanup is kicked 
> off
> h3. Issue: {{AssumedRoleCredentialProvider}} connector to STS not picking up 
> the s3a connection settings, including proxy.
> h3. issue: we're not using getPassword() to get user/password for proxy 
> binding for STS. Fix: use that and pass down the bucket ref for per-bucket 
> secrets in a JCEKS file.
> h3. Issue; hard to debug what's going wrong :)
> h3. Issue: docs about KMS permissions for SSE-KMS are wrong, and the 
> ITestAssumedRole* tests don't request KMS permissions, so fail in a bucket 
> when the base s3 FS is using SSE-KMS. KMS permissions need to be included in 
> generated profiles



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14528) s3a encryption tests fail when dest bucket has SSE-KMS enabled

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-14528.
-
Resolution: Won't Fix

something to be aware when you run the tests: if you force encryption on in a 
bucket. disable all encryption tests

> s3a encryption tests fail when dest bucket has SSE-KMS enabled
> --
>
> Key: HADOOP-14528
> URL: https://issues.apache.org/jira/browse/HADOOP-14528
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
> Environment: test bucket with SSE-KMS required
>Reporter: Steve Loughran
>Priority: Minor
>
> When testing against a bucket set up to require SSE-KMS, and with the bucket 
> settings enabling this & providing the key in {{ 
> fs.s3a.server-side-encryption.key}}, some of the encryption tests fail.
> Not sure whether this can/should be fixed, except by saying "disable 
> encryption tests here" that is: don't try to be clever about detecting this 
> conditions and skipping the tests automatically



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14528) s3a encryption tests fail when dest bucket has SSE-KMS enabled

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14528:

Summary: s3a encryption tests fail when dest bucket has SSE-KMS enabled  
(was: s3a encryption tests fail when dest bucket has 
fs.s3a.server-side-encryption.key  set)

> s3a encryption tests fail when dest bucket has SSE-KMS enabled
> --
>
> Key: HADOOP-14528
> URL: https://issues.apache.org/jira/browse/HADOOP-14528
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
> Environment: test bucket with SSE-KMS required
>Reporter: Steve Loughran
>Priority: Minor
>
> When testing against a bucket set up to require SSE-KMS, and with the bucket 
> settings enabling this & providing the key in {{ 
> fs.s3a.server-side-encryption.key}}, some of the encryption tests fail.
> Not sure whether this can/should be fixed, except by saying "disable 
> encryption tests here" that is: don't try to be clever about detecting this 
> conditions and skipping the tests automatically



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13853) S3ADataBlocks.DiskBlock to lazy create dest file for faster 0-byte puts

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13853.
-
Resolution: Won't Fix

for s3 io, the local FS isn't normally the bottleneck

> S3ADataBlocks.DiskBlock to lazy create dest file for faster 0-byte puts
> ---
>
> Key: HADOOP-13853
> URL: https://issues.apache.org/jira/browse/HADOOP-13853
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> Looking at traces of work, there's invariably a PUT of a _SUCCESS at the end, 
> which, with disk output, adds the overhead of creating, writing to and then 
> reading a 0 byte file.
> With a lazy create, the creation could be postponed until the first write, 
> with special handling in the {{startUpload()}} operation to return a null 
> stream, rather than reopen the file. Saves on some disk IO: create, read, 
> delete



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-12949) Add HTrace to the s3a connector

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-12949.
-
Resolution: Won't Fix

closing as WONTFIX as htrace is being pulled from hadoop. Once we've settled on 
a new trace framework, we can create a new piece of work around this

> Add HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Madhawa Gunasekara
>Assignee: Madhawa Gunasekara
>Priority: Major
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554938#comment-16554938
 ] 

Steve Loughran commented on HADOOP-15426:
-

* retry sleep time needs to be bigger/configurable, as it clearly doesn't 
recover
* list/get calls need to retry too
* and the batch write calls should wrap in retry logic to handle a transient IO 
problem

> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-15426:
---

Assignee: Steve Loughran

> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554923#comment-16554923
 ] 

Steve Loughran commented on HADOOP-15349:
-

I'm pleased to say I can now trigger DDB overloads, and the new message is 
being printed
{code}
[ERROR] 
testFakeDirectoryDeletion(org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost)  
Time elapsed: 32.643 s  <<< ERROR!
java.io.IOException: Max retries exceeded (5) for DynamoDB. This may be because 
write threshold of DynamoDB is set too low.
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:693)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:672)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$move$4(DynamoDBMetadataStore.java:625)
at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:127)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:125)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:624)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:1072)
at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:862)
at 
org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:299)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}


> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15349.001.patch, failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554903#comment-16554903
 ] 

Eric Yang commented on HADOOP-15593:


[~xiaochen] Good catch on the logic.  I think your suggestion to check 
isDestroy to stop the renew thread make sense.  Patch 004 is incomplete.  If a 
tgt is destroyed, it can not be renewed.  This will stop the renew thread:

{code}
  if (now > nextRefresh) {
LOG.error("TGT is expired. Aborting renew thread for {}.",
getUserName());
return;
  }
{code}

This part of code needs to be removed for the renewal thread to retry.

> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: https://issues.apache.org/jira/browse/HADOOP-15593
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Blocker
> Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, 
> HADOOP-15593.003.patch, HADOOP-15593.004.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for f...@example.com,5,main]
> java.lang.NullPointerException
> at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
> at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
> at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15544) ABFS: validate packing, transient classpath, hadoop fs CLI

2018-07-24 Thread Da Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554894#comment-16554894
 ] 

Da Zhou commented on HADOOP-15544:
--

Thank you [~ste...@apache.org] for sharing the steps.  At my end I can run the 
hadoop cmd without issue.

What I did: I followed the steps you shared, then append *$HADOOP_CLASSPATH* 
with the required jars path (azure jar path and 
/hadoop-dist/target/hadoop-3.2.0-SNAPSHOT/share/hadoop/tools/lib/*).

After that I tried WASB:
{code:java}
./bin/hadoop fs -ls 
wasb://TEST_CONTAINER_NAME@TEST_ACCOUNT.blob.core.windows.net/{code}
Then tried ABFS:
{code:java}
./bin/hadoop fs -ls 
abfs://TEST_CONTAINER_NAME@TEST_ACCOUNT.dfs.core.windows.net/{code}
Both can return the results successfully.

Could you share your setting for core-site.xml  and the failure message  with 
me? I guess it might be related to the configuration.

Thanks,
Da

> ABFS: validate packing, transient classpath, hadoop fs CLI
> --
>
> Key: HADOOP-15544
> URL: https://issues.apache.org/jira/browse/HADOOP-15544
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: dependencies.txt
>
>
> Validate the packaging and dependencies of ABFS
> * hadoop-cloud-storage artifact to export everything needed
> * {{hadoop fs -ls abfs://path}} to work in ASF distributions
> * check transient CP (e.g spark)
> Spark master;s hadoop-cloud module depends on hadoop-cloud-storage if you 
> build with the hadoop-3.1 profile, so it should automatically get in there. 
> Just need to check that it picks it up too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15612) Improve exception when tfile fails to load LzoCodec

2018-07-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554887#comment-16554887
 ] 

Hudson commented on HADOOP-15612:
-

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14630 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14630/])
HADOOP-15612. Improve exception when tfile fails to load LzoCodec. (gera: rev 
6bec03cfc8bdcf6aa3df9c22231ab959ba31f2f5)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/file/tfile/Compression.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/file/tfile/TestCompression.java


> Improve exception when tfile fails to load LzoCodec 
> 
>
> Key: HADOOP-15612
> URL: https://issues.apache.org/jira/browse/HADOOP-15612
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Major
> Attachments: HADOOP-15612.001.patch, HADOOP-15612.002.patch, 
> HADOOP-15612.003.patch
>
>
> When hadoop-lzo is not on classpath you get
> {code:java}
> java.io.IOException: LZO codec class not specified. Did you forget to set 
> property io.compression.codec.lzo.class?{code}
> which is probably rarely the real cause given the default class name. The 
> real root cause is not attached to the exception thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15572) Test S3Guard ops with assumed roles & verify required permissions

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15572.
-
Resolution: Duplicate

> Test S3Guard ops with assumed roles & verify required permissions
> -
>
> Key: HADOOP-15572
> URL: https://issues.apache.org/jira/browse/HADOOP-15572
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> We haven't documented permissions for S3Guard (WiP of mine); when I try to 
> test using the AssumedRoleCredentialProvider & a role nominally restricted to 
> R/W of S3guard *but not create/delete*, I can still create and destroy buckets
> Either I've got my list wrong, or how S3Guard sets up its auth isn't right & 
> somehow falling back to the full role



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15426:

Description: 
managed to create on a parallel test run
{code}
org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: 
The level of configured provisioned throughput for the table was exceeded. 
Consider increasing your provisioning level with the UpdateTable API. (Service: 
AmazonDynamoDBv2; Status Code: 400; Error Code: 
ProvisionedThroughputExceededException; Request ID: 
RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of configured 
provisioned throughput for the table was exceeded. Consider increasing your 
provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status 
Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: 
RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
at 

{code}

We should be able to handle this. 400 "bad things happened" error though, not 
the 503 from S3.

h3. We need a retry handler for DDB throttle operations

  was:
managed to create on a parallel test run
{code}
org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: 
The level of configured provisioned throughput for the table was exceeded. 
Consider increasing your provisioning level with the UpdateTable API. (Service: 
AmazonDynamoDBv2; Status Code: 400; Error Code: 
ProvisionedThroughputExceededException; Request ID: 
RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of configured 
provisioned throughput for the table was exceeded. Consider increasing your 
provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status 
Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: 
RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
at 

{code}

We should be able to handle this. 400 "bad things happened" error though, not 
the 503 from S3.


> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554863#comment-16554863
 ] 

Steve Loughran commented on HADOOP-15426:
-

Screenshot of the state of play; this from an mvn integration test with 
parallelism of 6. Autoscale kicks in, but not enough to stop throttle events 
coming back. Repeatable.

Happens against US-west from  a laptop in our sunnyvale office; 1+Gbps link; 
traceroute to AWS S3 says 20 hobs but a latency of < 3 millis
{code}
20  s3-us-west-1-w.amazonaws.com (54.231.237.43)  2.319 ms  2.175 ms  2.140 ms
{code}

The reason I'm seeing this now is probably that my test system is so close to 
the DDB and S3 stores that its overloading, whereas testing from the UK for AWS 
ireland we've got a tangible RTT

> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15604) Test if the unprocessed items in S3Guard DDB metadata store caused by I/O thresholds

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554858#comment-16554858
 ] 

Steve Loughran commented on HADOOP-15604:
-

IS this happening in the S3A committer?

> Test if the unprocessed items in S3Guard DDB metadata store caused by I/O 
> thresholds
> 
>
> Key: HADOOP-15604
> URL: https://issues.apache.org/jira/browse/HADOOP-15604
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> When there are ~50 files being committed; each in their own thread from the 
> commit pool; probably the DDB repo is being overloaded just from one single 
> process doing task commit. We should be backing off more, especially given 
> that failing on a write could potentially leave the store inconsistent with 
> the FS (renames, etc)
> It would be nice to have some tests to prove that the I/O thresholds are the 
> reason for unprocessed items in DynamoDB metadata store



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15426:

Attachment: Screen Shot 2018-07-24 at 15.16.46.png

> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: Screen Shot 2018-07-24 at 15.16.46.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554847#comment-16554847
 ] 

Steve Loughran commented on HADOOP-15426:
-

Managed to recreate this in a parallel test run with bucket capacity = 5, but 
autoscale set to 100. 

This means 
# yes, you can overload a bucket in getFileStatus
# any claims that the client retries is observably false
# autoscale isn't that responsive

{code}
isioned throughput for the table was exceeded. Consider increasing your 
provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status 
Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: 
M1D4K4KTA5VCMQ82HRG14SRPE7VV4KQNSO5AEMVJF66Q9ASUAAJG)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateDynamoDBException(S3AUtils.java:397)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:192)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.get(DynamoDBMetadataStore.java:474)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2112)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2090)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2054)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:2009)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2326)
at 
org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
at 
org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193)
at 
org.apache.hadoop.fs.contract.AbstractContractSeekTest.setup(AbstractContractSeekTest.java:56)
at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: 
com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: 
The level of configured provisioned throughput for the table was exceeded. 
Consider increasing your provisioning level with the UpdateTable API. (Service: 
AmazonDynamoDBv2; Status Code: 400; Error Code: 
ProvisionedThroughputExceededException; Request ID: 
M1D4K4KTA5VCMQ82HRG14SRPE7VV4KQNSO5AEMVJF66Q9ASUAAJG)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2925)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2901)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeGetItem(AmazonDynamoDBClient.java:1640)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.getItem(AmazonDynamoDBClient.java:1616)
at 
com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.doLoadItem(GetItemImpl.java:77)
at 
com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.getItem(GetItemImpl.java:66)
at 
com.amazonaws.services.dynamodbv2.document.Table.getItem(Table.java:608)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.getConsistentItem(DynamoDBMetadataStore.java:459)
at 

[jira] [Commented] (HADOOP-14927) ITestS3GuardTool failures in testDestroyNoBucket()

2018-07-24 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554846#comment-16554846
 ] 

genericqa commented on HADOOP-14927:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
30s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HADOOP-14927 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910491/HADOOP-14927.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 75d5fabd69fa 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ea2c6c8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14938/testReport/ |
| Max. process+thread count | 333 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14938/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> ITestS3GuardTool failures in testDestroyNoBucket()
> --
>
> Key: HADOOP-14927
> URL: 

[jira] [Updated] (HADOOP-15426) S3guard throttle events => 400 error code => exception

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15426:

Summary: S3guard throttle events => 400 error code => exception  (was: 
S3guard throttle event on delete => 400 error code => exception)

> S3guard throttle events => 400 error code => exception
> --
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Priority: Major
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15583) Stabilize S3A Assumed Role support

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15583:

Description: 
started off just on sharing credentials across S3A and S3Guard, but in the 
process it has grown to becoming one of stabilising the assumed role support so 
it can be used for more than just testing.

Was: "S3Guard to get AWS Credential chain from S3AFS; credentials closed() on 
shutdown"


h3. Issue: lack of auth chain sharing causes ddb and s3 to get out of sync

S3Guard builds its DDB auth chain itself, which stops it having to worry about 
being created standalone vs part of an S3AFS, but it means its authenticators 
are in a separate chain.

When you are using short-lived assumed roles or other session credentials 
updated in the S3A FS authentication chain, you need that same set of 
credentials picked up by DDB. Otherwise, at best you are doubling load, at 
worse: the DDB connector may not get refreshed credentials.

Proposed: {{DynamoDBClientFactory.createDynamoDBClient()}} to take an optional 
ref to aws credentials. If set: don't create a new set. 

There's one little complication here: our {{AWSCredentialProviderList}} list is 
autocloseable; it's close() will go through all children and close them. 
Apparently the AWS S3 client (And hopefully the DDB client) will close this 
when they are closed themselves. If DDB  has the same set of credentials as the 
FS, then there could be trouble if they are closed in one place when the other 
still wants to use them.

Solution; have a use count the uses of the credentials list, starting at one: 
every close() call decrements, and when this hits zero the cleanup is kicked off

h3. Issue: {{AssumedRoleCredentialProvider}} connector to STS not picking up 
the s3a connection settings, including proxy.

h3. issue: we're not using getPassword() to get user/password for proxy binding 
for STS. Fix: use that and pass down the bucket ref for per-bucket secrets in a 
JCEKS file.

h3. Issue; hard to debug what's going wrong :)

h3. Issue: docs about KMS permissions for SSE-KMS are wrong, and the 
ITestAssumedRole* tests don't request KMS permissions, so fail in a bucket when 
the base s3 FS is using SSE-KMS. KMS permissions need to be included in 
generated profiles

  was:
started off just on sharing credentials across S3A and S3Guard, but in the 
process it has grown to becoming one of stabilising the assumed role support so 
it can be used for more than just testing.

Was: "S3Guard to get AWS Credential chain from S3AFS; credentials closed() on 
shutdown"


h3. Issue: lack of auth chain sharing causes ddb and s3 to get out of sync

S3Guard builds its DDB auth chain itself, which stops it having to worry about 
being created standalone vs part of an S3AFS, but it means its authenticators 
are in a separate chain.

When you are using short-lived assumed roles or other session credentials 
updated in the S3A FS authentication chain, you need that same set of 
credentials picked up by DDB. Otherwise, at best you are doubling load, at 
worse: the DDB connector may not get refreshed credentials.

Proposed: {{DynamoDBClientFactory.createDynamoDBClient()}} to take an optional 
ref to aws credentials. If set: don't create a new set. 

There's one little complication here: our {{AWSCredentialProviderList}} list is 
autocloseable; it's close() will go through all children and close them. 
Apparently the AWS S3 client (And hopefully the DDB client) will close this 
when they are closed themselves. If DDB  has the same set of credentials as the 
FS, then there could be trouble if they are closed in one place when the other 
still wants to use them.

Solution; have a use count the uses of the credentials list, starting at one: 
every close() call decrements, and when this hits zero the cleanup is kicked off

h3. Issue: {{AssumedRoleCredentialProvider}} connector to STS not picking up 
the s3a connection settings, including proxy.

h3. issue: we're not using getPassword() to get user/password for proxy binding 
for STS. Fix: use that and pass down the bucket ref for per-bucket secrets in a 
JCEKS file.

h3. Issue; hard to debug what's going wrong :)


> Stabilize S3A Assumed Role support
> --
>
> Key: HADOOP-15583
> URL: https://issues.apache.org/jira/browse/HADOOP-15583
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15583-001.patch, HADOOP-15583-002.patch
>
>
> started off just on sharing credentials across S3A and S3Guard, but in the 
> process it has grown to becoming one of stabilising the assumed role support 
> so it can be used for more than just testing.
> Was: "S3Guard to 

[jira] [Commented] (HADOOP-13230) s3a's use of fake empty directory blobs does not interoperate with other s3 tools

2018-07-24 Thread Steve Jacobs (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554808#comment-16554808
 ] 

Steve Jacobs commented on HADOOP-13230:
---

Could this be implemented replacing the HEAD request for the fakedir entry with 
a listObjects call? That would be the same number of api calls in the 'empty 
fakeDir' case, but no more work in the populated directory case.

 

Recently I ran into this issue using PRESTO to insert into hive partitions. 
Presto does not use the S3a driver, and does not delete the fakedir objects. 

> s3a's use of fake empty directory blobs does not interoperate with other s3 
> tools
> -
>
> Key: HADOOP-13230
> URL: https://issues.apache.org/jira/browse/HADOOP-13230
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Aaron Fabbri
>Priority: Major
>
> Users of s3a may not realize that, in some cases, it does not interoperate 
> well with other s3 tools, such as the AWS CLI.  (See HIVE-13778, IMPALA-3558).
> Specifically, if a user:
> - Creates an empty directory with hadoop fs -mkdir s3a://bucket/path
> - Copies data into that directory via another tool, i.e. aws cli.
> - Tries to access the data in that directory with any Hadoop software.
> Then the last step fails because the fake empty directory blob that s3a wrote 
> in the first step, causes s3a (listStatus() etc.) to continue to treat that 
> directory as empty, even though the second step was supposed to populate the 
> directory with data.
> I wanted to document this fact for users. We may mark this as not-fix, "by 
> design".. May also be interesting to brainstorm solutions and/or a config 
> option to change the behavior if folks care.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15583) Stabilize S3A Assumed Role support

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554801#comment-16554801
 ] 

Steve Loughran commented on HADOOP-15583:
-

Stack if your role doesn't have access to the SSE-KMS key used in the test 
configuration. Tests need to make sure assume role created has valid KMS access

{code}

[ERROR] 
testPartialDeleteSingleDelete(org.apache.hadoop.fs.s3a.auth.ITestAssumeRole)  
Time elapsed: 1.888 s  <<< FAILURE!
java.lang.AssertionError
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
at 
java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
at 
java.util.stream.ForEachOps$ForEachOp$OfInt.evaluateParallel(ForEachOps.java:189)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.IntPipeline.forEach(IntPipeline.java:404)
at java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:560)
at 
org.apache.hadoop.fs.s3a.auth.ITestAssumeRole.touchFiles(ITestAssumeRole.java:608)
at 
org.apache.hadoop.fs.s3a.auth.ITestAssumeRole.executePartialDelete(ITestAssumeRole.java:776)
at 
org.apache.hadoop.fs.s3a.auth.ITestAssumeRole.testPartialDeleteSingleDelete(ITestAssumeRole.java:748)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.lang.AssertionError: java.nio.file.AccessDeniedException: 
fork-0001/test/testPartialDeleteSingleDelete/file-10: put on 
fork-0001/test/testPartialDeleteSingleDelete/file-10: 
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: 
Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 
0EFFB121C9AB6F2D; S3 Extended Request ID: 
vbk+mO9d1DS/Rs6HpacmadZHC/M9zlioTGoqATbkg7bfd8DLKuMBjUm1OytRFMdPJSbvl85qbSQ=), 
S3 Extended Request ID: 
vbk+mO9d1DS/Rs6HpacmadZHC/M9zlioTGoqATbkg7bfd8DLKuMBjUm1OytRFMdPJSbvl85qbSQ=:AccessDenied
at org.apache.hadoop.test.LambdaTestUtils.eval(LambdaTestUtils.java:644)
at 
org.apache.hadoop.fs.s3a.auth.ITestAssumeRole.lambda$touchFiles$11(ITestAssumeRole.java:609)
at 
java.util.stream.ForEachOps$ForEachOp$OfInt.accept(ForEachOps.java:205)
at 
java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:114)
at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: java.nio.file.AccessDeniedException: 
fork-0001/test/testPartialDeleteSingleDelete/file-10: put on 
fork-0001/test/testPartialDeleteSingleDelete/file-10: 
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: 
Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 
0EFFB121C9AB6F2D; S3 Extended Request ID: 
vbk+mO9d1DS/Rs6HpacmadZHC/M9zlioTGoqATbkg7bfd8DLKuMBjUm1OytRFMdPJSbvl85qbSQ=), 
S3 Extended 

[jira] [Commented] (HADOOP-14212) Expose SecurityEnabled boolean field in JMX for other services besides NameNode

2018-07-24 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554795#comment-16554795
 ] 

genericqa commented on HADOOP-14212:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 32m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 38m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 30m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 42s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
23s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m  1s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}346m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.server.datanode.TestDataNodeMXBean |
|   | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.hdfs.server.namenode.TestFSImageWithXAttr |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestDominantResourceFairnessPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HADOOP-14212 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932899/HADOOP-14212.006.patch
 |
| Optional Tests |  

[jira] [Updated] (HADOOP-13936) S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13936:

Target Version/s: 3.2.0

> S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation
> -
>
> Key: HADOOP-13936
> URL: https://issues.apache.org/jira/browse/HADOOP-13936
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1, 3.1.0, 3.1.1
>Reporter: Rajesh Balamohan
>Assignee: Steve Loughran
>Priority: Blocker
>
> As a part of {{S3AFileSystem.delete}} operation {{innerDelete}} is invoked, 
> which deletes keys from S3 in batches (default is 1000). But DynamoDB is 
> updated only at the end of this operation. This can cause issues when 
> deleting large number of keys. 
> E.g, it is possible to get exception after deleting 1000 keys and in such 
> cases dynamoDB would not be updated. This can cause DynamoDB to go out of 
> sync. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13936) S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13936:

Affects Version/s: 3.1.1
   3.1.0

> S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation
> -
>
> Key: HADOOP-13936
> URL: https://issues.apache.org/jira/browse/HADOOP-13936
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1, 3.1.0, 3.1.1
>Reporter: Rajesh Balamohan
>Assignee: Steve Loughran
>Priority: Blocker
>
> As a part of {{S3AFileSystem.delete}} operation {{innerDelete}} is invoked, 
> which deletes keys from S3 in batches (default is 1000). But DynamoDB is 
> updated only at the end of this operation. This can cause issues when 
> deleting large number of keys. 
> E.g, it is possible to get exception after deleting 1000 keys and in such 
> cases dynamoDB would not be updated. This can cause DynamoDB to go out of 
> sync. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14927) ITestS3GuardTool failures in testDestroyNoBucket()

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554790#comment-16554790
 ] 

Steve Loughran commented on HADOOP-14927:
-

OK: so its related to the region you are running tests against?

> ITestS3GuardTool failures in testDestroyNoBucket()
> --
>
> Key: HADOOP-14927
> URL: https://issues.apache.org/jira/browse/HADOOP-14927
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1, 3.0.0-alpha3, 3.1.0
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
> Attachments: HADOOP-14927.001.patch
>
>
> Hit this when testing for the Hadoop 3.0.0-beta1 RC0.
> {noformat}
> hadoop-3.0.0-beta1-src/hadoop-tools/hadoop-aws$ mvn clean verify 
> -Dit.test="ITestS3GuardTool*" -Dtest=none -Ds3guard -Ddynamo
> ...
> Failed tests: 
>   
> ITestS3GuardToolDynamoDB>AbstractS3GuardToolTestBase.testDestroyNoBucket:228 
> Expected an exception, got 0
>   ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testDestroyNoBucket:228 
> Expected an exception, got 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15628) S3A Filesystem does not check return from AmazonS3Client deleteObjects

2018-07-24 Thread Steve Jacobs (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554789#comment-16554789
 ] 

Steve Jacobs commented on HADOOP-15628:
---

We have an in house object store that has a bug related to Multi-Deletes where 
if the ACCESS_KEY doesn't own the bucket, the multi-delete ALWAYS fails. I 
checked the AWS docs and found that the response codes coming back were 
correct, and the xml was as well, it just wasn't being parsed by hdfs. (And its 
not the only tool not checking this either. S3CMD doesn't either). I would 
imaging you can reproduce with bucket / IAM policies as well but I haven't done 
so yet.

I'm currently running on hadoop 3.0.2 on the system I'm reproducing this on. 
Roger on not looking at the current rev, I'll work on getting a 3.1 install set 
up to test with. I checked everything except 3.1 and saw the same behavior, bad 
assumption on my part. 

Unfortunately due to the fact that I'm using a custom object store, S3guard 
isn't an option for me. (supposedly this store is strongly consistent though, 
so hopefully that won't cause me too much pain). 

I'll work on reproducing this on 3.1.

I'm also having fakeDir related issues and I'm aware of HADOOP-13230 . Presto 
doesn't clean fakeDir files up. It's just made tracking down delete related 
issues very confusing. 

> S3A Filesystem does not check return from AmazonS3Client deleteObjects
> --
>
> Key: HADOOP-15628
> URL: https://issues.apache.org/jira/browse/HADOOP-15628
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.9.1, 2.8.4, 3.1.1, 3.0.3
> Environment: Hadoop 3.0.2 / Hadoop 2.8.3
> Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
>Reporter: Steve Jacobs
>Assignee: Steve Loughran
>Priority: Minor
>
> Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api 
> do not check to see if all objects have been succesfully delete. In the event 
> of a failure, the api will still return a 200 OK (which isn't checked 
> currently):
> [Delete Code from Hadoop 
> 2.8|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
>  
> {code:java}
> if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
> DeleteObjectsRequest deleteRequest =
> new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
> s3.deleteObjects(deleteRequest);
> statistics.incrementWriteOps(1);
> keysToDelete.clear();
> }
> {code}
> This should be converted to use the DeleteObjectsResult class from the 
> S3Client: 
> [Amazon Code 
> Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
> {code:java}
> // Verify that the objects were deleted successfully.
> DeleteObjectsResult delObjRes = 
> s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
> delObjRes.getDeletedObjects().size();
> System.out.println(successfulDeletes + " objects successfully deleted.");
> {code}
> Bucket policies can be misconfigured, and deletes will fail without warning 
> by S3A clients.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15627) S3A ITests failing if bucket explicitly set to s3guard+DDB

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554734#comment-16554734
 ] 

Steve Loughran commented on HADOOP-15627:
-

Fixing in the HADOOP-15583 branch, as that's where I'm seeing these. It's 
coincidental, but

> S3A ITests failing if bucket explicitly set to s3guard+DDB
> --
>
> Key: HADOOP-15627
> URL: https://issues.apache.org/jira/browse/HADOOP-15627
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> Repeatable failure in {{ITestS3GuardWriteBack.testListStatusWriteBack}}
> Possible causes could include
> * test not setting up the three fs instances
> * (disabled) caching not isolating properly
> * something more serious



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13649:

Parent: HADOOP-15226  (was: HADOOP-15619)

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch, 
> HADOOP-13649.003.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13756) LocalMetadataStore#put(DirListingMetadata) should also put file metadata into fileHash.

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13756:

Parent: HADOOP-15226  (was: HADOOP-15619)

> LocalMetadataStore#put(DirListingMetadata) should also put file metadata into 
> fileHash.
> ---
>
> Key: HADOOP-13756
> URL: https://issues.apache.org/jira/browse/HADOOP-13756
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-13756.001.patch
>
>
> {{LocalMetadataStore#put(DirListingMetadata)}} only puts the metadata into 
> {{dirHash}}, thus all {{FileStatus}} s are missing from 
> {{LocalMedataStore#fileHash()}}, which makes it confuse to use.
> So in the current way, to correctly put file status into the store (and also 
> set {{authoriative}} flag), you need to run  {code}
> List metas = new ArrayList();
> boolean authorizative = true;
> for (S3AFileStatus status : files) {
>PathMetadata meta = new PathMetadata(status);
>store.put(meta);
> }
> DirListingMetadata dirMeta = new DirMeta(parent, metas, authorizative);
> store.put(dirMeta);
> {code}
> Since solely calling {{store.put(dirMeta)}} is not correct, and calling 
> {{store.put(dirMeta);}} after putting all sub-file {{FileStatuss}} does the 
> repetitive jobs. Can we just use a {{put(PathMetadata)}} and a 
> {{get/setAuthorative()}}   in the MetadataStore interface instead?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15363) (transient) ITestS3AInconsistency.testOpenFailOnRead S3Guard failure

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554731#comment-16554731
 ] 

Steve Loughran commented on HADOOP-15363:
-

Not seen this for a while; feel free to move to the 3.3 task list

> (transient) ITestS3AInconsistency.testOpenFailOnRead S3Guard failure
> 
>
> Key: HADOOP-15363
> URL: https://issues.apache.org/jira/browse/HADOOP-15363
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Priority: Major
>
> Test failure
> {code}
>   ITestS3AInconsistency.testOpenFailOnRead:162->doOpenFailOnReadTest:185 
> S3Guard failed to handle fail-on-read
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15034) S3Guard instrumentation to include cost of DynamoDB ops as metric

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15034:

Parent: HADOOP-15619  (was: HADOOP-15226)

> S3Guard instrumentation to include cost of DynamoDB ops as metric
> -
>
> Key: HADOOP-15034
> URL: https://issues.apache.org/jira/browse/HADOOP-15034
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Minor
>
> DynamoDB ops can return the cost of the operation in {{ConsumedCapacity}} 
> methods.
> by switching to the API calls which include this in the results are used in 
> {{DynamoDBMetadataStore}}, then we could provide live/aggregate stats on IO 
> capacity used. This could aid in live monitoring S3Guard load, and help 
> assess the cost of queries



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14576) s3guard DynamoDB resource not found: tables not ACTIVE state after initial connection

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14576:

Parent: HADOOP-15619  (was: HADOOP-15226)

> s3guard DynamoDB resource not found: tables not ACTIVE state after initial 
> connection
> -
>
> Key: HADOOP-14576
> URL: https://issues.apache.org/jira/browse/HADOOP-14576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Sean Mackrory
>Priority: Major
>
> We currently only anticipate tables not being in the ACTIVE state when first 
> connecting. It is possible for a table to be in the ACTIVE state and move to 
> an UPDATING state during partitioning events. Attempts to read or write 
> during that time will result in an AmazonServerException getting thrown. We 
> should try to handle that better...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15573) s3guard set-capacity to not retry on an access denied exception

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15573:

Parent: HADOOP-15619  (was: HADOOP-15226)

> s3guard set-capacity to not retry on an access denied exception
> ---
>
> Key: HADOOP-15573
> URL: https://issues.apache.org/jira/browse/HADOOP-15573
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>Priority: Minor
>
> when you call {{hadoop s3guard set-capacity}} with restricted access, you are 
> (correctly) blocked by AWS, but the client keeps retrying. It should fail 
> fast on a 400/AccessDenied



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15563) s3guard init and set-capacity to support DDB autoscaling

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15563:

Parent: HADOOP-15619  (was: HADOOP-15226)

> s3guard init and set-capacity to support DDB autoscaling
> 
>
> Key: HADOOP-15563
> URL: https://issues.apache.org/jira/browse/HADOOP-15563
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Minor
>
> To keep costs down on DDB, autoscaling is a key feature: you set the max 
> values and when idle, you don't get billed, *at the cost of delayed scale 
> time and risk of not getting the max value when AWS is busy*
> It can be done from the AWS web UI, but not in the s3guard init and 
> set-capacity calls
> It can be done [through the 
> API|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AutoScaling.HowTo.SDK.html]
> Usual issues then: wiring up, CLI params, testing. It'll be hard to test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13980) S3Guard CLI: Add fsck check command

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13980:

Parent: HADOOP-15619  (was: HADOOP-15226)

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14335) Improve DynamoDB schema update story

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14335:

Parent: HADOOP-15619  (was: HADOOP-15226)

> Improve DynamoDB schema update story
> 
>
> Key: HADOOP-14335
> URL: https://issues.apache.org/jira/browse/HADOOP-14335
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
>
> On HADOOP-13760 I'm realizing that changes to the DynamoDB schema aren't 
> great to deal with. Currently a build of Hadoop is hard-coded to a specific 
> schema version. So if you upgrade from one to the next you have to upgrade 
> everything (and then update the version in the table - which we don't have a 
> tool or document for) before you can keep using S3Guard. We could possibly 
> also make the definition of compatibility a bit more flexible, but it's going 
> to be very tough to do that without knowing what kind of future schema 
> changes we might want ahead of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14585) Ensure controls in-place to prevent clients with significant clock skews pruning aggressively

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14585:

Parent: HADOOP-15619  (was: HADOOP-15226)

> Ensure controls in-place to prevent clients with significant clock skews 
> pruning aggressively
> -
>
> Key: HADOOP-14585
> URL: https://issues.apache.org/jira/browse/HADOOP-14585
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Sean Mackrory
>Priority: Minor
>
> From discussion on HADOOP-14499:
> {quote}
> bear in mind that we can't guarantee that the clocks of all clients are in 
> sync; you don't want a client whose TZ setting is wrong to aggressively prune 
> things. Had that happen in production with files in shared filestore. This is 
> why ant -diagnostics checks time consistency with temp files...
> {quote}
> {quote}
> temp files work on a shared FS. AWS is actually somewhat sensitive to clocks: 
> if your VM is too far out of time then auth actually fails, its ~+-15 
> minutes. There's some stuff in the Java SDK to actually calculate and adjust 
> clock skew, presumably parsing the timestamp of a failure, calculating the 
> difference and retrying. Which means that the field in SDKGlobalConfiguration 
> could help identify the difference between local time and AWS time.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14000) s3guard metadata stores to support millons of children

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14000:

Parent: HADOOP-15619  (was: HADOOP-15226)

> s3guard metadata stores to support millons of children
> --
>
> Key: HADOOP-14000
> URL: https://issues.apache.org/jira/browse/HADOOP-14000
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Priority: Major
>
> S3 repos can have millions of child entries
> Currently {{DirListingMetaData}} can't and {{MetadataStore.listChildren(Path 
> path)}} won't be able to handle directories that big, for listing, deleting 
> or naming.
> We will need a paged response from the listing operation, something which can 
> be iterated over.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15193) add bulk delete call to metastore API & DDB impl

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15193:

Parent: HADOOP-15619  (was: HADOOP-15226)

> add bulk delete call to metastore API & DDB impl
> 
>
> Key: HADOOP-15193
> URL: https://issues.apache.org/jira/browse/HADOOP-15193
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> recursive dir delete (and any future bulk delete API like HADOOP-15191) 
> benefits from using the DDB bulk table delete call, which takes a list of 
> deletes and executes. Hopefully this will offer better perf. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13843) S3Guard, MetadataStore to support atomic create(path, overwrite=false)

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13843:

Parent: HADOOP-15619  (was: HADOOP-15226)

> S3Guard, MetadataStore to support atomic create(path, overwrite=false)
> --
>
> Key: HADOOP-13843
> URL: https://issues.apache.org/jira/browse/HADOOP-13843
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Priority: Major
>
> Support atomically enforced file creation. Current s3a can do a check in 
> create() and fail if there is something there, but a new entry only gets 
> created at the end of the PUT; during the entire interval between that check 
> and the close() of the stream, there's nothing to stop other callers creating 
> an object.
> Proposed: s3afs can do a check + create a 0 byte file at the path; that'd 
> need some {{putNoOverwrite(DirListingMetadata)}} call in MetadataStore, 
> followed by a PUT of an 0-byte file to S3. That will increase cost of file 
> creation, though at least with the MD store, the cost of the initial 
> getFileStatus() check is down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14109) improvements to S3GuardTool destroy command

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14109:

Parent: HADOOP-15619  (was: HADOOP-15226)

> improvements to S3GuardTool destroy command
> ---
>
> Key: HADOOP-14109
> URL: https://issues.apache.org/jira/browse/HADOOP-14109
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Minor
>
> The S3GuardTool destroy operation initializes dynamoDB, and in doing so has 
> some issues
> # if the version of the table is incompatible, init fails, so table isn't 
> deleteable
> # if the system is configured to create the table on demand, then whenever 
> destroy is called for a table that doesn't exist, it gets created and then 
> destroyed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13454) S3Guard: Provide custom FileSystem Statistics.

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13454:

Parent: HADOOP-15619  (was: HADOOP-15226)

> S3Guard: Provide custom FileSystem Statistics.
> --
>
> Key: HADOOP-13454
> URL: https://issues.apache.org/jira/browse/HADOOP-13454
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha3
>Reporter: Chris Nauroth
>Priority: Major
>
> Provide custom {{FileSystem}} {{Statistics}} with information about the 
> internal operational details of S3Guard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14425) Add more s3guard metrics

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14425:

Parent: HADOOP-15619  (was: HADOOP-15226)

> Add more s3guard metrics
> 
>
> Key: HADOOP-14425
> URL: https://issues.apache.org/jira/browse/HADOOP-14425
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Ai Deng
>Priority: Major
>
> The metrics suggested to add:
> Status:
> S3GUARD_METADATASTORE_ENABLED
> S3GUARD_METADATASTORE_IS_AUTHORITATIVE
> Operations:
> S3GUARD_METADATASTORE_INITIALIZATION
> S3GUARD_METADATASTORE_DELETE_PATH
> S3GUARD_METADATASTORE_DELETE_PATH_LATENCY
> S3GUARD_METADATASTORE_DELETE_SUBTREE_PATCH
> S3GUARD_METADATASTORE_GET_PATH
> S3GUARD_METADATASTORE_GET_PATH_LATENCY
> S3GUARD_METADATASTORE_GET_CHILDREN_PATH
> S3GUARD_METADATASTORE_GET_CHILDREN_PATH_LATENCY
> S3GUARD_METADATASTORE_MOVE_PATH
> S3GUARD_METADATASTORE_PUT_PATH
> S3GUARD_METADATASTORE_PUT_PATH_LATENCY
> S3GUARD_METADATASTORE_CLOSE
> S3GUARD_METADATASTORE_DESTORY
> From S3Guard:
> S3GUARD_METADATASTORE_MERGE_DIRECTORY
> For the failures:
> S3GUARD_METADATASTORE_DELETE_FAILURE
> S3GUARD_METADATASTORE_GET_FAILURE
> S3GUARD_METADATASTORE_PUT_FAILURE
> Etc:
> S3GUARD_METADATASTORE_PUT_RETRY_TIMES



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13649:

Parent: HADOOP-15619  (was: HADOOP-15226)

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch, 
> HADOOP-13649.003.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13756) LocalMetadataStore#put(DirListingMetadata) should also put file metadata into fileHash.

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13756:

Parent: HADOOP-15619  (was: HADOOP-15226)

> LocalMetadataStore#put(DirListingMetadata) should also put file metadata into 
> fileHash.
> ---
>
> Key: HADOOP-13756
> URL: https://issues.apache.org/jira/browse/HADOOP-13756
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-13756.001.patch
>
>
> {{LocalMetadataStore#put(DirListingMetadata)}} only puts the metadata into 
> {{dirHash}}, thus all {{FileStatus}} s are missing from 
> {{LocalMedataStore#fileHash()}}, which makes it confuse to use.
> So in the current way, to correctly put file status into the store (and also 
> set {{authoriative}} flag), you need to run  {code}
> List metas = new ArrayList();
> boolean authorizative = true;
> for (S3AFileStatus status : files) {
>PathMetadata meta = new PathMetadata(status);
>store.put(meta);
> }
> DirListingMetadata dirMeta = new DirMeta(parent, metas, authorizative);
> store.put(dirMeta);
> {code}
> Since solely calling {{store.put(dirMeta)}} is not correct, and calling 
> {{store.put(dirMeta);}} after putting all sub-file {{FileStatuss}} does the 
> repetitive jobs. Can we just use a {{put(PathMetadata)}} and a 
> {{get/setAuthorative()}}   in the MetadataStore interface instead?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15492) increase performance of s3guard import command

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15492:

Parent: HADOOP-15619  (was: HADOOP-15226)

> increase performance of s3guard import command
> --
>
> Key: HADOOP-15492
> URL: https://issues.apache.org/jira/browse/HADOOP-15492
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>Priority: Major
>
> Some perf improvements which spring to mind having looked at the s3guard 
> import command
> Key points: it can handle the import of a tree with existing data better
> # if the bucket is already under s3guard, then the listing will return all 
> listed files, which will then be put() again.
> # import calls {{putParentsIfNotPresent()}}, but DDBMetaStore.put() will do 
> the parent creation anyway
> # For each entry in the store (i.e. a file), the full parent listing is 
> created, then a batch write created to put all the parents and the actual file
> As a result, it's at risk of doing many more put calls than needed, 
> especially for wide/deep directory trees.
> It would be much more efficient to put all files in a single directory as 
> part of 1+ batch request, with 1 parent tree. Better yet: a get() of that 
> parent could skip the put of parent entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15628) S3A Filesystem does not check return from AmazonS3Client deleteObjects

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554611#comment-16554611
 ] 

Steve Loughran edited comment on HADOOP-15628 at 7/24/18 7:44 PM:
--

ps: that's not the current delete code. [This 
is|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1401].
 It's critical to have a look at the latest stuff to see if the problem still 
exists there, as if it doesn't, it just gets closed as WORKSFORME. As noted, I 
don't think it is perfect, but since we're only going to be tuning the 3.1+ 
line, you aren't ever going to see the fix in 2.8; 2.9 should already have the 
core fix; I'm just worrying about s3guard

That said: if you can replicate the situation where deleteObjects does a 
partial delete without raising an exception, that is a sign that the AWS SDK 
doesn't do what the javadocs say, so it's something we may need to worry about 
(although as we are on a much newer version of that SDK, again, it may be 
fixed). So: please try to replicate on the latest version, attempting it on the 
command line should be enough to try and trigger it. thanks


was (Author: ste...@apache.org):
ps: that's not the current delete code. [This 
is|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1401].
 It's critical to have a look at the latest stuff to see if the problem still 
exists there, as if it doesn't, it just gets closed as WORKSFORME. As noted, I 
don't think it is perfect, but since we're only going to be tuning the 3.1+ 
line, you aren't ever going to see the fix in 2.8. and as noted, 2.9 should 
have the core fix; I'm just worrying about s3guard

That said: if you can replicate the situation where deleteObjects does a 
partial delete without raising an exception, that is a sign that the AWS SDK 
doesn't do what the javadocs say, so it's something we may need to worry about 
(although as we are on a much newer version of that SDK, again, it may be 
fixed). So: please try to replicate on the latest version, attempting it on the 
command line should be enough to try and trigger it. thanks

> S3A Filesystem does not check return from AmazonS3Client deleteObjects
> --
>
> Key: HADOOP-15628
> URL: https://issues.apache.org/jira/browse/HADOOP-15628
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.9.1, 2.8.4, 3.1.1, 3.0.3
> Environment: Hadoop 3.0.2 / Hadoop 2.8.3
> Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
>Reporter: Steve Jacobs
>Assignee: Steve Loughran
>Priority: Minor
>
> Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api 
> do not check to see if all objects have been succesfully delete. In the event 
> of a failure, the api will still return a 200 OK (which isn't checked 
> currently):
> [Delete Code from Hadoop 
> 2.8|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
>  
> {code:java}
> if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
> DeleteObjectsRequest deleteRequest =
> new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
> s3.deleteObjects(deleteRequest);
> statistics.incrementWriteOps(1);
> keysToDelete.clear();
> }
> {code}
> This should be converted to use the DeleteObjectsResult class from the 
> S3Client: 
> [Amazon Code 
> Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
> {code:java}
> // Verify that the objects were deleted successfully.
> DeleteObjectsResult delObjRes = 
> s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
> delObjRes.getDeletedObjects().size();
> System.out.println(successfulDeletes + " objects successfully deleted.");
> {code}
> Bucket policies can be misconfigured, and deletes will fail without warning 
> by S3A clients.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15627) S3A ITests failing if bucket explicitly set to s3guard+DDB

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554713#comment-16554713
 ] 

Steve Loughran commented on HADOOP-15627:
-

latter test failure goes away if you increase the inconsistency time from the 
default 5s to 20s. That is: if your test run is slow enough, the 
inconsistencies will have expired and so the assertions fail. When your test 
bucket is on a different continent, your tests can become slow enough for this 
to surface

> S3A ITests failing if bucket explicitly set to s3guard+DDB
> --
>
> Key: HADOOP-15627
> URL: https://issues.apache.org/jira/browse/HADOOP-15627
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> Repeatable failure in {{ITestS3GuardWriteBack.testListStatusWriteBack}}
> Possible causes could include
> * test not setting up the three fs instances
> * (disabled) caching not isolating properly
> * something more serious



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15566) Remove HTrace support

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554695#comment-16554695
 ] 

Steve Loughran commented on HADOOP-15566:
-

Stack is of course correct: we want this stuff used end-to-end. We do this 
today with logging across our JARs; we need something beyond logging to track 
down performance/blame across everything.

Avoiding dictating "you must use reporting tool X" for your analysis limits 
which people will want to use the tracing, and so how broadly it gets used. I 
don't want to have to worry about what they do with that data, 

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554680#comment-16554680
 ] 

Xiao Chen commented on HADOOP-15593:


It makes sense for the best-effort retries in general, so the renewal thread 
doesn't abort prematurely due to intermittent failures.

But could you clarify a little more? If a tgt is destroyed, how can it be 
renewed?
Looks to me KDC outage would result in relogin failure and possibly getTGT() 
being null without an exception, after which the current code just does a null 
check on tgt and return without retries. IMO we should be consistent with it 
and just return.
I don't feel strongly that having the last try as patch 4 is a big problem, but 
it's not clear under which scenario this could possibly succeed.

> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: https://issues.apache.org/jira/browse/HADOOP-15593
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Blocker
> Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, 
> HADOOP-15593.003.patch, HADOOP-15593.004.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for f...@example.com,5,main]
> java.lang.NullPointerException
> at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
> at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
> at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554659#comment-16554659
 ] 

Eric Yang commented on HADOOP-15593:


[~xiaochen] Renewal thread supposed to run until max_renewable_life has been 
reached.  If ticket end time is expired or unknown, but max_renewable_life is 
not expired, we would want the renewal loop to run.  Maybe there is KDC outages 
to cause endTime = null, but retry should be attempted.  Hadoop doesn't seem to 
have logic to check max_renewable_life, therefore, it may keep trying to 
prevent service outage for now.  We probably want to open another ticket for 
enhancement to respect max_renewable_life.  It is safer to retry than having 
cluster goes down because Kerberos tickets can not be renewed.

> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: https://issues.apache.org/jira/browse/HADOOP-15593
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Blocker
> Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, 
> HADOOP-15593.003.patch, HADOOP-15593.004.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for f...@example.com,5,main]
> java.lang.NullPointerException
> at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
> at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
> at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15623) Compiling hadoop-azure fails with jdk10 (javax javascript)

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554621#comment-16554621
 ] 

Steve Loughran commented on HADOOP-15623:
-

Might be related to this https://www.infoq.com/news/2018/06/deprecate-nashorn



> Compiling hadoop-azure fails with jdk10 (javax javascript)
> --
>
> Key: HADOOP-15623
> URL: https://issues.apache.org/jira/browse/HADOOP-15623
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Reporter: Ewan Higgs
>Priority: Major
>
> {code}
> $ java -version
> java version "10.0.1" 2018-04-17
> Java(TM) SE Runtime Environment 18.3 (build 10.0.1+10)
> Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)
> {code}
> {code}
> $ mvn install -DskipShade -Dmaven.javadoc.skip=true -Djava.awt.headless=true 
> -DskipTests -rf :hadoop-azure
> ... 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-antrun-plugin:1.7:run 
> (create-parallel-tests-dirs) on project hadoop-azure: An Ant BuildException 
> has occured: Unable to create javax script engine for javascript
> [ERROR] around Ant part 

[jira] [Assigned] (HADOOP-15628) S3A Filesystem does not check return from AmazonS3Client deleteObjects

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-15628:
---

Assignee: Steve Loughran

> S3A Filesystem does not check return from AmazonS3Client deleteObjects
> --
>
> Key: HADOOP-15628
> URL: https://issues.apache.org/jira/browse/HADOOP-15628
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.9.1, 2.8.4, 3.1.1, 3.0.3
> Environment: Hadoop 3.0.2 / Hadoop 2.8.3
> Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
>Reporter: Steve Jacobs
>Assignee: Steve Loughran
>Priority: Minor
>
> Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api 
> do not check to see if all objects have been succesfully delete. In the event 
> of a failure, the api will still return a 200 OK (which isn't checked 
> currently):
> [Delete Code from Hadoop 
> 2.8|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
>  
> {code:java}
> if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
> DeleteObjectsRequest deleteRequest =
> new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
> s3.deleteObjects(deleteRequest);
> statistics.incrementWriteOps(1);
> keysToDelete.clear();
> }
> {code}
> This should be converted to use the DeleteObjectsResult class from the 
> S3Client: 
> [Amazon Code 
> Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
> {code:java}
> // Verify that the objects were deleted successfully.
> DeleteObjectsResult delObjRes = 
> s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
> delObjRes.getDeletedObjects().size();
> System.out.println(successfulDeletes + " objects successfully deleted.");
> {code}
> Bucket policies can be misconfigured, and deletes will fail without warning 
> by S3A clients.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15623) Compiling hadoop-azure fails with jdk10 (javax javascript)

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15623:

Component/s: build

> Compiling hadoop-azure fails with jdk10 (javax javascript)
> --
>
> Key: HADOOP-15623
> URL: https://issues.apache.org/jira/browse/HADOOP-15623
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Reporter: Ewan Higgs
>Priority: Major
>
> {code}
> $ java -version
> java version "10.0.1" 2018-04-17
> Java(TM) SE Runtime Environment 18.3 (build 10.0.1+10)
> Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)
> {code}
> {code}
> $ mvn install -DskipShade -Dmaven.javadoc.skip=true -Djava.awt.headless=true 
> -DskipTests -rf :hadoop-azure
> ... 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-antrun-plugin:1.7:run 
> (create-parallel-tests-dirs) on project hadoop-azure: An Ant BuildException 
> has occured: Unable to create javax script engine for javascript
> [ERROR] around Ant part 

[jira] [Comment Edited] (HADOOP-15628) S3A Filesystem does not check return from AmazonS3Client deleteObjects

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554611#comment-16554611
 ] 

Steve Loughran edited comment on HADOOP-15628 at 7/24/18 6:14 PM:
--

ps: that's not the current delete code. [This 
is|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1401].
 It's critical to have a look at the latest stuff to see if the problem still 
exists there, as if it doesn't, it just gets closed as WORKSFORME. As noted, I 
don't think it is perfect, but since we're only going to be tuning the 3.1+ 
line, you aren't ever going to see the fix in 2.8. and as noted, 2.9 should 
have the core fix; I'm just worrying about s3guard

That said: if you can replicate the situation where deleteObjects does a 
partial delete without raising an exception, that is a sign that the AWS SDK 
doesn't do what the javadocs say, so it's something we may need to worry about 
(although as we are on a much newer version of that SDK, again, it may be 
fixed). So: please try to replicate on the latest version, attempting it on the 
command line should be enough to try and trigger it. thanks


was (Author: ste...@apache.org):
ps: that's not the current delete code. [This 
is}https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1401].
 It's critical to have a look at the latest stuff to see if the problem still 
exists there, as if it doesn't, it just gets closed as WORKSFORME. As noted, I 
don't think it is perfect, but since we're only going to be tuning the 3.1+ 
line, you aren't ever going to see the fix in 2.8. and as noted, 2.9 should 
have the core fix; I'm just worrying about s3guard

That said: if you can replicate the situation where deleteObjects does a 
partial delete without raising an exception, that is a sign that the AWS SDK 
doesn't do what the javadocs say, so it's something we may need to worry about 
(although as we are on a much newer version of that SDK, again, it may be 
fixed). So: please try to replicate on the latest version, attempting it on the 
command line should be enough to try and trigger it. thanks

> S3A Filesystem does not check return from AmazonS3Client deleteObjects
> --
>
> Key: HADOOP-15628
> URL: https://issues.apache.org/jira/browse/HADOOP-15628
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.9.1, 2.8.4, 3.1.1, 3.0.3
> Environment: Hadoop 3.0.2 / Hadoop 2.8.3
> Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
>Reporter: Steve Jacobs
>Priority: Minor
>
> Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api 
> do not check to see if all objects have been succesfully delete. In the event 
> of a failure, the api will still return a 200 OK (which isn't checked 
> currently):
> [Delete Code from Hadoop 
> 2.8|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
>  
> {code:java}
> if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
> DeleteObjectsRequest deleteRequest =
> new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
> s3.deleteObjects(deleteRequest);
> statistics.incrementWriteOps(1);
> keysToDelete.clear();
> }
> {code}
> This should be converted to use the DeleteObjectsResult class from the 
> S3Client: 
> [Amazon Code 
> Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
> {code:java}
> // Verify that the objects were deleted successfully.
> DeleteObjectsResult delObjRes = 
> s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
> delObjRes.getDeletedObjects().size();
> System.out.println(successfulDeletes + " objects successfully deleted.");
> {code}
> Bucket policies can be misconfigured, and deletes will fail without warning 
> by S3A clients.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15628) S3A Filesystem does not check return from AmazonS3Client deleteObjects

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554611#comment-16554611
 ] 

Steve Loughran commented on HADOOP-15628:
-

ps: that's not the current delete code. [This 
is}https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1401].
 It's critical to have a look at the latest stuff to see if the problem still 
exists there, as if it doesn't, it just gets closed as WORKSFORME. As noted, I 
don't think it is perfect, but since we're only going to be tuning the 3.1+ 
line, you aren't ever going to see the fix in 2.8. and as noted, 2.9 should 
have the core fix; I'm just worrying about s3guard

That said: if you can replicate the situation where deleteObjects does a 
partial delete without raising an exception, that is a sign that the AWS SDK 
doesn't do what the javadocs say, so it's something we may need to worry about 
(although as we are on a much newer version of that SDK, again, it may be 
fixed). So: please try to replicate on the latest version, attempting it on the 
command line should be enough to try and trigger it. thanks

> S3A Filesystem does not check return from AmazonS3Client deleteObjects
> --
>
> Key: HADOOP-15628
> URL: https://issues.apache.org/jira/browse/HADOOP-15628
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.9.1, 2.8.4, 3.1.1, 3.0.3
> Environment: Hadoop 3.0.2 / Hadoop 2.8.3
> Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
>Reporter: Steve Jacobs
>Priority: Minor
>
> Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api 
> do not check to see if all objects have been succesfully delete. In the event 
> of a failure, the api will still return a 200 OK (which isn't checked 
> currently):
> [Delete Code from Hadoop 
> 2.8|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
>  
> {code:java}
> if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
> DeleteObjectsRequest deleteRequest =
> new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
> s3.deleteObjects(deleteRequest);
> statistics.incrementWriteOps(1);
> keysToDelete.clear();
> }
> {code}
> This should be converted to use the DeleteObjectsResult class from the 
> S3Client: 
> [Amazon Code 
> Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
> {code:java}
> // Verify that the objects were deleted successfully.
> DeleteObjectsResult delObjRes = 
> s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
> delObjRes.getDeletedObjects().size();
> System.out.println(successfulDeletes + " objects successfully deleted.");
> {code}
> Bucket policies can be misconfigured, and deletes will fail without warning 
> by S3A clients.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15628) S3A Filesystem does not check return from AmazonS3Client deleteObjects

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15628:

Description: 
Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api do 
not check to see if all objects have been succesfully delete. In the event of a 
failure, the api will still return a 200 OK (which isn't checked currently):

[Delete Code from Hadoop 
2.8|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
 
{code:java}
if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
DeleteObjectsRequest deleteRequest =
new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
s3.deleteObjects(deleteRequest);
statistics.incrementWriteOps(1);
keysToDelete.clear();
}
{code}
This should be converted to use the DeleteObjectsResult class from the 
S3Client: 

[Amazon Code 
Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
{code:java}
// Verify that the objects were deleted successfully.
DeleteObjectsResult delObjRes = 
s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
delObjRes.getDeletedObjects().size();
System.out.println(successfulDeletes + " objects successfully deleted.");
{code}
Bucket policies can be misconfigured, and deletes will fail without warning by 
S3A clients.

 

 

 

  was:
Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api do 
not check to see if all objects have been succesfully delete. In the event of a 
failure, the api will still return a 200 OK (which isn't checked currently):

[Current Delete 
Code|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
 
{code:java}
if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
DeleteObjectsRequest deleteRequest =
new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
s3.deleteObjects(deleteRequest);
statistics.incrementWriteOps(1);
keysToDelete.clear();
}
{code}
This should be converted to use the DeleteObjectsResult class from the 
S3Client: 

[Amazon Code 
Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
{code:java}
// Verify that the objects were deleted successfully.
DeleteObjectsResult delObjRes = 
s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
delObjRes.getDeletedObjects().size();
System.out.println(successfulDeletes + " objects successfully deleted.");
{code}
Bucket policies can be misconfigured, and deletes will fail without warning by 
S3A clients.

 

 

 


> S3A Filesystem does not check return from AmazonS3Client deleteObjects
> --
>
> Key: HADOOP-15628
> URL: https://issues.apache.org/jira/browse/HADOOP-15628
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.9.1, 2.8.4, 3.1.1, 3.0.3
> Environment: Hadoop 3.0.2 / Hadoop 2.8.3
> Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
>Reporter: Steve Jacobs
>Priority: Minor
>
> Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api 
> do not check to see if all objects have been succesfully delete. In the event 
> of a failure, the api will still return a 200 OK (which isn't checked 
> currently):
> [Delete Code from Hadoop 
> 2.8|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
>  
> {code:java}
> if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
> DeleteObjectsRequest deleteRequest =
> new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
> s3.deleteObjects(deleteRequest);
> statistics.incrementWriteOps(1);
> keysToDelete.clear();
> }
> {code}
> This should be converted to use the DeleteObjectsResult class from the 
> S3Client: 
> [Amazon Code 
> Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
> {code:java}
> // Verify that the objects were deleted successfully.
> DeleteObjectsResult delObjRes = 
> s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
> delObjRes.getDeletedObjects().size();
> System.out.println(successfulDeletes + " objects successfully deleted.");
> {code}
> Bucket policies can be misconfigured, and deletes will fail without warning 
> by S3A clients.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15628) S3A Filesystem does not check return from AmazonS3Client deleteObjects

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554589#comment-16554589
 ] 

Steve Loughran commented on HADOOP-15628:
-

That's interesting. 

# How did you manage to replicate it? Not grant enough permissions to an object?
# What version have you actually seen this on? <= 2.8.x? .  HADOOP-11572 tried 
to handle this better, which is 2.9+.
# and In HADOOP-15176 and the 3.1 release we've done a lot of work there

1. We actually rely on a MultiObjectsDeleteException being raised on a delete 
failure, which the API Says "if one or more of the objects couldn't be deleted."
2. We don't have a complete policy on what to do here.; currently it's 
catch-log-rethrow

I'm actually going to do some work on this in the next 10 days, because we need 
to handle this for rename() too (it's a copy & delete, after all): 
HADOOP-13936; HADOOP-15193 are the issues there. It'd be great if you could 
help there by testing the 3.2 RC0 against your buckets (expect this in 
september).

At the same time, we aren't going to be able to recover from the failure. All 
we're going to do is make S3Guard consistent with the remote state (i.e. mark 
deleted files as delete) and throw again. I don't believe I need to worry about 
the response from DeleteObjects, assuming that the SDK assertion "partial 
delete failures raise an exception". If you have evidence that this is not 
always the case, and that sometimes partial deletes surface in an incomplete 
result *and no exception raised in the client*, well, that would be cause for 
concern.

> S3A Filesystem does not check return from AmazonS3Client deleteObjects
> --
>
> Key: HADOOP-15628
> URL: https://issues.apache.org/jira/browse/HADOOP-15628
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.9.1, 2.8.4, 3.1.1, 3.0.3
> Environment: Hadoop 3.0.2 / Hadoop 2.8.3
> Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
>Reporter: Steve Jacobs
>Priority: Minor
>
> Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api 
> do not check to see if all objects have been succesfully delete. In the event 
> of a failure, the api will still return a 200 OK (which isn't checked 
> currently):
> [Current Delete 
> Code|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
>  
> {code:java}
> if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
> DeleteObjectsRequest deleteRequest =
> new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
> s3.deleteObjects(deleteRequest);
> statistics.incrementWriteOps(1);
> keysToDelete.clear();
> }
> {code}
> This should be converted to use the DeleteObjectsResult class from the 
> S3Client: 
> [Amazon Code 
> Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
> {code:java}
> // Verify that the objects were deleted successfully.
> DeleteObjectsResult delObjRes = 
> s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
> delObjRes.getDeletedObjects().size();
> System.out.println(successfulDeletes + " objects successfully deleted.");
> {code}
> Bucket policies can be misconfigured, and deletes will fail without warning 
> by S3A clients.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554585#comment-16554585
 ] 

Xiao Chen commented on HADOOP-15593:


Thanks for revving [~gabor.bota] and [~eyang] for the prompt review. Patch 
looks pretty good. I have 2 comments on the latest patch

- In case of either tgt is destroyed (either isDestroyed()==true or getEndTime 
NPEs), is there any value in retrying? How about we do something like:
{code}
  if (tgt.isDestroyed()) {
 //log and return;
  }
  try{
tgtEndTime = tgt.getEndTime().getTime();
  } catch (NullPointerException npe) {
 // log and return;
  }
{code}
{{runRenewalLoop}} var won't be necessary if we do this. Thoughts?

- The {{renewalFailures}} and {{renewalFailuresTotal}} metrics need to call 
{{value()}} in order to be logged correctly.
This comes from existing code, but good to fix since we're touching it.

> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: https://issues.apache.org/jira/browse/HADOOP-15593
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Blocker
> Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, 
> HADOOP-15593.003.patch, HADOOP-15593.004.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for f...@example.com,5,main]
> java.lang.NullPointerException
> at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
> at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
> at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15628) S3A Filesystem does not check return from AmazonS3Client deleteObjects

2018-07-24 Thread Steve Jacobs (JIRA)
Steve Jacobs created HADOOP-15628:
-

 Summary: S3A Filesystem does not check return from AmazonS3Client 
deleteObjects
 Key: HADOOP-15628
 URL: https://issues.apache.org/jira/browse/HADOOP-15628
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.3, 2.8.4, 2.9.1, 3.1.1
 Environment: Hadoop 3.0.2 / Hadoop 2.8.3

Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
Reporter: Steve Jacobs


Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api do 
not check to see if all objects have been succesfully delete. In the event of a 
failure, the api will still return a 200 OK (which isn't checked currently):

[Current Delete 
Code|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
 
{code:java}
if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
DeleteObjectsRequest deleteRequest =
new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
s3.deleteObjects(deleteRequest);
statistics.incrementWriteOps(1);
keysToDelete.clear();
}
{code}
This should be converted to use the DeleteObjectsResult class from the 
S3Client: 

[Amazon Code 
Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
{code:java}
// Verify that the objects were deleted successfully.
DeleteObjectsResult delObjRes = 
s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
delObjRes.getDeletedObjects().size();
System.out.println(successfulDeletes + " objects successfully deleted.");
{code}
Bucket policies can be misconfigured, and deletes will fail without warning by 
S3A clients.

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15576) S3A Multipart Uploader to work with S3Guard and encryption

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15576:

Summary: S3A  Multipart Uploader to work with S3Guard and encryption  (was: 
S3A  Multipart Uploader API implementation to work with S3Guard and encryption)

> S3A  Multipart Uploader to work with S3Guard and encryption
> ---
>
> Key: HADOOP-15576
> URL: https://issues.apache.org/jira/browse/HADOOP-15576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2
>Reporter: Steve Loughran
>Assignee: Ewan Higgs
>Priority: Blocker
>
> The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with 
> the tests to demonstrate this
> # move from low-level calls of S3A client to calls of WriteOperationHelper; 
> adding any new methods are needed there.
> # Tests. the tests of HDFS-13713. 
> # test execution, with -DS3Guard, -DAuth
> There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and 
> even if there was, it might not show that S3Guard was bypassed, because 
> there's no checks that listFiles/listStatus shows the newly committed files.
> Similarly, because MPU requests are initiated in S3AMultipartUploader, 
> encryption settings are't picked up. Files being uploaded this way *are not 
> being encrypted*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15576) S3A Multipart Uploader API implementation to work with S3Guard and encryption

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554542#comment-16554542
 ] 

Steve Loughran commented on HADOOP-15576:
-

Ewan, still expecting this

* S3AMultipartUploader to always go through s3a FS and 
{{WriteOperationsHelper}}, not at the s3 client; retry logic to go in there, 
along with s3guard introduction. that's for completing & aborting ubploads 
({{abortMultipartUpload()}},  
{{newUploadPartRequest()}}{{finalizeMultipartUpload()}}. That shoud be enough 
to wire up S3Guard. If something is missing: add it.
* S3AMultipartUploader to declare retry policy on its ops
* test to include inconsistent s3 client with failures (always) and, if s3guard 
is enabled inconsistent, turned on. Your store doesn't do inconsistency so 
that's moot, but it may fail -so we need those retries

You must not be creating any AWS request classes yourself, or calling the s3a 
client. If there isn't a method in WriteOperationsHelper to create the 
instance, add it, make sure it works with S3Guard, sets encryption stuff etc. 
The requirement to declare the retry policy forces you to look at the retry 
logic of the layers underneath, make sure either they all retry *or the new 
code adds it*. It will also help future maintenance



> S3A  Multipart Uploader API implementation to work with S3Guard and encryption
> --
>
> Key: HADOOP-15576
> URL: https://issues.apache.org/jira/browse/HADOOP-15576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2
>Reporter: Steve Loughran
>Assignee: Ewan Higgs
>Priority: Blocker
>
> The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with 
> the tests to demonstrate this
> # move from low-level calls of S3A client to calls of WriteOperationHelper; 
> adding any new methods are needed there.
> # Tests. the tests of HDFS-13713. 
> # test execution, with -DS3Guard, -DAuth
> There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and 
> even if there was, it might not show that S3Guard was bypassed, because 
> there's no checks that listFiles/listStatus shows the newly committed files.
> Similarly, because MPU requests are initiated in S3AMultipartUploader, 
> encryption settings are't picked up. Files being uploaded this way *are not 
> being encrypted*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15576) S3A Multipart Uploader API implementation to work with S3Guard and encryption

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15576:

Description: 
The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with 
the tests to demonstrate this

# move from low-level calls of S3A client to calls of WriteOperationHelper; 
adding any new methods are needed there.
# Tests. the tests of HDFS-13713. 
# test execution, with -DS3Guard, -DAuth

There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and even 
if there was, it might not show that S3Guard was bypassed, because there's no 
checks that listFiles/listStatus shows the newly committed files.

Similarly, because MPU requests are initiated in S3AMultipartUploader, 
encryption settings are't picked up. Files being uploaded this way *are not 
being encrypted*

  was:
The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with 
the tests to demonstrate this

# move from low-level calls of S3A client to calls of WriteOperationHelper; 
adding any new methods are needed there.
# Tests. the tests of HDFS-13713. 
# test execution, with -DS3Guard, -DAuth

There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and even 
if there was, it might not show that S3Guard was bypassed, because there's no 
checks that listFiles/listStatus shows the newly committed files.


> S3A  Multipart Uploader API implementation to work with S3Guard and encryption
> --
>
> Key: HADOOP-15576
> URL: https://issues.apache.org/jira/browse/HADOOP-15576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2
>Reporter: Steve Loughran
>Assignee: Ewan Higgs
>Priority: Blocker
>
> The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with 
> the tests to demonstrate this
> # move from low-level calls of S3A client to calls of WriteOperationHelper; 
> adding any new methods are needed there.
> # Tests. the tests of HDFS-13713. 
> # test execution, with -DS3Guard, -DAuth
> There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and 
> even if there was, it might not show that S3Guard was bypassed, because 
> there's no checks that listFiles/listStatus shows the newly committed files.
> Similarly, because MPU requests are initiated in S3AMultipartUploader, 
> encryption settings are't picked up. Files being uploaded this way *are not 
> being encrypted*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15576) S3A Multipart Uploader API implementation to work with S3Guard and encryption

2018-07-24 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15576:

Summary: S3A  Multipart Uploader API implementation to work with S3Guard 
and encryption  (was: S3A  Multipart Uploader API implementation to work with 
S3Guard)

> S3A  Multipart Uploader API implementation to work with S3Guard and encryption
> --
>
> Key: HADOOP-15576
> URL: https://issues.apache.org/jira/browse/HADOOP-15576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2
>Reporter: Steve Loughran
>Assignee: Ewan Higgs
>Priority: Blocker
>
> The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with 
> the tests to demonstrate this
> # move from low-level calls of S3A client to calls of WriteOperationHelper; 
> adding any new methods are needed there.
> # Tests. the tests of HDFS-13713. 
> # test execution, with -DS3Guard, -DAuth
> There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and 
> even if there was, it might not show that S3Guard was bypassed, because 
> there's no checks that listFiles/listStatus shows the newly committed files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15566) Remove HTrace support

2018-07-24 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-15566:
--
Labels: security  (was: )

> Remove HTrace support
> -
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 3.1.0
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554453#comment-16554453
 ] 

Eric Yang commented on HADOOP-15593:


[~gabor.bota] Thank you for the patch.  In patch 003, you have a tgt null check 
before entering the while loop.  NPE will be thrown only for endTime = new 
Date(null).  Unless tgt isDestroyed or uninitialized, otherwise, tgt always 
have an end time.  This is the reason that proposal can work.  Patch 004 
proposal can work equally well.  Therefore, +1 from my point of view.

> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: https://issues.apache.org/jira/browse/HADOOP-15593
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Blocker
> Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, 
> HADOOP-15593.003.patch, HADOOP-15593.004.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for f...@example.com,5,main]
> java.lang.NullPointerException
> at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
> at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
> at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14212) Expose SecurityEnabled boolean field in JMX for other services besides NameNode

2018-07-24 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554358#comment-16554358
 ] 

Adam Antal commented on HADOOP-14212:
-

Thank you for the suggestions [~ajayydv], I worked accordingly: removed the 
redundant public keywords from the MXBeans, although in TestDataNodeMXBean.java 
the contents of testDataNodeMXBean test is in a try-with-resources block from 
line 60 to 117, that is why the finally clause got removed - so I see no point 
in not removing it.

I added a test for kerberos authentication to TestDataNodeMXBean, could you 
please check if my modifications are appropriate? Also another question has 
arisen while I attempted to make tests for ResourceManager- and 
NodeManagerMXBean. I used SaslDataTransferTestCase as base class for getting 
kerberos configuration, but this class is inaccessible from hadoop-yarn. Would 
you please advise how to surpass this?

(Moving SaslDataTransferTestCase to hadoop-common is not a good idea since it 
uses DFSConfigKeys from hadoop-hdfs, but introducting a new dependency has also 
some other caveats.) A simpler answer could be just to copy the code to make 
the tests for those two MXBeans in yarn, or don't write test for them.

> Expose SecurityEnabled boolean field in JMX for other services besides 
> NameNode
> ---
>
> Key: HADOOP-14212
> URL: https://issues.apache.org/jira/browse/HADOOP-14212
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ray Burgemeestre
>Assignee: Adam Antal
>Priority: Minor
>  Labels: newbie, security
> Attachments: HADOOP-14212.001.patch, HADOOP-14212.002.patch, 
> HADOOP-14212.003.patch, HADOOP-14212.004.patch, HADOOP-14212.005.patch, 
> HADOOP-14212.005.patch, HADOOP-14212.005.patch, HADOOP-14212.006.patch
>
>
> The following commit 
> https://github.com/apache/hadoop/commit/dc17bda4b677e30c02c2a9a053895a43e41f7a12
>  introduced a "SecurityEnabled" field in the JMX output for the NameNode. I 
> believe it would be nice to add this same change to the JMX output of other 
> services: Secondary Namenode, ResourceManager, NodeManagers, DataNodes, etc. 
> So that it can be queried whether Security is enabled in all JMX resources.
> The reason I am suggesting this feature / improvement is that I think it  
> would provide a clean way to check whether your cluster is completely 
> Kerberized or not. I don't think there is an easy/clean way to do this now, 
> other than checking the logs, checking ports etc.? 
> The file where the change was made is 
> hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
>  has the following function now:
> {code:java}
> @Override // NameNodeStatusMXBean
> public boolean isSecurityEnabled() {
> return UserGroupInformation.isSecurityEnabled();
> }
> {code}
> I would be happy to develop a patch if it seems useful by others as well?
> This is a snippet from the JMX output from the NameNode in case security is 
> not enabled:
> {code}
>   {
> "name" : "Hadoop:service=NameNode,name=NameNodeStatus",
> "modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode",
> "NNRole" : "NameNode",
> "HostAndPort" : "node001.cm.cluster:8020",
> "SecurityEnabled" : false,
> "LastHATransitionTime" : 0,
> "State" : "standby"
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14212) Expose SecurityEnabled boolean field in JMX for other services besides NameNode

2018-07-24 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated HADOOP-14212:

Attachment: HADOOP-14212.006.patch

> Expose SecurityEnabled boolean field in JMX for other services besides 
> NameNode
> ---
>
> Key: HADOOP-14212
> URL: https://issues.apache.org/jira/browse/HADOOP-14212
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ray Burgemeestre
>Assignee: Adam Antal
>Priority: Minor
>  Labels: newbie, security
> Attachments: HADOOP-14212.001.patch, HADOOP-14212.002.patch, 
> HADOOP-14212.003.patch, HADOOP-14212.004.patch, HADOOP-14212.005.patch, 
> HADOOP-14212.005.patch, HADOOP-14212.005.patch, HADOOP-14212.006.patch
>
>
> The following commit 
> https://github.com/apache/hadoop/commit/dc17bda4b677e30c02c2a9a053895a43e41f7a12
>  introduced a "SecurityEnabled" field in the JMX output for the NameNode. I 
> believe it would be nice to add this same change to the JMX output of other 
> services: Secondary Namenode, ResourceManager, NodeManagers, DataNodes, etc. 
> So that it can be queried whether Security is enabled in all JMX resources.
> The reason I am suggesting this feature / improvement is that I think it  
> would provide a clean way to check whether your cluster is completely 
> Kerberized or not. I don't think there is an easy/clean way to do this now, 
> other than checking the logs, checking ports etc.? 
> The file where the change was made is 
> hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
>  has the following function now:
> {code:java}
> @Override // NameNodeStatusMXBean
> public boolean isSecurityEnabled() {
> return UserGroupInformation.isSecurityEnabled();
> }
> {code}
> I would be happy to develop a patch if it seems useful by others as well?
> This is a snippet from the JMX output from the NameNode in case security is 
> not enabled:
> {code}
>   {
> "name" : "Hadoop:service=NameNode,name=NameNodeStatus",
> "modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode",
> "NNRole" : "NameNode",
> "HostAndPort" : "node001.cm.cluster:8020",
> "SecurityEnabled" : false,
> "LastHATransitionTime" : 0,
> "State" : "standby"
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-24 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554234#comment-16554234
 ] 

genericqa commented on HADOOP-15607:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
19s{color} | {color:green} hadoop-aliyun in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HADOOP-15607 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932879/HADOOP-15607.003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 00f2aae744e9 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ff7c2ed |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14936/testReport/ |
| Max. process+thread count | 336 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14936/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> 

[jira] [Commented] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-24 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554209#comment-16554209
 ] 

wujinhu commented on HADOOP-15607:
--

[~Sammi] Finally, I reproduced this issue by changing some configurations in my 
local environment successfully, please help to review again, thanks.

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.1, 3.2.0, 3.1.1, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch, HADOOP-15607.002.patch, 
> HADOOP-15607.003.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
>  Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
>  [ErrorCode]: InvalidPartOrder
>  [RequestId]: 5B4C40425FCC208D79D1EAF5
>  [HostId]: 100.103.0.137
>  [ResponseError]:
>  
>  
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  xx.xx.xx.xx
>  current PartNumber 3, you given part number 3is not in 
> ascending order
>  
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554186#comment-16554186
 ] 

genericqa commented on HADOOP-15593:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
20s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}123m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HADOOP-15593 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932870/HADOOP-15593.004.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7703aa7e4d45 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8461278 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14935/testReport/ |
| Max. process+thread count | 1483 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14935/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: 

[jira] [Updated] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-24 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15607:
-
Attachment: HADOOP-15607.003.patch

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.1, 3.2.0, 3.1.1, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch, HADOOP-15607.002.patch, 
> HADOOP-15607.003.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
>  Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
>  [ErrorCode]: InvalidPartOrder
>  [RequestId]: 5B4C40425FCC208D79D1EAF5
>  [HostId]: 100.103.0.137
>  [ResponseError]:
>  
>  
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  xx.xx.xx.xx
>  current PartNumber 3, you given part number 3is not in 
> ascending order
>  
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15395) DefaultImpersonationProvider fails to parse proxy user config if username has . in it

2018-07-24 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554100#comment-16554100
 ] 

Mukul Kumar Singh commented on HADOOP-15395:


Thanks for working on this [~ajayydv], 
+1, the v3 patch looks good to me, I will commit this shortly

> DefaultImpersonationProvider fails to parse proxy user config if username has 
> . in it
> -
>
> Key: HADOOP-15395
> URL: https://issues.apache.org/jira/browse/HADOOP-15395
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HADOOP-15395.00.patch, HADOOP-15395.01.patch, 
> HADOOP-15395.02.patch, HADOOP-15395.03.patch
>
>
> DefaultImpersonationProvider fails to parse proxy user config if username has 
> . in it. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HADOOP-15593:

Attachment: HADOOP-15593.004.patch

> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: https://issues.apache.org/jira/browse/HADOOP-15593
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Blocker
> Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, 
> HADOOP-15593.003.patch, HADOOP-15593.004.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for f...@example.com,5,main]
> java.lang.NullPointerException
> at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
> at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
> at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE

2018-07-24 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554020#comment-16554020
 ] 

Gabor Bota commented on HADOOP-15593:
-

Hi [~eyang], 
The first approach you've proposed won't work, because tgt.getEndTime() will 
throw an NPE.
My approach will be the following:

{code:java}
  long tgtEndTime = now;
  if (!tgt.isDestroyed()) {
// As described in HADOOP-15593 we need to handle the case when
// tgt.getEndTime() throws NPE because of JDK issue JDK-8147772
// NPE is only possible if this issue is not fixed in the JDK
// currently used
try{
  tgtEndTime = tgt.getEndTime().getTime();
} catch (NullPointerException npe) {
  LOG.warn("NPE thrown while getting KerberosTicket endTime. The "
  + "endTime will be set to Time.now()");
}
  }
{code}

This seems to me the most straightforward.

Hi [~xiaochen],
You are right, there's no need to check if {{tgt != null}}. 
The problem with the unit test for this is that KerberosTicket#getEndTime is 
final, so cannot be mocked to throw NPE without using powermock.
The best I can do now is to test the check for the isDestroyed flag.

> UserGroupInformation TGT renewer throws NPE
> ---
>
> Key: HADOOP-15593
> URL: https://issues.apache.org/jira/browse/HADOOP-15593
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Blocker
> Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, 
> HADOOP-15593.003.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for f...@example.com,5,main]
> java.lang.NullPointerException
> at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
> at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
> at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15611) Improve log in FairCallQueue

2018-07-24 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553932#comment-16553932
 ] 

Yiqun Lin edited comment on HADOOP-15611 at 7/24/18 9:14 AM:
-

Revisiting this, I suggest we can add one additional log to print detail decay 
info for each user in   the loop of {{DecayRpcScheduler#decayCurrentCounts}}


was (Author: linyiqun):
LGTM, +1. Commit this shortly, :).

> Improve log in FairCallQueue
> 
>
> Key: HADOOP-15611
> URL: https://issues.apache.org/jira/browse/HADOOP-15611
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Ryan Wu
>Priority: Minor
> Attachments: HADOOP-15611.001.patch, HADOOP-15611.002.patch, 
> HADOOP-15611.003.patch
>
>
> In the usage of the FairCallQueue, we find there missing some Key log. Only a 
> few logs are printed, it makes us hard to learn and debug this feature.
> At least, following places can print more logs.
> * DecayRpcScheduler#decayCurrentCounts
> * WeightedRoundRobinMultiplexer#moveToNextQueue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15611) Improve log in FairCallQueue

2018-07-24 Thread Ryan Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553882#comment-16553882
 ] 

Ryan Wu commented on HADOOP-15611:
--

Hi [~linyiqun], the following output logs were tested in my local.
{noformat}
2018-07-24 14:24:16,431 DEBUG ipc.DecayRpcScheduler 
(DecayRpcScheduler.java:decayCurrentCounts(394)) - Start to decay current 
counts.

2018-07-24 14:24:16,431 DEBUG ipc.DecayRpcScheduler 
(DecayRpcScheduler.java:decayCurrentCounts(415)) - The decayed count for the 
user B is zero and being cleaned.

2018-07-24 14:24:16,431 DEBUG ipc.DecayRpcScheduler 
(DecayRpcScheduler.java:decayCurrentCounts(428)) - After decaying the stored 
counts, totalDecayedCount: 0, totalRawCallCount: 8.
{noformat}
{noformat}
2018-07-24 14:39:35,214 INFO  ipc.WeightedRoundRobinMultiplexer 
(WeightedRoundRobinMultiplexer.java:(78)) - WeightedRoundRobinMultiplexer 
is being used.
2018-07-24 14:39:35,214 DEBUG ipc.WeightedRoundRobinMultiplexer 
(WeightedRoundRobinMultiplexer.java:moveToNextQueue(112)) - Moving to next 
queue from queue index 0 to index 1, number of requests left for current queue: 
2.
2018-07-24 14:39:35,215 DEBUG ipc.WeightedRoundRobinMultiplexer 
(WeightedRoundRobinMultiplexer.java:moveToNextQueue(112)) - Moving to next 
queue from queue index 1 to index 2, number of requests left for current queue: 
1.
2018-07-24 14:39:35,215 DEBUG ipc.WeightedRoundRobinMultiplexer 
(WeightedRoundRobinMultiplexer.java:moveToNextQueue(112)) - Moving to next 
queue from queue index 2 to index 0, number of requests left for current queue: 
4.
{noformat}

> Improve log in FairCallQueue
> 
>
> Key: HADOOP-15611
> URL: https://issues.apache.org/jira/browse/HADOOP-15611
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Ryan Wu
>Priority: Minor
> Attachments: HADOOP-15611.001.patch, HADOOP-15611.002.patch, 
> HADOOP-15611.003.patch
>
>
> In the usage of the FairCallQueue, we find there missing some Key log. Only a 
> few logs are printed, it makes us hard to learn and debug this feature.
> At least, following places can print more logs.
> * DecayRpcScheduler#decayCurrentCounts
> * WeightedRoundRobinMultiplexer#moveToNextQueue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org