[jira] [Updated] (HADOOP-14524) Make CryptoCodec Closeable so it can be cleaned up proactively

2017-06-13 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14524:
---
Status: Patch Available  (was: Open)

> Make CryptoCodec Closeable so it can be cleaned up proactively
> --
>
> Key: HADOOP-14524
> URL: https://issues.apache.org/jira/browse/HADOOP-14524
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-14524.01.patch
>
>
> See HADOOP-14523 for motivation. Credit to [~mi...@cloudera.com] for 
> reporting initially there.
> Basically, the {{CryptoCodec}} class is not a closeable, but the 
> {{OpensslAesCtrCryptoCodec}} implementation of it contains a closeable member 
> (the Random object). Currently it is left for {{finalize()}} to clean up, 
> this depends on when a FGC is run, and would create problems if 
> {{OpensslAesCtrCryptoCodec}} is used with {{OsSecureRandom}}, which could let 
> OS run out of FDs on {{/dev/urandom}} if too many codecs created.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14524) Make CryptoCodec Closeable so it can be cleaned up proactively

2017-06-13 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14524:
---
Attachment: HADOOP-14524.01.patch

> Make CryptoCodec Closeable so it can be cleaned up proactively
> --
>
> Key: HADOOP-14524
> URL: https://issues.apache.org/jira/browse/HADOOP-14524
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-14524.01.patch
>
>
> See HADOOP-14523 for motivation. Credit to [~mi...@cloudera.com] for 
> reporting initially there.
> Basically, the {{CryptoCodec}} class is not a closeable, but the 
> {{OpensslAesCtrCryptoCodec}} implementation of it contains a closeable member 
> (the Random object). Currently it is left for {{finalize()}} to clean up, 
> this depends on when a FGC is run, and would create problems if 
> {{OpensslAesCtrCryptoCodec}} is used with {{OsSecureRandom}}, which could let 
> OS run out of FDs on {{/dev/urandom}} if too many codecs created.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14457) create() does not notify metadataStore of parent directories or ensure they're not existing files

2017-06-13 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048710#comment-16048710
 ] 

Mingliang Liu commented on HADOOP-14457:


Nice discussion here! This week I'll be at the DataWorks Summit 2017 San Jose. 
I'll review later this week or early next week. I don't want to block though, 
so if you guys reach consensus please go commit. Thanks,

> create() does not notify metadataStore of parent directories or ensure 
> they're not existing files
> -
>
> Key: HADOOP-14457
> URL: https://issues.apache.org/jira/browse/HADOOP-14457
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14457-HADOOP-13345.001.patch, 
> HADOOP-14457-HADOOP-13345.002.patch, HADOOP-14457-HADOOP-13345.003.patch, 
> HADOOP-14457-HADOOP-13345.004.patch, HADOOP-14457-HADOOP-13345.005.patch, 
> HADOOP-14457-HADOOP-13345.006.patch, HADOOP-14457-HADOOP-13345.007.patch, 
> HADOOP-14457-HADOOP-13345.008.patch, HADOOP-14457-HADOOP-13345.009.patch
>
>
> Not a great test yet, but it at least reliably demonstrates the issue. 
> LocalMetadataStore will sometimes erroneously report that a directory is 
> empty with isAuthoritative = true when it *definitely* has children the 
> metadatastore should know about. It doesn't appear to happen if the children 
> are just directory. The fact that it's returning an empty listing is 
> concerning, but the fact that it says it's authoritative *might* be a second 
> bug.
> {code}
> diff --git 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
>  
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> index 78b3970..1821d19 100644
> --- 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> +++ 
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> @@ -965,7 +965,7 @@ public boolean hasMetadataStore() {
>}
>  
>@VisibleForTesting
> -  MetadataStore getMetadataStore() {
> +  public MetadataStore getMetadataStore() {
>  return metadataStore;
>}
>  
> diff --git 
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
>  
> b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> index 4339649..881bdc9 100644
> --- 
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> +++ 
> b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> @@ -23,6 +23,11 @@
>  import org.apache.hadoop.fs.contract.AbstractFSContract;
>  import org.apache.hadoop.fs.FileSystem;
>  import org.apache.hadoop.fs.Path;
> +import org.apache.hadoop.fs.s3a.S3AFileSystem;
> +import org.apache.hadoop.fs.s3a.Tristate;
> +import org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata;
> +import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
> +import org.junit.Test;
>  
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.writeDataset;
> @@ -72,4 +77,24 @@ public void testRenameDirIntoExistingDir() throws 
> Throwable {
>  boolean rename = fs.rename(srcDir, destDir);
>  assertFalse("s3a doesn't support rename to non-empty directory", rename);
>}
> +
> +  @Test
> +  public void testMkdirPopulatesFileAncestors() throws Exception {
> +final FileSystem fs = getFileSystem();
> +final MetadataStore ms = ((S3AFileSystem) fs).getMetadataStore();
> +final Path parent = path("testMkdirPopulatesFileAncestors/source");
> +try {
> +  fs.mkdirs(parent);
> +  final Path nestedFile = new Path(parent, "dir1/dir2/dir3/file4");
> +  byte[] srcDataset = dataset(256, 'a', 'z');
> +  writeDataset(fs, nestedFile, srcDataset, srcDataset.length,
> +  1024, false);
> +
> +  DirListingMetadata list = ms.listChildren(parent);
> +  assertTrue("MetadataStore falsely reports authoritative empty list",
> +  list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
> +} finally {
> +  fs.delete(parent, true);
> +}
> +  }
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14524) Make CryptoCodec Closeable so it can be cleaned up proactively

2017-06-13 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14524:
---
Description: 
See HADOOP-14523 for motivation. Credit to [~mi...@cloudera.com] for reporting 
initially there.

Basically, the {{CryptoCodec}} class is not a closeable, but the 
{{OpensslAesCtrCryptoCodec}} implementation of it contains a closeable member 
(the Random object). Currently it is left for {{finalize()}} to clean up, this 
depends on when a FGC is run, and would create problems if 
{{OpensslAesCtrCryptoCodec}} is used with {{OsSecureRandom}}, which could let 
OS run out of FDs on {{/dev/urandom}} if too many codecs created.

  was:
See HADOOP-14523 for motivation. Credit to [~mi...@cloudera.com] for reporting 
initially there.

Basically, the CryptoCodec class is not a closeable, but the 
OpensslAesCtrCryptoCodec implementation of it contains a closeable member (the 
Random object). Currently it is left for {{finalize()}} to clean up, this would 
create problems if OpensslAesCtrCryptoCodec is used with OsSecureRandom, which 
could let OS run out of FDs on {{/dev/urandom}} if too many codecs created.


> Make CryptoCodec Closeable so it can be cleaned up proactively
> --
>
> Key: HADOOP-14524
> URL: https://issues.apache.org/jira/browse/HADOOP-14524
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> See HADOOP-14523 for motivation. Credit to [~mi...@cloudera.com] for 
> reporting initially there.
> Basically, the {{CryptoCodec}} class is not a closeable, but the 
> {{OpensslAesCtrCryptoCodec}} implementation of it contains a closeable member 
> (the Random object). Currently it is left for {{finalize()}} to clean up, 
> this depends on when a FGC is run, and would create problems if 
> {{OpensslAesCtrCryptoCodec}} is used with {{OsSecureRandom}}, which could let 
> OS run out of FDs on {{/dev/urandom}} if too many codecs created.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14521) KMS client needs retry logic

2017-06-13 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048629#comment-16048629
 ] 

Xiao Chen commented on HADOOP-14521:


I feel printing the information in the log message will at least help people 
debug. So please update the log message in the next rev if you agree.

+1 from me once that's addressed. Thanks for the work here [~shahrs87] and 
Daryn.

> KMS client needs retry logic
> 
>
> Key: HADOOP-14521
> URL: https://issues.apache.org/jira/browse/HADOOP-14521
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, 
> HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk-5.patch, 
> HDFS-11804-trunk-6.patch, HDFS-11804-trunk-7.patch, HDFS-11804-trunk.patch
>
>
> The kms client appears to have no retry logic – at all.  It's completely 
> decoupled from the ipc retry logic.  This has major impacts if the KMS is 
> unreachable for any reason, including but not limited to network connection 
> issues, timeouts, the +restart during an upgrade+.
> This has some major ramifications:
> # Jobs may fail to submit, although oozie resubmit logic should mask it
> # Non-oozie launchers may experience higher rates if they do not already have 
> retry logic.
> # Tasks reading EZ files will fail, probably be masked by framework reattempts
> # EZ file creation fails after creating a 0-length file – client receives 
> EDEK in the create response, then fails when decrypting the EDEK
> # Bulk hadoop fs copies, and maybe distcp, will prematurely fail



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14299) Hadoop Renew Thread for proxy users

2017-06-13 Thread Hongyuan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048607#comment-16048607
 ] 

Hongyuan Li edited comment on HADOOP-14299 at 6/14/17 2:07 AM:
---

use {{UserGroupInformation#loginUserFromKeytabAndReturnUGI}} got a 
{{UserGroupInformation}} object, use this object to do actions like 
{{UserGroupInformation.doAs()}},


was (Author: hongyuan li):
use UserGroupInformation#loginUserFromKeytabAndReturnUGI got a Subject object, 
use this object to do actions,

> Hadoop Renew Thread for proxy users
> ---
>
> Key: HADOOP-14299
> URL: https://issues.apache.org/jira/browse/HADOOP-14299
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>
> Currently Hadoop Client has a separate renew thread which is created only for 
> Authentication type Kerberos and not for Proxy. So for proxy users, a yarn 
> client monitoring a long running job will fail after initial ticket lifetime 
> with GSS initiate failed unless there is a manual re-kinit. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1030



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14299) Hadoop Renew Thread for proxy users

2017-06-13 Thread Hongyuan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048607#comment-16048607
 ] 

Hongyuan Li commented on HADOOP-14299:
--

use UserGroupInformation#loginUserFromKeytabAndReturnUGI got a Subject object, 
use this object to do actions,

> Hadoop Renew Thread for proxy users
> ---
>
> Key: HADOOP-14299
> URL: https://issues.apache.org/jira/browse/HADOOP-14299
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>
> Currently Hadoop Client has a separate renew thread which is created only for 
> Authentication type Kerberos and not for Proxy. So for proxy users, a yarn 
> client monitoring a long running job will fail after initial ticket lifetime 
> with GSS initiate failed unless there is a manual re-kinit. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1030



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14521) KMS client needs retry logic

2017-06-13 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048566#comment-16048566
 ] 

Rushabh S Shah edited comment on HADOOP-14521 at 6/14/17 1:08 AM:
--

bq. We can either update the message, or update the maxNumRetries to 
providers.length to make sure we try at least once on all providers. Either way 
feels okay to me.
I wouldn't worry much about the log message not being entirely accurate. 
If the user is making the number of retries less than the number of providers 
then he/she knows the problems associated with it.
{quote}
Slightly prefer the latter since by assumption all providers should work and we 
don't have retry delay for the whole sweep.
{quote}
Actually I don't prefer overriding the config value which user intentionally 
overrode it.
If you strongly feel about updating the log message, then I can change the log 
message in the next revision and will add the numRetries and providers length. 
Let me know you thoughts.
[~xiaochen]: As always thanks a lot for your time reviewing all the updated 
patches.


was (Author: shahrs87):
bq. We can either update the message, or update the maxNumRetries to 
providers.length to make sure we try at least once on all providers. Either way 
feels okay to me.
I wouldn't worry much about the error message not being entirely accurate. 
If the user is making the number of retries less than the number of providers 
then he/she knows the problems associated with it.
{quote}
Slightly prefer the latter since by assumption all providers should work and we 
don't have retry delay for the whole sweep.
{quote}
Actually I don't prefer overriding the config value which user intentionally 
overrode it.
If you strongly feel about updating the log message, then I can change the log 
message in the next revision and will add the numRetries and providers length. 
Let me know you thoughts.
[~xiaochen]: As always thanks a lot for your time reviewing all the updated 
patches.

> KMS client needs retry logic
> 
>
> Key: HADOOP-14521
> URL: https://issues.apache.org/jira/browse/HADOOP-14521
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, 
> HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk-5.patch, 
> HDFS-11804-trunk-6.patch, HDFS-11804-trunk-7.patch, HDFS-11804-trunk.patch
>
>
> The kms client appears to have no retry logic – at all.  It's completely 
> decoupled from the ipc retry logic.  This has major impacts if the KMS is 
> unreachable for any reason, including but not limited to network connection 
> issues, timeouts, the +restart during an upgrade+.
> This has some major ramifications:
> # Jobs may fail to submit, although oozie resubmit logic should mask it
> # Non-oozie launchers may experience higher rates if they do not already have 
> retry logic.
> # Tasks reading EZ files will fail, probably be masked by framework reattempts
> # EZ file creation fails after creating a 0-length file – client receives 
> EDEK in the create response, then fails when decrypting the EDEK
> # Bulk hadoop fs copies, and maybe distcp, will prematurely fail



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14521) KMS client needs retry logic

2017-06-13 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048566#comment-16048566
 ] 

Rushabh S Shah commented on HADOOP-14521:
-

bq. We can either update the message, or update the maxNumRetries to 
providers.length to make sure we try at least once on all providers. Either way 
feels okay to me.
I wouldn't worry much about the error message not being entirely accurate. 
If the user is making the number of retries less than the number of providers 
then he/she knows the problems associated with it.
{quote}
Slightly prefer the latter since by assumption all providers should work and we 
don't have retry delay for the whole sweep.
{quote}
Actually I don't prefer overriding the config value which user intentionally 
overrode it.
If you strongly feel about updating the log message, then I can change the log 
message in the next revision and will add the numRetries and providers length. 
Let me know you thoughts.
[~xiaochen]: As always thanks a lot for your time reviewing all the updated 
patches.

> KMS client needs retry logic
> 
>
> Key: HADOOP-14521
> URL: https://issues.apache.org/jira/browse/HADOOP-14521
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, 
> HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk-5.patch, 
> HDFS-11804-trunk-6.patch, HDFS-11804-trunk-7.patch, HDFS-11804-trunk.patch
>
>
> The kms client appears to have no retry logic – at all.  It's completely 
> decoupled from the ipc retry logic.  This has major impacts if the KMS is 
> unreachable for any reason, including but not limited to network connection 
> issues, timeouts, the +restart during an upgrade+.
> This has some major ramifications:
> # Jobs may fail to submit, although oozie resubmit logic should mask it
> # Non-oozie launchers may experience higher rates if they do not already have 
> retry logic.
> # Tasks reading EZ files will fail, probably be masked by framework reattempts
> # EZ file creation fails after creating a 0-length file – client receives 
> EDEK in the create response, then fails when decrypting the EDEK
> # Bulk hadoop fs copies, and maybe distcp, will prematurely fail



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14450) ADLS Python client inconsistent when used in tandem with AdlFileSystem

2017-06-13 Thread Atul Sikaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048510#comment-16048510
 ] 

Atul Sikaria edited comment on HADOOP-14450 at 6/13/17 11:08 PM:
-

The python tool with fix is released now. Run this to upgrade the command-line 
python tool:
run pip install azure-datalake-store --upgrade




was (Author: asikaria):
The pythin tool with fix is released now. Run this to upgrade the command-line 
python tool:
run pip install azure-datalake-store --upgrade



> ADLS Python client inconsistent when used in tandem with AdlFileSystem
> --
>
> Key: HADOOP-14450
> URL: https://issues.apache.org/jira/browse/HADOOP-14450
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Reporter: Sailesh Mukil
>Assignee: Atul Sikaria
>  Labels: infrastructure
>
> Impala uses the AdlFileSystem connector to talk to ADLS. As a part of the 
> Impala tests, we drop tables and verify that the files belonging to that 
> table have been dropped for all filesystems that Impala supports. These tests 
> however, fail with ADLS.
> If I use the Hadoop ADLS connector to delete a file, and then list the parent 
> directory of that file using the above Python client within the second, the 
> client still says that the file is available in ADLS.
> This is the Python client from Microsoft that we're using in our testing:
> https://github.com/Azure/azure-data-lake-store-python
> Their release notes say that it's still a "pre-release preview":
> https://github.com/Azure/azure-data-lake-store-python/releases
> Questions for the ADLS folks:
> Is this a known issue? If so, will it be fixed soon?
> Or is this expected behavior?
> I'm able to deterministically reproduce it in my tests, with Impala on ADLS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14450) ADLS Python client inconsistent when used in tandem with AdlFileSystem

2017-06-13 Thread Atul Sikaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048510#comment-16048510
 ] 

Atul Sikaria commented on HADOOP-14450:
---

The pythin tool with fix is released now. Run this to upgrade the command-line 
python tool:
run pip install azure-datalake-store --upgrade



> ADLS Python client inconsistent when used in tandem with AdlFileSystem
> --
>
> Key: HADOOP-14450
> URL: https://issues.apache.org/jira/browse/HADOOP-14450
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Reporter: Sailesh Mukil
>Assignee: Atul Sikaria
>  Labels: infrastructure
>
> Impala uses the AdlFileSystem connector to talk to ADLS. As a part of the 
> Impala tests, we drop tables and verify that the files belonging to that 
> table have been dropped for all filesystems that Impala supports. These tests 
> however, fail with ADLS.
> If I use the Hadoop ADLS connector to delete a file, and then list the parent 
> directory of that file using the above Python client within the second, the 
> client still says that the file is available in ADLS.
> This is the Python client from Microsoft that we're using in our testing:
> https://github.com/Azure/azure-data-lake-store-python
> Their release notes say that it's still a "pre-release preview":
> https://github.com/Azure/azure-data-lake-store-python/releases
> Questions for the ADLS folks:
> Is this a known issue? If so, will it be fixed soon?
> Or is this expected behavior?
> I'm able to deterministically reproduce it in my tests, with Impala on ADLS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14523) OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory

2017-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048474#comment-16048474
 ] 

Hadoop QA commented on HADOOP-14523:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
23s{color} | {color:red} hadoop-common-project/hadoop-common in trunk has 19 
extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
58s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HADOOP-14523 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872897/HADOOP-14523.02.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 29a9328959eb 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 036a24b |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12528/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12528/testReport/ |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12528/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory
> -
>
> Key: HADOOP-14523
> URL: https://issues.apache.org/jira/browse/HADOOP-14523
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-14523.01.patch, HADOOP-14523.02.patch
>
>
> I recently analyzed JVM heap dumps from Hive running a big 

[jira] [Commented] (HADOOP-14488) s3guard listStatus fails after renaming file into directory

2017-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048428#comment-16048428
 ] 

Hadoop QA commented on HADOOP-14488:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
11s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
38s{color} | {color:red} hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 1 
new + 6 unchanged - 0 fixed = 7 total (was 6) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
40s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HADOOP-14488 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872896/HADOOP-14488-HADOOP-13345-004.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 3fef873f85aa 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HADOOP-13345 / 6a06ed8 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12527/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12527/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12527/testReport/ |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12527/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> s3guard listStatus fails after renaming file into directory
> ---
>
> Key: HADOOP-14488
> URL: https://issues.apache.org/jira/browse/HADOOP-14488
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>Assignee: Sean Mackrory
>

[jira] [Updated] (HADOOP-14523) OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory

2017-06-13 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HADOOP-14523:

Status: Patch Available  (was: In Progress)

> OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory
> -
>
> Key: HADOOP-14523
> URL: https://issues.apache.org/jira/browse/HADOOP-14523
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-14523.01.patch, HADOOP-14523.02.patch
>
>
> I recently analyzed JVM heap dumps from Hive running a big workload. Two 
> excerpts from the analysis done with jxray (www.jxray.com) are given below. 
> It turns out that nearly a half of live memory is taken by objects awaiting 
> finalization, and the biggest offender among them is class 
> OpensslAesCtrCryptoCodec:
> {code}
>   401,189K (39.7%) (1 of sun.misc.Cleaner)
>  <-- Java Static: sun.misc.Cleaner.first
>   400,572K (39.6%) (14001 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager, java.util.jar.JarFile etc.)
>  <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- 
> sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next} <-- Java Static: 
> sun.misc.Cleaner.first
>   270,673K (26.8%) (2138 of org.apache.hadoop.mapred.JobConf)
>  <-- org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.conf <-- 
> j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next 
> <-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first
> -
>   102,232K (10.1%) (1 of j.l.r.Finalizer)
>  <-- Java Static: java.lang.ref.Finalizer.unfinalized
>   101,676K (10.1%) (8613 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> java.util.zip.ZipFile$ZipFileInflaterInputStream, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager etc.)
>  <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- Java Static: 
> java.lang.ref.Finalizer.unfinalized
> {code}
> This heap dump was taken using 'jmap -dump:live', which forces the JVM to run 
> full GC before dumping the heap. So we are already looking at the heap right 
> after GC, and yet all these unfinalized objects are there. I think this 
> happens because the JVM always runs only one finalization thread, and thus 
> the queue of objects that need finalization may get processed too slowly. My 
> understanding is that finalization works as follows:
> 1. When GC runs, it discovers that object x that overrides finalize() is 
> unreachable.
> 2. x is added to the finalization queue. So technically x is still reachable, 
> it occupies memory, and _all the objects that it references stay in memory as 
> well_.
> 3. The finalization thread processes objects from the finalization queue 
> serially, thus x may stay in memory for long time.
> 4. x.finalize() is invoked, then x is made unreachable. If x stayed in memory 
> for long time, it's now in Old Gen of the heap, so only full GC can clean it 
> up.
> 5. When full GC finally occurs, x gets cleaned up.
> So finalization is formally reliable, but in practice it's quite possible 
> that a lot of unreachable, but unfinalized objects flood the memory. I guess 
> we are seeing all these OpensslAesCtrCryptoCodec objects when they are in 
> phase 3 above. And the really bad thing is that these objects in turn keep in 
> memory a whole lot of other stuff, in particular JobConf objects. Such a 
> JobConf has nothing to do with finalization, yet the GC cannot release it 
> until the corresponding OpensslAesCtrCryptoCodec's is gone.
> Here is OpensslAesCtrCryptoCodec.finalize() method with my comments:
> {code}
> protected void finalize() throws Throwable {
>   try {
> Closeable r = (Closeable) this.random;
> r.close();  // Relevant only when (random instanceof OsSecureRandom == 
> true)
>   } catch (ClassCastException e) {
>   }
>   super.finalize();  // Not needed, no finalize() in superclasses
> }
> {code}
> So, finalize() in this class, that may keep in memory a whole tree of 
> objects, is relevant only when this codec is configured to use OsSecureRandom 
> class. The latter reads random bytes from the configured file, and needs 
> finalization to close the input stream associated with that file.
> The suggested fix is to remove finalize() from OpensslAesCtrCryptoCodec and 
> add it to the only class from this "family" that really needs it, 
> OsSecureRandom. That will ensure that only OsSecureRandom objects (if/when 
> they are used) stay in memory awaiting finalization, and no other, irrelevant 
> objects.
> Note that this solution means that streams are still closed lazily. This, in 
> principle, may cause its own problems. So the most reliable fix would be to 
> call 

[jira] [Updated] (HADOOP-14523) OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory

2017-06-13 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HADOOP-14523:

Attachment: HADOOP-14523.02.patch

Fixed checkstyle. The test that failed on the previous patch looks unrelated.

> OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory
> -
>
> Key: HADOOP-14523
> URL: https://issues.apache.org/jira/browse/HADOOP-14523
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-14523.01.patch, HADOOP-14523.02.patch
>
>
> I recently analyzed JVM heap dumps from Hive running a big workload. Two 
> excerpts from the analysis done with jxray (www.jxray.com) are given below. 
> It turns out that nearly a half of live memory is taken by objects awaiting 
> finalization, and the biggest offender among them is class 
> OpensslAesCtrCryptoCodec:
> {code}
>   401,189K (39.7%) (1 of sun.misc.Cleaner)
>  <-- Java Static: sun.misc.Cleaner.first
>   400,572K (39.6%) (14001 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager, java.util.jar.JarFile etc.)
>  <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- 
> sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next} <-- Java Static: 
> sun.misc.Cleaner.first
>   270,673K (26.8%) (2138 of org.apache.hadoop.mapred.JobConf)
>  <-- org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.conf <-- 
> j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next 
> <-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first
> -
>   102,232K (10.1%) (1 of j.l.r.Finalizer)
>  <-- Java Static: java.lang.ref.Finalizer.unfinalized
>   101,676K (10.1%) (8613 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> java.util.zip.ZipFile$ZipFileInflaterInputStream, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager etc.)
>  <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- Java Static: 
> java.lang.ref.Finalizer.unfinalized
> {code}
> This heap dump was taken using 'jmap -dump:live', which forces the JVM to run 
> full GC before dumping the heap. So we are already looking at the heap right 
> after GC, and yet all these unfinalized objects are there. I think this 
> happens because the JVM always runs only one finalization thread, and thus 
> the queue of objects that need finalization may get processed too slowly. My 
> understanding is that finalization works as follows:
> 1. When GC runs, it discovers that object x that overrides finalize() is 
> unreachable.
> 2. x is added to the finalization queue. So technically x is still reachable, 
> it occupies memory, and _all the objects that it references stay in memory as 
> well_.
> 3. The finalization thread processes objects from the finalization queue 
> serially, thus x may stay in memory for long time.
> 4. x.finalize() is invoked, then x is made unreachable. If x stayed in memory 
> for long time, it's now in Old Gen of the heap, so only full GC can clean it 
> up.
> 5. When full GC finally occurs, x gets cleaned up.
> So finalization is formally reliable, but in practice it's quite possible 
> that a lot of unreachable, but unfinalized objects flood the memory. I guess 
> we are seeing all these OpensslAesCtrCryptoCodec objects when they are in 
> phase 3 above. And the really bad thing is that these objects in turn keep in 
> memory a whole lot of other stuff, in particular JobConf objects. Such a 
> JobConf has nothing to do with finalization, yet the GC cannot release it 
> until the corresponding OpensslAesCtrCryptoCodec's is gone.
> Here is OpensslAesCtrCryptoCodec.finalize() method with my comments:
> {code}
> protected void finalize() throws Throwable {
>   try {
> Closeable r = (Closeable) this.random;
> r.close();  // Relevant only when (random instanceof OsSecureRandom == 
> true)
>   } catch (ClassCastException e) {
>   }
>   super.finalize();  // Not needed, no finalize() in superclasses
> }
> {code}
> So, finalize() in this class, that may keep in memory a whole tree of 
> objects, is relevant only when this codec is configured to use OsSecureRandom 
> class. The latter reads random bytes from the configured file, and needs 
> finalization to close the input stream associated with that file.
> The suggested fix is to remove finalize() from OpensslAesCtrCryptoCodec and 
> add it to the only class from this "family" that really needs it, 
> OsSecureRandom. That will ensure that only OsSecureRandom objects (if/when 
> they are used) stay in memory awaiting finalization, and no other, irrelevant 
> objects.
> Note that this solution means that streams are still closed lazily. This, in 
> principle, may cause its 

[jira] [Updated] (HADOOP-14523) OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory

2017-06-13 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HADOOP-14523:

Status: In Progress  (was: Patch Available)

> OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory
> -
>
> Key: HADOOP-14523
> URL: https://issues.apache.org/jira/browse/HADOOP-14523
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-14523.01.patch, HADOOP-14523.02.patch
>
>
> I recently analyzed JVM heap dumps from Hive running a big workload. Two 
> excerpts from the analysis done with jxray (www.jxray.com) are given below. 
> It turns out that nearly a half of live memory is taken by objects awaiting 
> finalization, and the biggest offender among them is class 
> OpensslAesCtrCryptoCodec:
> {code}
>   401,189K (39.7%) (1 of sun.misc.Cleaner)
>  <-- Java Static: sun.misc.Cleaner.first
>   400,572K (39.6%) (14001 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager, java.util.jar.JarFile etc.)
>  <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- 
> sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next} <-- Java Static: 
> sun.misc.Cleaner.first
>   270,673K (26.8%) (2138 of org.apache.hadoop.mapred.JobConf)
>  <-- org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.conf <-- 
> j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next 
> <-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first
> -
>   102,232K (10.1%) (1 of j.l.r.Finalizer)
>  <-- Java Static: java.lang.ref.Finalizer.unfinalized
>   101,676K (10.1%) (8613 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> java.util.zip.ZipFile$ZipFileInflaterInputStream, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager etc.)
>  <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- Java Static: 
> java.lang.ref.Finalizer.unfinalized
> {code}
> This heap dump was taken using 'jmap -dump:live', which forces the JVM to run 
> full GC before dumping the heap. So we are already looking at the heap right 
> after GC, and yet all these unfinalized objects are there. I think this 
> happens because the JVM always runs only one finalization thread, and thus 
> the queue of objects that need finalization may get processed too slowly. My 
> understanding is that finalization works as follows:
> 1. When GC runs, it discovers that object x that overrides finalize() is 
> unreachable.
> 2. x is added to the finalization queue. So technically x is still reachable, 
> it occupies memory, and _all the objects that it references stay in memory as 
> well_.
> 3. The finalization thread processes objects from the finalization queue 
> serially, thus x may stay in memory for long time.
> 4. x.finalize() is invoked, then x is made unreachable. If x stayed in memory 
> for long time, it's now in Old Gen of the heap, so only full GC can clean it 
> up.
> 5. When full GC finally occurs, x gets cleaned up.
> So finalization is formally reliable, but in practice it's quite possible 
> that a lot of unreachable, but unfinalized objects flood the memory. I guess 
> we are seeing all these OpensslAesCtrCryptoCodec objects when they are in 
> phase 3 above. And the really bad thing is that these objects in turn keep in 
> memory a whole lot of other stuff, in particular JobConf objects. Such a 
> JobConf has nothing to do with finalization, yet the GC cannot release it 
> until the corresponding OpensslAesCtrCryptoCodec's is gone.
> Here is OpensslAesCtrCryptoCodec.finalize() method with my comments:
> {code}
> protected void finalize() throws Throwable {
>   try {
> Closeable r = (Closeable) this.random;
> r.close();  // Relevant only when (random instanceof OsSecureRandom == 
> true)
>   } catch (ClassCastException e) {
>   }
>   super.finalize();  // Not needed, no finalize() in superclasses
> }
> {code}
> So, finalize() in this class, that may keep in memory a whole tree of 
> objects, is relevant only when this codec is configured to use OsSecureRandom 
> class. The latter reads random bytes from the configured file, and needs 
> finalization to close the input stream associated with that file.
> The suggested fix is to remove finalize() from OpensslAesCtrCryptoCodec and 
> add it to the only class from this "family" that really needs it, 
> OsSecureRandom. That will ensure that only OsSecureRandom objects (if/when 
> they are used) stay in memory awaiting finalization, and no other, irrelevant 
> objects.
> Note that this solution means that streams are still closed lazily. This, in 
> principle, may cause its own problems. So the most reliable fix would be to 
> call 

[jira] [Updated] (HADOOP-14488) s3guard listStatus fails after renaming file into directory

2017-06-13 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-14488:
---
Attachment: HADOOP-14488-HADOOP-13345-004.patch

So after digging into making sure this dealt correctly with nested directories, 
I found a few other short-comings in the InconsistentAmazonS3Client as 
currently used.
* There are separate entry points for individual and bulk deletes - when I 
added tests for delete tracking I only started intercepting individual deletes.
* I made the isChild() function usable for both immediate children and all 
other descendants, and corrected the logic for adding prefixes too.
* Added a test that the client returns the same number of object summaries and 
common prefixes before and after (but still during the delay) a delete.

The test is really a meta-test and doesn't belong in this class. It's just 
testing correct delete behavior in the Inconsistent client, so if anywhere I'd 
add it to ITestS3GuardDeleteTracking, which is added in my patches for 
HADOOP-14457. I suspect this will be otherwise ready to commit before that one, 
so once we're at that point I'll plan on switching the order I have the patches 
locally and moving testInconsistentS3ClientDeletes to 
ITestS3GuardDeleteTracking.

> s3guard listStatus fails after renaming file into directory
> ---
>
> Key: HADOOP-14488
> URL: https://issues.apache.org/jira/browse/HADOOP-14488
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-14488-HADOOP-13345-001.patch, 
> HADOOP-14488-HADOOP-13345-002.patch, HADOOP-14488-HADOOP-13345-003.patch, 
> HADOOP-14488-HADOOP-13345-004.patch, output.txt
>
>
> Running scala integration test with inconsistent s3 client & local DDB enabled
> {code}
> fs.rename("work/task-00/part-00", work)
> fs.listStatus(work)
> {code}
> The list status work fails with a message about the childStatus not being a 
> child of the parent. 
> Hypothesis: rename isn't updating the child path entry



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-14488) s3guard listStatus fails after renaming file into directory

2017-06-13 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory reassigned HADOOP-14488:
--

Assignee: Sean Mackrory

> s3guard listStatus fails after renaming file into directory
> ---
>
> Key: HADOOP-14488
> URL: https://issues.apache.org/jira/browse/HADOOP-14488
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>Assignee: Sean Mackrory
>Priority: Blocker
> Attachments: HADOOP-14488-HADOOP-13345-001.patch, 
> HADOOP-14488-HADOOP-13345-002.patch, HADOOP-14488-HADOOP-13345-003.patch, 
> HADOOP-14488-HADOOP-13345-004.patch, output.txt
>
>
> Running scala integration test with inconsistent s3 client & local DDB enabled
> {code}
> fs.rename("work/task-00/part-00", work)
> fs.listStatus(work)
> {code}
> The list status work fails with a message about the childStatus not being a 
> child of the parent. 
> Hypothesis: rename isn't updating the child path entry



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14457) create() does not notify metadataStore of parent directories or ensure they're not existing files

2017-06-13 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048376#comment-16048376
 ] 

Sean Mackrory commented on HADOOP-14457:


Had lot of back-and-forth about this offline with [~fabbri]. Notes are below. 
Would like some input from [~liuml07] / [~ste...@apache.org] about this since 
either solution involves adding to the MetadataStore interface. The gist is 
this:

- I think it's reasonable to only do this if authoritative mode is enabled.
- I'm concerned about code complexity, and having the Dynamo DB implementation 
add all ancestors, and then replicate the same thing under certain conditions 
in the FS only for implementations that don't do this. I suspect future 
implementations are more likely than not to require the tree to be traversable 
from root like DynamoDB does. I'm actually not clear why Local doesn't have 
this same problem, unless it's just that we don't care about doing a full-table 
scan on an in-memory hashmap.
- [~fabbri] is concerned about performance and wants the MetadataStore 
interface to be as simple as possible to make future implementations easy to 
add, so having more of the logic live in the FS when possible is desirable.

There are 2 options we discussed:

- Add putRecursive to the metadata store interface and use that in 
finishedWrite() instead of having it loop up the directory hierarchy itself. 
For Dynamo or similar implementations, it can just wrap put. Mkdir might be 
able to use it, although it already traverses the tree making sure none of the 
parents are a file anyway, similar to what create() should eventually do.

- Go with the change as-is, but wrap the finishedWrite changes in a 
capabilities API so that it only adds ancestors if the parent needs them and 
won't do it itself.

Personally, I prefer the former. It addresses my concerns about adding 
recursive operations at multiple points in the code (I think it's more likely 
than not that any future implementations will have the same need for DynamoDB 
to have the entire tree be traversable from root anyway), the FS doesn't need 
to decide whether or not it should recurse because it can always assume the 
underlying operation will do it the first time it's called, and the one point 
that operation is implemented can take advantage of any store-specific 
optimizations rather than something unaware of the storage layer just making 
repeated calls.

On the other hand, if we feel like we're going to eventually need a 
capabilities API anyway, we may as well add it now and use that to solve this 
problem too.

Any thoughts, [~ste...@apache.org] / [~liuml07]?

Any nuances I didn't call out here, [~fabbri]?

> create() does not notify metadataStore of parent directories or ensure 
> they're not existing files
> -
>
> Key: HADOOP-14457
> URL: https://issues.apache.org/jira/browse/HADOOP-14457
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14457-HADOOP-13345.001.patch, 
> HADOOP-14457-HADOOP-13345.002.patch, HADOOP-14457-HADOOP-13345.003.patch, 
> HADOOP-14457-HADOOP-13345.004.patch, HADOOP-14457-HADOOP-13345.005.patch, 
> HADOOP-14457-HADOOP-13345.006.patch, HADOOP-14457-HADOOP-13345.007.patch, 
> HADOOP-14457-HADOOP-13345.008.patch, HADOOP-14457-HADOOP-13345.009.patch
>
>
> Not a great test yet, but it at least reliably demonstrates the issue. 
> LocalMetadataStore will sometimes erroneously report that a directory is 
> empty with isAuthoritative = true when it *definitely* has children the 
> metadatastore should know about. It doesn't appear to happen if the children 
> are just directory. The fact that it's returning an empty listing is 
> concerning, but the fact that it says it's authoritative *might* be a second 
> bug.
> {code}
> diff --git 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
>  
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> index 78b3970..1821d19 100644
> --- 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> +++ 
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> @@ -965,7 +965,7 @@ public boolean hasMetadataStore() {
>}
>  
>@VisibleForTesting
> -  MetadataStore getMetadataStore() {
> +  public MetadataStore getMetadataStore() {
>  return metadataStore;
>}
>  
> diff --git 
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
>  
> b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> index 4339649..881bdc9 100644
> --- 
> 

[jira] [Comment Edited] (HADOOP-14457) create() does not notify metadataStore of parent directories or ensure they're not existing files

2017-06-13 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048376#comment-16048376
 ] 

Sean Mackrory edited comment on HADOOP-14457 at 6/13/17 8:40 PM:
-

Had lot of back-and-forth about this offline with [~fabbri]. Notes are below. 
Would like some input from [~liuml07] / [~ste...@apache.org] about this since 
either solution involves adding to the MetadataStore interface. The gist is 
this:

- I think it's reasonable to only do this if authoritative mode is enabled.
- I'm concerned about code complexity, and having the Dynamo DB implementation 
add all ancestors, and then replicate the same thing under certain conditions 
in the FS only for implementations that don't do this. I suspect future 
implementations are more likely than not to require the tree to be traversable 
from root like DynamoDB does. I'm actually not clear why Local doesn't have 
this same problem, unless it's just that we don't care about doing a full-table 
scan on an in-memory hashmap.
- [~fabbri] is concerned about performance and wants the MetadataStore 
interface to be as simple as possible to make future implementations easy to 
add, so having more of the logic live in the FS when possible (but only execute 
when necessary) is desirable.

There are 2 options we discussed:

- Add putRecursive to the metadata store interface and use that in 
finishedWrite() instead of having it loop up the directory hierarchy itself. 
For Dynamo or similar implementations, it can just wrap put. Mkdir might be 
able to use it, although it already traverses the tree making sure none of the 
parents are a file anyway, similar to what create() should eventually do.

- Go with the change as-is, but wrap the finishedWrite changes in a 
capabilities API so that it only adds ancestors if the parent needs them and 
won't do it itself.

Personally, I prefer the former. It addresses my concerns about adding 
recursive operations at multiple points in the code (I think it's more likely 
than not that any future implementations will have the same need for DynamoDB 
to have the entire tree be traversable from root anyway), the FS doesn't need 
to decide whether or not it should recurse because it can always assume the 
underlying operation will do it the first time it's called, and the one point 
that operation is implemented can take advantage of any store-specific 
optimizations rather than something unaware of the storage layer just making 
repeated calls.

On the other hand, if we feel like we're going to eventually need a 
capabilities API anyway, we may as well add it now and use that to solve this 
problem too.

Any thoughts, [~ste...@apache.org] / [~liuml07]?

Any nuances I didn't call out here, [~fabbri]?


was (Author: mackrorysd):
Had lot of back-and-forth about this offline with [~fabbri]. Notes are below. 
Would like some input from [~liuml07] / [~ste...@apache.org] about this since 
either solution involves adding to the MetadataStore interface. The gist is 
this:

- I think it's reasonable to only do this if authoritative mode is enabled.
- I'm concerned about code complexity, and having the Dynamo DB implementation 
add all ancestors, and then replicate the same thing under certain conditions 
in the FS only for implementations that don't do this. I suspect future 
implementations are more likely than not to require the tree to be traversable 
from root like DynamoDB does. I'm actually not clear why Local doesn't have 
this same problem, unless it's just that we don't care about doing a full-table 
scan on an in-memory hashmap.
- [~fabbri] is concerned about performance and wants the MetadataStore 
interface to be as simple as possible to make future implementations easy to 
add, so having more of the logic live in the FS when possible is desirable.

There are 2 options we discussed:

- Add putRecursive to the metadata store interface and use that in 
finishedWrite() instead of having it loop up the directory hierarchy itself. 
For Dynamo or similar implementations, it can just wrap put. Mkdir might be 
able to use it, although it already traverses the tree making sure none of the 
parents are a file anyway, similar to what create() should eventually do.

- Go with the change as-is, but wrap the finishedWrite changes in a 
capabilities API so that it only adds ancestors if the parent needs them and 
won't do it itself.

Personally, I prefer the former. It addresses my concerns about adding 
recursive operations at multiple points in the code (I think it's more likely 
than not that any future implementations will have the same need for DynamoDB 
to have the entire tree be traversable from root anyway), the FS doesn't need 
to decide whether or not it should recurse because it can always assume the 
underlying operation will do it the first time it's called, and the one point 
that operation is implemented can 

[jira] [Commented] (HADOOP-14523) OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory

2017-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048311#comment-16048311
 ] 

Hadoop QA commented on HADOOP-14523:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
23s{color} | {color:red} hadoop-common-project/hadoop-common in trunk has 19 
extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m 56s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestRaceWhenRelogin |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HADOOP-14523 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872889/HADOOP-14523.01.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4c57da710495 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8633ef8 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/testReport/ |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> OpensslAesCtrCryptoCodec.finalize() holds excessive amounts 

[jira] [Commented] (HADOOP-14503) Make RollingAverages a mutable metric

2017-06-13 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048269#comment-16048269
 ] 

Hanisha Koneru commented on HADOOP-14503:
-

Thanks for committing the patch, [~arpitagarwal].
I will post a branch-2 patch soon.

> Make RollingAverages a mutable metric
> -
>
> Key: HADOOP-14503
> URL: https://issues.apache.org/jira/browse/HADOOP-14503
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Fix For: 3.0.0-alpha4
>
> Attachments: HADOOP-14503.001.patch, HADOOP-14503.002.patch, 
> HADOOP-14503.003.patch, HADOOP-14503.004.patch, HADOOP-14503.005.patch, 
> HADOOP-14503.006.patch, HADOOP-14503.007.patch
>
>
> RollingAverages metric extends on MutableRatesWithAggregation metric and 
> maintains a group of rolling average metrics. This class should be allowed to 
> register as a metric with the MetricSystem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14523) OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory

2017-06-13 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HADOOP-14523:

Status: Patch Available  (was: Open)

> OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory
> -
>
> Key: HADOOP-14523
> URL: https://issues.apache.org/jira/browse/HADOOP-14523
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-14523.01.patch
>
>
> I recently analyzed JVM heap dumps from Hive running a big workload. Two 
> excerpts from the analysis done with jxray (www.jxray.com) are given below. 
> It turns out that nearly a half of live memory is taken by objects awaiting 
> finalization, and the biggest offender among them is class 
> OpensslAesCtrCryptoCodec:
> {code}
>   401,189K (39.7%) (1 of sun.misc.Cleaner)
>  <-- Java Static: sun.misc.Cleaner.first
>   400,572K (39.6%) (14001 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager, java.util.jar.JarFile etc.)
>  <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- 
> sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next} <-- Java Static: 
> sun.misc.Cleaner.first
>   270,673K (26.8%) (2138 of org.apache.hadoop.mapred.JobConf)
>  <-- org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.conf <-- 
> j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next 
> <-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first
> -
>   102,232K (10.1%) (1 of j.l.r.Finalizer)
>  <-- Java Static: java.lang.ref.Finalizer.unfinalized
>   101,676K (10.1%) (8613 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> java.util.zip.ZipFile$ZipFileInflaterInputStream, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager etc.)
>  <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- Java Static: 
> java.lang.ref.Finalizer.unfinalized
> {code}
> This heap dump was taken using 'jmap -dump:live', which forces the JVM to run 
> full GC before dumping the heap. So we are already looking at the heap right 
> after GC, and yet all these unfinalized objects are there. I think this 
> happens because the JVM always runs only one finalization thread, and thus 
> the queue of objects that need finalization may get processed too slowly. My 
> understanding is that finalization works as follows:
> 1. When GC runs, it discovers that object x that overrides finalize() is 
> unreachable.
> 2. x is added to the finalization queue. So technically x is still reachable, 
> it occupies memory, and _all the objects that it references stay in memory as 
> well_.
> 3. The finalization thread processes objects from the finalization queue 
> serially, thus x may stay in memory for long time.
> 4. x.finalize() is invoked, then x is made unreachable. If x stayed in memory 
> for long time, it's now in Old Gen of the heap, so only full GC can clean it 
> up.
> 5. When full GC finally occurs, x gets cleaned up.
> So finalization is formally reliable, but in practice it's quite possible 
> that a lot of unreachable, but unfinalized objects flood the memory. I guess 
> we are seeing all these OpensslAesCtrCryptoCodec objects when they are in 
> phase 3 above. And the really bad thing is that these objects in turn keep in 
> memory a whole lot of other stuff, in particular JobConf objects. Such a 
> JobConf has nothing to do with finalization, yet the GC cannot release it 
> until the corresponding OpensslAesCtrCryptoCodec's is gone.
> Here is OpensslAesCtrCryptoCodec.finalize() method with my comments:
> {code}
> protected void finalize() throws Throwable {
>   try {
> Closeable r = (Closeable) this.random;
> r.close();  // Relevant only when (random instanceof OsSecureRandom == 
> true)
>   } catch (ClassCastException e) {
>   }
>   super.finalize();  // Not needed, no finalize() in superclasses
> }
> {code}
> So, finalize() in this class, that may keep in memory a whole tree of 
> objects, is relevant only when this codec is configured to use OsSecureRandom 
> class. The latter reads random bytes from the configured file, and needs 
> finalization to close the input stream associated with that file.
> The suggested fix is to remove finalize() from OpensslAesCtrCryptoCodec and 
> add it to the only class from this "family" that really needs it, 
> OsSecureRandom. That will ensure that only OsSecureRandom objects (if/when 
> they are used) stay in memory awaiting finalization, and no other, irrelevant 
> objects.
> Note that this solution means that streams are still closed lazily. This, in 
> principle, may cause its own problems. So the most reliable fix would be to 
> call OsSecureRandom.close() explicitly when 

[jira] [Updated] (HADOOP-14523) OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory

2017-06-13 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HADOOP-14523:

Attachment: HADOOP-14523.01.patch

> OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory
> -
>
> Key: HADOOP-14523
> URL: https://issues.apache.org/jira/browse/HADOOP-14523
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-14523.01.patch
>
>
> I recently analyzed JVM heap dumps from Hive running a big workload. Two 
> excerpts from the analysis done with jxray (www.jxray.com) are given below. 
> It turns out that nearly a half of live memory is taken by objects awaiting 
> finalization, and the biggest offender among them is class 
> OpensslAesCtrCryptoCodec:
> {code}
>   401,189K (39.7%) (1 of sun.misc.Cleaner)
>  <-- Java Static: sun.misc.Cleaner.first
>   400,572K (39.6%) (14001 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager, java.util.jar.JarFile etc.)
>  <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- 
> sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next} <-- Java Static: 
> sun.misc.Cleaner.first
>   270,673K (26.8%) (2138 of org.apache.hadoop.mapred.JobConf)
>  <-- org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.conf <-- 
> j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next 
> <-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first
> -
>   102,232K (10.1%) (1 of j.l.r.Finalizer)
>  <-- Java Static: java.lang.ref.Finalizer.unfinalized
>   101,676K (10.1%) (8613 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> java.util.zip.ZipFile$ZipFileInflaterInputStream, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager etc.)
>  <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- Java Static: 
> java.lang.ref.Finalizer.unfinalized
> {code}
> This heap dump was taken using 'jmap -dump:live', which forces the JVM to run 
> full GC before dumping the heap. So we are already looking at the heap right 
> after GC, and yet all these unfinalized objects are there. I think this 
> happens because the JVM always runs only one finalization thread, and thus 
> the queue of objects that need finalization may get processed too slowly. My 
> understanding is that finalization works as follows:
> 1. When GC runs, it discovers that object x that overrides finalize() is 
> unreachable.
> 2. x is added to the finalization queue. So technically x is still reachable, 
> it occupies memory, and _all the objects that it references stay in memory as 
> well_.
> 3. The finalization thread processes objects from the finalization queue 
> serially, thus x may stay in memory for long time.
> 4. x.finalize() is invoked, then x is made unreachable. If x stayed in memory 
> for long time, it's now in Old Gen of the heap, so only full GC can clean it 
> up.
> 5. When full GC finally occurs, x gets cleaned up.
> So finalization is formally reliable, but in practice it's quite possible 
> that a lot of unreachable, but unfinalized objects flood the memory. I guess 
> we are seeing all these OpensslAesCtrCryptoCodec objects when they are in 
> phase 3 above. And the really bad thing is that these objects in turn keep in 
> memory a whole lot of other stuff, in particular JobConf objects. Such a 
> JobConf has nothing to do with finalization, yet the GC cannot release it 
> until the corresponding OpensslAesCtrCryptoCodec's is gone.
> Here is OpensslAesCtrCryptoCodec.finalize() method with my comments:
> {code}
> protected void finalize() throws Throwable {
>   try {
> Closeable r = (Closeable) this.random;
> r.close();  // Relevant only when (random instanceof OsSecureRandom == 
> true)
>   } catch (ClassCastException e) {
>   }
>   super.finalize();  // Not needed, no finalize() in superclasses
> }
> {code}
> So, finalize() in this class, that may keep in memory a whole tree of 
> objects, is relevant only when this codec is configured to use OsSecureRandom 
> class. The latter reads random bytes from the configured file, and needs 
> finalization to close the input stream associated with that file.
> The suggested fix is to remove finalize() from OpensslAesCtrCryptoCodec and 
> add it to the only class from this "family" that really needs it, 
> OsSecureRandom. That will ensure that only OsSecureRandom objects (if/when 
> they are used) stay in memory awaiting finalization, and no other, irrelevant 
> objects.
> Note that this solution means that streams are still closed lazily. This, in 
> principle, may cause its own problems. So the most reliable fix would be to 
> call OsSecureRandom.close() explicitly when it's 

[jira] [Updated] (HADOOP-14525) org.apache.hadoop.io.Text Truncate

2017-06-13 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HADOOP-14525:
-
Description: 
For Apache Hive, VARCHAR fields are much slower than STRING fields when a 
precision (string length cap) is included.  Keep in mind that this precision is 
the number of UTF-8 characters in the string, not the number of bytes.

The general procedure is:

# Load an entire byte buffer into a {{Text}} object
# Convert it to a {{String}}
# Count N number of character code points
# Substring the {{String}} at the correct place
# Convert the String back into a byte array and populate the {{Text}} object

It would be great if the {{Text}} object could offer a truncate/substring 
method based on character count that did not require copying data around.  
Along the same lines, a "getCharacterLength()" method may also be useful to 
determine if the precision has been exceeded.

  was:
For Apache Hive, VARCHAR fields are much slower than STRING fields when a 
precision (string length cap) is included.  Keep in mind that this precision is 
the number of UTF-8 characters in the string, not the number of bytes.

The general procedure is:

# Load an entire byte buffer into a {{Text}} object
# Convert it to a {{String}}
# Count N number of character code points
# Substring the {{String}} at the correct place
# Convert the String back into a byte array and populate the {{Text}} object

It would be great if the {{Text}} object could offer a truncate/substring 
method based on character count that did not require copying data around


> org.apache.hadoop.io.Text Truncate
> --
>
> Key: HADOOP-14525
> URL: https://issues.apache.org/jira/browse/HADOOP-14525
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 2.8.1
>Reporter: BELUGA BEHR
>
> For Apache Hive, VARCHAR fields are much slower than STRING fields when a 
> precision (string length cap) is included.  Keep in mind that this precision 
> is the number of UTF-8 characters in the string, not the number of bytes.
> The general procedure is:
> # Load an entire byte buffer into a {{Text}} object
> # Convert it to a {{String}}
> # Count N number of character code points
> # Substring the {{String}} at the correct place
> # Convert the String back into a byte array and populate the {{Text}} object
> It would be great if the {{Text}} object could offer a truncate/substring 
> method based on character count that did not require copying data around.  
> Along the same lines, a "getCharacterLength()" method may also be useful to 
> determine if the precision has been exceeded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14525) org.apache.hadoop.io.Text Truncate

2017-06-13 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HADOOP-14525:


 Summary: org.apache.hadoop.io.Text Truncate
 Key: HADOOP-14525
 URL: https://issues.apache.org/jira/browse/HADOOP-14525
 Project: Hadoop Common
  Issue Type: Improvement
  Components: io
Affects Versions: 2.8.1
Reporter: BELUGA BEHR


For Apache Hive, VARCHAR fields are much slower than STRING fields when a 
precision (string length cap) is included.  Keep in mind that this precision is 
the number of UTF-8 characters in the string, not the number of bytes.

The general procedure is:

# Load an entire byte buffer into a {{Text}} object
# Convert it to a {{String}}
# Count N number of character code points
# Substring the {{String}} at the correct place
# Convert the String back into a byte array and populate the {{Text}} object

It would be great if the {{Text}} object could offer a truncate/substring 
method based on character count that did not require copying data around



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14299) Hadoop Renew Thread for proxy users

2017-06-13 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048083#comment-16048083
 ] 

Wei-Chiu Chuang commented on HADOOP-14299:
--

I am not familiar with this subject so bear with me.
What is the use case here? If the user impersonating the proxy user renew its 
own credential, wouldn't it work?

> Hadoop Renew Thread for proxy users
> ---
>
> Key: HADOOP-14299
> URL: https://issues.apache.org/jira/browse/HADOOP-14299
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>
> Currently Hadoop Client has a separate renew thread which is created only for 
> Authentication type Kerberos and not for Proxy. So for proxy users, a yarn 
> client monitoring a long running job will fail after initial ticket lifetime 
> with GSS initiate failed unless there is a manual re-kinit. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1030



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14299) Hadoop Renew Thread for proxy users

2017-06-13 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HADOOP-14299:
-
Issue Type: Improvement  (was: Bug)

> Hadoop Renew Thread for proxy users
> ---
>
> Key: HADOOP-14299
> URL: https://issues.apache.org/jira/browse/HADOOP-14299
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>
> Currently Hadoop Client has a separate renew thread which is created only for 
> Authentication type Kerberos and not for Proxy. So for proxy users, a yarn 
> client monitoring a long running job will fail after initial ticket lifetime 
> with GSS initiate failed unless there is a manual re-kinit. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1030



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Closed] (HADOOP-14513) A little performance improvement of HarFileSystem

2017-06-13 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash closed HADOOP-14513.
-

> A little performance improvement of HarFileSystem
> -
>
> Key: HADOOP-14513
> URL: https://issues.apache.org/jira/browse/HADOOP-14513
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha3
>Reporter: hu xiaodong
>Assignee: hu xiaodong
>Priority: Trivial
> Attachments: HADOOP-14513.001.patch
>
>
> In the Java source of HarFileSystem.java:
> {code:title=HarFileSystem.java|borderStyle=solid}
> ...
> ...
> private Path archivePath(Path p) {
> Path retPath = null;
> Path tmp = p;
> 
> // I think p.depth() need not be loop many times, depth() is a complex 
> calculation
> for (int i=0; i< p.depth(); i++) {
>   if (tmp.toString().endsWith(".har")) {
> retPath = tmp;
> break;
>   }
>   tmp = tmp.getParent();
> }
> return retPath;
>   }
> ...
> ...
> {code}
>  
> I think the fellow is more suitable:
> {code:title=HarFileSystem.java|borderStyle=solid}
> ...
> ...
> private Path archivePath(Path p) {
> Path retPath = null;
> Path tmp = p;
> 
> // just loop once
> for (int i=0,depth=p.depth(); i< depth; i++) {
>   if (tmp.toString().endsWith(".har")) {
> retPath = tmp;
> break;
>   }
>   tmp = tmp.getParent();
> }
> return retPath;
>   }
> ...
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14513) A little performance improvement of HarFileSystem

2017-06-13 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash resolved HADOOP-14513.
---
Resolution: Not A Problem

> A little performance improvement of HarFileSystem
> -
>
> Key: HADOOP-14513
> URL: https://issues.apache.org/jira/browse/HADOOP-14513
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha3
>Reporter: hu xiaodong
>Assignee: hu xiaodong
>Priority: Trivial
> Attachments: HADOOP-14513.001.patch
>
>
> In the Java source of HarFileSystem.java:
> {code:title=HarFileSystem.java|borderStyle=solid}
> ...
> ...
> private Path archivePath(Path p) {
> Path retPath = null;
> Path tmp = p;
> 
> // I think p.depth() need not be loop many times, depth() is a complex 
> calculation
> for (int i=0; i< p.depth(); i++) {
>   if (tmp.toString().endsWith(".har")) {
> retPath = tmp;
> break;
>   }
>   tmp = tmp.getParent();
> }
> return retPath;
>   }
> ...
> ...
> {code}
>  
> I think the fellow is more suitable:
> {code:title=HarFileSystem.java|borderStyle=solid}
> ...
> ...
> private Path archivePath(Path p) {
> Path retPath = null;
> Path tmp = p;
> 
> // just loop once
> for (int i=0,depth=p.depth(); i< depth; i++) {
>   if (tmp.toString().endsWith(".har")) {
> retPath = tmp;
> break;
>   }
>   tmp = tmp.getParent();
> }
> return retPath;
>   }
> ...
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14299) Hadoop Renew Thread for proxy users

2017-06-13 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047486#comment-16047486
 ] 

Prabhu Joseph commented on HADOOP-14299:


[~Hongyuan Li] Looks the Hadoop Client won't have access to keytab / principal 
of proxy user to renew. Need someone to review and work on this Jira if it is a 
valid and feasible request.

> Hadoop Renew Thread for proxy users
> ---
>
> Key: HADOOP-14299
> URL: https://issues.apache.org/jira/browse/HADOOP-14299
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>
> Currently Hadoop Client has a separate renew thread which is created only for 
> Authentication type Kerberos and not for Proxy. So for proxy users, a yarn 
> client monitoring a long running job will fail after initial ticket lifetime 
> with GSS initiate failed unless there is a manual re-kinit. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1030



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org