[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2018-04-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450978#comment-16450978
 ] 

Hudson commented on HADOOP-14688:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14057/])
HADOOP-14688. Intern strings in KeyVersion and EncryptedKeyVersion. (xyao: rev 
89ec91cb004ff36b3b7f327167cb3b45b8baadd2)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProviderCryptoExtension.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProvider.java


> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: GC root of the String.png, HADOOP-14688.01.patch, 
> heapdump analysis.png, jxray.report
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2017-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157692#comment-16157692
 ] 

Hudson commented on HADOOP-14688:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12811 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12811/])
HADOOP-14688. Intern strings in KeyVersion and EncryptedKeyVersion. (weichiu: 
rev ad32759fd9f33e7bd18758ad1a5a464dab3bcbd9)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProviderCryptoExtension.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProvider.java


> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: GC root of the String.png, HADOOP-14688.01.patch, 
> heapdump analysis.png, jxray.report
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2017-09-06 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156000#comment-16156000
 ] 

Xiao Chen commented on HADOOP-14688:


Thanks Wei-Chiu for review and commit. Also thanks Daryn and Misha for 
commenting.

> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: GC root of the String.png, HADOOP-14688.01.patch, 
> heapdump analysis.png, jxray.report
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2017-09-05 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154008#comment-16154008
 ] 

Wei-Chiu Chuang commented on HADOOP-14688:
--

+1. will commit today.

> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: GC root of the String.png, HADOOP-14688.01.patch, 
> heapdump analysis.png, jxray.report
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2017-09-01 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150085#comment-16150085
 ] 

Wei-Chiu Chuang commented on HADOOP-14688:
--

Thanks [~mi...@cloudera.com] for your insightful comment, and thanks 
[~xiaochen] for the performance test.

I think the performance optimization makes sense to you especially in the 
context of key-rotation. Hi [~daryn], how do you think? I would like to cast my 
+1 if there are no objections.

> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: GC root of the String.png, HADOOP-14688.01.patch, 
> heapdump analysis.png, jxray.report
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2017-08-23 Thread Misha Dmitriev (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139371#comment-16139371
 ] 

Misha Dmitriev commented on HADOOP-14688:
-

[~daryn]: when a live heap dump is captured, as done here, a full GC is 
performed before a heap snapshot is taken. So if the given application produces 
objects that are very short-lived, i.e. quickly become garbage, then we will 
only see those of them that are live at the moment, which is typically not 
much. Conversely, most objects in a live heap dump tend to be relatively 
long-lived.

Furthermore, experience has shown that for reasonably long-lived strings, the 
CPU overhead of interning is small compared to the reduction in the memory 
pressure, reduced GC pauses, etc. That is, the cost of a fast internal 
String.intern() call is comparable to the cost of GC scanning and moving around 
all the extra copies of a string that remain in memory without interning.

> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: GC root of the String.png, HADOOP-14688.01.patch, 
> heapdump analysis.png, jxray.report
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2017-08-16 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129144#comment-16129144
 ] 

Xiao Chen commented on HADOOP-14688:


Bumping up on this...

I understand the concern of the added overhead for a normal operation like 
getFileEncryptionInfo. But from internal runs I did not see this interning 
causing any visible impact on NN throughput.

On the other hand, heap is pretty ugly without this one during re-encryption. 
Attaching a report ran from [jxray|http://www.jxray.com/]. The most related 
section is:
{quote}
7. DUPLICATE STRINGS

Total strings: 2,570,432  Unique strings: 1,033,993  Duplicate values: 3,559  
Overhead: 170,572K (8.4%)

Top duplicate strings:
Ovhd Num char[]s   Num objs   Value

103,775K (5.1%)   830205  830205  
"mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf"
 23,042K (1.1%)   184337  184337  
"0OFmjElLqXgtjvWKkgfRoLpUj92dHrEaQCPeh3VDh8V"
  8,668K (0.4%)   184937  184937  "EEK"
  2,853K (0.1%)12176   12176  "POST 
/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek?eek_op=reencrypt
 HTTP/1.1"
  2,473K (0.1%)12177   12177  
"/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek?eek_op=reencrypt"
  2,298K (0.1%)13374   13374  
"/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek"

{quote}

> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: GC root of the String.png, HADOOP-14688.01.patch, 
> heapdump analysis.png, jxray.report
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2017-07-27 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103609#comment-16103609
 ] 

Xiao Chen commented on HADOOP-14688:


The heapdumps are too big to attach here, so I uploaded a screenshot of the 
most relevant analysis result out of it.

The 2 most duplicated strings (mG... and 0O...) are the 2 key version names. I 
was running re-encryption on a zone with 1M files. 2 different key versions 
were among those files in this run.

Verified after interning, this goes away.

[~daryn], do you think this makes sense? Thanks!

> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-14688.01.patch, heapdump analysis.png
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2017-07-26 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102240#comment-16102240
 ] 

Xiao Chen commented on HADOOP-14688:


Thanks again [~daryn] for the review series! :)

True the edeks are stored in a xattr. During re-encryption though, the EDEK 
object is constructed, and sent to KMS, where a new EDEK is returned.

What's tricky here is, contacting KMS requires to be done outside of the lock. 
Therefore, the EDEK object has to exist for that time being. And since we're 
trying to re-encrypt many EDEKs per batch, there're many on-the-fly EDEK 
objects. Relative code in {{ReencryptionHandler$EDEKReencryptCallable#call}} of 
HDFS-10899.

To make things worse, since KMS is proven to be the bottleneck of this, we'd 
like to multi thread the 'contact KMS' part, which means more on-the-fly 
EDEKs (see multi-threading part of HDFS-10899's 
[doc|https://issues.apache.org/jira/secure/attachment/12874358/Re-encrypt%20edek%20design%20doc%20V2.pdf])

> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-14688.01.patch
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2017-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102198#comment-16102198
 ] 

Hadoop QA commented on HADOOP-14688:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 29s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestKDiag |
|   | hadoop.net.TestDNS |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HADOOP-14688 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12879026/HADOOP-14688.01.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0665569c2ac5 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 27a1a5f |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12863/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12863/testReport/ |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12863/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  

[jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion

2017-07-26 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102183#comment-16102183
 ] 

Daryn Sharp commented on HADOOP-14688:
--

Where does the interning play a meaningful part in the process of the related 
jira?  Haven't dug into the code, but isn't this information typically encoded 
in a xattr and only transiently exists during a namesystem operation?  If yes, 
the overhead of interning is likely unworth it.  Ie. you already have a unique 
string, why bother with effectively a hash lookup to sub it with another string 
if the unique instance is being gc'ed soon anyway.

> Intern strings in KeyVersion and EncryptedKeyVersion
> 
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-14688.01.patch
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org