[jira] [Created] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-25 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-9304:
---

 Summary: Add HdfsClientConfigKeys class to 
TestHdfsConfigFields#configurationClasses
 Key: HDFS-9304
 URL: https://issues.apache.org/jira/browse/HDFS-9304
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Mingliang Liu
Assignee: Mingliang Liu


*tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
to add this class to {{TestHdfsConfigFields#configurationClasses}}.

Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
still contains all the client side config keys, though marked @deprecated. As 
we add new client config keys (e.g. [HDFS-9259]), the unit test will fail with 
the following error:
{quote}
hdfs-default.xml has 1 properties missing in  class 
org.apache.hadoop.hdfs.DFSConfigKeys
{quote}

If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
together cover all config keys in {{hdfs-default.xml}}, we need to put both of 
them in {{TestHdfsConfigFields#configurationClasses}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario

2015-10-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973065#comment-14973065
 ] 

Hadoop QA commented on HDFS-9259:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  22m 42s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 15s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 55s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 56s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m  4s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 34s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  67m  2s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 32s | Tests passed in 
hadoop-hdfs-client. |
| | | 124m 56s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.TestRecoverStripedFile |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768573/HDFS-9259.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 446212a |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13184/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13184/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13184/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13184/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13184/console |


This message was automatically generated.

> Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
> --
>
> Key: HDFS-9259
> URL: https://issues.apache.org/jira/browse/HDFS-9259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Mingliang Liu
> Attachments: HDFS-9259.000.patch
>
>
> We recently found that cross-DC hdfs write could be really slow. Further 
> investigation identified that is due to SendBufferSize and ReceiveBufferSize 
> used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file 
> across DC with different SendBufferSize and ReceiveBufferSize values. The 
> results showed that c much faster than b; b is faster than a.
> a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting).
> b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning).
> c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both)
> HDFS-8829 has enabled scenario b. We would like to enable scenario c by 
> making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He 
> Tianyi] [~kanaka] [~vinayrpet].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate

2015-10-25 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu reopened HDFS-9293:
--

> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 
> 'rpcId',which may cause standby NN too busy  to communicate 
> --
>
> Key: HDFS-9293
> URL: https://issues.apache.org/jira/browse/HDFS-9293
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: 邓飞
>Assignee: 邓飞
> Fix For: 2.7.1
>
>
>   In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
> slowly,and hold the fsnamesystem writelock during the work and the DN's 
> heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
> which can't send heartbeat  because blocking at process Standby NN Regiest 
> common(FIXED at 2.7.1).
>   Below is the standby NN  stack:
> "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
> [0x7f0dd1d76000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.PriorityQueue.remove(PriorityQueue.java:360)
>   at 
> org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
>   at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
>   - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>
> When apply editLogOp,if the IPC retryCache is found,need  to remove the 
> previous from priorityQueue(O(N)), The updateblock is don't  need record 
> rpcId on editlog except  'client request updatePipeline',but we found many 
> 'UpdateBlocksOp' has repeat ipcId.
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate

2015-10-25 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu resolved HDFS-9293.
--
Resolution: Duplicate

> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 
> 'rpcId',which may cause standby NN too busy  to communicate 
> --
>
> Key: HDFS-9293
> URL: https://issues.apache.org/jira/browse/HDFS-9293
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: 邓飞
>Assignee: 邓飞
> Fix For: 2.7.1
>
>
>   In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
> slowly,and hold the fsnamesystem writelock during the work and the DN's 
> heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
> which can't send heartbeat  because blocking at process Standby NN Regiest 
> common(FIXED at 2.7.1).
>   Below is the standby NN  stack:
> "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
> [0x7f0dd1d76000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.PriorityQueue.remove(PriorityQueue.java:360)
>   at 
> org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
>   at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
>   - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>
> When apply editLogOp,if the IPC retryCache is found,need  to remove the 
> previous from priorityQueue(O(N)), The updateblock is don't  need record 
> rpcId on editlog except  'client request updatePipeline',but we found many 
> 'UpdateBlocksOp' has repeat ipcId.
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9304:

Status: Patch Available  (was: Open)

> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario

2015-10-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973448#comment-14973448
 ] 

Mingliang Liu commented on HDFS-9259:
-

The failing test {{TestHdfsConfigFields}} is tracked by [HDFS-9304].

> Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
> --
>
> Key: HDFS-9259
> URL: https://issues.apache.org/jira/browse/HDFS-9259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Mingliang Liu
> Attachments: HDFS-9259.000.patch
>
>
> We recently found that cross-DC hdfs write could be really slow. Further 
> investigation identified that is due to SendBufferSize and ReceiveBufferSize 
> used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file 
> across DC with different SendBufferSize and ReceiveBufferSize values. The 
> results showed that c much faster than b; b is faster than a.
> a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting).
> b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning).
> c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both)
> HDFS-8829 has enabled scenario b. We would like to enable scenario c by 
> making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He 
> Tianyi] [~kanaka] [~vinayrpet].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9229) Expose size of NameNode directory as a metric

2015-10-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973446#comment-14973446
 ] 

Hadoop QA commented on HDFS-9229:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  24m 50s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   8m 34s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |  11m  5s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m 36s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 44s | The applied patch generated  2 
new checkstyle issues (total was 490, now 491). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 44s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  12m 54s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  62m 53s | Tests failed in hadoop-hdfs. |
| | | 133m 57s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.util.TestByteArrayManager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768621/HDFS-9229.004.patch |
| Optional Tests | site javadoc javac unit findbugs checkstyle |
| git revision | trunk / ab8eb87 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13186/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13186/artifact/patchprocess/diffJavacWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13186/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13186/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13186/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13186/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13186/console |


This message was automatically generated.

> Expose size of NameNode directory as a metric
> -
>
> Key: HDFS-9229
> URL: https://issues.apache.org/jira/browse/HDFS-9229
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Surendra Singh Lilhore
>Priority: Minor
> Attachments: HDFS-9229.001.patch, HDFS-9229.002.patch, 
> HDFS-9229.003.patch, HDFS-9229.004.patch
>
>
> Useful for admins in reserving / managing NN local file system space. Also 
> useful when transferring NN backups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9229) Expose size of NameNode directory as a metric

2015-10-25 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973698#comment-14973698
 ] 

Rakesh R commented on HDFS-9229:


Thank you [~wheat9], [~jingzhao], [~zhz] for the useful discussions.

Thank you [~surendrasingh] for taking care this. I've few comments:
# Do we need to update size inside lock, how about moving it outside?
{code}
+//Update NameDirSize Metric
+namesystem.getFSImage().getStorage().updateNameDirSize();
{code}
# I could see name dir will be updated during FSImage#saveNamespace call 
[FSImage.java#L1061|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L1061],
 probably we could cover this logic also, right?
# In tests, can we try avoids constant sleeping. One alternate approach I can 
think is to call #rollEditLog() function explicitly. For example, 
HATestUtil#waitForStandbyToCatchUp()
{code}
+  Thread.sleep(3*1000);
+  checkNNDirSize(cluster.getNameDirs(0), nn0.getNameDirSize());
+  checkNNDirSize(cluster.getNameDirs(1), nn1.getNameDirSize());
{code}
# There is one minor checkstyle warning, please fix it.
{code}
NNStorage.java:1090:23: 'nnDirSizeMap' hides a field.
{code}

> Expose size of NameNode directory as a metric
> -
>
> Key: HDFS-9229
> URL: https://issues.apache.org/jira/browse/HDFS-9229
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Surendra Singh Lilhore
>Priority: Minor
> Attachments: HDFS-9229.001.patch, HDFS-9229.002.patch, 
> HDFS-9229.003.patch, HDFS-9229.004.patch
>
>
> Useful for admins in reserving / managing NN local file system space. Also 
> useful when transferring NN backups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973479#comment-14973479
 ] 

Hadoop QA commented on HDFS-9304:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   8m 28s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 31s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 37s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   1m 10s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  66m  3s | Tests failed in hadoop-hdfs. |
| | |  90m 53s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.TestReplaceDatanodeOnFailure |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768625/HDFS-9304.000.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / ab8eb87 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13187/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13187/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13187/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13187/console |


This message was automatically generated.

> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9305) Delayed heartbeat processing causes storm of subsequent heartbeats

2015-10-25 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9305:

Reporter: Chris Nauroth  (was: Arpit Agarwal)

> Delayed heartbeat processing causes storm of subsequent heartbeats
> --
>
> Key: HDFS-9305
> URL: https://issues.apache.org/jira/browse/HDFS-9305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Arpit Agarwal
>
> A DataNode typically sends a heartbeat to the NameNode every 3 seconds.  We 
> expect heartbeat handling to complete relatively quickly.  However, if 
> something unexpected causes heartbeat processing to get blocked, such as a 
> long GC or heavy lock contention within the NameNode, then heartbeat 
> processing would be delayed.  After recovering from this delay, the DataNode 
> then starts sending a storm of heartbeat messages in a tight loop.  In a 
> large cluster with many DataNodes, this storm of heartbeat messages could 
> cause harmful load on the NameNode and make overall cluster recovery more 
> difficult.
> The bug appears to be caused by incorrect timekeeping inside 
> {{BPServiceActor}}.  The next heartbeat time is always calculated as a delta 
> from the previous heartbeat time, without any compensation for possible long 
> latency on an individual heartbeat RPC.  The only mitigation would be 
> restarting all DataNodes to force a reset of the heartbeat schedule, or 
> simply wait out the storm until the scheduling catches up and corrects itself.
> This problem would not manifest after a NameNode restart.  In that case, the 
> NameNode would respond to the first heartbeat by telling the DataNode to 
> re-register, and {{BPServiceActor#reRegister}} would reset the heartbeat 
> schedule to the current time.  I believe the problem would only manifest if 
> the NameNode process kept alive, but processed heartbeats unexpectedly slowly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9305) Delayed heartbeat processing causes storm of subsequent heartbeats

2015-10-25 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-9305:
---

 Summary: Delayed heartbeat processing causes storm of subsequent 
heartbeats
 Key: HDFS-9305
 URL: https://issues.apache.org/jira/browse/HDFS-9305
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.1
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


A DataNode typically sends a heartbeat to the NameNode every 3 seconds.  We 
expect heartbeat handling to complete relatively quickly.  However, if 
something unexpected causes heartbeat processing to get blocked, such as a long 
GC or heavy lock contention within the NameNode, then heartbeat processing 
would be delayed.  After recovering from this delay, the DataNode then starts 
sending a storm of heartbeat messages in a tight loop.  In a large cluster with 
many DataNodes, this storm of heartbeat messages could cause harmful load on 
the NameNode and make overall cluster recovery more difficult.

The bug appears to be caused by incorrect timekeeping inside 
{{BPServiceActor}}.  The next heartbeat time is always calculated as a delta 
from the previous heartbeat time, without any compensation for possible long 
latency on an individual heartbeat RPC.  The only mitigation would be 
restarting all DataNodes to force a reset of the heartbeat schedule, or simply 
wait out the storm until the scheduling catches up and corrects itself.

This problem would not manifest after a NameNode restart.  In that case, the 
NameNode would respond to the first heartbeat by telling the DataNode to 
re-register, and {{BPServiceActor#reRegister}} would reset the heartbeat 
schedule to the current time.  I believe the problem would only manifest if the 
NameNode process kept alive, but processed heartbeats unexpectedly slowly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-25 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973758#comment-14973758
 ] 

Yi Liu commented on HDFS-9276:
--

{quote}
To reproduce the bug, please set the following configuration to Name Node:
dfs.namenode.delegation.token.max-lifetime = 10min
dfs.namenode.delegation.key.update-interval = 3min
dfs.namenode.delegation.token.renew-interval = 3min
The bug will occure after 3 minutes.
{quote}

Your test code can't say anything,  the error msg of "token 
(HDFS_DELEGATION_TOKEN token 330156 for test) is expired" is because you set 
"dfs.namenode.delegation.token.renew-interval" to 3 min but you don't let 
{{test}} user to renew the token. 

I see what you want to do now.  Actually hadoop code is enough to let you do 
what you want to do.  If a user client get a new delegation token, and your 
long running application can accept it, you can update the credentials of 
user's UGI on the server through {{UserGroupInformation#addCredentials}}, it 
will overwrite old tokens by default, of course you should make the service 
name of token is the same if you want to overwrite it.

It's not a bug.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown 

[jira] [Updated] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9304:

Attachment: HDFS-9304.000.patch

> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-25 Thread Liangliang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973546#comment-14973546
 ] 

Liangliang Gu commented on HDFS-9276:
-

In my test code, a new UGI is created and the hdfs delegation token is got from 
the new UGI, so a new hdfs delegation token will be returned.
You can reproduce the bug according to the test code I provided.

I find this bug when running spark streaming in yarn-cluster mode using 
"--principal --keytab" argument (more then 7 days). 
This jira https://issues.apache.org/jira/browse/SPARK-5342 shows how spark get 
new hdfs delegation token and update to the current ugi.

This jira https://issues.apache.org/jira/browse/SPARK-8688 fixes the bug when 
updating the token to the current ugi.
But it only fixes the bug in the application master. This bug will also occure 
in the executor, when the executor updates the token.

My patch wants to fixed the bug in hadoop, so spark does not need to do some 
workaround.



> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at 

[jira] [Comment Edited] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-25 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973758#comment-14973758
 ] 

Yi Liu edited comment on HDFS-9276 at 10/26/15 5:39 AM:


{quote}
To reproduce the bug, please set the following configuration to Name Node:
dfs.namenode.delegation.token.max-lifetime = 10min
dfs.namenode.delegation.key.update-interval = 3min
dfs.namenode.delegation.token.renew-interval = 3min
The bug will occure after 3 minutes.
{quote}

Your test code can't say anything,  the error msg of "token 
(HDFS_DELEGATION_TOKEN token 330156 for test) is expired" is because you set 
"dfs.namenode.delegation.token.renew-interval" to 3 min but you don't let 
{{test}} user to renew the token. 

I see what you want to do now, if the same with the later case as I commented 
above.  Actually hadoop code is enough to let you do what you want to do.  If a 
user client get a new delegation token, and your long running application can 
accept it, you can update the credentials of user's UGI on the server through 
{{UserGroupInformation#addCredentials}}, it will overwrite old tokens by 
default, of course you should make the service name of token is the same if you 
want to overwrite it.

It's not a bug.


was (Author: hitliuyi):
{quote}
To reproduce the bug, please set the following configuration to Name Node:
dfs.namenode.delegation.token.max-lifetime = 10min
dfs.namenode.delegation.key.update-interval = 3min
dfs.namenode.delegation.token.renew-interval = 3min
The bug will occure after 3 minutes.
{quote}

Your test code can't say anything,  the error msg of "token 
(HDFS_DELEGATION_TOKEN token 330156 for test) is expired" is because you set 
"dfs.namenode.delegation.token.renew-interval" to 3 min but you don't let 
{{test}} user to renew the token. 

I see what you want to do now.  Actually hadoop code is enough to let you do 
what you want to do.  If a user client get a new delegation token, and your 
long running application can accept it, you can update the credentials of 
user's UGI on the server through {{UserGroupInformation#addCredentials}}, it 
will overwrite old tokens by default, of course you should make the service 
name of token is the same if you want to overwrite it.

It's not a bug.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   

[jira] [Comment Edited] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-25 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973758#comment-14973758
 ] 

Yi Liu edited comment on HDFS-9276 at 10/26/15 5:41 AM:


{quote}
To reproduce the bug, please set the following configuration to Name Node:
dfs.namenode.delegation.token.max-lifetime = 10min
dfs.namenode.delegation.key.update-interval = 3min
dfs.namenode.delegation.token.renew-interval = 3min
The bug will occure after 3 minutes.
{quote}

Your test code can't say anything,  the error msg of "token 
(HDFS_DELEGATION_TOKEN token 330156 for test) is expired" is because you set 
"dfs.namenode.delegation.token.renew-interval" to 3 min but you don't let 
{{test}} user to renew the token. 

I see what you want to do now, it's the same with the later case of what I 
commented above.  Actually hadoop code is enough to let you do what you want to 
do.  If a user client get a new delegation token, and your long running 
application can accept it, you can update the credentials of user's UGI on the 
server through {{UserGroupInformation#addCredentials}}, it will overwrite old 
tokens by default, of course you should make the service name of token is the 
same if you want to overwrite it.

It's not a bug.


was (Author: hitliuyi):
{quote}
To reproduce the bug, please set the following configuration to Name Node:
dfs.namenode.delegation.token.max-lifetime = 10min
dfs.namenode.delegation.key.update-interval = 3min
dfs.namenode.delegation.token.renew-interval = 3min
The bug will occure after 3 minutes.
{quote}

Your test code can't say anything,  the error msg of "token 
(HDFS_DELEGATION_TOKEN token 330156 for test) is expired" is because you set 
"dfs.namenode.delegation.token.renew-interval" to 3 min but you don't let 
{{test}} user to renew the token. 

I see what you want to do now, if the same with the later case as I commented 
above.  Actually hadoop code is enough to let you do what you want to do.  If a 
user client get a new delegation token, and your long running application can 
accept it, you can update the credentials of user's UGI on the server through 
{{UserGroupInformation#addCredentials}}, it will overwrite old tokens by 
default, of course you should make the service name of token is the same if you 
want to overwrite it.

It's not a bug.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
>

[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl

2015-10-25 Thread Kanaka Kumar Avvaru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973776#comment-14973776
 ] 

Kanaka Kumar Avvaru commented on HDFS-8836:
---

Thanks for the review [~ajisakaa]. Sorry for late response, I will update the 
patch in sometime this week.

> Skip newline on empty files with getMerge -nl
> -
>
> Key: HDFS-8836
> URL: https://issues.apache.org/jira/browse/HDFS-8836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Jan Filipiak
>Assignee: Kanaka Kumar Avvaru
>Priority: Trivial
> Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, 
> HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch
>
>
> Hello everyone,
> I recently was in the need of using the new line option -nl with getMerge 
> because the files I needed to merge simply didn't had one. I was merging all 
> the files from one directory and unfortunately this directory also included 
> empty files, which effectively led to multiple newlines append after some 
> files. I needed to remove them manually afterwards.
> In this situation it is maybe good to have another argument that allows 
> skipping empty files.
> Thing one could try to implement this feature:
> The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
> return the number of bytes copied which would be convenient as one could
> skip append the new line when 0 bytes where copied or one would check the 
> file size before.
> I posted this Idea on the mailing list 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
>  but I didn't really get many responses, so I thought I my try this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-25 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973249#comment-14973249
 ] 

Yi Liu commented on HDFS-9276:
--

[~marsishandsome],  Agree with Steve you need to move this to Hadoop common if 
the patch only contains change in common.  Before this, I think you may have a 
mistake about how to use delegation token.

Do you want to update the delegation token through 
{{FileSystem#addDelegationTokens}}?  It will not get new delegation token again 
if old one exists in the credentials, also it may be more complicate than what 
you think.
Actually I am curious about how you write your long running application. Is 
your application just an user client or running on YARN or a separate service?  
If your application is just one user client, I mean it's not a service which is 
acessed by many user clients, then you still need to user Kerberos instead of 
delegationToken, but if your application is a real service which serves user 
clients, then the delegation token is the right one. The delegation token is 
used in your service to access HDFS on behalf the user, usually your 
application service can renew the delegation token, the application service 
itself can't get a new delegation token for some specific user.  If your 
application service runs longer than the maximum renewable date of user's 
delegationToken,  one way is the user gets a new delegation token and your 
application service supports some mechanism to let user to update the 
delegation token and then refresh the token in that user's UGI's credential.  
Another way is to support proxy user privileges in your running application, 
refer to YARN-2704.   Are you in the correct way?

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 

[jira] [Comment Edited] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-25 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973249#comment-14973249
 ] 

Yi Liu edited comment on HDFS-9276 at 10/25/15 1:57 PM:


[~marsishandsome],  Agree with Steve you need to move this to Hadoop common if 
the patch only contains change in common.  Before this, I think you may have a 
mistake about how to use delegation token.

Do you want to update the delegation token through 
{{FileSystem#addDelegationTokens}}?  It will not get new delegation token again 
if old one exists in the credentials, also it may be more complicate than what 
you think.
Actually I am curious about how you write your long running application. Is 
your application just an user client or a separate service?  If your 
application is just one user client, I mean it's not a service which is 
accessed by many user clients, then you still need to user Kerberos instead of 
delegationToken, but if your application is a real service which serves user 
clients, then the delegation token is the right one. The delegation token is 
used in your service to access HDFS on behalf the user, usually your 
application service can renew the delegation token, the application service 
itself can't get a new delegation token for some specific user.  If your 
application service runs longer than the maximum renewable date of user's 
delegationToken,  one way is the user gets a new delegation token and your 
application service supports some mechanism to let user to update the 
delegation token and then refresh the token in that user's UGI's credential.  
Another way is to support proxy user privileges in your running application, 
refer to YARN-2704.   Are you in the correct way?


was (Author: hitliuyi):
[~marsishandsome],  Agree with Steve you need to move this to Hadoop common if 
the patch only contains change in common.  Before this, I think you may have a 
mistake about how to use delegation token.

Do you want to update the delegation token through 
{{FileSystem#addDelegationTokens}}?  It will not get new delegation token again 
if old one exists in the credentials, also it may be more complicate than what 
you think.
Actually I am curious about how you write your long running application. Is 
your application just an user client or running on YARN or a separate service?  
If your application is just one user client, I mean it's not a service which is 
acessed by many user clients, then you still need to user Kerberos instead of 
delegationToken, but if your application is a real service which serves user 
clients, then the delegation token is the right one. The delegation token is 
used in your service to access HDFS on behalf the user, usually your 
application service can renew the delegation token, the application service 
itself can't get a new delegation token for some specific user.  If your 
application service runs longer than the maximum renewable date of user's 
delegationToken,  one way is the user gets a new delegation token and your 
application service supports some mechanism to let user to update the 
delegation token and then refresh the token in that user's UGI's credential.  
Another way is to support proxy user privileges in your running application, 
refer to YARN-2704.   Are you in the correct way?

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> 

[jira] [Commented] (HDFS-7984) webhdfs:// needs to support provided delegation tokens

2015-10-25 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973256#comment-14973256
 ] 

Yi Liu commented on HDFS-7984:
--

{quote}
String fileLocation = System.getenv(HADOOP_TOKEN_FILE_LOCATION);
...
Credentials cred = Credentials.readTokenStorageFile(
{quote}

The HADOOP_TOKEN_FILE_LOCATION already support multiple tokens.

> webhdfs:// needs to support provided delegation tokens
> --
>
> Key: HDFS-7984
> URL: https://issues.apache.org/jira/browse/HDFS-7984
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Attachments: HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7984) webhdfs:// needs to support provided delegation tokens

2015-10-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973365#comment-14973365
 ] 

Allen Wittenauer commented on HDFS-7984:


HADOOP_TOKEN_FILE_LOCATION is also a terrible interface.

> webhdfs:// needs to support provided delegation tokens
> --
>
> Key: HDFS-7984
> URL: https://issues.apache.org/jira/browse/HDFS-7984
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Attachments: HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9229) Expose size of NameNode directory as a metric

2015-10-25 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9229:
-
Attachment: HDFS-9229.004.patch

Thanks [~zhz] and [~jingzhao]

Attached updated patch..

Changes in 004 :

1. After NN start metric count will update in NNStorage constructor.
2. ANN will update count after edit log roll.
3. SNN will update metric count after tailing edit from ANN.
4. Added new  test based on new changes.

Please review...


> Expose size of NameNode directory as a metric
> -
>
> Key: HDFS-9229
> URL: https://issues.apache.org/jira/browse/HDFS-9229
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Surendra Singh Lilhore
>Priority: Minor
> Attachments: HDFS-9229.001.patch, HDFS-9229.002.patch, 
> HDFS-9229.003.patch, HDFS-9229.004.patch
>
>
> Useful for admins in reserving / managing NN local file system space. Also 
> useful when transferring NN backups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7984) webhdfs:// needs to support provided delegation tokens

2015-10-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973367#comment-14973367
 ] 

Allen Wittenauer commented on HDFS-7984:


Oh, one other thing: there is no way for an end user to create a token file 
with multiple tokens inside it, short of building custom code to do it..  (That 
issue is a separate, upcoming JIRA)

> webhdfs:// needs to support provided delegation tokens
> --
>
> Key: HDFS-7984
> URL: https://issues.apache.org/jira/browse/HDFS-7984
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Attachments: HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)