[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-23 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971490#comment-14971490
 ] 

Haohui Mai commented on HDFS-9129:
--

{code}
+   * The state machine is briefly elaborated in the following diagram. 
Specially
+   * the start status is always INITIALIZED and the end status is always OFF.
+   * There is no transition to INITIALIZED and no transition from OFF. Once
+   * entered, it will not leave THRESHOLD status until the block and datanode
+   * thresholds are met. Similarly, it will not leave EXTENSION status until 
the
+   * thresholds are met and extension period is reached.
+   * 
+   */\
+   *  thresholds not met   / |
+   * INITIALIZED -> THRESHOLD <--`
+   *|  /|
+   *| / |
+   *|/  |
+   *| thresholds met/   |
+   *|   &  /| thresholds 
met
+   * thresholds |no need extension/ |   &
+   *met |.---`  | need 
extension
+   *|   /   |
+   *|  /|
+   *| / |
+   *|/  |
+   *|   /   |
+   *V |/_   V
+   *   OFF <--- EXTENSION <---.
+   *   thresholds met  \  |
+   * &  \/
+   *  extension reached
{code}

It does not give much information compared to figuring out the issues on the 
code directly. What does "thresholds met" / "extensions reached" mean? It 
causes more confusions than explanations.

{code}
LOG.error("Non-recognized block manager safe mode status: {}", status);
{code}

Should be an assert.

{code}
  /**
   * If the NN is in safemode, and not due to manual / low resources, we
   * assume it must be because of startup. If the NN had low resources 
during
   * startup, we assume it came out of startup safemode and it is now 
in low
   * resources safemode.
   */
private volatile boolean isInManualSafeMode = false;
private volatile boolean isInResourceLowSafeMode = false;

...
isInManualSafeMode = !resourcesLow;
isInResourceLowSafeMode = resourcesLow;
{code}

How do these two variables synchronize? Is the system in consistent state in 
the middle of the execution?

{code}
+bmSafeMode = new BlockManagerSafeMode(bm, fsn, conf);
+assertEquals(BMSafeModeStatus.INITIALIZED, getSafeModeStatus());
+assertFalse(bmSafeMode.isInSafeMode());
+// INITIALIZED -> THRESHOLD
+bmSafeMode.setBlockTotal(BLOCK_TOTAL);
+assertEquals(BMSafeModeStatus.THRESHOLD, getSafeModeStatus());
+assertTrue(bmSafeMode.isInSafeMode());
{code}

It makes sense to put it in a test instead of in the {{@Before}} method.

{code}
+// INITIALIZED -> OFF
+Whitebox.setInternalState(bmSafeMode, "status",
+BMSafeModeStatus.INITIALIZED);
+reachBlockThreshold();
+bmSafeMode.checkSafeMode();
+assertEquals(BMSafeModeStatus.OFF, getSafeModeStatus());
+
+// INITIALIZED -> THRESHOLD
+Whitebox.setInternalState(bmSafeMode, "status",
+BMSafeModeStatus.INITIALIZED);
+Whitebox.setInternalState(bmSafeMode, "blockSafe", 0);
+bmSafeMode.checkSafeMode();
+assertEquals(BMSafeModeStatus.THRESHOLD, getSafeModeStatus());
+
+// stays in THRESHOLD: pending block threshold
+Whitebox.setInternalState(bmSafeMode, "status", 
BMSafeModeStatus.THRESHOLD);
+for (long i = 0; i < BLOCK_THRESHOLD; i++) {
+  Whitebox.setInternalState(bmSafeMode, "blockSafe", i);
+  bmSafeMode.checkSafeMode();
+  assertEquals(BMSafeModeStatus.THRESHOLD, getSafeModeStatus());
+}
+
+// THRESHOLD -> EXTENSION
+Whitebox.setInternalState(bmSafeMode, "status", 
BMSafeModeStatus.THRESHOLD);
+reachBlockThreshold();
+bmSafeMode.checkSafeMode();
+assertEquals(BMSafeModeStatus.EXTENSION, getSafeModeStatus());
+Whitebox.setInternalState(bmSafeMode, "smmthread", null);
+
+// THRESHOLD -> OFF
+Whitebox.setInternalState(bmSafeMode, "status", 
BMSafeModeStatus.THRESHOLD);
+reachBlockThreshold();
+Whitebox.setInternalState(bmSafeMode, "needExtension", false);
+

[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971537#comment-14971537
 ] 

Hadoop QA commented on HDFS-9297:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   8m 44s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 37s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 52s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 45s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   1m  5s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  67m 11s | Tests failed in hadoop-hdfs. |
| | |  93m 12s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestRecoverStripedFile |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768333/HDFS-9297.001.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 934d96a |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13157/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13157/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13157/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13157/console |


This message was automatically generated.

> Update TestBlockMissingException to use 
> corruptBlockOnDataNodesByDeletingBlockFile()
> 
>
> Key: HDFS-9297
> URL: https://issues.apache.org/jira/browse/HDFS-9297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Attachments: HDFS-9297.001.patch
>
>
> TestBlockMissingException uses its own function to corrupt a block by 
> deleting all its block files. HDFS-7235 introduced a helper function 
> {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
> thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971586#comment-14971586
 ] 

Yongjun Zhang commented on HDFS-7284:
-

Hi [~jojochuang],

Thanks for the new rev. I noticed that you changed the default logger setting 
from info to debug, definitely we need to change it back:

log4j.rootLogger=debug,stdout

+1 after that pending jenkins test.




> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, 
> HDFS-7284.003.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9279) Decomissioned capacity should not be considered for configured/used capacity

2015-10-23 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated HDFS-9279:
--
Attachment: HDFS-9279-v1.patch

The patch in addition to dfsUsed, also updates XceiverCount , and blockPoolUsed 
only when a node is not decommissioning or decommissioned. cacheCapacity and 
cacheUsed are updated for all nodes that are not decommissioned.

> Decomissioned capacity should not be considered for configured/used capacity
> 
>
> Key: HDFS-9279
> URL: https://issues.apache.org/jira/browse/HDFS-9279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9279-v1.patch
>
>
> Capacity of a decommissioned node is being accounted as configured and used 
> capacity metrics. This gives incorrect perception of cluster usage.
> Once a node is decommissioned, its capacity should be considered similar to a 
> dead node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-23 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-7284:
--
Attachment: HDFS-7284.004.patch

Thanks [~yzhangal] for the code review. I am attaching a new version with no 
log4 change.

> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, 
> HDFS-7284.003.patch, HDFS-7284.004.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9299) Give ReplicationMonitor a readable thread name

2015-10-23 Thread Staffan Friberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Staffan Friberg updated HDFS-9299:
--
Attachment: HDFS-9299.001.patch

> Give ReplicationMonitor a readable thread name
> --
>
> Key: HDFS-9299
> URL: https://issues.apache.org/jira/browse/HDFS-9299
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Priority: Trivial
> Attachments: HDFS-9299.001.patch
>
>
> Currently the log output from the Replication Monitor is the class name, by 
> setting the name on the thread the output will be easier to read.
> Current
> 2015-10-23 11:07:53,344 
> [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@2fbdc5dd]
>  INFO  blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping 
> ReplicationMonitor.
> After
> 2015-10-23 11:07:53,344 [ReplicationMonitor] INFO  
> blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping 
> ReplicationMonitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-23 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-7284:
--
Attachment: HDFS-7284.005.patch

Thanks for catching my bad English :/

> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, 
> HDFS-7284.003.patch, HDFS-7284.004.patch, HDFS-7284.005.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9299) Give ReplicationMonitor a readable thread name

2015-10-23 Thread Staffan Friberg (JIRA)
Staffan Friberg created HDFS-9299:
-

 Summary: Give ReplicationMonitor a readable thread name
 Key: HDFS-9299
 URL: https://issues.apache.org/jira/browse/HDFS-9299
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Staffan Friberg
Priority: Trivial


Currently the log output from the Replication Monitor is the class name, by 
setting the name on the thread the output will be easier to read.

Current
2015-10-23 11:07:53,344 
[org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@2fbdc5dd]
 INFO  blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping 
ReplicationMonitor.


After
2015-10-23 11:07:53,344 [ReplicationMonitor] INFO  blockmanagement.BlockManager 
(BlockManager.java:run(4125)) - Stopping ReplicationMonitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9268) fuse_dfs chown crashes when uid is passed as -1

2015-10-23 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971633#comment-14971633
 ] 

Zhe Zhang commented on HDFS-9268:
-

The patch LGTM. One minor suggestion is maybe we can fold {{fuseConnect}} into 
{{fuseConnectAsThreadUid}} to avoid bugs of this kind in the future? Seems we 
should always call {{fuseConnect}} with the thread UID anyway.

> fuse_dfs chown crashes when uid is passed as -1
> ---
>
> Key: HDFS-9268
> URL: https://issues.apache.org/jira/browse/HDFS-9268
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-9268.001.patch, HDFS-9268.002.patch
>
>
> JVM crashes when users attempt to use vi to update a file on fuse file system 
> with insufficient permission. (I use CDH's hadoop-fuse-dfs wrapper script to 
> generate the bug, but the same bug is reproducible in trunk)
> The root cause is a segfault in a dfs-fuse method
> To reproduce it do as follows:
> mkdir /mnt/fuse
> chmod 777 /mnt/fuse
> ulimit -c unlimited# to enable coredump
> hadoop-fuse-dfs -odebug hdfs://localhost:9000/fuse /mnt/fuse
> touch /mnt/fuse/y
> chmod 600 /mnt/fuse/y
> vim /mnt/fuse/y
> (in vim, :w to save the file)
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x003b82f27ad6, pid=26606, tid=140079005689600
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libc.so.6+0x127ad6]  __tls_get_addr@@GLIBC_2.3+0x127ad6
> #
> # Core dump written. Default location: /home/weichiu/core or core.26606
> #
> # An error report file with more information is saved as:
> # /home/weichiu/hs_err_pid26606.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> /usr/bin/hadoop-fuse-dfs: line 29: 26606 Aborted (core 
> dumped) env CLASSPATH="${CLASSPATH}" ${HADOOP_HOME}/bin/fuse_dfs $@
> ===
> The coredump shows the segfault comes from 
> (gdb) bt
> #0  0x003b82e328e5 in raise () from /lib64/libc.so.6
> #1  0x003b82e340c5 in abort () from /lib64/libc.so.6
> #2  0x7f66fc924d75 in os::abort(bool) () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #3  0x7f66fcaa76d7 in VMError::report_and_die() () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #4  0x7f66fc929c8f in JVM_handle_linux_signal () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #5  
> #6  0x003b82f27ad6 in __strcmp_sse42 () from /lib64/libc.so.6
> #7  0x004039a0 in hdfsConnTree_RB_FIND ()
> #8  0x00403e8f in fuseConnect ()
> #9  0x004046db in dfs_chown ()
> #10 0x7f66fcf8f6d2 in ?? () from /lib64/libfuse.so.2
> #11 0x7f66fcf940d1 in ?? () from /lib64/libfuse.so.2
> #12 0x7f66fcf910ef in ?? () from /lib64/libfuse.so.2
> #13 0x003b83207851 in start_thread () from /lib64/libpthread.so.0
> #14 0x003b82ee894d in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9266) Avoid unsafe split and append on fields that might be IPv6 literals

2015-10-23 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9266:

Summary: Avoid unsafe split and append on fields that might be IPv6 
literals  (was: hadoop-hdfs - Avoid unsafe split and append on fields that 
might be IPv6 literals)

> Avoid unsafe split and append on fields that might be IPv6 literals
> ---
>
> Key: HDFS-9266
> URL: https://issues.apache.org/jira/browse/HDFS-9266
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Nemanja Matkovic
>Assignee: Nemanja Matkovic
>  Labels: ipv6
> Attachments: HDFS-9266-HADOOP-11890.1.patch, 
> HDFS-9266-HADOOP-11890.2.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9299) Give ReplicationMonitor a readable thread name

2015-10-23 Thread Staffan Friberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Staffan Friberg reassigned HDFS-9299:
-

Assignee: Staffan Friberg

> Give ReplicationMonitor a readable thread name
> --
>
> Key: HDFS-9299
> URL: https://issues.apache.org/jira/browse/HDFS-9299
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
>Priority: Trivial
> Attachments: HDFS-9299.001.patch
>
>
> Currently the log output from the Replication Monitor is the class name, by 
> setting the name on the thread the output will be easier to read.
> Current
> 2015-10-23 11:07:53,344 
> [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@2fbdc5dd]
>  INFO  blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping 
> ReplicationMonitor.
> After
> 2015-10-23 11:07:53,344 [ReplicationMonitor] INFO  
> blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping 
> ReplicationMonitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9298) remove replica and not add replica with wrong genStamp

2015-10-23 Thread Chang Li (JIRA)
Chang Li created HDFS-9298:
--

 Summary: remove replica and not add replica with wrong genStamp
 Key: HDFS-9298
 URL: https://issues.apache.org/jira/browse/HDFS-9298
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li


currently, in setGenerationStampAndVerifyReplicas, replica with wrong gen stamp 
is not really removed, only StorageLocation of that replica is removed. 
Moreover, we should check genStamp before addReplicaIfNotPresent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs

2015-10-23 Thread Staffan Friberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Staffan Friberg updated HDFS-9260:
--
Attachment: HDFS-7435.005.patch

Fix the last todos

Handles New NN and Old DN (unsorted entries), it is ineffiecient since the NN 
needs to sort entries. However it should only be a problem during the upgrade 
cycle, and avoidable if DNs are updated first.

StorageInfoMonitor thread that can compact the TreeSet if the fill ratio gets 
too low.

Added test to check that unsorted entries are handled correctly.

> Improve performance and GC friendliness of startup and FBRs
> ---
>
> Key: HDFS-9260
> URL: https://issues.apache.org/jira/browse/HDFS-9260
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, performance
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Attachments: HDFS Block and Replica Management 20151013.pdf, 
> HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, 
> HDFS-7435.004.patch, HDFS-7435.005.patch
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to 
> keep them sorted. This allows faster and more GC friendly handling of full 
> block reports.
> Would like to hear peoples feedback on this change and also some help 
> investigating/understanding a few outstanding issues if we are interested in 
> moving forward with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9255) Consolidate block recovery related implementation into a single class

2015-10-23 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971591#comment-14971591
 ] 

Jing Zhao commented on HDFS-9255:
-

Thanks for working on this, Walter! The patch looks pretty good to me. One 
question is: since {{DataNode#blockRecoveryWorker}} is not declared as final, 
can we make sure the BPServiceActor thread can always see its non-null value 
when calling {{getBlockRecoveryWorker}}?

> Consolidate block recovery related implementation into a single class
> -
>
> Key: HDFS-9255
> URL: https://issues.apache.org/jira/browse/HDFS-9255
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-9255.01.patch, HDFS-9255.02.patch, 
> HDFS-9255.03.patch, HDFS-9255.04.patch, HDFS-9255.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9289) check genStamp when complete file

2015-10-23 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971590#comment-14971590
 ] 

Elliott Clark commented on HDFS-9289:
-

It had all of the data and the same md5sums when I checked. So the only thing 
different was genstamps. Not really sure about why that happened. But I didn't 
mean to side track this jira.

Test looks nice.

> check genStamp when complete file
> -
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/browse/HDFS-9289
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Critical
> Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971600#comment-14971600
 ] 

Yongjun Zhang commented on HDFS-7284:
-

Sorry one more thing
Suggest to change:
{code}
* A helper method to output the string representation of a derived class,
{code}
to
{code}
* A helper method to output the string representation of the Block portion of
* a derived class' instance.
{code}


> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, 
> HDFS-7284.003.patch, HDFS-7284.004.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate

2015-10-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

邓飞 updated HDFS-9293:
-
Description: 
  In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
slowly,and hold the fsnamesystem writelock during the work and the DN's 
heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
which can't send heartbeat  because blocking at process Standby NN Regiest 
common(FIXED at 2.7.1).

  Below is the standby NN  stack:

"Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
[0x7f0dd1d76000]
   java.lang.Thread.State: RUNNABLE
at java.util.PriorityQueue.remove(PriorityQueue.java:360)
at 
org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
- locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
   
When apply editLogOp,if the IPC retryCache is found,need  to remove the 
previous from priorityQueue(O(N)), The updateblock is don't  need record rpcId 
on editlog except  'client request updatePipeline',but we found many 
'UpdateBlocksOp' has repeat ipcId.

 
  

  was:
  In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
slowly,and hold the fsnamesystem writelock during the work and the DN's 
heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
which can't send heartbeat  because blocking at process Standby NN Regiest 
common(FIXED at 2.7.1).

  Below is the standby NN  stack:

"Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
[0x7f0dd1d76000]
   java.lang.Thread.State: RUNNABLE
at java.util.PriorityQueue.remove(PriorityQueue.java:360)
at 
org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
- locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
   
When apply editLogOp,if the IPC retryCache is found,need  to remove the 
previous from priorityQueue(O(N)), The updateblock is don't  need record rpcId 
on editlog except  'client request updatePipeline',but we found many 
'UpdateBlocksOp' has repeat ipcId at editlog.

 
  


> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 
> 'rpcId',which may cause standby NN too busy  to communicate 
> --

[jira] [Updated] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate

2015-10-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

邓飞 updated HDFS-9293:
-
Description: 
  In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
slowly,and hold the fsnamesystem writelock during the work and the DN's 
heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
which can't send heartbeat  because blocking at process Standby NN Regiest 
common(FIXED at 2.7.1).

  Below is the standby NN  stack:

"Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
[0x7f0dd1d76000]
   java.lang.Thread.State: RUNNABLE
at java.util.PriorityQueue.remove(PriorityQueue.java:360)
at 
org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
- locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
   
When apply editLogOp,if the IPC retryCache is found,need  to remove the 
previous from priorityQueue(O(N)), The updateblock is don't  need record rpcId 
on editlog except  'client request updatePipeline',but we found many 
'UpdateBlocksOp' has repeat ipcId at editlog.

 
  

> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 
> 'rpcId',which may cause standby NN too busy  to communicate 
> --
>
> Key: HDFS-9293
> URL: https://issues.apache.org/jira/browse/HDFS-9293
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: 邓飞
>Assignee: 邓飞
>
>   In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
> slowly,and hold the fsnamesystem writelock during the work and the DN's 
> heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
> which can't send heartbeat  because blocking at process Standby NN Regiest 
> common(FIXED at 2.7.1).
>   Below is the standby NN  stack:
> "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
> [0x7f0dd1d76000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.PriorityQueue.remove(PriorityQueue.java:360)
>   at 
> org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
>   at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
>   - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
>   at 
> 

[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971546#comment-14971546
 ] 

Tony Wu commented on HDFS-9297:
---

The failed tests are not related to this change.

> Update TestBlockMissingException to use 
> corruptBlockOnDataNodesByDeletingBlockFile()
> 
>
> Key: HDFS-9297
> URL: https://issues.apache.org/jira/browse/HDFS-9297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Attachments: HDFS-9297.001.patch
>
>
> TestBlockMissingException uses its own function to corrupt a block by 
> deleting all its block files. HDFS-7235 introduced a helper function 
> {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
> thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9298) remove replica and not add replica with wrong genStamp

2015-10-23 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated HDFS-9298:
---
Attachment: HDFS-9298.1.patch

> remove replica and not add replica with wrong genStamp
> --
>
> Key: HDFS-9298
> URL: https://issues.apache.org/jira/browse/HDFS-9298
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: HDFS-9298.1.patch
>
>
> currently, in setGenerationStampAndVerifyReplicas, replica with wrong gen 
> stamp is not really removed, only StorageLocation of that replica is removed. 
> Moreover, we should check genStamp before addReplicaIfNotPresent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9298) remove replica and not add replica with wrong genStamp

2015-10-23 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated HDFS-9298:
---
Status: Patch Available  (was: Open)

> remove replica and not add replica with wrong genStamp
> --
>
> Key: HDFS-9298
> URL: https://issues.apache.org/jira/browse/HDFS-9298
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: HDFS-9298.1.patch
>
>
> currently, in setGenerationStampAndVerifyReplicas, replica with wrong gen 
> stamp is not really removed, only StorageLocation of that replica is removed. 
> Moreover, we should check genStamp before addReplicaIfNotPresent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9266) Avoid unsafe split and append on fields that might be IPv6 literals

2015-10-23 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9266:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Avoid unsafe split and append on fields that might be IPv6 literals
> ---
>
> Key: HDFS-9266
> URL: https://issues.apache.org/jira/browse/HDFS-9266
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Nemanja Matkovic
>Assignee: Nemanja Matkovic
>  Labels: ipv6
> Attachments: HDFS-9266-HADOOP-11890.1.patch, 
> HDFS-9266-HADOOP-11890.2.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9295) Add a thorough test of the full KMS code path

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971634#comment-14971634
 ] 

Hadoop QA commented on HDFS-9295:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   8m 14s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 25s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   1m  2s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  52m 13s | Tests failed in hadoop-hdfs. |
| | |  75m 47s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.datanode.TestFsDatasetCache |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768338/HDFS-9295.002.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / eb6379c |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13160/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13160/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13160/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13160/console |


This message was automatically generated.

> Add a thorough test of the full KMS code path
> -
>
> Key: HDFS-9295
> URL: https://issues.apache.org/jira/browse/HDFS-9295
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HDFS-9295.001.patch, HDFS-9295.002.patch
>
>
> TestKMS does a good job of testing the ACLs directly, but they are tested out 
> of context.  Additional tests are needed that test how the ACL impact key 
> creation, EZ creation, file creation in an EZ, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9294) DFSClient deadlock when close file and failed to renew lease

2015-10-23 Thread JIRA
邓飞 created HDFS-9294:


 Summary: DFSClient  deadlock when close file and failed to renew 
lease
 Key: HDFS-9294
 URL: https://issues.apache.org/jira/browse/HDFS-9294
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, hdfs-client
Affects Versions: 2.7.1, 2.2.0
 Environment: Hadoop 2.2.0
Reporter: 邓飞


We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version is 
2.2.0),and it should be HDFS BUG,at the time our network is not stable.
 below is the stack:

*
Found one Java-level deadlock:
=
"MemStoreFlusher.1":
  waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
org.apache.hadoop.hdfs.LeaseRenewer),
  which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
"LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
  waiting to lock monitor 0x7ff2e67e16a8 (object 0x000486ce6620, a 
org.apache.hadoop.hdfs.DFSOutputStream),
  which is held by "MemStoreFlusher.0"
"MemStoreFlusher.0":
  waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
org.apache.hadoop.hdfs.LeaseRenewer),
  which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"

Java stack information for the threads listed above:
===
"MemStoreFlusher.1":
at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
- waiting to lock <0x0002fae5ebe0> (a 
org.apache.hadoop.hdfs.LeaseRenewer)
at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
- locked <0x00055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
at 
org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
at 
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
at 
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
- locked <0x00059869eed8> (a java.lang.Object)
at 
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238)
at java.lang.Thread.run(Thread.java:744)
"LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
at 
org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822)
- waiting to lock <0x000486ce6620> (a 
org.apache.hadoop.hdfs.DFSOutputStream)
at 
org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780)
at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:453)
- locked <0x0002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
at java.lang.Thread.run(Thread.java:744)
"MemStoreFlusher.0":
at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
- waiting to lock <0x0002fae5ebe0> (a 
org.apache.hadoop.hdfs.LeaseRenewer)
at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
at 

[jira] [Commented] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate

2015-10-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970720#comment-14970720
 ] 

邓飞 commented on HDFS-9293:
--

thank Walter, it's my mistake,that fixed at 2.7.1

> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 
> 'rpcId',which may cause standby NN too busy  to communicate 
> --
>
> Key: HDFS-9293
> URL: https://issues.apache.org/jira/browse/HDFS-9293
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: 邓飞
>Assignee: 邓飞
>
>   In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
> slowly,and hold the fsnamesystem writelock during the work and the DN's 
> heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
> which can't send heartbeat  because blocking at process Standby NN Regiest 
> common(FIXED at 2.7.1).
>   Below is the standby NN  stack:
> "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
> [0x7f0dd1d76000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.PriorityQueue.remove(PriorityQueue.java:360)
>   at 
> org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
>   at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
>   - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>
> When apply editLogOp,if the IPC retryCache is found,need  to remove the 
> previous from priorityQueue(O(N)), The updateblock is don't  need record 
> rpcId on editlog except  'client request updatePipeline',but we found many 
> 'UpdateBlocksOp' has repeat ipcId.
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970837#comment-14970837
 ] 

Hadoop QA commented on HDFS-9276:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m  1s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 58s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 13s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 23s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   8m 56s | Tests passed in 
hadoop-common. |
| | |  54m 52s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-common |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768270/HDFS-9276.03.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13151/artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13151/artifact/patchprocess/testrun_hadoop-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13151/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13151/console |


This message was automatically generated.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new 

[jira] [Assigned] (HDFS-9294) DFSClient deadlock when close file and failed to renew lease

2015-10-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

邓飞 reassigned HDFS-9294:


Assignee: 邓飞

> DFSClient  deadlock when close file and failed to renew lease
> -
>
> Key: HDFS-9294
> URL: https://issues.apache.org/jira/browse/HDFS-9294
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS, hdfs-client
>Affects Versions: 2.2.0, 2.7.1
> Environment: Hadoop 2.2.0
>Reporter: 邓飞
>Assignee: 邓飞
>
> We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version is 
> 2.2.0),and it should be HDFS BUG,at the time our network is not stable.
>  below is the stack:
> *
> Found one Java-level deadlock:
> =
> "MemStoreFlusher.1":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   waiting to lock monitor 0x7ff2e67e16a8 (object 0x000486ce6620, a 
> org.apache.hadoop.hdfs.DFSOutputStream),
>   which is held by "MemStoreFlusher.0"
> "MemStoreFlusher.0":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> Java stack information for the threads listed above:
> ===
> "MemStoreFlusher.1":
>   at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
>   - waiting to lock <0x0002fae5ebe0> (a 
> org.apache.hadoop.hdfs.LeaseRenewer)
>   at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
>   - locked <0x00055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
>   at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>   - locked <0x00059869eed8> (a java.lang.Object)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238)
>   at java.lang.Thread.run(Thread.java:744)
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822)
>   - waiting to lock <0x000486ce6620> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780)
>   at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753)
>   at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:453)
>   - locked <0x0002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer)
>   at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
>   at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
>   at java.lang.Thread.run(Thread.java:744)
> "MemStoreFlusher.0":
>   at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
>   - waiting to lock <0x0002fae5ebe0> (a 

[jira] [Commented] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate

2015-10-23 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970711#comment-14970711
 ] 

Walter Su commented on HDFS-9293:
-

I checked HDFS-7398. I think ClientId/CallId will be reset after logEdit(..).

> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 
> 'rpcId',which may cause standby NN too busy  to communicate 
> --
>
> Key: HDFS-9293
> URL: https://issues.apache.org/jira/browse/HDFS-9293
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: 邓飞
>Assignee: 邓飞
>
>   In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
> slowly,and hold the fsnamesystem writelock during the work and the DN's 
> heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
> which can't send heartbeat  because blocking at process Standby NN Regiest 
> common(FIXED at 2.7.1).
>   Below is the standby NN  stack:
> "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
> [0x7f0dd1d76000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.PriorityQueue.remove(PriorityQueue.java:360)
>   at 
> org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
>   at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
>   - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>
> When apply editLogOp,if the IPC retryCache is found,need  to remove the 
> previous from priorityQueue(O(N)), The updateblock is don't  need record 
> rpcId on editlog except  'client request updatePipeline',but we found many 
> 'UpdateBlocksOp' has repeat ipcId.
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9255) Consolidate block recovery related implementation into a single class

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970735#comment-14970735
 ] 

Hadoop QA commented on HDFS-9255:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 17s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 23s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 28s | The applied patch generated  1 
new checkstyle issues (total was 286, now 264). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 33s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 12s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  51m 35s | Tests failed in hadoop-hdfs. |
| | |  97m 58s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768244/HDFS-9255.05.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13149/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13149/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13149/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13149/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13149/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13149/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13149/console |


This message was automatically generated.

> Consolidate block recovery related implementation into a single class
> -
>
> Key: HDFS-9255
> URL: https://issues.apache.org/jira/browse/HDFS-9255
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-9255.01.patch, HDFS-9255.02.patch, 
> HDFS-9255.03.patch, HDFS-9255.04.patch, HDFS-9255.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9290) DFSClient#callAppend() is not backward compatible for slightly older NameNodes

2015-10-23 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971060#comment-14971060
 ] 

Kihwal Lee commented on HDFS-9290:
--

The fix looks good. One minor nit is that logging at {{INFO}} can sometimes be 
noisy. I think end-users rarely care about the fact that it is talking to an 
older namenode. Let's make it {{DEBUG}}.

> DFSClient#callAppend() is not backward compatible for slightly older NameNodes
> --
>
> Key: HDFS-9290
> URL: https://issues.apache.org/jira/browse/HDFS-9290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Blocker
> Attachments: HDFS-9290.001.patch
>
>
> HDFS-7210 combined 2 RPC calls used at file append into a single one. 
> Specifically {{getFileInfo()}} is combined with {{append()}}. While backward 
> compatibility for older client is handled by the new NameNode (protobuf). 
> Newer client's {{append()}} call does not work with older NameNodes. One will 
> run into an exception like the following:
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.isLazyPersist(DFSOutputStream.java:1741)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.getChecksum4Compute(DFSOutputStream.java:1550)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1560)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1670)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1717)
> at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1861)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1922)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1892)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:340)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:336)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:336)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:318)
> at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1164)
> {code}
> The cause is that the new client code is expecting both the last block and 
> file info in the same RPC but the old NameNode only replied with the first. 
> The exception itself does not reflect this and one will have to look at the 
> HDFS source code to really understand what happened.
> We can have the client detect it's talking to a old NameNode and send an 
> extra {{getFileInfo()}} RPC. Or we should improve the exception being thrown 
> to accurately reflect the cause of failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9295) Add a thorough test of the full KMS code path

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971106#comment-14971106
 ] 

Hadoop QA commented on HDFS-9295:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   8m 31s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   8m 32s | The applied patch generated  22  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 33s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   1m  9s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  66m 51s | Tests failed in hadoop-hdfs. |
| | |  91m 52s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestSafeMode |
|   | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768294/HDFS-9295.001.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13152/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13152/artifact/patchprocess/diffJavacWarnings.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13152/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13152/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13152/console |


This message was automatically generated.

> Add a thorough test of the full KMS code path
> -
>
> Key: HDFS-9295
> URL: https://issues.apache.org/jira/browse/HDFS-9295
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HDFS-9295.001.patch
>
>
> TestKMS does a good job of testing the ACLs directly, but they are tested out 
> of context.  Additional tests are needed that test how the ACL impact key 
> creation, EZ creation, file creation in an EZ, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++

2015-10-23 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971142#comment-14971142
 ] 

James Clampffer commented on HDFS-9117:
---

Thanks for the update Bob! The last patch covered all of my concerns, the 
comments made it much easier to understand.

I have one tiny issue and one nit:
Issue: In configuration.cc #include  can just be 
#include 
Nit: A comment about why 20 is the recursion depth limit for 
Configuration::SubstituteVars could be handy even if it just says "this is how 
the java client does it".  Certainly not a blocker.

Once the include is fixed up I'll +1.

> Config file reader / options classes for libhdfs++
> --
>
> Key: HDFS-9117
> URL: https://issues.apache.org/jira/browse/HDFS-9117
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9117.HDFS-8707.001.patch, 
> HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, 
> HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, 
> HDFS-9117.HDFS-8707.006.patch
>
>
> For environmental compatability with HDFS installations, libhdfs++ should be 
> able to read the configurations from Hadoop XML files and behave in line with 
> the Java implementation.
> Most notably, machine names and ports should be readable from Hadoop XML 
> configuration files.
> Similarly, an internal Options architecture for libhdfs++ should be developed 
> to efficiently transport the configuration information within the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971179#comment-14971179
 ] 

Yongjun Zhang commented on HDFS-7284:
-

Hi [~jojochuang],

It's important to have consistent block name appear in the log, so people can 
analyze the actions happened to a given block across the board by searching for 
the "blk_id_timestamp" or "blk_id".

I'd suggest adding the following code to Block.java:

{code}
  /**
   */
  public static String toString(final Block b) {
return b.getBlockName() + "_" + b.getGenerationStamp();
  }
  
  /**
   */
  @Override
  public String toString() {
return toString(this);
  }
{code}

and change the message you are working on to
{code}
 NameNode.blockStateChangeLog.debug("BLOCK* Removing stale replica {}"
  + " of {}", r, Block.toString(r));
{code}

Hi [~andrew.wang], does this sound good to you? I think the replica state that 
comes with {{ReplicaUnderConstruction#toString}} would help debugging.

Thanks.


> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9269) Need to update the documentation and wrapper for fuse-dfs

2015-10-23 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9269:
--
Attachment: HDFS-9269.001.patch

rev1: work in progress. updated doc/README

> Need to update the documentation and wrapper for fuse-dfs
> -
>
> Key: HDFS-9269
> URL: https://issues.apache.org/jira/browse/HDFS-9269
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
> Attachments: HDFS-9269.001.patch
>
>
> To reproduce the bug in HDFS-9268, I followed the wiki, the doc and read the 
> wrapper script of fuse-dfs, but found them super outdated. (the wrapper was 
> last updated four years ago, and the hadoop project layout has dramatically 
> changed since then). I am creating this JIRA to track the status of the 
> update.
> There are quite a few external blogs/discussion threads floating around the 
> internet which talked about how to update the scripts, but no one took the 
> time to update them here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9290) DFSClient#callAppend() is not backward compatible for slightly older NameNodes

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971255#comment-14971255
 ] 

Hadoop QA commented on HDFS-9290:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   9m 19s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 55s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 53s | The applied patch generated  1 
new checkstyle issues (total was 55, now 55). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 18s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 34s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests |   0m 31s | Tests passed in 
hadoop-hdfs-client. |
| | |  51m 35s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768311/HDFS-9290.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 35a303d |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13154/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13154/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13154/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13154/console |


This message was automatically generated.

> DFSClient#callAppend() is not backward compatible for slightly older NameNodes
> --
>
> Key: HDFS-9290
> URL: https://issues.apache.org/jira/browse/HDFS-9290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Blocker
> Attachments: HDFS-9290.001.patch, HDFS-9290.002.patch
>
>
> HDFS-7210 combined 2 RPC calls used at file append into a single one. 
> Specifically {{getFileInfo()}} is combined with {{append()}}. While backward 
> compatibility for older client is handled by the new NameNode (protobuf). 
> Newer client's {{append()}} call does not work with older NameNodes. One will 
> run into an exception like the following:
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.isLazyPersist(DFSOutputStream.java:1741)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.getChecksum4Compute(DFSOutputStream.java:1550)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1560)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1670)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1717)
> at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1861)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1922)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1892)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:340)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:336)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:336)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:318)
> at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1164)
> {code}
> The cause is that the new client code is expecting both the last block and 
> file info in the same 

[jira] [Commented] (HDFS-9296) ShellBasedUnixGroupMapping should support group names with space

2015-10-23 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971252#comment-14971252
 ] 

Allen Wittenauer commented on HDFS-9296:


bq. AD permits group names with space (e.g. "Domain Users").

Yes, but that doesn't mean they are POSIX compliant, which much match this 
regex: [_a-z][-0-9_a-z]*\$? .

So a definite -1 on this.

> ShellBasedUnixGroupMapping should support group names with space
> 
>
> Key: HDFS-9296
> URL: https://issues.apache.org/jira/browse/HDFS-9296
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>
> In a typical configuration, group name is obtained from AD through SSSD/LDAP. 
> AD permits group names with space (e.g. "Domain Users").
> Unfortunately, the present implementation of ShellBasedUnixGroupMapping 
> parses the output of shell command "id -Gn", and assumes group names are 
> separated by space. 
> This could be achieved by using a combination of shell scripts, for example, 
> bash -c 'id -G weichiu | tr " " "\n" | xargs -I % getent group "%" | cut 
> -d":" -f1'
> But I am still looking for a more compact form, and potentially more 
> efficient one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9272) Implement a unix-like cat utility

2015-10-23 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971177#comment-14971177
 ] 

James Clampffer commented on HDFS-9272:
---

Will do.  Is there generally a preference in the HDFS community about taking a 
host and port as seperate tokens vs taking a URI for these sorts of tests?

> Implement a unix-like cat utility
> -
>
> Key: HDFS-9272
> URL: https://issues.apache.org/jira/browse/HDFS-9272
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
>Priority: Minor
> Attachments: HDFS-9272.HDFS-8707.000.patch
>
>
> Implement the basic functionality of "cat" and have it build as a separate 
> executable.
> 2 Reasons for this:
> We don't have any real integration tests at the moment so something simple to 
> verify that the library actually works against a real cluster is useful.
> Eventually I'll make more utilities like stat, mkdir etc.  Once there are 
> enough of them it will be simple to make a C++ implementation of the hadoop 
> fs command line interface that doesn't take the latency hit of spinning up a 
> JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-23 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7284:

Labels: supportability  (was: )

> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9296) ShellBasedUnixGroupMapping should support group names with space

2015-10-23 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9296:
-

 Summary: ShellBasedUnixGroupMapping should support group names 
with space
 Key: HDFS-9296
 URL: https://issues.apache.org/jira/browse/HDFS-9296
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


In a typical configuration, group name is obtained from AD through SSSD/LDAP. 
AD permits group names with space (e.g. "Domain Users").

Unfortunately, the present implementation of ShellBasedUnixGroupMapping parses 
the output of shell command "id -Gn", and assumes group names are separated by 
space. 

This could be achieved by using a combination of shell scripts, for example, 

bash -c 'id -G weichiu | tr " " "\n" | xargs -I % getent group "%" | cut -d":" 
-f1'

But I am still looking for a more compact form, and potentially more efficient 
one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9296) ShellBasedUnixGroupMapping should support group names with space

2015-10-23 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-9296.
---
Resolution: Duplicate

I filed in the wrong category. A new one is filed as HADOOP-12505

> ShellBasedUnixGroupMapping should support group names with space
> 
>
> Key: HDFS-9296
> URL: https://issues.apache.org/jira/browse/HDFS-9296
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>
> In a typical configuration, group name is obtained from AD through SSSD/LDAP. 
> AD permits group names with space (e.g. "Domain Users").
> Unfortunately, the present implementation of ShellBasedUnixGroupMapping 
> parses the output of shell command "id -Gn", and assumes group names are 
> separated by space. 
> This could be achieved by using a combination of shell scripts, for example, 
> bash -c 'id -G weichiu | tr " " "\n" | xargs -I % getent group "%" | cut 
> -d":" -f1'
> But I am still looking for a more compact form, and potentially more 
> efficient one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9295) Add a thorough test of the full KMS code path

2015-10-23 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-9295:
---
Attachment: HDFS-9295.001.patch

> Add a thorough test of the full KMS code path
> -
>
> Key: HDFS-9295
> URL: https://issues.apache.org/jira/browse/HDFS-9295
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test
>Affects Versions: 2.6.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HDFS-9295.001.patch
>
>
> TestKMS does a good job of testing the ACLs directly, but they are tested out 
> of context.  Additional tests are needed that test how the ACL impact key 
> creation, EZ creation, file creation in an EZ, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9243) TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout

2015-10-23 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970914#comment-14970914
 ] 

Wei-Chiu Chuang commented on HDFS-9243:
---

Thanks for the analysis. Please feel free to assign this jira to yourself. 
Because the failure appear quite frequent, it should be possible to improve 
upon it, even though it may not be an issue in production. I am thinking it 
could be resolved by reducing certain timeout parameters, so that the test case 
doesn't need to wait long. 

> TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout
> -
>
> Key: HDFS-9243
> URL: https://issues.apache.org/jira/browse/HDFS-9243
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Wei-Chiu Chuang
>Priority: Minor
>
> org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks 
> sometimes time out.
> This is happening on trunk as can be observed in several recent jenkins job. 
> (e.g. https://builds.apache.org/job/Hadoop-Hdfs-trunk/2423/  
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2386/ 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2351/ 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/472/
> On my local Linux machine, this test case times out 6 out of 10 times. When 
> it does not time out, this test takes about 20 seconds, otherwise it takes 
> more than 60 seconds and then time out.
> I suspect it's a deadlock issue, as dead lock had occurred at this test case 
> in HDFS-5527 before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9295) Add a thorough test of the full KMS code path

2015-10-23 Thread Daniel Templeton (JIRA)
Daniel Templeton created HDFS-9295:
--

 Summary: Add a thorough test of the full KMS code path
 Key: HDFS-9295
 URL: https://issues.apache.org/jira/browse/HDFS-9295
 Project: Hadoop HDFS
  Issue Type: Test
  Components: security, test
Affects Versions: 2.6.1
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Critical


TestKMS does a good job of testing the ACLs directly, but they are tested out 
of context.  Additional tests are needed that test how the ACL impact key 
creation, EZ creation, file creation in an EZ, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9295) Add a thorough test of the full KMS code path

2015-10-23 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-9295:
---
Affects Version/s: (was: 2.6.1)
   2.7.1
   Status: Patch Available  (was: Open)

> Add a thorough test of the full KMS code path
> -
>
> Key: HDFS-9295
> URL: https://issues.apache.org/jira/browse/HDFS-9295
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HDFS-9295.001.patch
>
>
> TestKMS does a good job of testing the ACLs directly, but they are tested out 
> of context.  Additional tests are needed that test how the ACL impact key 
> creation, EZ creation, file creation in an EZ, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9285) testTruncateWithDataNodesRestartImmediately occasionally fails

2015-10-23 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970919#comment-14970919
 ] 

Wei-Chiu Chuang commented on HDFS-9285:
---

Thanks [~walter.k.su] for the analysis. You have done a fix in HDFS-8729 so I 
am assuming you're the expert :) Please feel free to assign this jira to 
yourself!

> testTruncateWithDataNodesRestartImmediately occasionally fails
> --
>
> Key: HDFS-9285
> URL: https://issues.apache.org/jira/browse/HDFS-9285
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Priority: Minor
>
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2462/testReport/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestartImmediately/
> Note that this is similar, but appears to be a different failure than 
> HDFS-8729.
> Error Message
> inode should complete in ~3 ms.
> Expected: is 
>  but: was 
> Stacktrace
> java.lang.AssertionError: inode should complete in ~3 ms.
> Expected: is 
>  but: was 
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.junit.Assert.assertThat(Assert.java:865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1192)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1176)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1171)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestartImmediately(TestFileTruncate.java:798)
> Log excerpt:
> 2015-10-22 06:34:47,281 [IPC Server handler 8 on 8020] INFO  
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1   cmd=open
> src=/test/testTruncateWithDataNodesRestartImmediately   dst=null
> perm=null   proto=rpc
> 2015-10-22 06:34:47,382 [IPC Server handler 9 on 8020] INFO  
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1   cmd=open
> src=/test/testTruncateWithDataNodesRestartImmediately   dst=null
> perm=null   proto=rpc
> 2015-10-22 06:34:47,484 [IPC Server handler 0 on 8020] INFO  
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1   cmd=open
> src=/test/testTruncateWithDataNodesRestartImmediately   dst=null
> perm=null   proto=rpc
> 2015-10-22 06:34:47,585 [IPC Server handler 1 on 8020] INFO  
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1   cmd=open
> src=/test/testTruncateWithDataNodesRestartImmediately   dst=null
> perm=null   proto=rpc
> 2015-10-22 06:34:47,689 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1889)) - Shutting down the Mini HDFS Cluster
> 2015-10-22 06:34:47,690 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdownDataNodes(1935)) - Shutting down DataNode 2
> 2015-10-22 06:34:47,690 [main] WARN  datanode.DirectoryScanner 
> (DirectoryScanner.java:shutdown(529)) - DirectoryScanner: shutdown has been 
> called



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-23 Thread Liangliang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liangliang Gu updated HDFS-9276:

Attachment: HDFS-9276.02.patch

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, debug1.PNG, 
> debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
>   at 
> 

[jira] [Commented] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate

2015-10-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970696#comment-14970696
 ] 

邓飞 commented on HDFS-9293:
--

It's dirty reference case on FSEditLog:

 private ThreadLocal cache =
  new ThreadLocal() {
@Override
protected OpInstanceCache initialValue() {
  return new OpInstanceCache();
}
  };

If NN all handler thread initial the OpInstanceCache  instance, the the thread 
will use later.
Such as logUpdateBlocks:

public void logUpdateBlocks(String path, INodeFileUnderConstruction file,
  boolean toLogRpcIds) {
UpdateBlocksOp op = UpdateBlocksOp.getInstance(cache.get())
  .setPath(path)
  .setBlocks(file.getBlocks());
logRpcIds(op, toLogRpcIds);
logEdit(op);
  }
 
/** Record the RPC IDs if necessary */
  private void logRpcIds(FSEditLogOp op, boolean toLogRpcIds) {
if (toLogRpcIds) {
  op.setRpcClientId(Server.getClientId());
  op.setRpcCallId(Server.getCallId());
}
  }

If client recover the pipeline at oncetime,so the FSEditLogOp  instance will 
set RpcId. Even though other UpdateBlocksOp  like addBlock whick identified as  
@Idempotent,but also will record repeat RpcId at editlog.
That made standby NN IPC handler thread parking, indirectly active NN.
And we found 2.7.1 has the same problem.

 



> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 
> 'rpcId',which may cause standby NN too busy  to communicate 
> --
>
> Key: HDFS-9293
> URL: https://issues.apache.org/jira/browse/HDFS-9293
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: 邓飞
>Assignee: 邓飞
>
>   In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
> slowly,and hold the fsnamesystem writelock during the work and the DN's 
> heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
> which can't send heartbeat  because blocking at process Standby NN Regiest 
> common(FIXED at 2.7.1).
>   Below is the standby NN  stack:
> "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
> [0x7f0dd1d76000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.PriorityQueue.remove(PriorityQueue.java:360)
>   at 
> org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
>   at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
>   - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>
> When apply editLogOp,if the IPC retryCache is found,need  to remove the 
> previous from priorityQueue(O(N)), The updateblock is don't  need record 
> rpcId on editlog except  'client request updatePipeline',but we found many 
> 'UpdateBlocksOp' has repeat ipcId.
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate

2015-10-23 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970689#comment-14970689
 ] 

Walter Su commented on HDFS-9293:
-

relates to HDFS-7609, HDFS-8611

> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 
> 'rpcId',which may cause standby NN too busy  to communicate 
> --
>
> Key: HDFS-9293
> URL: https://issues.apache.org/jira/browse/HDFS-9293
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: 邓飞
>Assignee: 邓飞
>
>   In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
> slowly,and hold the fsnamesystem writelock during the work and the DN's 
> heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
> which can't send heartbeat  because blocking at process Standby NN Regiest 
> common(FIXED at 2.7.1).
>   Below is the standby NN  stack:
> "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
> [0x7f0dd1d76000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.PriorityQueue.remove(PriorityQueue.java:360)
>   at 
> org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
>   at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
>   - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>
> When apply editLogOp,if the IPC retryCache is found,need  to remove the 
> previous from priorityQueue(O(N)), The updateblock is don't  need record 
> rpcId on editlog except  'client request updatePipeline',but we found many 
> 'UpdateBlocksOp' has repeat ipcId.
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate

2015-10-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

邓飞 resolved HDFS-9293.
--
   Resolution: Fixed
Fix Version/s: 2.7.1

> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 
> 'rpcId',which may cause standby NN too busy  to communicate 
> --
>
> Key: HDFS-9293
> URL: https://issues.apache.org/jira/browse/HDFS-9293
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: 邓飞
>Assignee: 邓飞
> Fix For: 2.7.1
>
>
>   In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
> slowly,and hold the fsnamesystem writelock during the work and the DN's 
> heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
> which can't send heartbeat  because blocking at process Standby NN Regiest 
> common(FIXED at 2.7.1).
>   Below is the standby NN  stack:
> "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
> [0x7f0dd1d76000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.PriorityQueue.remove(PriorityQueue.java:360)
>   at 
> org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
>   at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
>   - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>
> When apply editLogOp,if the IPC retryCache is found,need  to remove the 
> previous from priorityQueue(O(N)), The updateblock is don't  need record 
> rpcId on editlog except  'client request updatePipeline',but we found many 
> 'UpdateBlocksOp' has repeat ipcId.
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-23 Thread Liangliang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liangliang Gu updated HDFS-9276:

Attachment: HDFS-9276.03.patch

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
>   at 
> 

[jira] [Work started] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate

2015-10-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9293 started by 邓飞.

> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 
> 'rpcId',which may cause standby NN too busy  to communicate 
> --
>
> Key: HDFS-9293
> URL: https://issues.apache.org/jira/browse/HDFS-9293
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: 邓飞
>Assignee: 邓飞
>
>   In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog 
> slowly,and hold the fsnamesystem writelock during the work and the DN's 
> heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN 
> which can't send heartbeat  because blocking at process Standby NN Regiest 
> common(FIXED at 2.7.1).
>   Below is the standby NN  stack:
> "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable 
> [0x7f0dd1d76000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.PriorityQueue.remove(PriorityQueue.java:360)
>   at 
> org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
>   at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
>   - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>
> When apply editLogOp,if the IPC retryCache is found,need  to remove the 
> previous from priorityQueue(O(N)), The updateblock is don't  need record 
> rpcId on editlog except  'client request updatePipeline',but we found many 
> 'UpdateBlocksOp' has repeat ipcId.
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970736#comment-14970736
 ] 

Hadoop QA commented on HDFS-9276:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768266/HDFS-9276.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13150/console |


This message was automatically generated.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, debug1.PNG, 
> debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970768#comment-14970768
 ] 

Steve Loughran commented on HDFS-9276:
--

give the patch is all to hadoop common -filing a JIRA & patch there will run 
the hadoop-common build & test, which is a bit less brittle than the HDFS one

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)

[jira] [Commented] (HDFS-8914) Documentation conflict regarding fail-over of Namenode

2015-10-23 Thread Ravindra Babu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970727#comment-14970727
 ] 

Ravindra Babu commented on HDFS-8914:
-

Lars Francke : Are you committing this change as we have received +1 from 
Hadoop QA?

> Documentation conflict regarding fail-over of Namenode
> --
>
> Key: HDFS-8914
> URL: https://issues.apache.org/jira/browse/HDFS-8914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
> Environment: Documentation page in live
>Reporter: Ravindra Babu
>Assignee: Lars Francke
>Priority: Trivial
> Attachments: HDFS-8914.1.patch, HDFS-8914.2.patch
>
>
> Please refer to these two links and correct one of them.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> The NameNode machine is a single point of failure for an HDFS cluster. If the 
> NameNode machine fails, manual intervention is necessary. Currently, 
> automatic restart and failover of the NameNode software to another machine is 
> not supported.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
> The HDFS High Availability feature addresses the above problems by providing 
> the option of running two redundant NameNodes in the same cluster in an 
> Active/Passive configuration with a hot standby. This allows a fast failover 
> to a new NameNode in the case that a machine crashes, or a graceful 
> administrator-initiated failover for the purpose of planned maintenance.
> Please update hdfsDesign article with same facts to avoid confusion in 
> Reader's mind..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-9254) HDFS Secure Mode Documentation updates

2015-10-23 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971312#comment-14971312
 ] 

Arpit Agarwal edited comment on HDFS-9254 at 10/23/15 4:37 PM:
---

So yes it looks like at least the {{SaslRpcClient}} doesn't like principals 
without a host component.

{code}
192.168.56.80:8485: Failed on local exception: java.io.IOException: 
java.lang.IllegalArgumentException: Kerberos principal name does NOT have the 
expected hostname part: j...@example.com; Host Details : local host is: 
"cm0.example.com/192.168.56.80"; destination host is: "cm0.example.com":8485;
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)
at 
org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:899)
{code}

Whereas SecurityUtil handles them fine. We should be consistent. I'll file a 
separate bug to fix the {{SaslRpcClient}}, and any other components I run into, 
but also update the doc patch for now. Thanks for the catch.


was (Author: arpitagarwal):
So yes it looks like at least the Journal Node doesn't like principals without 
a host component.

{code}
192.168.56.80:8485: Failed on local exception: java.io.IOException: 
java.lang.IllegalArgumentException: Kerberos principal name does NOT have the 
expected hostname part: j...@example.com; Host Details : local host is: 
"cm0.example.com/192.168.56.80"; destination host is: "cm0.example.com":8485;
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)
at 
org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:899)
{code}

Whereas SecurityUtil handles them fine. We should be consistent. I'll file a 
separate bug to fix the JN, and any other components I run into, but also 
update the doc patch for now. Thanks for the catch.

> HDFS Secure Mode Documentation updates
> --
>
> Key: HDFS-9254
> URL: https://issues.apache.org/jira/browse/HDFS-9254
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9254.01.patch
>
>
> Some Kerberos configuration parameters are not documented well enough. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot

2015-10-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9231:

Description: 
Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt 
blocks with the original file dir instead of the snapshot dir, and {{fsck 
-list-corruptfileblocks -includeSnapshots}} behave the same.
This can be confusing because even when the original file is deleted, fsck will 
still show that deleted file as corrupted, although what's actually corrupted 
is the snapshot. 

As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs.

  was:
For snapshot files, fsck shows corrupt blocks with the original file dir 
instead of the snapshot dir.
This can be confusing since even when the original file is deleted, a new fsck 
run will still show that file as corrupted although what's actually corrupted 
is the snapshot. 

This is true even when given the -includeSnapshots option.


> fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
> ---
>
> Key: HDFS-9231
> URL: https://issues.apache.org/jira/browse/HDFS-9231
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, 
> HDFS-9231.003.patch, HDFS-9231.004.patch
>
>
> Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt 
> blocks with the original file dir instead of the snapshot dir, and {{fsck 
> -list-corruptfileblocks -includeSnapshots}} behave the same.
> This can be confusing because even when the original file is deleted, fsck 
> will still show that deleted file as corrupted, although what's actually 
> corrupted is the snapshot. 
> As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HDFS-9297:
--
Status: Patch Available  (was: Open)

> Update TestBlockMissingException to use 
> corruptBlockOnDataNodesByDeletingBlockFile()
> 
>
> Key: HDFS-9297
> URL: https://issues.apache.org/jira/browse/HDFS-9297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Attachments: HDFS-9297.001.patch
>
>
> TestBlockMissingException uses its own function to corrupt a block by 
> deleting all its block files. HDFS-7235 introduced a helper function 
> {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
> thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9272) Implement a unix-like cat utility

2015-10-23 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971367#comment-14971367
 ] 

Haohui Mai commented on HDFS-9272:
--

bq. Will do. Is there generally a preference in the HDFS community about taking 
a host and port as seperate tokens vs taking a URI for these sorts of tests?

IMO either approach is reasonable. Please feel free to choose what is easier.

> Implement a unix-like cat utility
> -
>
> Key: HDFS-9272
> URL: https://issues.apache.org/jira/browse/HDFS-9272
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
>Priority: Minor
> Attachments: HDFS-9272.HDFS-8707.000.patch
>
>
> Implement the basic functionality of "cat" and have it build as a separate 
> executable.
> 2 Reasons for this:
> We don't have any real integration tests at the moment so something simple to 
> verify that the library actually works against a real cluster is useful.
> Eventually I'll make more utilities like stat, mkdir etc.  Once there are 
> enough of them it will be simple to make a C++ implementation of the hadoop 
> fs command line interface that doesn't take the latency hit of spinning up a 
> JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8914) Documentation conflict regarding fail-over of Namenode

2015-10-23 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971395#comment-14971395
 ] 

Lars Francke commented on HDFS-8914:


I'm not a Hadoop committer, we'll have to wait for one.

> Documentation conflict regarding fail-over of Namenode
> --
>
> Key: HDFS-8914
> URL: https://issues.apache.org/jira/browse/HDFS-8914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
> Environment: Documentation page in live
>Reporter: Ravindra Babu
>Assignee: Lars Francke
>Priority: Trivial
> Attachments: HDFS-8914.1.patch, HDFS-8914.2.patch
>
>
> Please refer to these two links and correct one of them.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> The NameNode machine is a single point of failure for an HDFS cluster. If the 
> NameNode machine fails, manual intervention is necessary. Currently, 
> automatic restart and failover of the NameNode software to another machine is 
> not supported.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
> The HDFS High Availability feature addresses the above problems by providing 
> the option of running two redundant NameNodes in the same cluster in an 
> Active/Passive configuration with a hot standby. This allows a fast failover 
> to a new NameNode in the case that a machine crashes, or a graceful 
> administrator-initiated failover for the purpose of planned maintenance.
> Please update hdfsDesign article with same facts to avoid confusion in 
> Reader's mind..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-23 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-7284:
--
Attachment: HDFS-7284.003.patch

[~yzhangal] Good idea! 
The output of your suggested change is:
2015-10-23 10:21:18,647 [IPC Server handler 7 on 51002] DEBUG BlockStateChange 
(BlockInfo.java:setGenerationStampAndVerifyReplicas(396)) - BLOCK* Removing 
stale replica 
ReplicaUC[[DISK]DS-b87b985d-6dc7-448e-9d45-dcd6c2c8ec37:NORMAL:127.0.0.1:51003|RBW]
 of blk_1073741826_1002

Attaching rev3 based on Yongjun's suggestion.

> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, 
> HDFS-7284.003.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9289) check genStamp when complete file

2015-10-23 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971302#comment-14971302
 ] 

Chang Li commented on HDFS-9289:


[~eclark], block on 10.210.31.38 should be marked as corrupt because it's from 
old pipeline right?

> check genStamp when complete file
> -
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/browse/HDFS-9289
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Critical
> Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HDFS-9297:
--
Attachment: HDFS-9297.001.patch

In this patch:
* Use {{corruptBlockOnDataNodesByDeletingBlockFile()}} to corrupt a block by 
removing all block files.
* Removed the test's own implementation of the same function.

> Update TestBlockMissingException to use 
> corruptBlockOnDataNodesByDeletingBlockFile()
> 
>
> Key: HDFS-9297
> URL: https://issues.apache.org/jira/browse/HDFS-9297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Attachments: HDFS-9297.001.patch
>
>
> TestBlockMissingException uses its own function to corrupt a block by 
> deleting all its block files. HDFS-7235 introduced a helper function 
> {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
> thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9295) Add a thorough test of the full KMS code path

2015-10-23 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-9295:
---
Attachment: HDFS-9295.002.patch

Fixed complier warnings

> Add a thorough test of the full KMS code path
> -
>
> Key: HDFS-9295
> URL: https://issues.apache.org/jira/browse/HDFS-9295
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HDFS-9295.001.patch, HDFS-9295.002.patch
>
>
> TestKMS does a good job of testing the ACLs directly, but they are tested out 
> of context.  Additional tests are needed that test how the ACL impact key 
> creation, EZ creation, file creation in an EZ, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9288) Add RapidXML to third-party

2015-10-23 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971362#comment-14971362
 ] 

Haohui Mai commented on HDFS-9288:
--

I think that requires us to fix pom.xml to exclude these files when checking 
ASF licenses. I think it's okay to separate it to another jira.

> Add RapidXML to third-party
> ---
>
> Key: HDFS-9288
> URL: https://issues.apache.org/jira/browse/HDFS-9288
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9288.HDFS-8707.001.patch
>
>
> Needed for Configuration class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9079) Erasure coding: preallocate multiple generation stamps and serialize updates from data streamers

2015-10-23 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971412#comment-14971412
 ] 

Zhe Zhang commented on HDFS-9079:
-

Thanks Nicholas for the comment.
bq. > ... => 2) Asks NN for new GS => 3) Gets new GS from NN => ...
bq. What is the difference between #2 and #3? Is it just a single RPC?
Yes it's a single RPC. I listed them as 2 steps because other events could 
happen between #2 and #3. E.g. while {{streamer_i}} is waiting for NN response 
{{streamer_j}} might start step #2.

bq. Do you mean that client may update GS without letting NN knowing it?
More details of the proposed protocol can be found [here | 
https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741972=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741972].
 When {{streamer_i}} encounters a DN failure, the coordinator will ask other 
streamers to update their DN to increment GS. After all healthy DNs acknowledge 
that they have bumped their local GSes, the coordinator send {{updatePipeline}} 
RPC to NN to update the NN's copy of GS. So there will never be "false stale" 
-- fresh replica being considered as stale. 

bq. How to save step #1?
Good catch, I meant saving steps 2~3.

> Erasure coding: preallocate multiple generation stamps and serialize updates 
> from data streamers
> 
>
> Key: HDFS-9079
> URL: https://issues.apache.org/jira/browse/HDFS-9079
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9079-HDFS-7285.00.patch, HDFS-9079.01.patch, 
> HDFS-9079.02.patch, HDFS-9079.03.patch, HDFS-9079.04.patch, HDFS-9079.05.patch
>
>
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) 
> Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) 
> Updates block on NN
> {code}
> To simplify the above we can preallocate GS when NN creates a new striped 
> block group ({{FSN#createNewBlock}}). For each new striped block group we can 
> reserve {{NUM_PARITY_BLOCKS}} GS's. Then steps 1~3 in the above sequence can 
> be saved. If more than {{NUM_PARITY_BLOCKS}} errors have happened we 
> shouldn't try to further recover anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs

2015-10-23 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971418#comment-14971418
 ] 

Jitendra Nath Pandey commented on HDFS-9184:


 I think any check at the client side can be followed up as a separate jira. It 
is not so critical, because rogue clients can circumvent a client side check 
anyway.

+1 for the latest patch. 
I also plan to commit it to branch-2, because this patch doesn't change the 
audit logs at all, unless explicitly enabled.

> Logging HDFS operation's caller context into audit logs
> ---
>
> Key: HDFS-9184
> URL: https://issues.apache.org/jira/browse/HDFS-9184
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, 
> HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, 
> HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch, 
> HDFS-9184.008.patch, HDFS-9184.009.patch
>
>
> For a given HDFS operation (e.g. delete file), it's very helpful to track 
> which upper level job issues it. The upper level callers may be specific 
> Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode 
> (NN) is abused/spammed, the operator may want to know immediately which MR 
> job should be blamed so that she can kill it. To this end, the caller context 
> contains at least the application-dependent "tracking id".
> There are several existing techniques that may be related to this problem.
> 1. Currently the HDFS audit log tracks the users of the the operation which 
> is obviously not enough. It's common that the same user issues multiple jobs 
> at the same time. Even for a single top level task, tracking back to a 
> specific caller in a chain of operations of the whole workflow (e.g.Oozie -> 
> Hive -> Yarn) is hard, if not impossible.
> 2. HDFS integrated {{htrace}} support for providing tracing information 
> across multiple layers. The span is created in many places interconnected 
> like a tree structure which relies on offline analysis across RPC boundary. 
> For this use case, {{htrace}} has to be enabled at 100% sampling rate which 
> introduces significant overhead. Moreover, passing additional information 
> (via annotations) other than span id from root of the tree to leaf is a 
> significant additional work.
> 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there 
> are some related discussion on this topic. The final patch implemented the 
> tracking id as a part of delegation token. This protects the tracking 
> information from being changed or impersonated. However, kerberos 
> authenticated connections or insecure connections don't have tokens. 
> [HADOOP-8779] proposes to use tokens in all the scenarios, but that might 
> mean changes to several upstream projects and is a major change in their 
> security implementation.
> We propose another approach to address this problem. We also treat HDFS audit 
> log as a good place for after-the-fact root cause analysis. We propose to put 
> the caller id (e.g. Hive query id) in threadlocals. Specially, on client side 
> the threadlocal object is passed to NN as a part of RPC header (optional), 
> while on sever side NN retrieves it from header and put it to {{Handler}}'s 
> threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the 
> caller context for each operation. In this way, the existing code is not 
> affected.
> It is still challenging to keep "lying" client from abusing the caller 
> context. Our proposal is to add a {{signature}} field to the caller context. 
> The client choose to provide its signature along with the caller id. The 
> operator may need to validate the signature at the time of offline analysis. 
> The NN is not responsible for validating the signature online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot

2015-10-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9231:

Attachment: HDFS-9231.005.patch

> fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
> ---
>
> Key: HDFS-9231
> URL: https://issues.apache.org/jira/browse/HDFS-9231
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, 
> HDFS-9231.003.patch, HDFS-9231.004.patch, HDFS-9231.005.patch
>
>
> Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt 
> blocks with the original file dir instead of the snapshot dir, and {{fsck 
> -list-corruptfileblocks -includeSnapshots}} behave the same.
> This can be confusing because even when the original file is deleted, fsck 
> will still show that deleted file as corrupted, although what's actually 
> corrupted is the snapshot. 
> As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot

2015-10-23 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971469#comment-14971469
 ] 

Xiao Chen commented on HDFS-9231:
-

Thanks a lot for the review [~yzhangal]!
{quote}
1. The description is not quite accurate per our discussion, suggest to modify. 
Especially the patch actually does change (and fix) the behavior when without 
-includeSnapshots.
{quote}
It was great to talk to you. I have updated the description. Modified patch 
summary in the end of this comment.
{quote}
2. A possible optimization in FSDirSnapshotOp#getSnapshotFiles. It seems that 
the sf variable could be calculated in caller for once before the loop in the 
caller, and pass to this method.
{quote}
My apologies for the confusion, I added some comments in this method. But 
getting sf for each snapshottable dir is needed, since /d1 and /d2 have 
different snapshotlist.
{quote}
3. final INodesInPath iip = fsd.getINodesInPath4Write(snap, false); maybe 
substituted with call to getINodesInPath
{quote}
Good catch! I updated the code to call {{getINode}} which invokes 
{{getINodesInPath}}.
{quote}
4. The check if (!corruptFileBlocks.isEmpty()) in 
listCorruptFileBlocksWithSnapshot is not needed
{quote}
Good call. Fixed.
{quote}
5. Add comment in listCorruptFileBlocks() before the call 
namenode.getNamesystem().listCorruptFileBlocksWithSnapshot, to indicate that 
snapshottableDirs is only relevant when -includeSnapshots is specified.
{quote}
Added a link to {{FSNamesystem#listCorruptFileBlocksWithSnapshot}} which 
explains that parameter in javadoc.
{quote}
6. In listCorruptFileBlocksWithSnapshot, we can add
{code}
if (snapshottableDirs == null) {
  continue;
}
{code}
to avoid the call to getSnapshotFiles.
{quote}
I'm not sure this is necessary. On one hand, it definitely saves 1 call stack. 
On the other hand, with the existence of all those loops and checks, I think 
the performance gain of saving 1 call stack would be trivial.  And the nullity 
check of snapshottableDirs is already performed as a first step in 
{{getSnapshotFiles}}.

Attached patch 005 with the above modifications. Updated summary below:
- {{fsck -list-corruptfileblocks -includeSnapshots}} will also show full dir of 
snapshots
- {{fsck -list-corruptfileblocks}} without -includeSnapshots will not show 
corrupt blocks that only have snapshot files
- NameNode WebUI's way of showing corrupted files/blocks unchanged.
- Added a sentence in NN WebUI to hint the admin to run fsck with 
-includeSnapshots, if there're snapshots present in the system.
- Some refactoring to reuse existing code in new methods getSnapshottableDirs 
and ListCorruptFileBlocksWithSnapshot
- The reasoning of keep minimal change to NN WebUI and fsck without 
-includeSnapshots is that getting all possible snapshots may be slow.

> fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
> ---
>
> Key: HDFS-9231
> URL: https://issues.apache.org/jira/browse/HDFS-9231
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, 
> HDFS-9231.003.patch, HDFS-9231.004.patch, HDFS-9231.005.patch
>
>
> Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt 
> blocks with the original file dir instead of the snapshot dir, and {{fsck 
> -list-corruptfileblocks -includeSnapshots}} behave the same.
> This can be confusing because even when the original file is deleted, fsck 
> will still show that deleted file as corrupted, although what's actually 
> corrupted is the snapshot. 
> As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot

2015-10-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9231:

Status: Patch Available  (was: Open)

> fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
> ---
>
> Key: HDFS-9231
> URL: https://issues.apache.org/jira/browse/HDFS-9231
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, 
> HDFS-9231.003.patch, HDFS-9231.004.patch, HDFS-9231.005.patch
>
>
> Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt 
> blocks with the original file dir instead of the snapshot dir, and {{fsck 
> -list-corruptfileblocks -includeSnapshots}} behave the same.
> This can be confusing because even when the original file is deleted, fsck 
> will still show that deleted file as corrupted, although what's actually 
> corrupted is the snapshot. 
> As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9254) HDFS Secure Mode Documentation updates

2015-10-23 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971312#comment-14971312
 ] 

Arpit Agarwal commented on HDFS-9254:
-

So yes it looks like at least the Journal Node doesn't like principals without 
a host component.

{code}
192.168.56.80:8485: Failed on local exception: java.io.IOException: 
java.lang.IllegalArgumentException: Kerberos principal name does NOT have the 
expected hostname part: j...@example.com; Host Details : local host is: 
"cm0.example.com/192.168.56.80"; destination host is: "cm0.example.com":8485;
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)
at 
org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:899)
{code}

Whereas SecurityUtil handles them fine. We should be consistent. I'll file a 
separate bug to fix the JN, and any other components I run into, but also 
update the doc patch for now. Thanks for the catch.

> HDFS Secure Mode Documentation updates
> --
>
> Key: HDFS-9254
> URL: https://issues.apache.org/jira/browse/HDFS-9254
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9254.01.patch
>
>
> Some Kerberos configuration parameters are not documented well enough. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9077) webhdfs client requires SPNEGO to do renew

2015-10-23 Thread HeeSoo Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HeeSoo Kim updated HDFS-9077:
-
Attachment: HDFS-9077.002.patch

> webhdfs client requires SPNEGO to do renew
> --
>
> Key: HDFS-9077
> URL: https://issues.apache.org/jira/browse/HDFS-9077
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
> Attachments: HDFS-9077.001.patch, HDFS-9077.002.patch, HDFS-9077.patch
>
>
> Simple bug.
> webhdfs (the file system) doesn't pass delegation= in its REST call to renew 
> the same token.  This forces a SPNEGO (or other auth) instead of just 
> renewing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9229) Expose size of NameNode directory as a metric

2015-10-23 Thread Surendra Singh Lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971375#comment-14971375
 ] 

Surendra Singh Lilhore commented on HDFS-9229:
--

Thanks [~wheat9] for suggestion..
Can I move this metric in {{NameNodeStatusMXBean}} ??

> Expose size of NameNode directory as a metric
> -
>
> Key: HDFS-9229
> URL: https://issues.apache.org/jira/browse/HDFS-9229
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Surendra Singh Lilhore
>Priority: Minor
> Attachments: HDFS-9229.001.patch, HDFS-9229.002.patch, 
> HDFS-9229.003.patch
>
>
> Useful for admins in reserving / managing NN local file system space. Also 
> useful when transferring NN backups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8808) dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby

2015-10-23 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971388#comment-14971388
 ] 

Zhe Zhang commented on HDFS-8808:
-

Thanks ATM for reviewing again. I just triggered Jenkins since the last run was 
from a month ago.

> dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby
> 
>
> Key: HDFS-8808
> URL: https://issues.apache.org/jira/browse/HDFS-8808
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Gautam Gopalakrishnan
>Assignee: Zhe Zhang
> Attachments: HDFS-8808-00.patch, HDFS-8808-01.patch, 
> HDFS-8808-02.patch, HDFS-8808-03.patch, HDFS-8808.04.patch
>
>
> The parameter {{dfs.image.transfer.bandwidthPerSec}} can be used to limit the 
> speed with which the fsimage is copied between the namenodes during regular 
> use. However, as a side effect, this also limits transfers when the 
> {{-bootstrapStandby}} option is used. This option is often used during 
> upgrades and could potentially slow down the entire workflow. The request 
> here is to ensure {{-bootstrapStandby}} is unaffected by this bandwidth 
> setting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot

2015-10-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9231:

Status: Open  (was: Patch Available)

> fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
> ---
>
> Key: HDFS-9231
> URL: https://issues.apache.org/jira/browse/HDFS-9231
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, 
> HDFS-9231.003.patch, HDFS-9231.004.patch
>
>
> For snapshot files, fsck shows corrupt blocks with the original file dir 
> instead of the snapshot dir.
> This can be confusing since even when the original file is deleted, a new 
> fsck run will still show that file as corrupted although what's actually 
> corrupted is the snapshot. 
> This is true even when given the -includeSnapshots option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971441#comment-14971441
 ] 

Anu Engineer commented on HDFS-4015:


none of the test failures seem to be related to this patch.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971297#comment-14971297
 ] 

Hadoop QA commented on HDFS-7284:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 25s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 25s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 11s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  50m 12s | Tests failed in hadoop-hdfs. |
| | |  96m 53s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap |
|   | hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots |
|   | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshot |
|   | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport |
| Timed out tests | 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | org.apache.hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768308/HDFS-7284.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 35a303d |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13153/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13153/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13153/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13153/console |


This message was automatically generated.

> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8631) WebHDFS : Support list/setQuota

2015-10-23 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-8631:
-
Attachment: HDFS-8631-002.patch

Attached updated patch..
Please review 

> WebHDFS : Support list/setQuota
> ---
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9289) check genStamp when complete file

2015-10-23 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated HDFS-9289:
---
Attachment: HDFS-9289.2.patch

.2 patch include test. also include info of encountered genStamp and expected 
genStamp in exception

> check genStamp when complete file
> -
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/browse/HDFS-9289
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Critical
> Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Tony Wu (JIRA)
Tony Wu created HDFS-9297:
-

 Summary: Update TestBlockMissingException to use 
corruptBlockOnDataNodesByDeletingBlockFile()
 Key: HDFS-9297
 URL: https://issues.apache.org/jira/browse/HDFS-9297
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS, test
Affects Versions: 2.7.1
Reporter: Tony Wu
Assignee: Tony Wu
Priority: Trivial


TestBlockMissingException uses its own function to corrupt a block by deleting 
all its block files. HDFS-7235 introduced a helper function 
{{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9290) DFSClient#callAppend() is not backward compatible for slightly older NameNodes

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972300#comment-14972300
 ] 

Hudson commented on HDFS-9290:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2522 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2522/])
HDFS-9290. DFSClient#callAppend() is not backward compatible for (kihwal: rev 
b9e0417bdf2b9655dc4256bdb43683eca1ab46be)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java


> DFSClient#callAppend() is not backward compatible for slightly older NameNodes
> --
>
> Key: HDFS-9290
> URL: https://issues.apache.org/jira/browse/HDFS-9290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Blocker
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-9290.001.patch, HDFS-9290.002.patch
>
>
> HDFS-7210 combined 2 RPC calls used at file append into a single one. 
> Specifically {{getFileInfo()}} is combined with {{append()}}. While backward 
> compatibility for older client is handled by the new NameNode (protobuf). 
> Newer client's {{append()}} call does not work with older NameNodes. One will 
> run into an exception like the following:
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.isLazyPersist(DFSOutputStream.java:1741)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.getChecksum4Compute(DFSOutputStream.java:1550)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1560)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1670)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1717)
> at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1861)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1922)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1892)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:340)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:336)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:336)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:318)
> at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1164)
> {code}
> The cause is that the new client code is expecting both the last block and 
> file info in the same RPC but the old NameNode only replied with the first. 
> The exception itself does not reflect this and one will have to look at the 
> HDFS source code to really understand what happened.
> We can have the client detect it's talking to a old NameNode and send an 
> extra {{getFileInfo()}} RPC. Or we should improve the exception being thrown 
> to accurately reflect the cause of failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9301) HDFS clients can't construct HdfsConfiguration instances

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972301#comment-14972301
 ] 

Hudson commented on HDFS-9301:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2522 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2522/])
HDFS-9301. HDFS clients can't construct HdfsConfiguration instances. (wheat9: 
rev 15eb84b37e6c0195d59d3a29fbc5b7417bf022ff)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfigurationLoader.java


> HDFS clients can't construct HdfsConfiguration instances
> 
>
> Key: HDFS-9301
> URL: https://issues.apache.org/jira/browse/HDFS-9301
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, 
> HDFS-9241.002.patch, HDFS-9241.003.patch, HDFS-9241.004.patch, 
> HDFS-9241.005.patch
>
>
> the changes for the hdfs client classpath make instantiating 
> {{HdfsConfiguration}} from the client impossible; it only lives server side. 
> This breaks any app which creates one.
> I know people will look at the {{@Private}} tag and say "don't do that then", 
> but it's worth considering precisely why I, at least, do this: it's the only 
> way to guarantee that the hdfs-default and hdfs-site resources get on the 
> classpath, including all the security settings. It's precisely the use case 
> which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code.
> What am I meant to do now? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-4015:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to branch-2 for 2.8.0.

Thanks for contributing this improvement [~anu], and thanks for the reviews 
[~liuml07] and [~jnp].

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972342#comment-14972342
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9290) DFSClient#callAppend() is not backward compatible for slightly older NameNodes

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972341#comment-14972341
 ] 

Hudson commented on HDFS-9290:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/])
HDFS-9290. DFSClient#callAppend() is not backward compatible for (kihwal: rev 
b9e0417bdf2b9655dc4256bdb43683eca1ab46be)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java


> DFSClient#callAppend() is not backward compatible for slightly older NameNodes
> --
>
> Key: HDFS-9290
> URL: https://issues.apache.org/jira/browse/HDFS-9290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Blocker
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-9290.001.patch, HDFS-9290.002.patch
>
>
> HDFS-7210 combined 2 RPC calls used at file append into a single one. 
> Specifically {{getFileInfo()}} is combined with {{append()}}. While backward 
> compatibility for older client is handled by the new NameNode (protobuf). 
> Newer client's {{append()}} call does not work with older NameNodes. One will 
> run into an exception like the following:
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.isLazyPersist(DFSOutputStream.java:1741)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.getChecksum4Compute(DFSOutputStream.java:1550)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1560)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1670)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1717)
> at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1861)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1922)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1892)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:340)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:336)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:336)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:318)
> at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1164)
> {code}
> The cause is that the new client code is expecting both the last block and 
> file info in the same RPC but the old NameNode only replied with the first. 
> The exception itself does not reflect this and one will have to look at the 
> HDFS source code to really understand what happened.
> We can have the client detect it's talking to a old NameNode and send an 
> extra {{getFileInfo()}} RPC. Or we should improve the exception being thrown 
> to accurately reflect the cause of failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9301) HDFS clients can't construct HdfsConfiguration instances

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972343#comment-14972343
 ] 

Hudson commented on HDFS-9301:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/])
HDFS-9301. HDFS clients can't construct HdfsConfiguration instances. (wheat9: 
rev 15eb84b37e6c0195d59d3a29fbc5b7417bf022ff)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfigurationLoader.java


> HDFS clients can't construct HdfsConfiguration instances
> 
>
> Key: HDFS-9301
> URL: https://issues.apache.org/jira/browse/HDFS-9301
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, 
> HDFS-9241.002.patch, HDFS-9241.003.patch, HDFS-9241.004.patch, 
> HDFS-9241.005.patch
>
>
> the changes for the hdfs client classpath make instantiating 
> {{HdfsConfiguration}} from the client impossible; it only lives server side. 
> This breaks any app which creates one.
> I know people will look at the {{@Private}} tag and say "don't do that then", 
> but it's worth considering precisely why I, at least, do this: it's the only 
> way to guarantee that the hdfs-default and hdfs-site resources get on the 
> classpath, including all the security settings. It's precisely the use case 
> which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code.
> What am I meant to do now? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972344#comment-14972344
 ] 

Hudson commented on HDFS-9297:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/])
HDFS-9297. Update TestBlockMissingException to use (lei: rev 
5679e46b7f867f8f7f8195c86c37e3db7b23d7d7)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java


> Update TestBlockMissingException to use 
> corruptBlockOnDataNodesByDeletingBlockFile()
> 
>
> Key: HDFS-9297
> URL: https://issues.apache.org/jira/browse/HDFS-9297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9297.001.patch
>
>
> TestBlockMissingException uses its own function to corrupt a block by 
> deleting all its block files. HDFS-7235 introduced a helper function 
> {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
> thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972285#comment-14972285
 ] 

Hudson commented on HDFS-9297:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8699 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8699/])
HDFS-9297. Update TestBlockMissingException to use (lei: rev 
5679e46b7f867f8f7f8195c86c37e3db7b23d7d7)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java


> Update TestBlockMissingException to use 
> corruptBlockOnDataNodesByDeletingBlockFile()
> 
>
> Key: HDFS-9297
> URL: https://issues.apache.org/jira/browse/HDFS-9297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9297.001.patch
>
>
> TestBlockMissingException uses its own function to corrupt a block by 
> deleting all its block files. HDFS-7235 introduced a helper function 
> {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
> thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972385#comment-14972385
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #578 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/578/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972386#comment-14972386
 ] 

Hudson commented on HDFS-9297:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #578 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/578/])
HDFS-9297. Update TestBlockMissingException to use (lei: rev 
5679e46b7f867f8f7f8195c86c37e3db7b23d7d7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Update TestBlockMissingException to use 
> corruptBlockOnDataNodesByDeletingBlockFile()
> 
>
> Key: HDFS-9297
> URL: https://issues.apache.org/jira/browse/HDFS-9297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9297.001.patch
>
>
> TestBlockMissingException uses its own function to corrupt a block by 
> deleting all its block files. HDFS-7235 introduced a helper function 
> {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
> thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972284#comment-14972284
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8699 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8699/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972348#comment-14972348
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1313 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1313/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972349#comment-14972349
 ] 

Hudson commented on HDFS-9297:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1313 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1313/])
HDFS-9297. Update TestBlockMissingException to use (lei: rev 
5679e46b7f867f8f7f8195c86c37e3db7b23d7d7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Update TestBlockMissingException to use 
> corruptBlockOnDataNodesByDeletingBlockFile()
> 
>
> Key: HDFS-9297
> URL: https://issues.apache.org/jira/browse/HDFS-9297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9297.001.patch
>
>
> TestBlockMissingException uses its own function to corrupt a block by 
> deleting all its block files. HDFS-7235 introduced a helper function 
> {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
> thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972363#comment-14972363
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2523 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2523/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972365#comment-14972365
 ] 

Hudson commented on HDFS-9297:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2523 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2523/])
HDFS-9297. Update TestBlockMissingException to use (lei: rev 
5679e46b7f867f8f7f8195c86c37e3db7b23d7d7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Update TestBlockMissingException to use 
> corruptBlockOnDataNodesByDeletingBlockFile()
> 
>
> Key: HDFS-9297
> URL: https://issues.apache.org/jira/browse/HDFS-9297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9297.001.patch
>
>
> TestBlockMissingException uses its own function to corrupt a block by 
> deleting all its block files. HDFS-7235 introduced a helper function 
> {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
> thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972426#comment-14972426
 ] 

Hadoop QA commented on HDFS-7284:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 17s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 17s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 25s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 35s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 40s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 13s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  51m 17s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 36s | Tests passed in 
hadoop-hdfs-client. |
| | | 103m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.util.TestByteArrayManager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768378/HDFS-7284.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7781fe1 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13178/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13178/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13178/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13178/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13178/console |


This message was automatically generated.

> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, 
> HDFS-7284.003.patch, HDFS-7284.004.patch, HDFS-7284.005.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8831) Trash Support for deletion in HDFS encryption zone

2015-10-23 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8831:
-
Attachment: HDFS-8831.02.patch

> Trash Support for deletion in HDFS encryption zone
> --
>
> Key: HDFS-8831
> URL: https://issues.apache.org/jira/browse/HDFS-8831
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, 
> HDFS-8831.01.patch, HDFS-8831.02.patch
>
>
> Currently, "Soft Delete" is only supported if the whole encryption zone is 
> deleted. If you delete files whinin the zone with trash feature enabled, you 
> will get error similar to the following 
> {code}
> rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: 
> /z1_1/startnn.sh can't be moved from an encryption zone.
> {code}
> With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of 
> the file being deleted appropriately to the same encryption zone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972372#comment-14972372
 ] 

Hudson commented on HDFS-9297:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #591 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/591/])
HDFS-9297. Update TestBlockMissingException to use (lei: rev 
5679e46b7f867f8f7f8195c86c37e3db7b23d7d7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Update TestBlockMissingException to use 
> corruptBlockOnDataNodesByDeletingBlockFile()
> 
>
> Key: HDFS-9297
> URL: https://issues.apache.org/jira/browse/HDFS-9297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9297.001.patch
>
>
> TestBlockMissingException uses its own function to corrupt a block by 
> deleting all its block files. HDFS-7235 introduced a helper function 
> {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same 
> thing. We can update this test to use the helper function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972370#comment-14972370
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #591 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/591/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972367#comment-14972367
 ] 

Arpit Agarwal commented on HDFS-4015:
-

Committed to trunk. Keeping Jira open for the branch-2 commit.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9279) Decomissioned capacity should not be considered for configured/used capacity

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972309#comment-14972309
 ] 

Hadoop QA commented on HDFS-9279:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 29s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  4s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 31s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 40s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 33s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  64m 26s | Tests failed in hadoop-hdfs. |
| | | 108m 34s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestInterDatanodeProtocol |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.TestReplaceDatanodeOnFailure |
|   | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.server.namenode.TestCacheDirectives |
|   | hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768433/HDFS-9279-v2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5679e46 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13176/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13176/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13176/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13176/console |


This message was automatically generated.

> Decomissioned capacity should not be considered for configured/used capacity
> 
>
> Key: HDFS-9279
> URL: https://issues.apache.org/jira/browse/HDFS-9279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9279-v1.patch, HDFS-9279-v2.patch
>
>
> Capacity of a decommissioned node is being accounted as configured and used 
> capacity metrics. This gives incorrect perception of cluster usage.
> Once a node is decommissioned, its capacity should be considered similar to a 
> dead node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >