[jira] [Updated] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-06-15 Thread Wan Chang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HDFS-15175:
-
Attachment: HDFS-15175-trunk.1.patch
Labels: NameNode  (was: )
Status: Patch Available  (was: Open)

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-06-12 Thread Wan Chang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133970#comment-17133970
 ] 

Wan Chang commented on HDFS-15175:
--

Seems that I can't attach file in comment

Here is the diff content
{code:java}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
index 963628f9ac4..2d5ed13b6f9 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
@@ -412,6 +412,17 @@ private static void write(List aclEntries, 
DataOutputStream out)
 return PBHelperClient.convertXAttrs(proto.getXAttrsList());
   }
 
+  private static Block[] deepCopy(Block[] blocks) {
+if (blocks == null || blocks.length == 0) {
+  return blocks;
+}
+Block[] copy = new Block[blocks.length];
+for (int i = 0; i < blocks.length; ++i) {
+  copy[i] = blocks[i] == null ? null : new Block(blocks[i]);
+}
+return copy;
+  }
+
   @SuppressWarnings("unchecked")
   static abstract class AddCloseOp
  extends FSEditLogOp
@@ -500,7 +511,7 @@ public String getPath() {
 throw new RuntimeException("Can't have more than " + MAX_BLOCKS +
 " in an AddCloseOp.");
   }
-  this.blocks = blocks;
+  this.blocks = FSEditLogOp.deepCopy(blocks);
   return (T)this;
 }
 
@@ -978,7 +989,7 @@ public String getPath() {
 }
 
 AddBlockOp setPenultimateBlock(Block pBlock) {
-  this.penultimateBlock = pBlock;
+  this.penultimateBlock = pBlock == null ? null : new Block(pBlock);
   return this;
 }
 
@@ -987,7 +998,7 @@ Block getPenultimateBlock() {
 }
 
 AddBlockOp setLastBlock(Block lastBlock) {
-  this.lastBlock = lastBlock;
+  this.lastBlock = lastBlock == null ? null : new Block(lastBlock);
   return this;
 }
 
@@ -1090,7 +1101,7 @@ public String getPath() {
 }
 
 UpdateBlocksOp setBlocks(Block[] blocks) {
-  this.blocks = blocks;
+  this.blocks = FSEditLogOp.deepCopy(blocks);
   return this;
 }
 
@@ -2881,7 +2892,8 @@ TruncateOp setTimestamp(long timestamp) {
 }
 
 TruncateOp setTruncateBlock(Block truncateBlock) {
-  this.truncateBlock = truncateBlock;
+  this.truncateBlock = truncateBlock == null ?
+  null : new Block(truncateBlock);
   return this;
 }
{code}

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Critical
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> 

[jira] [Commented] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-06-12 Thread Wan Chang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133963#comment-17133963
 ] 

Wan Chang commented on HDFS-15175:
--

[~haiyang Hu], sorry for the late response.

You may need OEV tool to fix the corrupted editLog first.

I attached the patch file. Hope that helps

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Critical
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-05-13 Thread Wan Chang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106439#comment-17106439
 ] 

Wan Chang commented on HDFS-15175:
--

Hi [~caiyicong] 

I dig for a while but can not find a convenient way to reproduce it. I just fix 
this issue as mentioned above.

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Critical
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-04-26 Thread Wan Chang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092532#comment-17092532
 ] 

Wan Chang edited comment on HDFS-15175 at 4/26/20, 6:27 AM:


We encountered this bug in our test environment.

Seems that it can be fixed by a deep copy of the blocks while generate 
FSEditlogOp?


was (Author: wanchang):
We encountered this bug in our test environment.

Seems that it can be fixed by a deep copy of the blocks within 
FSEditLogAsync.logEdit(FSEditLogOp) ?

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Critical
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-04-26 Thread Wan Chang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092532#comment-17092532
 ] 

Wan Chang commented on HDFS-15175:
--

We encountered this bug in our test environment.

Seems that it can be fixed by a deep copy of the blocks within 
FSEditLogAsync.logEdit(FSEditLogOp) ?

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Critical
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag

2016-04-18 Thread Wan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HDFS-10265:
-
Status: Open  (was: Patch Available)

> OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
> -
>
> Key: HDFS-10265
> URL: https://issues.apache.org/jira/browse/HDFS-10265
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1, 2.4.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-10265-001.patch
>
>
> I use OEV tool to convert editlog to xml file, then convert the xml file back 
> to binary editslog file(so that low version NameNode can load edits that 
> generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
> tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.
> Here is the stack:
> {code}
> fromXml error decoding opcode null
> {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
>  {"-2"}, {},
> {"3875711"}}
> Encountered exception. Exiting: no entry found for BLOCK
> org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
> BLOCK
> at 
> org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
> ...
> {code}
> Here is part of the xml file:
> {code}
> 
>   OP_UPDATE_BLOCKS
>   
> 3875711
> 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5
> 
> -2
>   
> 
> {code}
> I tracked the NN's log and found those operation:
> 0. The file 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is 
> very small and contains only one block.
> 1. Client ask NN to add block to the file.
> 2. Client failed to write to DN and asked NameNode to abandon block.
> 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog
> Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags.
> In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag

2016-04-18 Thread Wan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HDFS-10265:
-
Status: Patch Available  (was: Open)

submit patch

> OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
> -
>
> Key: HDFS-10265
> URL: https://issues.apache.org/jira/browse/HDFS-10265
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1, 2.4.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-10265-001.patch, HDFS-10265-002.patch
>
>
> I use OEV tool to convert editlog to xml file, then convert the xml file back 
> to binary editslog file(so that low version NameNode can load edits that 
> generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
> tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.
> Here is the stack:
> {code}
> fromXml error decoding opcode null
> {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
>  {"-2"}, {},
> {"3875711"}}
> Encountered exception. Exiting: no entry found for BLOCK
> org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
> BLOCK
> at 
> org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
> ...
> {code}
> Here is part of the xml file:
> {code}
> 
>   OP_UPDATE_BLOCKS
>   
> 3875711
> 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5
> 
> -2
>   
> 
> {code}
> I tracked the NN's log and found those operation:
> 0. The file 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is 
> very small and contains only one block.
> 1. Client ask NN to add block to the file.
> 2. Client failed to write to DN and asked NameNode to abandon block.
> 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog
> Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags.
> In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag

2016-04-18 Thread Wan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HDFS-10265:
-
Attachment: HDFS-10265-002.patch

> OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
> -
>
> Key: HDFS-10265
> URL: https://issues.apache.org/jira/browse/HDFS-10265
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.4.1, 2.7.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-10265-001.patch, HDFS-10265-002.patch
>
>
> I use OEV tool to convert editlog to xml file, then convert the xml file back 
> to binary editslog file(so that low version NameNode can load edits that 
> generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
> tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.
> Here is the stack:
> {code}
> fromXml error decoding opcode null
> {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
>  {"-2"}, {},
> {"3875711"}}
> Encountered exception. Exiting: no entry found for BLOCK
> org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
> BLOCK
> at 
> org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
> ...
> {code}
> Here is part of the xml file:
> {code}
> 
>   OP_UPDATE_BLOCKS
>   
> 3875711
> 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5
> 
> -2
>   
> 
> {code}
> I tracked the NN's log and found those operation:
> 0. The file 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is 
> very small and contains only one block.
> 1. Client ask NN to add block to the file.
> 2. Client failed to write to DN and asked NameNode to abandon block.
> 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog
> Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags.
> In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag

2016-04-16 Thread Wan Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244503#comment-15244503
 ] 

Wan Chang commented on HDFS-10265:
--

Sorry for the late to relay. Thanks for your suggestions, I will update this 
patch soon.

> OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
> -
>
> Key: HDFS-10265
> URL: https://issues.apache.org/jira/browse/HDFS-10265
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.4.1, 2.7.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-10265-001.patch
>
>
> I use OEV tool to convert editlog to xml file, then convert the xml file back 
> to binary editslog file(so that low version NameNode can load edits that 
> generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
> tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.
> Here is the stack:
> {code}
> fromXml error decoding opcode null
> {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
>  {"-2"}, {},
> {"3875711"}}
> Encountered exception. Exiting: no entry found for BLOCK
> org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
> BLOCK
> at 
> org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
> ...
> {code}
> Here is part of the xml file:
> {code}
> 
>   OP_UPDATE_BLOCKS
>   
> 3875711
> 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5
> 
> -2
>   
> 
> {code}
> I tracked the NN's log and found those operation:
> 0. The file 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is 
> very small and contains only one block.
> 1. Client ask NN to add block to the file.
> 2. Client failed to write to DN and asked NameNode to abandon block.
> 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog
> Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags.
> In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag

2016-04-06 Thread Wan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HDFS-10265:
-
Description: 
I use OEV tool to convert editlog to xml file, then convert the xml file back 
to binary editslog file(so that low version NameNode can load edits that 
generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.

Here is the stack:
{code}
fromXml error decoding opcode null
{{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
 {"-2"}, {},
{"3875711"}}
Encountered exception. Exiting: no entry found for BLOCK
org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
BLOCK
at 
org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
...
{code}

Here is part of the xml file:
{code}

  OP_UPDATE_BLOCKS
  
3875711

/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5

-2
  

{code}

I tracked the NN's log and found those operation:

0. The file 
/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 is very 
small and contains only one block.
1. Client ask NN to add block to the file.
2. Client failed to write to DN and asked NameNode to abandon block.
3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog

Finally NN generated a OP_UPDATE_BLOCKS with no BLOCK tags.

In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.

  was:
I use OEV tool to convert editlog to xml file, then convert the xml file back 
to binary editslog file(so that low version NameNode can load edits that 
generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.

Here is the stack:
{code}
fromXml error decoding opcode null
{{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
 {"-2"}, {},
{"3875711"}}
Encountered exception. Exiting: no entry found for BLOCK
org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
BLOCK
at 
org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
...
{code}

Here is part of the xml file:
{code}

  OP_UPDATE_BLOCKS
  
3875711

/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5

-2
  

{code}

I tracked the NN's log and found those operation:
1. Client ask NN to add block to the file.
2. Client failed to write to DN and asked NameNode to abandon block.
3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog

The file /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 
contains only one block, so NN generated a OP_UPDATE_BLOCKS with no BLOCK tags.

In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.


> OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
> -
>
> Key: HDFS-10265
> URL: https://issues.apache.org/jira/browse/HDFS-10265
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.4.1, 2.7.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-10265-001.patch
>
>
> I use OEV tool to convert editlog to xml file, then convert the xml file back 
> to binary editslog file(so that low version NameNode can load edits that 
> generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
> tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.
> Here is the stack:
> {code}
> fromXml error decoding opcode null
> {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
>  {"-2"}, {},
> {"3875711"}}
> Encountered exception. Exiting: no entry found for BLOCK
> org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
> BLOCK
> at 
> org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
> ...
> {code}
> Here is part of the xml file:
> {code}
> 
>   OP_UPDATE_BLOCKS
>   
> 

[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag

2016-04-06 Thread Wan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HDFS-10265:
-
Description: 
I use OEV tool to convert editlog to xml file, then convert the xml file back 
to binary editslog file(so that low version NameNode can load edits that 
generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.

Here is the stack:
{code}
fromXml error decoding opcode null
{{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
 {"-2"}, {},
{"3875711"}}
Encountered exception. Exiting: no entry found for BLOCK
org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
BLOCK
at 
org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
...
{code}

Here is part of the xml file:
{code}

  OP_UPDATE_BLOCKS
  
3875711

/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5

-2
  

{code}

I tracked the NN's log and found those operation:
1. Client ask NN to add block to the file.
2. Client failed to write to DN and asked NameNode to abandon block.
3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog

The file /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 
contains only one block, so NN generated a OP_UPDATE_BLOCKS with no BLOCK tags.

In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.

  was:
I use OEV tool to convert editlog to xml file, then convert the xml file back 
to binary editslog file(so that low level NameNode can load edits that 
generated by high level NameNode). But when OP_UPDATE_BLOCKS has no BLOCK tag, 
the OEV tool doesn't handle the case and exits with InvalidXmlException.

Here is the stack:
{code}
fromXml error decoding opcode null
{{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
 {"-2"}, {},
{"3875711"}}
Encountered exception. Exiting: no entry found for BLOCK
org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
BLOCK
at 
org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
...
{code}

Here is part of the xml file:
{code}

  OP_UPDATE_BLOCKS
  
3875711

/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5

-2
  

{code}

I tracked the NN's log and found those operation:
1. Client ask NN to add block to the file.
2. Client failed to write to DN and asked NameNode to abandon block.
3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog

The file /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 
contains only one block, so NN generated a OP_UPDATE_BLOCKS with no BLOCK tags.

In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.


> OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
> -
>
> Key: HDFS-10265
> URL: https://issues.apache.org/jira/browse/HDFS-10265
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.4.1, 2.7.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-10265-001.patch
>
>
> I use OEV tool to convert editlog to xml file, then convert the xml file back 
> to binary editslog file(so that low version NameNode can load edits that 
> generated by higher version NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
> tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.
> Here is the stack:
> {code}
> fromXml error decoding opcode null
> {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
>  {"-2"}, {},
> {"3875711"}}
> Encountered exception. Exiting: no entry found for BLOCK
> org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
> BLOCK
> at 
> org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
> ...
> {code}
> Here is part of the xml file:
> {code}
> 
>   OP_UPDATE_BLOCKS
>   
> 3875711
> 
> 

[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag

2016-04-06 Thread Wan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HDFS-10265:
-
Status: Patch Available  (was: Open)

submit patch

> OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
> -
>
> Key: HDFS-10265
> URL: https://issues.apache.org/jira/browse/HDFS-10265
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1, 2.4.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-10265-001.patch
>
>
> I use OEV tool to convert editlog to xml file, then convert the xml file back 
> to binary editslog file(so that low level NameNode can load edits that 
> generated by high level NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
> tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.
> Here is the stack:
> {code}
> fromXml error decoding opcode null
> {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
>  {"-2"}, {},
> {"3875711"}}
> Encountered exception. Exiting: no entry found for BLOCK
> org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
> BLOCK
> at 
> org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
> ...
> {code}
> Here is part of the xml file:
> {code}
> 
>   OP_UPDATE_BLOCKS
>   
> 3875711
> 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5
> 
> -2
>   
> 
> {code}
> I tracked the NN's log and found those operation:
> 1. Client ask NN to add block to the file.
> 2. Client failed to write to DN and asked NameNode to abandon block.
> 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog
> The file 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 
> contains only one block, so NN generated a OP_UPDATE_BLOCKS with no BLOCK 
> tags.
> In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag

2016-04-06 Thread Wan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HDFS-10265:
-
Attachment: HDFS-10265-001.patch

> OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
> -
>
> Key: HDFS-10265
> URL: https://issues.apache.org/jira/browse/HDFS-10265
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.4.1, 2.7.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-10265-001.patch
>
>
> I use OEV tool to convert editlog to xml file, then convert the xml file back 
> to binary editslog file(so that low level NameNode can load edits that 
> generated by high level NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
> tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.
> Here is the stack:
> {code}
> fromXml error decoding opcode null
> {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
>  {"-2"}, {},
> {"3875711"}}
> Encountered exception. Exiting: no entry found for BLOCK
> org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
> BLOCK
> at 
> org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
> ...
> {code}
> Here is part of the xml file:
> {code}
> 
>   OP_UPDATE_BLOCKS
>   
> 3875711
> 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5
> 
> -2
>   
> 
> {code}
> I tracked the NN's log and found those operation:
> 1. Client ask NN to add block to the file.
> 2. Client failed to write to DN and asked NameNode to abandon block.
> 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog
> The file 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 
> contains only one block, so NN generated a OP_UPDATE_BLOCKS with no BLOCK 
> tags.
> In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag

2016-04-06 Thread Wan Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227787#comment-15227787
 ] 

Wan Chang commented on HDFS-10265:
--

FSEditLogOp$UpdateBlocksOp.fromXml:
{code}
void fromXml(Stanza st) throws InvalidXmlException {
  this.path = st.getValue("PATH");
  List blocks = st.getChildren("BLOCK");
  this.blocks = new Block[blocks.size()];
  for (int i = 0; i < blocks.size(); i++) {
this.blocks[i] = FSEditLogOp.blockFromXml(blocks.get(i));
  }
  readRpcIdsFromXml(st);
}
{code}

XMLUtils$Stanza.getChildren:
{code}
public List getChildren(String name) throws InvalidXmlException {
  LinkedList  children = subtrees.get(name);
  if (children == null) {
throw new InvalidXmlException("no entry found for " + name);
  }
  return children;
}
{code}

When OEV tool encountered the OP_UPDATE_BLOCKS operation with no BLOCK tag, it 
will throw Exception and exits.
Both 2.4.1 and 2.7.1 have the same problem.

> OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag
> -
>
> Key: HDFS-10265
> URL: https://issues.apache.org/jira/browse/HDFS-10265
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.4.1, 2.7.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Minor
>  Labels: patch
>
> I use OEV tool to convert editlog to xml file, then convert the xml file back 
> to binary editslog file(so that low level NameNode can load edits that 
> generated by high level NameNode). But when OP_UPDATE_BLOCKS has no BLOCK 
> tag, the OEV tool doesn't handle the case and exits with InvalidXmlException.
> Here is the stack:
> {code}
> fromXml error decoding opcode null
> {{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
>  {"-2"}, {},
> {"3875711"}}
> Encountered exception. Exiting: no entry found for BLOCK
> org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
> BLOCK
> at 
> org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
> ...
> {code}
> Here is part of the xml file:
> {code}
> 
>   OP_UPDATE_BLOCKS
>   
> 3875711
> 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5
> 
> -2
>   
> 
> {code}
> I tracked the NN's log and found those operation:
> 1. Client ask NN to add block to the file.
> 2. Client failed to write to DN and asked NameNode to abandon block.
> 3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog
> The file 
> /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 
> contains only one block, so NN generated a OP_UPDATE_BLOCKS with no BLOCK 
> tags.
> In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10265) OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS has no BLOCK tag

2016-04-05 Thread Wan Chang (JIRA)
Wan Chang created HDFS-10265:


 Summary: OEV tool fails to read edit xml file if OP_UPDATE_BLOCKS 
has no BLOCK tag
 Key: HDFS-10265
 URL: https://issues.apache.org/jira/browse/HDFS-10265
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.7.1, 2.4.1
Reporter: Wan Chang
Assignee: Wan Chang
Priority: Minor


I use OEV tool to convert editlog to xml file, then convert the xml file back 
to binary editslog file(so that low level NameNode can load edits that 
generated by high level NameNode). But when OP_UPDATE_BLOCKS has no BLOCK tag, 
the OEV tool doesn't handle the case and exits with InvalidXmlException.

Here is the stack:
{code}
fromXml error decoding opcode null
{{"/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5"},
 {"-2"}, {},
{"3875711"}}
Encountered exception. Exiting: no entry found for BLOCK
org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for 
BLOCK
at 
org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:908)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:3942)
...
{code}

Here is part of the xml file:
{code}

  OP_UPDATE_BLOCKS
  
3875711

/tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5

-2
  

{code}

I tracked the NN's log and found those operation:
1. Client ask NN to add block to the file.
2. Client failed to write to DN and asked NameNode to abandon block.
3. NN remove the block and write an OP_UPDATE_BLOCKS to editlog

The file /tmp/100M3/slive/data/subDir_13/subDir_7/subDir_15/subDir_11/subFile_5 
contains only one block, so NN generated a OP_UPDATE_BLOCKS with no BLOCK tags.

In FSEditLogOp$UpdateBlocksOp.fromXml, we need to handle the case above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)