Long time fail over when using QJM
Hi, all I tried to test the QJM HA and it always works good. But, yestoday I met an quite long time fail over with QJM. The test is base on the CDH4.3.0. The attachment is the standby namenode and the journalnode 's logs. The network cable on active namenode(also a datanode) was pulled out at about 07:24. From the standby-namenode log I found log like this: 2013-08-28 07:24:51,122 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 1Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 41 42 2013-08-28 07:36:14,028 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 32 Total time for transactions(ms): 3Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46 The information seems regular. The problem is that between the 2 lines there's no log in 12 minutes. There is no long gc happened. It seems the code blocked somewhere. Unfortunately, I forgot to print the jstack info T_T. Hope for your response. Best regards, Mickey
Build failed in Jenkins: Hadoop-Hdfs-0.23-Build #714
See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/714/changes Changes: [tgraves] YARN-1101. Active nodes can be decremented below 0 (Robert Parker via tgraves) -- [...truncated 7673 lines...] [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[270,37] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[281,30] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[10533,37] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[10544,30] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[8357,37] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[8368,30] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[12641,37] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[12652,30] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[9741,37] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[9752,30] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[1781,37] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[1792,30] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5338,37] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5349,30] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR] https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[6290,37] cannot find symbol [ERROR] symbol : class Parser [ERROR] location: package com.google.protobuf [ERROR]
Hadoop-Hdfs-0.23-Build - Build # 714 - Still Failing
See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/714/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 7866 lines...] [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3313,27] cannot find symbol [ERROR] symbol : method setUnfinishedMessage(org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto) [ERROR] location: class com.google.protobuf.InvalidProtocolBufferException [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3319,8] cannot find symbol [ERROR] symbol : method makeExtensionsImmutable() [ERROR] location: class org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3330,10] cannot find symbol [ERROR] symbol : method ensureFieldAccessorsInitialized(java.lang.Classorg.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto,java.lang.Classorg.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto.Builder) [ERROR] location: class com.google.protobuf.GeneratedMessage.FieldAccessorTable [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3335,31] cannot find symbol [ERROR] symbol : class AbstractParser [ERROR] location: package com.google.protobuf [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3344,4] method does not override or implement a method from a supertype [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[4098,12] cannot find symbol [ERROR] symbol : method ensureFieldAccessorsInitialized(java.lang.Classorg.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto,java.lang.Classorg.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto.Builder) [ERROR] location: class com.google.protobuf.GeneratedMessage.FieldAccessorTable [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[4371,104] cannot find symbol [ERROR] symbol : method getUnfinishedMessage() [ERROR] location: class com.google.protobuf.InvalidProtocolBufferException [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5264,8] getUnknownFields() in org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto cannot override getUnknownFields() in com.google.protobuf.GeneratedMessage; overridden method is final [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5284,19] cannot find symbol [ERROR] symbol : method parseUnknownField(com.google.protobuf.CodedInputStream,com.google.protobuf.UnknownFieldSet.Builder,com.google.protobuf.ExtensionRegistryLite,int) [ERROR] location: class org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5314,15] cannot find symbol [ERROR] symbol : method setUnfinishedMessage(org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto) [ERROR] location: class com.google.protobuf.InvalidProtocolBufferException [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5317,27] cannot find symbol [ERROR] symbol : method setUnfinishedMessage(org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto) [ERROR] location: class com.google.protobuf.InvalidProtocolBufferException [ERROR]
Re: Long time fail over when using QJM
If you're seeing those log messages, the SBN was already active at that time. It only logs that message when successfully writing transactions. So, the failover must have already completed before the logs you're looking at. -Todd On Thu, Aug 29, 2013 at 1:18 AM, Mickey huanfeng...@gmail.com wrote: Hi, all I tried to test the QJM HA and it always works good. But, yestoday I met an quite long time fail over with QJM. The test is base on the CDH4.3.0. The attachment is the standby namenode and the journalnode 's logs. The network cable on active namenode(also a datanode) was pulled out at about 07:24. From the standby-namenode log I found log like this: 2013-08-28 07:24:51,122 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 1Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 41 42 2013-08-28 07:36:14,028 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 32 Total time for transactions(ms): 3Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46 The information seems regular. The problem is that between the 2 lines there's no log in 12 minutes. There is no long gc happened. It seems the code blocked somewhere. Unfortunately, I forgot to print the jstack info T_T. Hope for your response. Best regards, Mickey -- Todd Lipcon Software Engineer, Cloudera
[jira] [Resolved] (HDFS-5142) Namenode crashes with NPE in ReplicationMonitor
[ https://issues.apache.org/jira/browse/HDFS-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved HDFS-5142. -- Resolution: Duplicate Ahhh. I thought it looked familiar. I did even commented on HDFS-4482. Namenode crashes with NPE in ReplicationMonitor --- Key: HDFS-5142 URL: https://issues.apache.org/jira/browse/HDFS-5142 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.0-beta, 0.23.9 Reporter: Kihwal Lee Priority: Critical When ReplicationMonitor creates and adds a replication work for a block, its INodeFile (0.23) or BlockCollection (2.x) is recorded. This is done under the FSN write lock, but the actual chooseTarget() call is made outside the lock. When chooseTarget() is called, FSDirectory#getFullPathName() ends up getting called. If the INode was unlinked from its parents after ReplicationMonitor releasing the lock (e.g. delete), this call genetates NPE and crashes the name node. Path name is actually unused in the existing block placement policy modules. But private implementations might use it. It will be nice if we can avoid calling getFullPathName() at all here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5144) Document time unit to NameNodeMetrics.java
Akira AJISAKA created HDFS-5144: --- Summary: Document time unit to NameNodeMetrics.java Key: HDFS-5144 URL: https://issues.apache.org/jira/browse/HDFS-5144 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Priority: Minor in o.a.h.hdfs.server.namenode.metrics.NameNodeMetrics.java, metrics are declared as follows: {code} @Metric(Duration in SafeMode at startup) MutableGaugeInt safeModeTime; @Metric(Time loading FS Image at startup) MutableGaugeInt fsImageLoadTime; {code} Since some users may confuse which unit (sec or msec) is correct, they should be documented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Long time fail over when using QJM
2013/8/30 Todd Lipcon t...@cloudera.com If you're seeing those log messages, the SBN was already active at that time. It only logs that message when successfully writing transactions. So, the failover must have already completed before the logs you're looking at. -Todd On Thu, Aug 29, 2013 at 1:18 AM, Mickey huanfeng...@gmail.com wrote: Hi, all I tried to test the QJM HA and it always works good. But, yestoday I met an quite long time fail over with QJM. The test is base on the CDH4.3.0. The attachment is the standby namenode and the journalnode 's logs. The network cable on active namenode(also a datanode) was pulled out at about 07:24. From the standby-namenode log I found log like this: 2013-08-28 07:24:51,122 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 1Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 41 42 2013-08-28 07:36:14,028 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 32 Total time for transactions(ms): 3Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46 The information seems regular. The problem is that between the 2 lines there's no log in 12 minutes. There is no long gc happened. It seems the code blocked somewhere. Unfortunately, I forgot to print the jstack info T_T. Hope for your response. Best regards, Mickey -- Todd Lipcon Software Engineer, Cloudera