Long time fail over when using QJM

2013-08-29 Thread Mickey
Hi, all
I tried to test the QJM HA and it always works good. But, yestoday I met an
quite long time fail over with QJM. The test is base on the CDH4.3.0.
The attachment is the standby namenode and the journalnode 's logs.
The network cable on active namenode(also a datanode) was pulled out at
about 07:24. From the standby-namenode log I found log like this:
2013-08-28 07:24:51,122 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1
Total time for transactions(ms): 1Number of transactions batched in Syncs:
0 Number of syncs: 0 SyncTimes(ms): 0 41 42
2013-08-28 07:36:14,028 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions:
32 Total time for transactions(ms): 3Number of transactions batched in
Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46

The information seems regular. The problem is that between the 2 lines
there's no log  in 12 minutes. There is no long gc happened. It seems the
code blocked somewhere. Unfortunately, I forgot to print the jstack info
T_T.

Hope for your response.

Best regards,
Mickey


Build failed in Jenkins: Hadoop-Hdfs-0.23-Build #714

2013-08-29 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/714/changes

Changes:

[tgraves] YARN-1101. Active nodes can be decremented below 0 (Robert Parker via 
tgraves)

--
[...truncated 7673 lines...]
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[270,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[281,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[10533,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[10544,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[8357,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[8368,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[12641,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[12652,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[9741,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[9752,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[1781,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[1792,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5338,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5349,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[6290,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 

Hadoop-Hdfs-0.23-Build - Build # 714 - Still Failing

2013-08-29 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/714/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 7866 lines...]
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3313,27]
 cannot find symbol
[ERROR] symbol  : method 
setUnfinishedMessage(org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto)
[ERROR] location: class com.google.protobuf.InvalidProtocolBufferException
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3319,8]
 cannot find symbol
[ERROR] symbol  : method makeExtensionsImmutable()
[ERROR] location: class 
org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3330,10]
 cannot find symbol
[ERROR] symbol  : method 
ensureFieldAccessorsInitialized(java.lang.Classorg.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto,java.lang.Classorg.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto.Builder)
[ERROR] location: class com.google.protobuf.GeneratedMessage.FieldAccessorTable
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3335,31]
 cannot find symbol
[ERROR] symbol  : class AbstractParser
[ERROR] location: package com.google.protobuf
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3344,4]
 method does not override or implement a method from a supertype
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[4098,12]
 cannot find symbol
[ERROR] symbol  : method 
ensureFieldAccessorsInitialized(java.lang.Classorg.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto,java.lang.Classorg.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto.Builder)
[ERROR] location: class com.google.protobuf.GeneratedMessage.FieldAccessorTable
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[4371,104]
 cannot find symbol
[ERROR] symbol  : method getUnfinishedMessage()
[ERROR] location: class com.google.protobuf.InvalidProtocolBufferException
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5264,8]
 getUnknownFields() in 
org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto 
cannot override getUnknownFields() in com.google.protobuf.GeneratedMessage; 
overridden method is final
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5284,19]
 cannot find symbol
[ERROR] symbol  : method 
parseUnknownField(com.google.protobuf.CodedInputStream,com.google.protobuf.UnknownFieldSet.Builder,com.google.protobuf.ExtensionRegistryLite,int)
[ERROR] location: class 
org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5314,15]
 cannot find symbol
[ERROR] symbol  : method 
setUnfinishedMessage(org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto)
[ERROR] location: class com.google.protobuf.InvalidProtocolBufferException
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5317,27]
 cannot find symbol
[ERROR] symbol  : method 
setUnfinishedMessage(org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto)
[ERROR] location: class com.google.protobuf.InvalidProtocolBufferException
[ERROR] 

Re: Long time fail over when using QJM

2013-08-29 Thread Todd Lipcon
If you're seeing those log messages, the SBN was already active at that
time. It only logs that message when successfully writing transactions. So,
the failover must have already completed before the logs you're looking at.

-Todd

On Thu, Aug 29, 2013 at 1:18 AM, Mickey huanfeng...@gmail.com wrote:

 Hi, all
 I tried to test the QJM HA and it always works good. But, yestoday I met
 an quite long time fail over with QJM. The test is base on the CDH4.3.0.
 The attachment is the standby namenode and the journalnode 's logs.
 The network cable on active namenode(also a datanode) was pulled out at
 about 07:24. From the standby-namenode log I found log like this:
 2013-08-28 07:24:51,122 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1
 Total time for transactions(ms): 1Number of transactions batched in Syncs:
 0 Number of syncs: 0 SyncTimes(ms): 0 41 42
 2013-08-28 07:36:14,028 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions:
 32 Total time for transactions(ms): 3Number of transactions batched in
 Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46

 The information seems regular. The problem is that between the 2 lines
 there's no log  in 12 minutes. There is no long gc happened. It seems the
 code blocked somewhere. Unfortunately, I forgot to print the jstack info
 T_T.

 Hope for your response.

 Best regards,
 Mickey




-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Resolved] (HDFS-5142) Namenode crashes with NPE in ReplicationMonitor

2013-08-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-5142.
--

Resolution: Duplicate

Ahhh. I thought it looked familiar. I did even commented on HDFS-4482.

 Namenode crashes with NPE in ReplicationMonitor
 ---

 Key: HDFS-5142
 URL: https://issues.apache.org/jira/browse/HDFS-5142
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.1.0-beta, 0.23.9
Reporter: Kihwal Lee
Priority: Critical

 When ReplicationMonitor creates and adds a replication work for a block, its 
 INodeFile (0.23) or BlockCollection (2.x) is recorded. This is done under the 
 FSN write lock, but the actual chooseTarget() call is made outside the lock.
 When chooseTarget() is called, FSDirectory#getFullPathName() ends up getting 
 called. If the INode was unlinked from its parents after ReplicationMonitor 
 releasing the lock (e.g. delete), this call genetates NPE and crashes the 
 name node. 
 Path name is actually unused in the existing block placement policy modules. 
 But private implementations might use it. It will be nice if we can avoid 
 calling getFullPathName() at all here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5144) Document time unit to NameNodeMetrics.java

2013-08-29 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HDFS-5144:
---

 Summary: Document time unit to NameNodeMetrics.java
 Key: HDFS-5144
 URL: https://issues.apache.org/jira/browse/HDFS-5144
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Priority: Minor


in o.a.h.hdfs.server.namenode.metrics.NameNodeMetrics.java, metrics are 
declared as follows:

{code}
  @Metric(Duration in SafeMode at startup) MutableGaugeInt safeModeTime;
  @Metric(Time loading FS Image at startup) MutableGaugeInt fsImageLoadTime;
{code}

Since some users may confuse which unit (sec or msec) is correct, they should 
be documented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Long time fail over when using QJM

2013-08-29 Thread Mickey
2013/8/30 Todd Lipcon t...@cloudera.com

 If you're seeing those log messages, the SBN was already active at that
 time. It only logs that message when successfully writing transactions. So,
 the failover must have already completed before the logs you're looking at.

 -Todd

 On Thu, Aug 29, 2013 at 1:18 AM, Mickey huanfeng...@gmail.com wrote:

  Hi, all
  I tried to test the QJM HA and it always works good. But, yestoday I met
  an quite long time fail over with QJM. The test is base on the CDH4.3.0.
  The attachment is the standby namenode and the journalnode 's logs.
  The network cable on active namenode(also a datanode) was pulled out at
  about 07:24. From the standby-namenode log I found log like this:
  2013-08-28 07:24:51,122 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 1
  Total time for transactions(ms): 1Number of transactions batched in
 Syncs:
  0 Number of syncs: 0 SyncTimes(ms): 0 41 42
  2013-08-28 07:36:14,028 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions:
  32 Total time for transactions(ms): 3Number of transactions batched in
  Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46
 
  The information seems regular. The problem is that between the 2 lines
  there's no log  in 12 minutes. There is no long gc happened. It seems the
  code blocked somewhere. Unfortunately, I forgot to print the jstack info
  T_T.
 
  Hope for your response.
 
  Best regards,
  Mickey
 



 --
 Todd Lipcon
 Software Engineer, Cloudera