[jira] [Commented] (HDFS-13236) Standby NN down with error encountered while tailing edits
[ https://issues.apache.org/jira/browse/HDFS-13236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822989#comment-16822989 ] Yuriy Malygin commented on HDFS-13236: -- This problem in other ticket - HDFS-13596. > Standby NN down with error encountered while tailing edits > -- > > Key: HDFS-13236 > URL: https://issues.apache.org/jira/browse/HDFS-13236 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namenode >Affects Versions: 3.0.0 >Reporter: Yuriy Malygin >Priority: Major > > After update Hadoop from 2.7.3 to 3.0.0 standby NN down with error > encountered while tailing edits from JN: > {code:java} > Feb 28 01:58:31 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:31,594 > INFO [FSImageSaver for /one/hadoop-data/dfs of type IMAGE_AND_EDITS] > FSImageFormatProtobuf - Image file > /one/hadoop-data/dfs/current/fsimage.ckpt_012748979 > 98 of size 4595971949 bytes saved in 93 seconds. > Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 > INFO [Standby State Checkpointer] NNStorageRetentionManager - Going to retain > 2 images with txid >= 1274897935 > Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 > INFO [Standby State Checkpointer] NNStorageRetentionManager - Purging old > image > FSImageFile(file=/one/hadoop-data/dfs/current/fsimage_01274897875, > cpktTxId > =01274897875) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 > INFO [Edit log tailer] FSImage - Reading > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6a168e6f > expecting start txid #1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 > INFO [Edit log tailer] FSImage - Start loading edits file > http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A10 > 56233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true, > http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848; > inProgressOk=true > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 > INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream > 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999 > torageInfo=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true, > > http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217 > -aef5-6ed206893848=true' to transaction ID 1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 > INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream > 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true' > to transaction ID 1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,680 > ERROR [Edit log tailer] FSEditLogLoader - Encountered exception on operation > AddOp [length=0, inodeId=145550319, > path=/kafka/parquet/infrastructureGrace/date=2018-02-28/_temporary/1/_temporary/attempt_1516181147167_20856_r_98_0/part-r-00098.gz.parquet, > replication=3, mtime=1519772206615, atime=1519772206615, > blockSize=134217728, blocks=[], permissions=root:supergroup:rw-r--r--, > aclEntries=null, > clientName=DFSClient_attempt_1516181147167_20856_r_98_0_1523538799_1, > clientMachine=10.137.2.142, overwrite=false, RpcClientId=, > RpcCallId=271996603, storagePolicyId=0, erasureCodingPolicyId=0, > opCode=OP_ADD, txid=1274898002] > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:946) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > Feb 28 01:58:34
[jira] [Commented] (HDFS-13236) Standby NN down with error encountered while tailing edits
[ https://issues.apache.org/jira/browse/HDFS-13236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532302#comment-16532302 ] Yuriy Malygin commented on HDFS-13236: -- Hi [~linyiqun], I create additional issue about _NotEnoughReplicasException_ - HDFS-13718. > Standby NN down with error encountered while tailing edits > -- > > Key: HDFS-13236 > URL: https://issues.apache.org/jira/browse/HDFS-13236 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namenode >Affects Versions: 3.0.0 >Reporter: Yuriy Malygin >Priority: Major > > After update Hadoop from 2.7.3 to 3.0.0 standby NN down with error > encountered while tailing edits from JN: > {code:java} > Feb 28 01:58:31 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:31,594 > INFO [FSImageSaver for /one/hadoop-data/dfs of type IMAGE_AND_EDITS] > FSImageFormatProtobuf - Image file > /one/hadoop-data/dfs/current/fsimage.ckpt_012748979 > 98 of size 4595971949 bytes saved in 93 seconds. > Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 > INFO [Standby State Checkpointer] NNStorageRetentionManager - Going to retain > 2 images with txid >= 1274897935 > Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 > INFO [Standby State Checkpointer] NNStorageRetentionManager - Purging old > image > FSImageFile(file=/one/hadoop-data/dfs/current/fsimage_01274897875, > cpktTxId > =01274897875) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 > INFO [Edit log tailer] FSImage - Reading > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6a168e6f > expecting start txid #1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 > INFO [Edit log tailer] FSImage - Start loading edits file > http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A10 > 56233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true, > http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848; > inProgressOk=true > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 > INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream > 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999 > torageInfo=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true, > > http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217 > -aef5-6ed206893848=true' to transaction ID 1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 > INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream > 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true' > to transaction ID 1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,680 > ERROR [Edit log tailer] FSEditLogLoader - Encountered exception on operation > AddOp [length=0, inodeId=145550319, > path=/kafka/parquet/infrastructureGrace/date=2018-02-28/_temporary/1/_temporary/attempt_1516181147167_20856_r_98_0/part-r-00098.gz.parquet, > replication=3, mtime=1519772206615, atime=1519772206615, > blockSize=134217728, blocks=[], permissions=root:supergroup:rw-r--r--, > aclEntries=null, > clientName=DFSClient_attempt_1516181147167_20856_r_98_0_1523538799_1, > clientMachine=10.137.2.142, overwrite=false, RpcClientId=, > RpcCallId=271996603, storagePolicyId=0, erasureCodingPolicyId=0, > opCode=OP_ADD, txid=1274898002] > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:946) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at >
[jira] [Commented] (HDFS-13236) Standby NN down with error encountered while tailing edits
[ https://issues.apache.org/jira/browse/HDFS-13236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531436#comment-16531436 ] Kihwal Lee commented on HDFS-13236: --- The restart after upgrade issue is being addressed in HDFS-13596. > Standby NN down with error encountered while tailing edits > -- > > Key: HDFS-13236 > URL: https://issues.apache.org/jira/browse/HDFS-13236 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namenode >Affects Versions: 3.0.0 >Reporter: Yuriy Malygin >Priority: Major > > After update Hadoop from 2.7.3 to 3.0.0 standby NN down with error > encountered while tailing edits from JN: > {code:java} > Feb 28 01:58:31 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:31,594 > INFO [FSImageSaver for /one/hadoop-data/dfs of type IMAGE_AND_EDITS] > FSImageFormatProtobuf - Image file > /one/hadoop-data/dfs/current/fsimage.ckpt_012748979 > 98 of size 4595971949 bytes saved in 93 seconds. > Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 > INFO [Standby State Checkpointer] NNStorageRetentionManager - Going to retain > 2 images with txid >= 1274897935 > Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 > INFO [Standby State Checkpointer] NNStorageRetentionManager - Purging old > image > FSImageFile(file=/one/hadoop-data/dfs/current/fsimage_01274897875, > cpktTxId > =01274897875) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 > INFO [Edit log tailer] FSImage - Reading > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6a168e6f > expecting start txid #1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 > INFO [Edit log tailer] FSImage - Start loading edits file > http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A10 > 56233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true, > http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848; > inProgressOk=true > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 > INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream > 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999 > torageInfo=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true, > > http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217 > -aef5-6ed206893848=true' to transaction ID 1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 > INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream > 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true' > to transaction ID 1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,680 > ERROR [Edit log tailer] FSEditLogLoader - Encountered exception on operation > AddOp [length=0, inodeId=145550319, > path=/kafka/parquet/infrastructureGrace/date=2018-02-28/_temporary/1/_temporary/attempt_1516181147167_20856_r_98_0/part-r-00098.gz.parquet, > replication=3, mtime=1519772206615, atime=1519772206615, > blockSize=134217728, blocks=[], permissions=root:supergroup:rw-r--r--, > aclEntries=null, > clientName=DFSClient_attempt_1516181147167_20856_r_98_0_1523538799_1, > clientMachine=10.137.2.142, overwrite=false, RpcClientId=, > RpcCallId=271996603, storagePolicyId=0, erasureCodingPolicyId=0, > opCode=OP_ADD, txid=1274898002] > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:946) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > Feb
[jira] [Commented] (HDFS-13236) Standby NN down with error encountered while tailing edits
[ https://issues.apache.org/jira/browse/HDFS-13236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531135#comment-16531135 ] Yiqun Lin commented on HDFS-13236: -- Hi [~_ph], as I see you reported two problems when upgrading the cluster. Looks like they are unrelated. Could you file a new JIRA for tracking Rack Awareness problem you found? And let this JIRA focus on SBN tailing edits error. > Standby NN down with error encountered while tailing edits > -- > > Key: HDFS-13236 > URL: https://issues.apache.org/jira/browse/HDFS-13236 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namenode >Affects Versions: 3.0.0 >Reporter: Yuriy Malygin >Priority: Major > > After update Hadoop from 2.7.3 to 3.0.0 standby NN down with error > encountered while tailing edits from JN: > {code:java} > Feb 28 01:58:31 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:31,594 > INFO [FSImageSaver for /one/hadoop-data/dfs of type IMAGE_AND_EDITS] > FSImageFormatProtobuf - Image file > /one/hadoop-data/dfs/current/fsimage.ckpt_012748979 > 98 of size 4595971949 bytes saved in 93 seconds. > Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 > INFO [Standby State Checkpointer] NNStorageRetentionManager - Going to retain > 2 images with txid >= 1274897935 > Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 > INFO [Standby State Checkpointer] NNStorageRetentionManager - Purging old > image > FSImageFile(file=/one/hadoop-data/dfs/current/fsimage_01274897875, > cpktTxId > =01274897875) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 > INFO [Edit log tailer] FSImage - Reading > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6a168e6f > expecting start txid #1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 > INFO [Edit log tailer] FSImage - Start loading edits file > http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A10 > 56233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true, > http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848; > inProgressOk=true > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 > INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream > 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999 > torageInfo=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true, > > http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217 > -aef5-6ed206893848=true' to transaction ID 1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 > INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream > 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true' > to transaction ID 1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,680 > ERROR [Edit log tailer] FSEditLogLoader - Encountered exception on operation > AddOp [length=0, inodeId=145550319, > path=/kafka/parquet/infrastructureGrace/date=2018-02-28/_temporary/1/_temporary/attempt_1516181147167_20856_r_98_0/part-r-00098.gz.parquet, > replication=3, mtime=1519772206615, atime=1519772206615, > blockSize=134217728, blocks=[], permissions=root:supergroup:rw-r--r--, > aclEntries=null, > clientName=DFSClient_attempt_1516181147167_20856_r_98_0_1523538799_1, > clientMachine=10.137.2.142, overwrite=false, RpcClientId=, > RpcCallId=271996603, storagePolicyId=0, erasureCodingPolicyId=0, > opCode=OP_ADD, txid=1274898002] > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at >
[jira] [Commented] (HDFS-13236) Standby NN down with error encountered while tailing edits
[ https://issues.apache.org/jira/browse/HDFS-13236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531085#comment-16531085 ] Francisco commented on HDFS-13236: -- Hi all, We are having the same issue after upgrading from 2.8.2 to 3.1.0, we have a cluster with one name node 3 datanodes and no journal node, that worked without issues during an entire year. The upgrade went fine but problem started after a system maintenance that restarted all nodes. Here are the main information on the log 2018-07-03 10:57:35,543 INFO org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: Fast-forwarding stream '/space/hadoop/hadoop_run/head_node/current/edits_00023174228-00023184599' to transaction ID 23174224 2018-07-03 10:57:35,575 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation AddOp [length=0, inodeId=6383051, path=/spark/.sparkStaging/application_1530536780991_0001/commons-compress-1.14.jar, replication=3, mtime=1530538999482, atime=1530538999482, blockSize=134217728, blocks=[], permissions=spark:hadoop:rw-r--r--, aclEntries=null, clientName=DFSClient_NONMAPREDUCE_291933719_1, clientMachine=10.1.19.65, overwrite=true, RpcClientId=, RpcCallId=269330502, storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_ADD, txid=23174233] java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected length 16 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) at org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) at org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:669) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:731) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:968) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:947) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741) 2018-07-03 10:57:35,577 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage java.io.IOException: java.lang.IllegalStateException: Cannot skip to less than the current value (=6383051), where newValue=6383050 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:669) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:731) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:968) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:947) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741) Caused by: java.lang.IllegalStateException: Cannot skip to less than the current value (=6383051), where newValue=6383050 at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1943) ... 13 more > Standby NN down with error
[jira] [Commented] (HDFS-13236) Standby NN down with error encountered while tailing edits
[ https://issues.apache.org/jira/browse/HDFS-13236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391622#comment-16391622 ] Kihwal Lee commented on HDFS-13236: --- For block placement differences, {{DFSNetworkTopology}} is new in Hadoop 3. It might be related. See HDFS-11419. [~vagarychen] might be able to tell whether it is related. Missing Client ID is very strange as it has a call ID. Both come from the handler's thread local variable set by the RPC server. The client ID field isn't the last field in the edit either. It is created when a RPC client is created and this is set in the connection context header. It is all internal and automatic. I don't know what happens when the connection is dropped before edit logging, while the call is still being processed. [~daryn], does server reset the client ID in this case? > Standby NN down with error encountered while tailing edits > -- > > Key: HDFS-13236 > URL: https://issues.apache.org/jira/browse/HDFS-13236 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namenode >Affects Versions: 3.0.0 >Reporter: Yuriy Malygin >Priority: Major > > After update Hadoop from 2.7.3 to 3.0.0 standby NN down with error > encountered while tailing edits from JN: > {code:java} > Feb 28 01:58:31 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:31,594 > INFO [FSImageSaver for /one/hadoop-data/dfs of type IMAGE_AND_EDITS] > FSImageFormatProtobuf - Image file > /one/hadoop-data/dfs/current/fsimage.ckpt_012748979 > 98 of size 4595971949 bytes saved in 93 seconds. > Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 > INFO [Standby State Checkpointer] NNStorageRetentionManager - Going to retain > 2 images with txid >= 1274897935 > Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 > INFO [Standby State Checkpointer] NNStorageRetentionManager - Purging old > image > FSImageFile(file=/one/hadoop-data/dfs/current/fsimage_01274897875, > cpktTxId > =01274897875) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 > INFO [Edit log tailer] FSImage - Reading > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6a168e6f > expecting start txid #1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 > INFO [Edit log tailer] FSImage - Start loading edits file > http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A10 > 56233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true, > http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848; > inProgressOk=true > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 > INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream > 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999 > torageInfo=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true, > > http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217 > -aef5-6ed206893848=true' to transaction ID 1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 > INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream > 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup=1274897999=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848=true' > to transaction ID 1274897999 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,680 > ERROR [Edit log tailer] FSEditLogLoader - Encountered exception on operation > AddOp [length=0, inodeId=145550319, > path=/kafka/parquet/infrastructureGrace/date=2018-02-28/_temporary/1/_temporary/attempt_1516181147167_20856_r_98_0/part-r-00098.gz.parquet, > replication=3, mtime=1519772206615, atime=1519772206615, > blockSize=134217728, blocks=[], permissions=root:supergroup:rw-r--r--, > aclEntries=null, > clientName=DFSClient_attempt_1516181147167_20856_r_98_0_1523538799_1, > clientMachine=10.137.2.142, overwrite=false, RpcClientId=, > RpcCallId=271996603, storagePolicyId=0, erasureCodingPolicyId=0, > opCode=OP_ADD, txid=1274898002] > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at > org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at >