Re: Namenode EOF Exception
Thanks, I agree I need to upgrade :) I was able to recover NN following your suggestions, and an additional hack was to sync the namespaceID across data nodes with the namenode. On May 14, 2012, at 11:48 AM, Harsh J ha...@cloudera.com wrote: True, I don't recall 0.20.2 (the original release that was a few years ago) carrying these fixes. You ought to upgrade that cluster to the current stable release for the many fixes you can benefit from :) On Mon, May 14, 2012 at 11:58 PM, Prashant Kommireddi prash1...@gmail.com wrote: Thanks Harsh. I am using 0.20.2, I see on the Jira this issue was fixed for 0.23? I will try out your suggestions and get back. On May 14, 2012, at 1:22 PM, Harsh J ha...@cloudera.com wrote: Your fsimage seems to have gone bad (is it 0-sized? I recall that as a known issue long since fixed). The easiest way is to fall back to the last available good checkpoint (From SNN). Or if you have multiple dfs.name.dirs, see if some of the other points have better/complete files on them, and re-spread them across after testing them out (and backing up the originals). Though what version are you running? Cause AFAIK most of the recent stable versions/distros include NN resource monitoring threads which should have placed your NN into safemode the moment all its disks ran near to out of space. On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi prash1...@gmail.com wrote: Hi, I am seeing an issue where Namenode does not start due an EOFException. The disk was full and I cleared space up but I am unable to get past this exception. Any ideas on how this can be resolved? 2012-05-14 10:10:44,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=hadoop 2012-05-14 10:10:44,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false 2012-05-14 10:10:44,023 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.file.FileContext 2012-05-14 10:10:44,024 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2012-05-14 10:10:44,047 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 205470 2012-05-14 10:10:44,844 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2012-05-14 10:10:44,845 INFO org.apache.hadoop.ipc.Server: Stopping server on 54310 2012-05-14 10:10:44,845 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2012-05-14 10:10:44,846 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at
Re: Namenode EOF Exception
Your fsimage seems to have gone bad (is it 0-sized? I recall that as a known issue long since fixed). The easiest way is to fall back to the last available good checkpoint (From SNN). Or if you have multiple dfs.name.dirs, see if some of the other points have better/complete files on them, and re-spread them across after testing them out (and backing up the originals). Though what version are you running? Cause AFAIK most of the recent stable versions/distros include NN resource monitoring threads which should have placed your NN into safemode the moment all its disks ran near to out of space. On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi prash1...@gmail.com wrote: Hi, I am seeing an issue where Namenode does not start due an EOFException. The disk was full and I cleared space up but I am unable to get past this exception. Any ideas on how this can be resolved? 2012-05-14 10:10:44,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=hadoop 2012-05-14 10:10:44,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false 2012-05-14 10:10:44,023 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.file.FileContext 2012-05-14 10:10:44,024 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2012-05-14 10:10:44,047 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 205470 2012-05-14 10:10:44,844 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2012-05-14 10:10:44,845 INFO org.apache.hadoop.ipc.Server: Stopping server on 54310 2012-05-14 10:10:44,845 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2012-05-14 10:10:44,846 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at gridforce-1.internal.salesforce.com/10.0.201.159 / -- Harsh J
Re: Namenode EOF Exception
Thanks Harsh. I am using 0.20.2, I see on the Jira this issue was fixed for 0.23? I will try out your suggestions and get back. On May 14, 2012, at 1:22 PM, Harsh J ha...@cloudera.com wrote: Your fsimage seems to have gone bad (is it 0-sized? I recall that as a known issue long since fixed). The easiest way is to fall back to the last available good checkpoint (From SNN). Or if you have multiple dfs.name.dirs, see if some of the other points have better/complete files on them, and re-spread them across after testing them out (and backing up the originals). Though what version are you running? Cause AFAIK most of the recent stable versions/distros include NN resource monitoring threads which should have placed your NN into safemode the moment all its disks ran near to out of space. On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi prash1...@gmail.com wrote: Hi, I am seeing an issue where Namenode does not start due an EOFException. The disk was full and I cleared space up but I am unable to get past this exception. Any ideas on how this can be resolved? 2012-05-14 10:10:44,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=hadoop 2012-05-14 10:10:44,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false 2012-05-14 10:10:44,023 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.file.FileContext 2012-05-14 10:10:44,024 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2012-05-14 10:10:44,047 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 205470 2012-05-14 10:10:44,844 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2012-05-14 10:10:44,845 INFO org.apache.hadoop.ipc.Server: Stopping server on 54310 2012-05-14 10:10:44,845 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2012-05-14 10:10:44,846 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at gridforce-1.internal.salesforce.com/10.0.201.159 / -- Harsh J
Re: Namenode EOF Exception
True, I don't recall 0.20.2 (the original release that was a few years ago) carrying these fixes. You ought to upgrade that cluster to the current stable release for the many fixes you can benefit from :) On Mon, May 14, 2012 at 11:58 PM, Prashant Kommireddi prash1...@gmail.com wrote: Thanks Harsh. I am using 0.20.2, I see on the Jira this issue was fixed for 0.23? I will try out your suggestions and get back. On May 14, 2012, at 1:22 PM, Harsh J ha...@cloudera.com wrote: Your fsimage seems to have gone bad (is it 0-sized? I recall that as a known issue long since fixed). The easiest way is to fall back to the last available good checkpoint (From SNN). Or if you have multiple dfs.name.dirs, see if some of the other points have better/complete files on them, and re-spread them across after testing them out (and backing up the originals). Though what version are you running? Cause AFAIK most of the recent stable versions/distros include NN resource monitoring threads which should have placed your NN into safemode the moment all its disks ran near to out of space. On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi prash1...@gmail.com wrote: Hi, I am seeing an issue where Namenode does not start due an EOFException. The disk was full and I cleared space up but I am unable to get past this exception. Any ideas on how this can be resolved? 2012-05-14 10:10:44,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=hadoop 2012-05-14 10:10:44,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false 2012-05-14 10:10:44,023 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.file.FileContext 2012-05-14 10:10:44,024 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2012-05-14 10:10:44,047 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 205470 2012-05-14 10:10:44,844 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2012-05-14 10:10:44,845 INFO org.apache.hadoop.ipc.Server: Stopping server on 54310 2012-05-14 10:10:44,845 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2012-05-14 10:10:44,846 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at gridforce-1.internal.salesforce.com/10.0.201.159 / -- Harsh J -- Harsh J