Re: Namenode EOF Exception

2012-05-15 Thread Prashant Kommireddi
Thanks, I agree I need to upgrade :)

I was able to recover NN following your suggestions, and an additional
hack was to sync the namespaceID across data nodes with the namenode.

On May 14, 2012, at 11:48 AM, Harsh J ha...@cloudera.com wrote:

 True, I don't recall 0.20.2 (the original release that was a few years
 ago) carrying these fixes. You ought to upgrade that cluster to the
 current stable release for the many fixes you can benefit from :)

 On Mon, May 14, 2012 at 11:58 PM, Prashant Kommireddi
 prash1...@gmail.com wrote:
 Thanks Harsh. I am using 0.20.2, I see on the Jira this issue was
 fixed for 0.23?

 I will try out your suggestions and get back.

 On May 14, 2012, at 1:22 PM, Harsh J ha...@cloudera.com wrote:

 Your fsimage seems to have gone bad (is it 0-sized? I recall that as a
 known issue long since fixed).

 The easiest way is to fall back to the last available good checkpoint
 (From SNN). Or if you have multiple dfs.name.dirs, see if some of the
 other points have better/complete files on them, and re-spread them
 across after testing them out (and backing up the originals).

 Though what version are you running? Cause AFAIK most of the recent
 stable versions/distros include NN resource monitoring threads which
 should have placed your NN into safemode the moment all its disks ran
 near to out of space.

 On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi
 prash1...@gmail.com wrote:
 Hi,

 I am seeing an issue where Namenode does not start due an EOFException. The
 disk was full and I cleared space up but I am unable to get past this
 exception. Any ideas on how this can be resolved?

 2012-05-14 10:10:44,018 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=hadoop
 2012-05-14 10:10:44,018 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isPermissionEnabled=false
 2012-05-14 10:10:44,023 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
 Initializing FSNamesystemMetrics using context
 object:org.apache.hadoop.metrics.file.FileContext
 2012-05-14 10:10:44,024 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2012-05-14 10:10:44,047 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Number of files = 205470
 2012-05-14 10:10:44,844 ERROR
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
 initialization failed.
 java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
 2012-05-14 10:10:44,845 INFO org.apache.hadoop.ipc.Server: Stopping server
 on 54310
 2012-05-14 10:10:44,845 ERROR
 org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

 2012-05-14 10:10:44,846 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down NameNode at
 

Re: Namenode EOF Exception

2012-05-14 Thread Harsh J
Your fsimage seems to have gone bad (is it 0-sized? I recall that as a
known issue long since fixed).

The easiest way is to fall back to the last available good checkpoint
(From SNN). Or if you have multiple dfs.name.dirs, see if some of the
other points have better/complete files on them, and re-spread them
across after testing them out (and backing up the originals).

Though what version are you running? Cause AFAIK most of the recent
stable versions/distros include NN resource monitoring threads which
should have placed your NN into safemode the moment all its disks ran
near to out of space.

On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi
prash1...@gmail.com wrote:
 Hi,

 I am seeing an issue where Namenode does not start due an EOFException. The
 disk was full and I cleared space up but I am unable to get past this
 exception. Any ideas on how this can be resolved?

 2012-05-14 10:10:44,018 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=hadoop
 2012-05-14 10:10:44,018 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isPermissionEnabled=false
 2012-05-14 10:10:44,023 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
 Initializing FSNamesystemMetrics using context
 object:org.apache.hadoop.metrics.file.FileContext
 2012-05-14 10:10:44,024 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2012-05-14 10:10:44,047 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Number of files = 205470
 2012-05-14 10:10:44,844 ERROR
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
 initialization failed.
 java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
    at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
    at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
 2012-05-14 10:10:44,845 INFO org.apache.hadoop.ipc.Server: Stopping server
 on 54310
 2012-05-14 10:10:44,845 ERROR
 org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
    at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
    at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

 2012-05-14 10:10:44,846 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down NameNode at
 gridforce-1.internal.salesforce.com/10.0.201.159
 /



-- 
Harsh J


Re: Namenode EOF Exception

2012-05-14 Thread Prashant Kommireddi
Thanks Harsh. I am using 0.20.2, I see on the Jira this issue was
fixed for 0.23?

I will try out your suggestions and get back.

On May 14, 2012, at 1:22 PM, Harsh J ha...@cloudera.com wrote:

 Your fsimage seems to have gone bad (is it 0-sized? I recall that as a
 known issue long since fixed).

 The easiest way is to fall back to the last available good checkpoint
 (From SNN). Or if you have multiple dfs.name.dirs, see if some of the
 other points have better/complete files on them, and re-spread them
 across after testing them out (and backing up the originals).

 Though what version are you running? Cause AFAIK most of the recent
 stable versions/distros include NN resource monitoring threads which
 should have placed your NN into safemode the moment all its disks ran
 near to out of space.

 On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi
 prash1...@gmail.com wrote:
 Hi,

 I am seeing an issue where Namenode does not start due an EOFException. The
 disk was full and I cleared space up but I am unable to get past this
 exception. Any ideas on how this can be resolved?

 2012-05-14 10:10:44,018 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=hadoop
 2012-05-14 10:10:44,018 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isPermissionEnabled=false
 2012-05-14 10:10:44,023 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
 Initializing FSNamesystemMetrics using context
 object:org.apache.hadoop.metrics.file.FileContext
 2012-05-14 10:10:44,024 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2012-05-14 10:10:44,047 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Number of files = 205470
 2012-05-14 10:10:44,844 ERROR
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
 initialization failed.
 java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
 2012-05-14 10:10:44,845 INFO org.apache.hadoop.ipc.Server: Stopping server
 on 54310
 2012-05-14 10:10:44,845 ERROR
 org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

 2012-05-14 10:10:44,846 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down NameNode at
 gridforce-1.internal.salesforce.com/10.0.201.159
 /



 --
 Harsh J


Re: Namenode EOF Exception

2012-05-14 Thread Harsh J
True, I don't recall 0.20.2 (the original release that was a few years
ago) carrying these fixes. You ought to upgrade that cluster to the
current stable release for the many fixes you can benefit from :)

On Mon, May 14, 2012 at 11:58 PM, Prashant Kommireddi
prash1...@gmail.com wrote:
 Thanks Harsh. I am using 0.20.2, I see on the Jira this issue was
 fixed for 0.23?

 I will try out your suggestions and get back.

 On May 14, 2012, at 1:22 PM, Harsh J ha...@cloudera.com wrote:

 Your fsimage seems to have gone bad (is it 0-sized? I recall that as a
 known issue long since fixed).

 The easiest way is to fall back to the last available good checkpoint
 (From SNN). Or if you have multiple dfs.name.dirs, see if some of the
 other points have better/complete files on them, and re-spread them
 across after testing them out (and backing up the originals).

 Though what version are you running? Cause AFAIK most of the recent
 stable versions/distros include NN resource monitoring threads which
 should have placed your NN into safemode the moment all its disks ran
 near to out of space.

 On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi
 prash1...@gmail.com wrote:
 Hi,

 I am seeing an issue where Namenode does not start due an EOFException. The
 disk was full and I cleared space up but I am unable to get past this
 exception. Any ideas on how this can be resolved?

 2012-05-14 10:10:44,018 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=hadoop
 2012-05-14 10:10:44,018 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isPermissionEnabled=false
 2012-05-14 10:10:44,023 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
 Initializing FSNamesystemMetrics using context
 object:org.apache.hadoop.metrics.file.FileContext
 2012-05-14 10:10:44,024 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2012-05-14 10:10:44,047 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Number of files = 205470
 2012-05-14 10:10:44,844 ERROR
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
 initialization failed.
 java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
    at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
    at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
 2012-05-14 10:10:44,845 INFO org.apache.hadoop.ipc.Server: Stopping server
 on 54310
 2012-05-14 10:10:44,845 ERROR
 org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807)
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
    at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
    at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

 2012-05-14 10:10:44,846 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down NameNode at
 gridforce-1.internal.salesforce.com/10.0.201.159
 /



 --
 Harsh J



-- 
Harsh J