Hi Philippe

Yes, I did. I restarted NameNode and other daemons multiple times.
I found that all my files had got corrupted somehow. I was able to fix the
issue by running below command:

hdfs fsck / | egrep -v '^\.+$' | grep -v replica | grep -v Replica

But it deleted all the files from my cluster. Only the directory structures
were left.

My main concern is how did this issue happen and how to prevent it in
future from happening?

Regards
Nishant

Nishant

sent from handheld device. please ignore typos.

On Wed, Feb 15, 2017 at 3:01 PM, Philippe Kernévez <pkerne...@octo.com>
wrote:

> Hi Nishant,
>
> You namenode are probably unable to comunicate with your datanode. Did you
> restart all the HDFS services ?
>
> Regards,
> Philipp
>
> On Tue, Feb 14, 2017 at 10:43 AM, Nishant Verma <
> nishant.verma0...@gmail.com> wrote:
>
>> Hi
>>
>> I have open source hadoop version 2.7.3 cluster (2 Masters + 3 Slaves)
>> installed on AWS EC2 instances. I am using the cluster to integrate it with
>> Kafka Connect.
>>
>> The setup of cluster was done last month and setup of kafka connect was
>> completed last fortnight. Since then, we were able to operate the kafka
>> topic records on our HDFS and do various operations.
>>
>> Since last afternoon, I find that any kafka topic is not getting
>> committed to the cluster. When I tried to open the older files, I started
>> getting below error. When I copy a new file to the cluster from local, it
>> comes and gets opened but after some time, again starts showing similar
>> IOException:
>>
>> 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
>> file=/test/inputdata/derby.log
>> 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: 
>> java.io.IOException: No live nodes contain block 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking 
>> nodes = [], ignoredNodes = null No live nodes contain current block Block 
>> locations: Dead nodes: . Will get new block locations from namenode and 
>> retry...
>> 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 
>> IOException, will wait for 499.3472970548959 msec.
>> 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
>> file=/test/inputdata/derby.log
>> 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: 
>> java.io.IOException: No live nodes contain block 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking 
>> nodes = [], ignoredNodes = null No live nodes contain current block Block 
>> locations: Dead nodes: . Will get new block locations from namenode and 
>> retry...
>> 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 
>> IOException, will wait for 4988.873277172643 msec.
>> 17/02/14 07:58:00 INFO hdfs.DFSClient: No node available for 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
>> file=/test/inputdata/derby.log
>> 17/02/14 07:58:00 INFO hdfs.DFSClient: Could not obtain 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: 
>> java.io.IOException: No live nodes contain block 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking 
>> nodes = [], ignoredNodes = null No live nodes contain current block Block 
>> locations: Dead nodes: . Will get new block locations from namenode and 
>> retry...
>> 17/02/14 07:58:00 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 
>> IOException, will wait for 8598.311122824263 msec.
>> 17/02/14 07:58:09 WARN hdfs.DFSClient: Could not obtain block: 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
>> file=/test/inputdata/derby.log No live nodes contain current block Block 
>> locations: Dead nodes: . Throwing a BlockMissingException
>> 17/02/14 07:58:09 WARN hdfs.DFSClient: Could not obtain block: 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
>> file=/test/inputdata/derby.log No live nodes contain current block Block 
>> locations: Dead nodes: . Throwing a BlockMissingException
>> 17/02/14 07:58:09 WARN hdfs.DFSClient: DFS Read
>> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
>> file=/test/inputdata/derby.log
>>         at 
>> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:983)
>>         at 
>> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642)
>>         at 
>> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
>>         at 
>> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
>>         at java.io.DataInputStream.read(DataInputStream.java:100)
>>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
>>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
>>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
>>         at 
>> org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
>>         at 
>> org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
>>         at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
>>         at 
>> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
>>         at 
>> org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
>>         at 
>> org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
>>         at 
>> org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
>>         at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
>>         at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
>> cat: Could not obtain block: 
>> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
>> file=/test/inputdata/derby.log
>>
>> When I do : hdfs fsck / , I get:
>>
>> Total size:    667782677 B
>>  Total dirs:    406
>>  Total files:   44485
>>  Total symlinks:                0
>>  Total blocks (validated):      43767 (avg. block size 15257 B)
>>   ********************************
>>   UNDER MIN REPL'D BLOCKS:      43766 (99.99772 %)
>>   dfs.namenode.replication.min: 1
>>   CORRUPT FILES:        43766
>>   MISSING BLOCKS:       43766
>>   MISSING SIZE:         667781648 B
>>   CORRUPT BLOCKS:       43766
>>   ********************************
>>  Minimally replicated blocks:   1 (0.0022848265 %)
>>  Over-replicated blocks:        0 (0.0 %)
>>  Under-replicated blocks:       0 (0.0 %)
>>  Mis-replicated blocks:         0 (0.0 %)
>>  Default replication factor:    3
>>  Average block replication:     6.8544796E-5
>>  Corrupt blocks:                43766
>>  Missing replicas:              0 (0.0 %)
>>  Number of data-nodes:          3
>>  Number of racks:               1
>> FSCK ended at Tue Feb 14 07:59:10 UTC 2017 in 932 milliseconds
>>
>>
>> The filesystem under path '/' is CORRUPT
>>
>> That means, all my files got corrupted somehow.
>>
>> I want to recover my HDFS and fix the corrupt health status. Also, I
>> would like to understand, how such an issue occurred suddenly and how to
>> prevent it in future?
>>
>>
>> Thanks
>>
>> Nishant Verma
>>
>
>
>
> --
> Philippe Kernévez
>
>
>
> Directeur technique (Suisse),
> pkerne...@octo.com
> +41 79 888 33 32
>
> Retrouvez OCTO sur OCTO Talk : http://blog.octo.com
> OCTO Technology http://www.octo.com
>

Reply via email to