Hi Nishant,

You namenode are probably unable to comunicate with your datanode. Did you
restart all the HDFS services ?

Regards,
Philipp

On Tue, Feb 14, 2017 at 10:43 AM, Nishant Verma <nishant.verma0...@gmail.com
> wrote:

> Hi
>
> I have open source hadoop version 2.7.3 cluster (2 Masters + 3 Slaves)
> installed on AWS EC2 instances. I am using the cluster to integrate it with
> Kafka Connect.
>
> The setup of cluster was done last month and setup of kafka connect was
> completed last fortnight. Since then, we were able to operate the kafka
> topic records on our HDFS and do various operations.
>
> Since last afternoon, I find that any kafka topic is not getting committed
> to the cluster. When I tried to open the older files, I started getting
> below error. When I copy a new file to the cluster from local, it comes and
> gets opened but after some time, again starts showing similar IOException:
>
> 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log
> 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: 
> java.io.IOException: No live nodes contain block 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking 
> nodes = [], ignoredNodes = null No live nodes contain current block Block 
> locations: Dead nodes: . Will get new block locations from namenode and 
> retry...
> 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 
> IOException, will wait for 499.3472970548959 msec.
> 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log
> 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: 
> java.io.IOException: No live nodes contain block 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking 
> nodes = [], ignoredNodes = null No live nodes contain current block Block 
> locations: Dead nodes: . Will get new block locations from namenode and 
> retry...
> 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 
> IOException, will wait for 4988.873277172643 msec.
> 17/02/14 07:58:00 INFO hdfs.DFSClient: No node available for 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log
> 17/02/14 07:58:00 INFO hdfs.DFSClient: Could not obtain 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: 
> java.io.IOException: No live nodes contain block 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking 
> nodes = [], ignoredNodes = null No live nodes contain current block Block 
> locations: Dead nodes: . Will get new block locations from namenode and 
> retry...
> 17/02/14 07:58:00 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 
> IOException, will wait for 8598.311122824263 msec.
> 17/02/14 07:58:09 WARN hdfs.DFSClient: Could not obtain block: 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log No live nodes contain current block Block 
> locations: Dead nodes: . Throwing a BlockMissingException
> 17/02/14 07:58:09 WARN hdfs.DFSClient: Could not obtain block: 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log No live nodes contain current block Block 
> locations: Dead nodes: . Throwing a BlockMissingException
> 17/02/14 07:58:09 WARN hdfs.DFSClient: DFS Read
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:983)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
>         at java.io.DataInputStream.read(DataInputStream.java:100)
>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
>         at 
> org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
>         at 
> org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
>         at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
>         at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
>         at 
> org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
>         at 
> org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
>         at 
> org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
>         at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
>         at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
> cat: Could not obtain block: 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log
>
> When I do : hdfs fsck / , I get:
>
> Total size:    667782677 B
>  Total dirs:    406
>  Total files:   44485
>  Total symlinks:                0
>  Total blocks (validated):      43767 (avg. block size 15257 B)
>   ********************************
>   UNDER MIN REPL'D BLOCKS:      43766 (99.99772 %)
>   dfs.namenode.replication.min: 1
>   CORRUPT FILES:        43766
>   MISSING BLOCKS:       43766
>   MISSING SIZE:         667781648 B
>   CORRUPT BLOCKS:       43766
>   ********************************
>  Minimally replicated blocks:   1 (0.0022848265 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       0 (0.0 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     6.8544796E-5
>  Corrupt blocks:                43766
>  Missing replicas:              0 (0.0 %)
>  Number of data-nodes:          3
>  Number of racks:               1
> FSCK ended at Tue Feb 14 07:59:10 UTC 2017 in 932 milliseconds
>
>
> The filesystem under path '/' is CORRUPT
>
> That means, all my files got corrupted somehow.
>
> I want to recover my HDFS and fix the corrupt health status. Also, I would
> like to understand, how such an issue occurred suddenly and how to prevent
> it in future?
>
>
> Thanks
>
> Nishant Verma
>



-- 
Philippe Kernévez



Directeur technique (Suisse),
pkerne...@octo.com
+41 79 888 33 32

Retrouvez OCTO sur OCTO Talk : http://blog.octo.com
OCTO Technology http://www.octo.com

Reply via email to