write to most datanode fail quickly
HiI'm using hbase with about 20 regionserver. And one regionserver failed to write most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. logs like this:java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681) then serveral firstBadLink error 2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090) then serveral Failed to add a datanode2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010]) the full log is in http://paste2.org/xfn16jm2Any suggestion will be appreciated. Thanks.
Re: write to most datanode fail quickly
Which Hadoop release are you using ? Have you run fsck ? Cheers On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote: Hi I'm using hbase with about 20 regionserver. And one regionserver failed to write most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. logs like this: java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681) 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681) then serveral firstBadLink error 2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090) then serveral Failed to add a datanode 2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing java.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010]) the full log is in http://paste2.org/xfn16jm2 Any suggestion will be appreciated. Thanks.
RE: write to most datanode fail quickly
I'm using Hadoop 2.0.0 and not run fsck. only one regionserver have these dfs logs, strange. Thanks CC: user@hadoop.apache.org From: yuzhih...@gmail.com Subject: Re: write to most datanode fail quickly Date: Tue, 14 Oct 2014 02:43:26 -0700 To: user@hadoop.apache.org Which Hadoop release are you using ? Have you run fsck ? Cheers On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote: HiI'm using hbase with about 20 regionserver. And one regionserver failed to write most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. logs like this:java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681) then serveral firstBadLink error 2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090) then serveral Failed to add a datanode2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010]) the full log is in http://paste2.org/xfn16jm2Any suggestion will be appreciated. Thanks.
Re: write to most datanode fail quickly
Can you check NameNode log for 132.228.48.20 ? Have you turned on short circuit read ? Cheers On Oct 14, 2014, at 3:00 AM, sunww spe...@outlook.com wrote: I'm using Hadoop 2.0.0 and not run fsck. only one regionserver have these dfs logs, strange. Thanks CC: user@hadoop.apache.org From: yuzhih...@gmail.com Subject: Re: write to most datanode fail quickly Date: Tue, 14 Oct 2014 02:43:26 -0700 To: user@hadoop.apache.org Which Hadoop release are you using ? Have you run fsck ? Cheers On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote: Hi I'm using hbase with about 20 regionserver. And one regionserver failed to write most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. logs like this: java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681) 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681) then serveral firstBadLink error 2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090) then serveral Failed to add a datanode 2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing java.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010]) the full log is in http://paste2.org/xfn16jm2 Any suggestion will be appreciated. Thanks.
RE: write to most datanode fail quickly
Hi dfs.client.read.shortcircuit is true. this is namenode log at that moment:http://paste2.org/U0zDA9ms It seems like there is no special in namenode log. Thanks CC: user@hadoop.apache.org From: yuzhih...@gmail.com Subject: Re: write to most datanode fail quickly Date: Tue, 14 Oct 2014 03:09:24 -0700 To: user@hadoop.apache.org Can you check NameNode log for 132.228.48.20 ? Have you turned on short circuit read ? Cheers On Oct 14, 2014, at 3:00 AM, sunww spe...@outlook.com wrote: I'm using Hadoop 2.0.0 and not run fsck. only one regionserver have these dfs logs, strange. Thanks CC: user@hadoop.apache.org From: yuzhih...@gmail.com Subject: Re: write to most datanode fail quickly Date: Tue, 14 Oct 2014 02:43:26 -0700 To: user@hadoop.apache.org Which Hadoop release are you using ? Have you run fsck ? Cheers On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote: HiI'm using hbase with about 20 regionserver. And one regionserver failed to write most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. logs like this:java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681) then serveral firstBadLink error 2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090) then serveral Failed to add a datanode2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010]) the full log is in http://paste2.org/xfn16jm2Any suggestion will be appreciated. Thanks.
Re: write to most datanode fail quickly
132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you posted. I don't see error or exception either. Perhaps search in wider scope. On Tue, Oct 14, 2014 at 5:36 AM, sunww spe...@outlook.com wrote: Hi dfs.client.read.shortcircuit is true. this is namenode log at that moment: http://paste2.org/U0zDA9ms It seems like there is no special in namenode log. Thanks -- CC: user@hadoop.apache.org From: yuzhih...@gmail.com Subject: Re: write to most datanode fail quickly Date: Tue, 14 Oct 2014 03:09:24 -0700 To: user@hadoop.apache.org Can you check NameNode log for 132.228.48.20 ? Have you turned on short circuit read ? Cheers On Oct 14, 2014, at 3:00 AM, sunww spe...@outlook.com wrote: I'm using Hadoop 2.0.0 and not run fsck. only one regionserver have these dfs logs, strange. Thanks -- CC: user@hadoop.apache.org From: yuzhih...@gmail.com Subject: Re: write to most datanode fail quickly Date: Tue, 14 Oct 2014 02:43:26 -0700 To: user@hadoop.apache.org Which Hadoop release are you using ? Have you run fsck ? Cheers On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote: Hi I'm using hbase with about 20 regionserver. And one regionserver failed to write most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. logs like this: java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681) 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681) then serveral firstBadLink error 2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090) then serveral Failed to add a datanode 2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing java.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[ 132.228.248.17:50010, 132.228.248.35:50010], original=[ 132.228.248.17:50010, 132.228.248.35:50010]) the full log is in http://paste2.org/xfn16jm2 Any suggestion will be appreciated. Thanks.
RE: write to most datanode fail quickly
Hithe correct ip is 132.228.248.20.I check hdfs log in the dead regionserver, it have some error message, maybe it's useful. http://paste2.org/NwpcaGVv Thanks Date: Tue, 14 Oct 2014 10:28:31 -0700 Subject: Re: write to most datanode fail quickly From: yuzhih...@gmail.com To: user@hadoop.apache.org 132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you posted. I don't see error or exception either. Perhaps search in wider scope. On Tue, Oct 14, 2014 at 5:36 AM, sunww spe...@outlook.com wrote: Hi dfs.client.read.shortcircuit is true. this is namenode log at that moment:http://paste2.org/U0zDA9ms It seems like there is no special in namenode log. Thanks CC: user@hadoop.apache.org From: yuzhih...@gmail.com Subject: Re: write to most datanode fail quickly Date: Tue, 14 Oct 2014 03:09:24 -0700 To: user@hadoop.apache.org Can you check NameNode log for 132.228.48.20 ? Have you turned on short circuit read ? Cheers On Oct 14, 2014, at 3:00 AM, sunww spe...@outlook.com wrote: I'm using Hadoop 2.0.0 and not run fsck. only one regionserver have these dfs logs, strange. Thanks CC: user@hadoop.apache.org From: yuzhih...@gmail.com Subject: Re: write to most datanode fail quickly Date: Tue, 14 Oct 2014 02:43:26 -0700 To: user@hadoop.apache.org Which Hadoop release are you using ? Have you run fsck ? Cheers On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote: HiI'm using hbase with about 20 regionserver. And one regionserver failed to write most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. logs like this:java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681) then serveral firstBadLink error 2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090) then serveral Failed to add a datanode2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010]) the full log is in http://paste2.org/xfn16jm2Any suggestion will be appreciated. Thanks.