[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252612#comment-15252612
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

I have posted a new patch, which I posted as HDFS-10301.002.patch.  The idea 
here is that we know the number of storage reports we expect to see in the 
block report.  We should not be removing any storages as zombies unless we have 
seen this number of storages and marked these storages with the ID of the 
latest block report.

I feel that this approach is better than the one used in 001.patch, since it 
correctly handles the "interleaved" case.  It is very difficult to prove that 
we can never get interleaved storage reports for the DataNode.  This is because 
of issues like queuing inside the RPCs system, packets getting reordered or 
delayed by the network, and queuing inside the deferred work mechanism added by 
HDFS-9198.  So we should handle this case correctly.

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.01.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252475#comment-15252475
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

Hmm.  This is a challenging one.  [~walter.k.su], I think I agree that the 
queue added in HDFS-9198 might be part of the problem here.  In CDH, we haven't 
yet backported the deferred queuing stuff implemented in HDFS-9198, which might 
explain why we never saw this.  Since we don't have a queue, and since NN RPCs 
are almost always handled in the order they arrive, CDH5 doesn't implement 
"reordering" of resent storage reports.

Independently of this bug, I do think it's concerning that the DN keeps piling 
on retransmissions of FBRs even before the old ones were processed and 
acknowledged.  This kind of behavior will obviously lead to congestion collapse 
if congestion is what caused the original FBRs to be processed but not 
acknowledged.

{code}
void enqueue(List actions) throws InterruptedException {
  synchronized (queue) {
for (Runnable action : actions) {
  if (!queue.offer(action)) {
if (!isAlive() && namesystem.isRunning()) {
  ExitUtil.terminate(1, getName() + " is not running");
}
long now = Time.monotonicNow();
if (now - lastFull > 4000) {
  lastFull = now;
  LOG.info("Block report queue is full");
}
queue.put(action);
  }
}
  }
}
  }
{code}
This is going to be problematic when contention gets high, because threads will 
spend a long time waiting to enter the {{synchronized (queue)}} section.  And 
this will not be logged or reflected back to the admin in any way.  
Unfortunately, the operation that you want here, the ability to atomically add 
a bunch of items to the {{BlockingQueue}}, simply is not provided by 
{{BlockingQueue}}.  The solution also seems somewhat brittle since reordering 
could happen because of network issues in a multi-RPC BlockReport.

I'm thinking about this a little more, and it seems like the root of the 
problem is that in the single-RPC case, we're throwing away the information 
about how many storages were in the original report.  We need to find a way to 
include that information in there...

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252377#comment-15252377
 ] 

Konstantin Shvachko commented on HDFS-10301:


Hey Walter, your patch looks good by itself, but it does not address the bug in 
the zombie storage recognition.
Took me some time to review your patch, would have been easier if you explained 
your approach.
So your patch is reordering block reports for different storages in such a way 
that storages from the same report are placed as a contiguous segment in the 
block report queue, so that processing of different BRs is not interleaved. 
This addresses Daryn's comment rather than solving the reported bug, as BTW 
Daryn correctly stated.
If you want to go forward with reordering of BRs you should probably do it in 
another issue. I personally am not a supporter because
# It introduces an unnecessary restriction on the order of execution of block 
reports, and
# adds even more complexity to BR processing logic.

I see the main problem here that block reports used to be idempotent per 
storage, but HDFS-7960 made execution of a subsequent storage dependent on the 
state produced during execution of the previous ones. I think idempotent is 
good, and we should keep it. I think we can mitigate the problem by one of the 
following
# Changing the criteria of zombie storage recognition. Why should it depend on 
block report IDs?
# Eliminating the notion of zombie storage altogether. E.g., NN can DN to run 
{{DirectoryScanner}} if NN thinks DN's state is outdated.
# Try to move {{curBlockReportId}} from {{DatanodeDescriptor}} to 
{{StorageInfo}}, which will eliminate global state between storages.

Also if we cannot come up with a quick solution, then we should probably roll 
back HDFS-7960 for now and revisit it later, because this is a critical bug 
effecting all of our latest releases. And that is a lot of clusters and PBs out 
there.

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251275#comment-15251275
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

Thanks for the bug report.  This is a tricky one.

One small correction-- HDFS-7960 was not introduced as part of DataNode 
hotswap.  It was originally introduced to solve issues caused by HDF-7575, 
although it fixed issues with hotswap as well.

It seems like we should be able to remove existing DataNode storage report RPCs 
with the old ID from the queue when we receive one with a new block report ID.  
This would also avoid a possible congestion collapse scenario caused by 
repeated retransmissions after the timeout.

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247390#comment-15247390
 ] 

Hadoop QA commented on HDFS-10301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 15s {color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77 with JDK v1.8.0_77 
generated 1 new + 32 unchanged - 1 fixed = 33 total (was 33) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 54s {color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95 with JDK v1.7.0_95 
generated 1 new + 34 unchanged - 1 fixed = 35 total (was 35) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 15s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 36s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 52s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 133m 26s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Synchronization performed on java.util.concurrent.ArrayBlockingQueue in 

[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247129#comment-15247129
 ] 

Walter Su commented on HDFS-10301:
--

Oh, I see. In this case, the reports are not splitted. And because the for-loop 
is outside the lock, the 2 for-loops interleaved.
{code}
for (int r = 0; r < reports.length; r++) {
{code}

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246996#comment-15246996
 ] 

Walter Su commented on HDFS-10301:
--

1. IPC reader is single-thread by default. If it's multi-threaded, The order of 
putting rpc requests into {{callQueue}} is unspecified.
1. IPC {{callQueue}} is fifo.
2. IPC Handler is multi-threaded. If 2 handlers are both waiting the fsn lock, 
the entry order depends on the fairness of the lock.
bq. When constructed as fair, threads contend for entry using an 
*approximately* arrival-order policy. When the currently held lock is released 
either the longest-waiting single writer thread will be assigned the write 
lock... (quore from 
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html)

I think if DN can't get acked from NN, it shouldn't assume the 
arrival/processing order(esp when reestablish a connection). Well, I'm still 
curious about how the interleave happened. Any thoughts?

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246571#comment-15246571
 ] 

Konstantin Shvachko commented on HDFS-10301:


Hey Daryn, not sure how HDFS-9198 eliminates it from occurring. DataNodes are 
still waiting for NN to process each BR, so they can timeout and send the same 
block report multiple times. On the NN side, BR ops processing is 
multi-threaded, so it can still interleave processing storages from different 
reports. Could you please clarify, what am I missing?

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-18 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245867#comment-15245867
 ] 

Daryn Sharp commented on HDFS-10301:


Enabling HDFS-9198 will fifo process BRs.  It doesn't solve this implementation 
bug but virtually eliminates it from occurring.

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244890#comment-15244890
 ] 

Konstantin Shvachko commented on HDFS-10301:


My DN has the following six storages:
{code}
DS-019298c0-aab9-45b4-8b62-95d6809380ff:NORMAL:kkk.sss.22.105
DS-0ea95238-d9ba-4f62-ae18-fdb9333465ce:NORMAL:kkk.sss.22.105
DS-191fc04b-90be-42c9-b6fb-fdd1517bf4c7:NORMAL:kkk.sss.22.105
DS-4a2e91c7-cdf0-408b-83a6-286c3534d673:NORMAL:kkk.sss.22.105
DS-5b2941f7-2b52-45a8-b135-dcbe488cc65b:NORMAL:kkk.sss.22.105
DS-6849f605-fd83-462d-97c3-cb6949383f7e:NORMAL:kkk.sss.22.105
{code}
Here are the logs for its block reports. All throw the same exception, but I 
pasted it only once.
{code}
2016-04-12 22:31:58,931 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Unsuccessfully sent block report 0x283d25423fb64d,  containing 6 storage 
report(s), of which we sent 0. The reports had 81565 total blocks and used 0 
RPC(s). This took 19 msec to generate and 60078 msecs for RPC and NN 
processing. Got back no commands.
2016-04-12 22:31:58,931 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in offerService
java.net.SocketTimeoutException: Call From 
dn-hcl1264.my.cluster.com/kkk.sss.22.105 to namenode-ha1.my.cluster.com:9000 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/kkk.sss.22.105:10101 
remote=namenode-ha1.my.cluster.com/10.150.1.56:9000]; For more details see:  
http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750)
at org.apache.hadoop.ipc.Client.call(Client.java:1473)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy12.blockReport(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:178)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:494)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:732)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:872)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/kkk.sss.22.105:10101 
remote=namenode-ha1.my.cluster.com/10.150.1.56:9000]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)

2016-04-12 22:32:59,179 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Unsuccessfully sent block report 0x283d334a100bde,  containing 6 storage 
report(s), of which we sent 0. The reports had 81565 total blocks and used 0 
RPC(s). This took 17 msec to generate and 60066 msecs for RPC and NN 
processing. Got back no commands.
2016-04-12 22:33:59,311 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Unsuccessfully sent block report 0x283d414ae386b2,  containing 6 storage 
report(s), of which we sent 0. The reports had 81565 total blocks and used 0 
RPC(s). This took 16 msec to generate and 60055 msecs for RPC and NN 
processing. Got back no commands.
2016-04-12 22:34:59,409 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Unsuccessfully sent block report 0x283d4f4a605732,  containing 6 storage 
report(s), of which we sent 0. The reports had 81565 total blocks and used 0 

[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244876#comment-15244876
 ] 

Konstantin Shvachko commented on HDFS-10301:


More details.
# My DataNode has 6 storages. It sends a block report and times out, then it 
sends the same block report five more times with different blockReportIds.
# The NameNode starts executing all six reports around the same time, and 
interleaves them, that is it processes the first storage of BR2 before it 
process the last storage of BR1. (Color coded logs are coming)
# While processing storages from BR2 NameNode changes the lastBlockReportId 
field to the id of BR2. This messes with processing storages from BR1, which 
have not been processed yet. Namely these storages are considered zombie, and 
all replicas are removed from those storages along with the storage itself.
# The storage is then reconstructed by the NameNode when it receives a 
heartbeat from the DataNode, but this storage is marked as "stale", but the 
replicas will not be reconstructed until the next block report, which in my 
case is a few hours later.
# I noticed missing blocks because several DataNodes exhibited the same 
behavior and all replicas of the same block were lost.
# The replicas eventually reappeared (several hours later), because DataNodes 
do not physically remove the replicas and report them in the next block report.

The behavior was introduced by HDFS-7960 as a part of hot-swap feature. I did 
not do hot-swap, and did not failover the NameNode.

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)