[jira] [Commented] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-16 Thread Daryn Sharp (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382288#comment-17382288
 ] 

Daryn Sharp commented on HDFS-16127:


+1 this has become a source of very nasty problems.

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Major
> Attachments: HDFS-16127.patch
>
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-14 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380634#comment-17380634
 ] 

Kihwal Lee commented on HDFS-16127:
---

There is no unit test in the patch. It isn't easy to instrument this particular 
bug in a unit test without intrusive instrumentation. 

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Major
> Attachments: HDFS-16127.patch
>
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380477#comment-17380477
 ] 

Hadoop QA commented on HDFS-16127:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 
57s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 33s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 18m 
27s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
33s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 42s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private 

[jira] [Commented] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-13 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379964#comment-17379964
 ] 

Kihwal Lee commented on HDFS-16127:
---

The proposed solution is to check the size of {{ackQueue}} when 
{{waitForAllAcks()}} for the final packet throws an {{IOException}}. If the 
queue is empty we can assume the last ack was received and the final packet for 
the block was removed from the queue, meaning no recovery is needed.

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Major
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-13 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379961#comment-17379961
 ] 

Kihwal Lee commented on HDFS-16127:
---

*How the bug manifests*
In DataStreamer thread's main loop, it flushes everything and waits for all 
acks are received when it sees an empty close packet from dataQueue. Then it 
proceeds to send the close packet to signal datanodes to finalize the replicas. 
It also waits for the ack to the final packet by calling waitForAllAcks(). 
Prior to HDFS-15813, it involved no network activities, but network failures 
became possible after this. The following is the client log entry from one of 
the failure/data loss cases.

{noformat}
org.apache.hadoop.hdfs.DataStreamer: DataStreamer Exception
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.DataStreamer.sendPacket(DataStreamer.java:857)
at 
org.apache.hadoop.hdfs.DataStreamer.sendHeartbeat(DataStreamer.java:875)
at 
org.apache.hadoop.hdfs.DataStreamer.waitForAllAcks(DataStreamer.java:845)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:798)
{noformat}

This exception resulted in a close recovery, because the pipeline stage was set 
to PIPELINE_CLOSE at this point. However, the client had no error coming from 
its ResponseProcessor, meaning that it has actually received the final ack and 
removed the final packet from ackQueue. Since it shut itself down cleanly, 
there was no sign of read or write error from this thread.

The following is the main part of the close recovery code, after a connection 
is successfully established.
{code:java}
  DFSPacket endOfBlockPacket = dataQueue.remove();  // remove the end 
of block packet
  assert endOfBlockPacket.isLastPacketInBlock();
  assert lastAckedSeqno == endOfBlockPacket.getSeqno() - 1;
  lastAckedSeqno = endOfBlockPacket.getSeqno();
  pipelineRecoveryCount = 0;
  dataQueue.notifyAll();
{code}

The asserts would have prevented the bug from propagating if they were active. 
(They are only active during testing.) It blindly requeues the content of 
ackQueue, thinking the unACKed final packet is still there. In this failure 
case, there is none, as the final packet was actually ACKed. The datanodes 
normally closed the connections, which resulted in "connection rest" for the 
data streamer stuckin in sending a heartbeat. The recovery simply dequeues one 
packet and tosses it away. After all, this packet is supposed to have no data. 
It even erroneously updates lastAckedSeqno.

At this point the first packet belonging to the next block has been thrown 
away. For the next block write, datanodes complains that the first packet's 
offset is non-zero. This is irrecoverable and after 5 times of retries, the 
data write fails.

*Data loss*
These errors usually cause a permanent write failure due to inability to write 
further with a first packet (actually the 2nd one, but the bug makes it the 
first one to be sent) of the block starting at non-zero offset. Thus the errors 
are propagated back to the user and results in a task attempt failure, etc.

However, if the remaining bytes to write to a new block fits in one packet, 
something worse happens. If the client encounters the close-recovery bug, the 
one data packet that gets dropped by the faulty recovery is the only data 
packet for the next block. The remaining packet after that is the final close 
packet for the next block. When the client continues to the next block, the 
datanode rejects the write as the close packet has a non-zero offset.

But instead of causing a permanent write failure, the client now enters a 
close-recovery phase, since the pipeline stage was set to PIPELINE_CLOSE while 
sending the packet. The new connection header tells datanodes that it is for a 
close recovery so they simply closes the zero-byte block file with the 
specified gen stamp. The recovery will appear successful, so the client gets no 
error