[ 
https://issues.apache.org/jira/browse/HDFS-15813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278224#comment-17278224
 ] 

Kihwal Lee edited comment on HDFS-15813 at 2/3/21, 5:19 PM:
------------------------------------------------------------

As for the safety of the change, I feel confident since this has been running 
in our production for many years. But for preventing future breakages, we might 
need a unit test.

It might be possible to simulate the condition by delaying the ACK for last 
packet using {{DataNodeFaultInjector}}. If we do 1) write data 2) hflush, 3) 
enable the fault injector and 3) close(), the existing 
{{delaySendingAckToUpstream()}} may be utilized.  Just one datanode is needed. 
Even if the ACK is delayed beyond the value of "dfs.client.socket-timeout", the 
pipeline should not break. Because of the heartbeat packets, the BlockReceiver 
in the datanode won't get a read timeout while the client is waiting for the 
last ack.


was (Author: kihwal):
As for the safety of the change, I feel confident since this has been running 
in our production for many years. But for preventing future breakages, we might 
need a unit test.

It might be possible to simulate the condition by delaying the ACK for last 
packet using {{DataNodeFaultInjector}}. If we do 1) write data 2) hflush, 3) 
enable the fault injector and 3) close(), the existing 
{{delaySendingAckToUpstream()}} may be utilized.  Just one datanode is needed. 
Even if the ACK is delayed beyond the value of 
"ipc.client.connection.maxidletime", the pipeline should not break. Because of 
the heartbeat packets, the BlockReceiver in the datanode won't get a read 
timeout while the client is waiting for the last ack.

> DataStreamer: keep sending heartbeat packets while streaming
> ------------------------------------------------------------
>
>                 Key: HDFS-15813
>                 URL: https://issues.apache.org/jira/browse/HDFS-15813
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.4.0
>            Reporter: Jim Brennan
>            Assignee: Jim Brennan
>            Priority: Major
>         Attachments: HDFS-15813.001.patch, HDFS-15813.002.patch
>
>
> In response to [HDFS-5032], [~daryn] made a change to our internal code to 
> ensure that heartbeats continue during data steaming, even in the face of a 
> slow disk.
> As [~kihwal] noted, absence of heartbeat during flush will be fixed in a 
> separate jira.  It doesn't look like this change was ever pushed back to 
> apache, so I am providing it here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to