[ 
https://issues.apache.org/jira/browse/HDDS-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-4388:
--------------------------------------
    Description: Currently, in ratis "writeStateMachinecall" gets retried 
indefinitely in event of a timeout. In case, where disks are slow/overloaded or 
number of chunk writer threads are not available for a period of 10s, 
writeStateMachine call times out in 10s. In cases like these, the same write 
chunk keeps on getting retried causing the same chunk of data to be 
overwritten. The idea here is to abort the request once the node failure 
timeout reaches.  (was: Currently, in ratis "writeStateMachinecall" gets 
retried indefinitely in event of a timeout. In case, where disks are 
slow/overloaded or number of chunk writer threads are not available for a 
period of 10s, writeStateMachine call times out in 10s. In cases like these, 
the same write chunk keeps on getting retried causing the same chink of data to 
be overwritten. The idea here is to abort the request once the node failure 
timeout reaches.)

> Make writeStateMachineTimeout retry count proportional to node failure timeout
> ------------------------------------------------------------------------------
>
>                 Key: HDDS-4388
>                 URL: https://issues.apache.org/jira/browse/HDDS-4388
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.1.0
>
>
> Currently, in ratis "writeStateMachinecall" gets retried indefinitely in 
> event of a timeout. In case, where disks are slow/overloaded or number of 
> chunk writer threads are not available for a period of 10s, writeStateMachine 
> call times out in 10s. In cases like these, the same write chunk keeps on 
> getting retried causing the same chunk of data to be overwritten. The idea 
> here is to abort the request once the node failure timeout reaches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to