[ https://issues.apache.org/jira/browse/HDDS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205210#comment-17205210 ]
Rui Wang commented on HDDS-4237: -------------------------------- Hi Nicholas, that's great! This Jira is supposed to find more bugs and there will also work to to during/after Jira. All will need people to work on :) > Testing Infrastructure Random Failures > -------------------------------------- > > Key: HDDS-4237 > URL: https://issues.apache.org/jira/browse/HDDS-4237 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Reporter: Rui Wang > Priority: Major > > Network partitioning can cause brian-split case where there are two leaders > exist. We need some sort of testing Infrastructure/framework to simulate such > case and verify whether our SCM HA implementation can achieve strong > consistency under partitioned network. > There might be two ways suggested by Mukul Kumar Singh: > a) Blockade tests, blockade is a docker based framework where the > network for one DN can be isolated from the other > b) MiniOzoneChaosCluster - This is a unit test based test, where a > random datanode was killed and this helped in finding out issues with > the consistency. > We might need similar solution for SCM: block SCM leader network and also > increase timeout to make old leader do not turn into candidate. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org