[ https://issues.apache.org/jira/browse/HDDS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204623#comment-17204623 ]
Nicholas Jiang commented on HDDS-4237: -------------------------------------- [~amaliujia], I could work for this issue together with you. > Testing Infrastructure Random Failures > -------------------------------------- > > Key: HDDS-4237 > URL: https://issues.apache.org/jira/browse/HDDS-4237 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Reporter: Rui Wang > Priority: Major > > Network partitioning can cause brian-split case where there are two leaders > exist. We need some sort of testing Infrastructure/framework to simulate such > case and verify whether our SCM HA implementation can achieve strong > consistency under partitioned network. > There might be two ways suggested by Mukul Kumar Singh: > a) Blockade tests, blockade is a docker based framework where the > network for one DN can be isolated from the other > b) MiniOzoneChaosCluster - This is a unit test based test, where a > random datanode was killed and this helped in finding out issues with > the consistency. > We might need similar solution for SCM: block SCM leader network and also > increase timeout to make old leader do not turn into candidate. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org