[
https://issues.apache.org/jira/browse/HDDS-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073855#comment-17073855
]
Attila Doroszlai commented on HDDS-3313:
----------------------------------------
I have no idea what's happening in _failure 2_ case:
{code:title=Restart OM and Verify Ratis Logs}
09:42:56.649 DEBUG Test timeout 8 minutes active. 474.324 seconds left.
09:42:56.651 INFO Running command 'ozone sh key put
o3://omservice/volume1/bucket1/testOMRestart_0_0 NOTICE.txt 2>&1'.
09:50:50.974 FAIL Test timeout 8 minutes exceeded.
{code}
{code:title=log excerpt from leader OM}
2020-04-02 09:42:52 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om3:
nextIndex: updateUnconditionally 59 -> 1
2020-04-02 09:42:52 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:52 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:53 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:53 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:53 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:53 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:53 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om3:
nextIndex: updateUnconditionally 59 -> 51
2020-04-02 09:42:54 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:54 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:54 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:54 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:55 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:55 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:55 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:55 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:56 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:56 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:56 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:56 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:56 WARN GrpcLogAppender:212 -
om1@group-D66704EFC61C->om3-GrpcLogAppender: appendEntries Timeout,
request=AppendEntriesRequest:cid=71,entriesCount=8,lastEntry=(t:14, i:58)
2020-04-02 09:42:57 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:57 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 58 -> 57
2020-04-02 09:42:57 WARN GrpcLogAppender:122 -
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
2020-04-02 09:42:57 INFO FollowerInfo:50 - om1@group-D66704EFC61C->om2:
nextIndex: updateUnconditionally 59 -> 58
{code}
https://github.com/adoroszlai/hadoop-ozone/runs/554504875
> OM HA acceptance test is flaky
> ------------------------------
>
> Key: HDDS-3313
> URL: https://issues.apache.org/jira/browse/HDDS-3313
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: test
> Reporter: Attila Doroszlai
> Assignee: Hanisha Koneru
> Priority: Critical
> Attachments: acceptance.zip
>
>
> {{ozone-om-ha}} test is failing intermittently. Example on master:
> https://github.com/apache/hadoop-ozone/runs/549544110
> {code:title=failure 1}
> 2020-03-31T19:34:02.3757399Z
> ==============================================================================
> 2020-03-31T19:34:02.3762775Z ozone-om-ha-testOMHA :: Smoketest ozone cluster
> startup
> 2020-03-31T19:34:02.3763313Z
> ==============================================================================
> 2020-03-31T19:34:07.9174050Z Stop Leader OM and Verify Failover
> | FAIL |
> 2020-03-31T19:34:07.9174675Z 255 != 0
> 2020-03-31T19:34:07.9176048Z
> ------------------------------------------------------------------------------
> 2020-03-31T19:34:37.4682717Z Test Multiple Failovers
> | FAIL |
> 2020-03-31T19:34:37.4682899Z 1 != 0
> 2020-03-31T19:34:37.4683766Z
> ------------------------------------------------------------------------------
> 2020-03-31T19:35:24.9569154Z Restart OM and Verify Ratis Logs
> | FAIL |
> 2020-03-31T19:35:24.9569529Z 255 != 0
> 2020-03-31T19:35:24.9574925Z
> ------------------------------------------------------------------------------
> 2020-03-31T19:35:24.9575613Z ozone-om-ha-testOMHA :: Smoketest ozone cluster
> startup | FAIL |
> 2020-03-31T19:35:24.9575952Z 3 critical tests, 0 passed, 3 failed
> 2020-03-31T19:35:24.9576076Z 3 tests total, 0 passed, 3 failed
> {code}
> {code:title=failure 2}
> 2020-03-31T20:36:29.5715868Z
> ==============================================================================
> 2020-03-31T20:36:29.5743517Z ozone-om-ha-testOMHA :: Smoketest ozone cluster
> startup
> 2020-03-31T20:36:29.5744025Z
> ==============================================================================
> 2020-03-31T20:37:08.4625840Z Stop Leader OM and Verify Failover
> | PASS |
> 2020-03-31T20:37:08.4626644Z
> ------------------------------------------------------------------------------
> 2020-03-31T20:39:47.9721513Z Test Multiple Failovers
> | PASS |
> 2020-03-31T20:39:47.9723424Z
> ------------------------------------------------------------------------------
> 2020-03-31T21:25:29.1203036Z Restart OM and Verify Ratis Logs
> | FAIL |
> 2020-03-31T21:25:29.1204001Z Test timeout 8 minutes exceeded.
> 2020-03-31T21:25:29.1204954Z
> ------------------------------------------------------------------------------
> 2020-03-31T21:25:29.1220689Z ozone-om-ha-testOMHA :: Smoketest ozone cluster
> startup | FAIL |
> 2020-03-31T21:25:29.1224446Z 3 critical tests, 2 passed, 1 failed
> 2020-03-31T21:25:29.1224833Z 3 tests total, 2 passed, 1 failed
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]