[ 
https://issues.apache.org/jira/browse/HDDS-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073855#comment-17073855
 ] 

Attila Doroszlai commented on HDDS-3313:
----------------------------------------

I have no idea what's happening in _failure 2_ case:

{code:title=Restart OM and Verify Ratis Logs}
09:42:56.649    DEBUG   Test timeout 8 minutes active. 474.324 seconds left.    
09:42:56.651    INFO    Running command 'ozone sh key put 
o3://omservice/volume1/bucket1/testOMRestart_0_0 NOTICE.txt 2>&1'.    
09:50:50.974    FAIL    Test timeout 8 minutes exceeded.
{code}

{code:title=log excerpt from leader OM}
2020-04-02 09:42:52 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om3: 
nextIndex: updateUnconditionally 59 -> 1
2020-04-02 09:42:52 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:52 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:53 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:53 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:53 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:53 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:53 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om3: 
nextIndex: updateUnconditionally 59 -> 51
2020-04-02 09:42:54 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:54 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:54 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:54 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:55 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:55 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:55 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:55 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:56 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:56 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:56 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:56 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 59 -> 58
2020-04-02 09:42:56 WARN  GrpcLogAppender:212 - 
om1@group-D66704EFC61C->om3-GrpcLogAppender:  appendEntries Timeout, 
request=AppendEntriesRequest:cid=71,entriesCount=8,lastEntry=(t:14, i:58)
2020-04-02 09:42:57 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:57 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 58 -> 57
2020-04-02 09:42:57 WARN  GrpcLogAppender:122 - 
om1@group-D66704EFC61C->om2-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2020-04-02 09:42:57 INFO  FollowerInfo:50 - om1@group-D66704EFC61C->om2: 
nextIndex: updateUnconditionally 59 -> 58
{code}

https://github.com/adoroszlai/hadoop-ozone/runs/554504875

> OM HA acceptance test is flaky
> ------------------------------
>
>                 Key: HDDS-3313
>                 URL: https://issues.apache.org/jira/browse/HDDS-3313
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: test
>            Reporter: Attila Doroszlai
>            Assignee: Hanisha Koneru
>            Priority: Critical
>         Attachments: acceptance.zip
>
>
> {{ozone-om-ha}} test is failing intermittently.  Example on master: 
> https://github.com/apache/hadoop-ozone/runs/549544110
> {code:title=failure 1}
> 2020-03-31T19:34:02.3757399Z 
> ==============================================================================
> 2020-03-31T19:34:02.3762775Z ozone-om-ha-testOMHA :: Smoketest ozone cluster 
> startup                       
> 2020-03-31T19:34:02.3763313Z 
> ==============================================================================
> 2020-03-31T19:34:07.9174050Z Stop Leader OM and Verify Failover               
>                      | FAIL |
> 2020-03-31T19:34:07.9174675Z 255 != 0
> 2020-03-31T19:34:07.9176048Z 
> ------------------------------------------------------------------------------
> 2020-03-31T19:34:37.4682717Z Test Multiple Failovers                          
>                      | FAIL |
> 2020-03-31T19:34:37.4682899Z 1 != 0
> 2020-03-31T19:34:37.4683766Z 
> ------------------------------------------------------------------------------
> 2020-03-31T19:35:24.9569154Z Restart OM and Verify Ratis Logs                 
>                      | FAIL |
> 2020-03-31T19:35:24.9569529Z 255 != 0
> 2020-03-31T19:35:24.9574925Z 
> ------------------------------------------------------------------------------
> 2020-03-31T19:35:24.9575613Z ozone-om-ha-testOMHA :: Smoketest ozone cluster 
> startup               | FAIL |
> 2020-03-31T19:35:24.9575952Z 3 critical tests, 0 passed, 3 failed
> 2020-03-31T19:35:24.9576076Z 3 tests total, 0 passed, 3 failed
> {code}
> {code:title=failure 2}
> 2020-03-31T20:36:29.5715868Z 
> ==============================================================================
> 2020-03-31T20:36:29.5743517Z ozone-om-ha-testOMHA :: Smoketest ozone cluster 
> startup                       
> 2020-03-31T20:36:29.5744025Z 
> ==============================================================================
> 2020-03-31T20:37:08.4625840Z Stop Leader OM and Verify Failover               
>                      | PASS |
> 2020-03-31T20:37:08.4626644Z 
> ------------------------------------------------------------------------------
> 2020-03-31T20:39:47.9721513Z Test Multiple Failovers                          
>                      | PASS |
> 2020-03-31T20:39:47.9723424Z 
> ------------------------------------------------------------------------------
> 2020-03-31T21:25:29.1203036Z Restart OM and Verify Ratis Logs                 
>                      | FAIL |
> 2020-03-31T21:25:29.1204001Z Test timeout 8 minutes exceeded.
> 2020-03-31T21:25:29.1204954Z 
> ------------------------------------------------------------------------------
> 2020-03-31T21:25:29.1220689Z ozone-om-ha-testOMHA :: Smoketest ozone cluster 
> startup               | FAIL |
> 2020-03-31T21:25:29.1224446Z 3 critical tests, 2 passed, 1 failed
> 2020-03-31T21:25:29.1224833Z 3 tests total, 2 passed, 1 failed
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to