[jira] [Updated] (HDDS-3039) SCM sometimes cannot exit safe mode

Arpit Agarwal (Jira) Thu, 11 Jun 2020 10:46:25 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arpit Agarwal updated HDDS-3039:
--------------------------------
    Labels: Triaged  (was: TriagePending)

> SCM sometimes cannot exit safe mode
> -----------------------------------
>
>                 Key: HDDS-3039
>                 URL: https://issues.apache.org/jira/browse/HDDS-3039
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Attila Doroszlai
>            Priority: Critical
>              Labels: Triaged
>
> Sometimes SCM cannot exit safe mode:
> {code:title=https://github.com/apache/hadoop-ozone/pull/563/checks?check_run_id=453543576}
> 2020-02-18T19:12:28.1108180Z [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 139.821 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.fsck.TestContainerMapper
> 2020-02-18T19:12:28.1169327Z [ERROR] 
> org.apache.hadoop.ozone.fsck.TestContainerMapper  Time elapsed: 139.813 s  
> <<< ERROR!
> 2020-02-18T19:12:28.1202534Z java.util.concurrent.TimeoutException: 
> ...
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.waitForClusterToBeReady(MiniOzoneClusterImpl.java:164)
>   at 
> org.apache.hadoop.ozone.fsck.TestContainerMapper.init(TestContainerMapper.java:71)
> {code}
> despite nodes and pipeline being ready:
> {code}
> 2020-02-18 19:10:18,045 [main] INFO  ozone.MiniOzoneClusterImpl 
> (MiniOzoneClusterImpl.java:lambda$waitForClusterToBeReady$0(169)) - Nodes are 
> ready. Got 3 of 3 DN Heartbeats.
> ...
> 2020-02-18 19:10:18,847 [RatisPipelineUtilsThread] INFO  
> pipeline.PipelineStateManager (PipelineStateManager.java:addPipeline(54)) - 
> Created pipeline Pipeline[ Id: b56478a3-8816-459e-a007-db5ee4a5572e, Nodes: 
> 86e97873-2dbd-4f1b-b418-cf9fba405476{ip: 172.17.0.2, host: bedb6e0ff851, 
> networkLocation: /default-rack, certSerialId: 
> null}0fb407c1-4cda-4b3e-8e64-20c845872684{ip: 172.17.0.2, host: bedb6e0ff851, 
> networkLocation: /default-rack, certSerialId: 
> null}31baa82d-441c-41be-94c9-8dd7468b728e{ip: 172.17.0.2, host: bedb6e0ff851, 
> networkLocation: /default-rack, certSerialId: null}, Type:RATIS, 
> Factor:THREE, State:ALLOCATED, leaderId:null ]
> ...
> 2020-02-18 19:12:17,108 [main] INFO  ozone.MiniOzoneClusterImpl 
> (MiniOzoneClusterImpl.java:lambda$waitForClusterToBeReady$0(169)) - Nodes are 
> ready. Got 3 of 3 DN Heartbeats.
> 2020-02-18 19:12:17,108 [main] INFO  ozone.MiniOzoneClusterImpl 
> (MiniOzoneClusterImpl.java:lambda$waitForClusterToBeReady$0(172)) - Waiting 
> for cluster to exit safe mode
> 2020-02-18 19:12:17,151 [main] INFO  ozone.MiniOzoneClusterImpl 
> (MiniOzoneClusterImpl.java:shutdown(370)) - Shutting down the Mini Ozone 
> Cluster
> {code}
> [~shashikant] also noticed this in other integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-3039) SCM sometimes cannot exit safe mode

Reply via email to