[jira] [Updated] (HDDS-3379) Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster

2020-06-29 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3379:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Clients unable to failover after the OzoneManager leader is restart in 
> MiniOzoneChaosCluster
> 
>
> Key: HDDS-3379
> URL: https://issues.apache.org/jira/browse/HDDS-3379
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster, TriagePending
>
> Clients unable to failover after the OzoneManager leader is restart in 
> MiniOzoneChaosCluster.
> This happens after the following restart events.
> {code}
> ➜  chaos-2020-04-11-21-51-52-IST egrep "iniOzoneHAClusterImp|Failures" 
> complete.log
> 2020-04-11 21:52:08,296 
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  
> ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC 
> server at localhost/127.0.0.1:10804
> 2020-04-11 21:52:08,387 
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  
> ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC 
> server at localhost/127.0.0.1:10810
> 2020-04-11 21:52:08,485 
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  
> ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC 
> server at localhost/127.0.0.1:10816
> 2020-04-11 21:52:22,845 
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  
> failure.Failures (FailureManager.java:start(66)) - starting failure manager 
> 60 60 SECONDS
> 2020-04-11 21:53:22,850 [pool-59-thread-1] INFO  failure.Failures 
> (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
> 2020-04-11 21:53:22,853 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down 
> OzoneManager omNode-3
> 2020-04-11 21:53:22,988 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting 
> OzoneManager omNode-3
>   at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
>   at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
>   at 
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
>   at 
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
> 2020-04-11 21:54:22,849 [pool-59-thread-1] INFO  failure.Failures 
> (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
> 2020-04-11 21:54:22,850 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down 
> OzoneManager omNode-1
> 2020-04-11 21:54:22,895 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting 
> OzoneManager omNode-1
>   at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
>   at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
>   at 
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
>   at 
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
> ➜  chaos-2020-04-11-21-51-52-IST
> {code}
> This results in the following exception.
> {code}
> 2020-04-11 21:54:24,201 [pool-360-thread-4] ERROR 
> loadgenerators.LoadExecutors (LoadExecutors.java:load(67)) - 
> FilesystemLoadGenerator LOADGEN: Exiting due to exception
> java.io.IOException: java.io.IOException: Could not determine or connect to 
> OM Leader.
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:199)
> at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
> at 
> org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.doPostOp(LoadBucket.java:176)
> at 
> org.apache.hadoop.ozone.utils.LoadBucket$Op.execute(LoadBucket.java:132)
> at 
> 

[jira] [Updated] (HDDS-3379) Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster

2020-06-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3379:

Target Version/s: 0.6.0
  Labels: MiniOzoneChaosCluster TriagePending  (was: 
MiniOzoneChaosCluster)

> Clients unable to failover after the OzoneManager leader is restart in 
> MiniOzoneChaosCluster
> 
>
> Key: HDDS-3379
> URL: https://issues.apache.org/jira/browse/HDDS-3379
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster, TriagePending
>
> Clients unable to failover after the OzoneManager leader is restart in 
> MiniOzoneChaosCluster.
> This happens after the following restart events.
> {code}
> ➜  chaos-2020-04-11-21-51-52-IST egrep "iniOzoneHAClusterImp|Failures" 
> complete.log
> 2020-04-11 21:52:08,296 
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  
> ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC 
> server at localhost/127.0.0.1:10804
> 2020-04-11 21:52:08,387 
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  
> ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC 
> server at localhost/127.0.0.1:10810
> 2020-04-11 21:52:08,485 
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  
> ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC 
> server at localhost/127.0.0.1:10816
> 2020-04-11 21:52:22,845 
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  
> failure.Failures (FailureManager.java:start(66)) - starting failure manager 
> 60 60 SECONDS
> 2020-04-11 21:53:22,850 [pool-59-thread-1] INFO  failure.Failures 
> (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
> 2020-04-11 21:53:22,853 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down 
> OzoneManager omNode-3
> 2020-04-11 21:53:22,988 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting 
> OzoneManager omNode-3
>   at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
>   at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
>   at 
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
>   at 
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
> 2020-04-11 21:54:22,849 [pool-59-thread-1] INFO  failure.Failures 
> (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
> 2020-04-11 21:54:22,850 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down 
> OzoneManager omNode-1
> 2020-04-11 21:54:22,895 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting 
> OzoneManager omNode-1
>   at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
>   at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
>   at 
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
>   at 
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
> ➜  chaos-2020-04-11-21-51-52-IST
> {code}
> This results in the following exception.
> {code}
> 2020-04-11 21:54:24,201 [pool-360-thread-4] ERROR 
> loadgenerators.LoadExecutors (LoadExecutors.java:load(67)) - 
> FilesystemLoadGenerator LOADGEN: Exiting due to exception
> java.io.IOException: java.io.IOException: Could not determine or connect to 
> OM Leader.
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:199)
> at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
> at 
> org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.doPostOp(LoadBucket.java:176)
> at 
>