[jira] [Updated] (HDDS-3379) Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster
[ https://issues.apache.org/jira/browse/HDDS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek updated HDDS-3379: -- Target Version/s: 0.7.0 (was: 0.6.0) > Clients unable to failover after the OzoneManager leader is restart in > MiniOzoneChaosCluster > > > Key: HDDS-3379 > URL: https://issues.apache.org/jira/browse/HDDS-3379 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster, TriagePending > > Clients unable to failover after the OzoneManager leader is restart in > MiniOzoneChaosCluster. > This happens after the following restart events. > {code} > ➜ chaos-2020-04-11-21-51-52-IST egrep "iniOzoneHAClusterImp|Failures" > complete.log > 2020-04-11 21:52:08,296 > [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO > ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC > server at localhost/127.0.0.1:10804 > 2020-04-11 21:52:08,387 > [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO > ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC > server at localhost/127.0.0.1:10810 > 2020-04-11 21:52:08,485 > [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO > ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC > server at localhost/127.0.0.1:10816 > 2020-04-11 21:52:22,845 > [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO > failure.Failures (FailureManager.java:start(66)) - starting failure manager > 60 60 SECONDS > 2020-04-11 21:53:22,850 [pool-59-thread-1] INFO failure.Failures > (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure > 2020-04-11 21:53:22,853 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down > OzoneManager omNode-3 > 2020-04-11 21:53:22,988 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting > OzoneManager omNode-3 > at > org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229) > at > org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223) > at > org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101) > at > org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98) > 2020-04-11 21:54:22,849 [pool-59-thread-1] INFO failure.Failures > (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure > 2020-04-11 21:54:22,850 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down > OzoneManager omNode-1 > 2020-04-11 21:54:22,895 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting > OzoneManager omNode-1 > at > org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229) > at > org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223) > at > org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101) > at > org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98) > ➜ chaos-2020-04-11-21-51-52-IST > {code} > This results in the following exception. > {code} > 2020-04-11 21:54:24,201 [pool-360-thread-4] ERROR > loadgenerators.LoadExecutors (LoadExecutors.java:load(67)) - > FilesystemLoadGenerator LOADGEN: Exiting due to exception > java.io.IOException: java.io.IOException: Could not determine or connect to > OM Leader. > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:199) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at java.io.FilterOutputStream.write(FilterOutputStream.java:97) > at > org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.doPostOp(LoadBucket.java:176) > at > org.apache.hadoop.ozone.utils.LoadBucket$Op.execute(LoadBucket.java:132) > at >
[jira] [Updated] (HDDS-3379) Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster
[ https://issues.apache.org/jira/browse/HDDS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-3379: Target Version/s: 0.6.0 Labels: MiniOzoneChaosCluster TriagePending (was: MiniOzoneChaosCluster) > Clients unable to failover after the OzoneManager leader is restart in > MiniOzoneChaosCluster > > > Key: HDDS-3379 > URL: https://issues.apache.org/jira/browse/HDDS-3379 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster, TriagePending > > Clients unable to failover after the OzoneManager leader is restart in > MiniOzoneChaosCluster. > This happens after the following restart events. > {code} > ➜ chaos-2020-04-11-21-51-52-IST egrep "iniOzoneHAClusterImp|Failures" > complete.log > 2020-04-11 21:52:08,296 > [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO > ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC > server at localhost/127.0.0.1:10804 > 2020-04-11 21:52:08,387 > [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO > ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC > server at localhost/127.0.0.1:10810 > 2020-04-11 21:52:08,485 > [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO > ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC > server at localhost/127.0.0.1:10816 > 2020-04-11 21:52:22,845 > [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO > failure.Failures (FailureManager.java:start(66)) - starting failure manager > 60 60 SECONDS > 2020-04-11 21:53:22,850 [pool-59-thread-1] INFO failure.Failures > (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure > 2020-04-11 21:53:22,853 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down > OzoneManager omNode-3 > 2020-04-11 21:53:22,988 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting > OzoneManager omNode-3 > at > org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229) > at > org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223) > at > org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101) > at > org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98) > 2020-04-11 21:54:22,849 [pool-59-thread-1] INFO failure.Failures > (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure > 2020-04-11 21:54:22,850 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down > OzoneManager omNode-1 > 2020-04-11 21:54:22,895 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl > (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting > OzoneManager omNode-1 > at > org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229) > at > org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223) > at > org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101) > at > org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98) > ➜ chaos-2020-04-11-21-51-52-IST > {code} > This results in the following exception. > {code} > 2020-04-11 21:54:24,201 [pool-360-thread-4] ERROR > loadgenerators.LoadExecutors (LoadExecutors.java:load(67)) - > FilesystemLoadGenerator LOADGEN: Exiting due to exception > java.io.IOException: java.io.IOException: Could not determine or connect to > OM Leader. > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:199) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at java.io.FilterOutputStream.write(FilterOutputStream.java:97) > at > org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.doPostOp(LoadBucket.java:176) > at >