[jira] [Updated] (QPID-7192) [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE environment has locally committed transactions requiring rollback
[ https://issues.apache.org/jira/browse/QPID-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rudyy updated QPID-7192: - Fix Version/s: (was: qpid-java-broker-7.1.1) (was: qpid-java-broker-7.0.7) > [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE > environment has locally committed transactions requiring rollback > - > > Key: QPID-7192 > URL: https://issues.apache.org/jira/browse/QPID-7192 > Project: Qpid > Issue Type: Bug > Components: Broker-J >Affects Versions: qpid-java-6.0, qpid-java-6.1, qpid-java-broker-7.1.0, > qpid-java-broker-7.0.6 >Reporter: Alex Rudyy >Priority: Major > Attachments: QPID-7192-wip.diff > > > Start-up of BDB HA VHN fails in the following scenario: > 1) When Transaction on Master node is in process of commit and the replicas > BDB HA VHNs are stopped at the same time, the transaction is aborted with > InsufficientReplicasException > 2) BDB HA VHN for replicas are restarted and the Replica environment detects > a transaction requiring rollback from previous commit. The rollback > transaction causes restart of JE Environment. > 3) BDB HA VHN for impacted replica continues activation and performs an > intruder protection checks but Environment is restarting which results in > exception "ConnectionScopedRuntimeException: Environment is restarting". The > exception puts BDB HA VHN into ERRORED state. JE environment re-join the > group successfully but BDB HA VHN is not aware about it, as > StateChangeListener is not set and BDB HA VHN does not receive notifications > about state transitions > A manual operator intervention is required to recover from the issue: BDB HA > VHN needs to be started up again by invoking state change operation from Web > Management Console or REST API. > > Here are JE Exceptions reported on Replica side on transaction rollback: > {noformat} > BROKER-20 WARN [DETACHED > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] > o.a.q.s.s.b.r.ReplicatedEnvironmentFacade > test:nodetestInFlightTransactionsWhilstMajorityIsLost10004 has transaction(s) > ahead of the current master. These must be discarded to allow this node to > rejoin the group. This condition is normally caused by the use of weak > durability options. > BROKER-20 DEBUG [DETACHED > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] > o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Environment restarting due to > exception (JE 5.0.104) > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > Node > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > must rollback 3 commits to the earliest point indicated by transaction > id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to > rejoin the replication group. All existing ReplicatedEnvironment handles must > be closed and reinstantiated. Log files were truncated to file 0x0, offset > 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or > abort. Must run recovery by re-opening Environment handles Environment is > invalid and must be closed. > com.sleepycat.je.rep.RollbackException: (JE 5.0.104) > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > Node > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > must rollback 3 commits to the earliest point indicated by transaction > id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to > rejoin the replication group. All existing ReplicatedEnvironment handles must > be closed and reinstantiated. Log files were truncated to file 0x0, offset > 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or > abort. Must run recovery by re-opening Environment handles Environment is > invalid and must be closed. > at > com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupHardRecovery(ReplicaFeederSyncup.java:650) > ~[je-5.0.104.jar:5.0.104] > at > com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:341) > ~[je-5.0.104.jar:5.0.104] >
[jira] [Updated] (QPID-7192) [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE environment has locally committed transactions requiring rollback
[ https://issues.apache.org/jira/browse/QPID-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rudyy updated QPID-7192: - Summary: [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE environment has locally committed transactions requiring rollback (was: [Java Broker] BDB HA Virtual Host Node does not restart successfully if JE environment has locally committed transactions requiring rollback) > [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE > environment has locally committed transactions requiring rollback > - > > Key: QPID-7192 > URL: https://issues.apache.org/jira/browse/QPID-7192 > Project: Qpid > Issue Type: Bug > Components: Broker-J >Affects Versions: qpid-java-6.0, qpid-java-6.1, qpid-java-broker-7.1.0, > qpid-java-broker-7.0.6 >Reporter: Alex Rudyy >Priority: Major > Fix For: qpid-java-broker-7.0.7, qpid-java-broker-7.1.1 > > Attachments: QPID-7192-wip.diff > > > Start-up of BDB HA VHN fails in the following scenario: > 1) When Transaction on Master node is in process of commit and the replicas > BDB HA VHNs are stopped at the same time, the transaction is aborted with > InsufficientReplicasException > 2) BDB HA VHN for replicas are restarted and the Replica environment detects > a transaction requiring rollback from previous commit. The rollback > transaction causes restart of JE Environment. > 3) BDB HA VHN for impacted replica continues activation and performs an > intruder protection checks but Environment is restarting which results in > exception "ConnectionScopedRuntimeException: Environment is restarting". The > exception puts BDB HA VHN into ERRORED state. JE environment re-join the > group successfully but BDB HA VHN is not aware about it, as > StateChangeListener is not set and BDB HA VHN does not receive notifications > about state transitions > A manual operator intervention is required to recover from the issue: BDB HA > VHN needs to be started up again by invoking state change operation from Web > Management Console or REST API. > > Here are JE Exceptions reported on Replica side on transaction rollback: > {noformat} > BROKER-20 WARN [DETACHED > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] > o.a.q.s.s.b.r.ReplicatedEnvironmentFacade > test:nodetestInFlightTransactionsWhilstMajorityIsLost10004 has transaction(s) > ahead of the current master. These must be discarded to allow this node to > rejoin the group. This condition is normally caused by the use of weak > durability options. > BROKER-20 DEBUG [DETACHED > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] > o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Environment restarting due to > exception (JE 5.0.104) > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > Node > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > must rollback 3 commits to the earliest point indicated by transaction > id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to > rejoin the replication group. All existing ReplicatedEnvironment handles must > be closed and reinstantiated. Log files were truncated to file 0x0, offset > 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or > abort. Must run recovery by re-opening Environment handles Environment is > invalid and must be closed. > com.sleepycat.je.rep.RollbackException: (JE 5.0.104) > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > Node > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > must rollback 3 commits to the earliest point indicated by transaction > id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to > rejoin the replication group. All existing ReplicatedEnvironment handles must > be closed and reinstantiated. Log files were truncated to file 0x0, offset > 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or > abort. Must run recovery by re-opening Environment handles Environment is > invalid and must be closed. > at >