[jira] [Updated] (QPID-7192) [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE environment has locally committed transactions requiring rollback

2019-02-24 Thread Alex Rudyy (JIRA)


 [ 
https://issues.apache.org/jira/browse/QPID-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rudyy updated QPID-7192:
-
Fix Version/s: (was: qpid-java-broker-7.1.1)
   (was: qpid-java-broker-7.0.7)

> [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE 
> environment has locally committed transactions requiring rollback
> -
>
> Key: QPID-7192
> URL: https://issues.apache.org/jira/browse/QPID-7192
> Project: Qpid
>  Issue Type: Bug
>  Components: Broker-J
>Affects Versions: qpid-java-6.0, qpid-java-6.1, qpid-java-broker-7.1.0, 
> qpid-java-broker-7.0.6
>Reporter: Alex Rudyy
>Priority: Major
> Attachments: QPID-7192-wip.diff
>
>
> Start-up of BDB HA VHN fails in the following scenario:
> 1) When Transaction on Master node is in process of commit  and the replicas 
> BDB HA VHNs are stopped at the same time, the transaction is aborted with 
> InsufficientReplicasException
> 2) BDB HA VHN for replicas are restarted and the Replica environment detects 
> a transaction requiring rollback from previous commit. The rollback 
> transaction causes restart of JE Environment.
> 3) BDB HA VHN for impacted replica continues activation and performs an 
> intruder protection checks but Environment is restarting which results in 
> exception "ConnectionScopedRuntimeException: Environment is restarting". The 
> exception puts BDB HA VHN into ERRORED state. JE environment re-join the 
> group successfully but  BDB HA VHN is not aware about it, as 
> StateChangeListener is not set and BDB HA VHN does not receive notifications 
> about state transitions
> A manual operator intervention is required to recover from the issue: BDB HA 
> VHN needs to be started up again by invoking state change operation from Web 
> Management Console or REST API.
>  
> Here are JE Exceptions reported on Replica side on transaction rollback:
> {noformat}
> BROKER-20 WARN  [DETACHED 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] 
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade 
> test:nodetestInFlightTransactionsWhilstMajorityIsLost10004 has transaction(s) 
> ahead of the current master. These must be discarded to allow this node to 
> rejoin the group. This condition is normally caused by the use of weak 
> durability options.
> BROKER-20 DEBUG [DETACHED 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] 
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Environment restarting due to 
> exception (JE 5.0.104) 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  Node 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  must rollback 3 commits to the earliest point indicated by transaction 
> id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to 
> rejoin the replication group. All existing ReplicatedEnvironment handles must 
> be closed and reinstantiated.  Log files were truncated to file 0x0, offset 
> 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or 
> abort. Must run recovery by re-opening Environment handles Environment is 
> invalid and must be closed.
> com.sleepycat.je.rep.RollbackException: (JE 5.0.104) 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  Node 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  must rollback 3 commits to the earliest point indicated by transaction 
> id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to 
> rejoin the replication group. All existing ReplicatedEnvironment handles must 
> be closed and reinstantiated.  Log files were truncated to file 0x0, offset 
> 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or 
> abort. Must run recovery by re-opening Environment handles Environment is 
> invalid and must be closed.
> at 
> com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupHardRecovery(ReplicaFeederSyncup.java:650)
>  ~[je-5.0.104.jar:5.0.104]
> at 
> com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:341)
>  ~[je-5.0.104.jar:5.0.104]
> 

[jira] [Updated] (QPID-7192) [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE environment has locally committed transactions requiring rollback

2019-02-04 Thread Alex Rudyy (JIRA)


 [ 
https://issues.apache.org/jira/browse/QPID-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rudyy updated QPID-7192:
-
Summary: [Broker-J] BDB HA Virtual Host Node does not restart successfully 
if JE environment has locally committed transactions requiring rollback  (was: 
[Java Broker] BDB HA Virtual Host Node does not restart successfully if JE 
environment has locally committed transactions requiring rollback)

> [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE 
> environment has locally committed transactions requiring rollback
> -
>
> Key: QPID-7192
> URL: https://issues.apache.org/jira/browse/QPID-7192
> Project: Qpid
>  Issue Type: Bug
>  Components: Broker-J
>Affects Versions: qpid-java-6.0, qpid-java-6.1, qpid-java-broker-7.1.0, 
> qpid-java-broker-7.0.6
>Reporter: Alex Rudyy
>Priority: Major
> Fix For: qpid-java-broker-7.0.7, qpid-java-broker-7.1.1
>
> Attachments: QPID-7192-wip.diff
>
>
> Start-up of BDB HA VHN fails in the following scenario:
> 1) When Transaction on Master node is in process of commit  and the replicas 
> BDB HA VHNs are stopped at the same time, the transaction is aborted with 
> InsufficientReplicasException
> 2) BDB HA VHN for replicas are restarted and the Replica environment detects 
> a transaction requiring rollback from previous commit. The rollback 
> transaction causes restart of JE Environment.
> 3) BDB HA VHN for impacted replica continues activation and performs an 
> intruder protection checks but Environment is restarting which results in 
> exception "ConnectionScopedRuntimeException: Environment is restarting". The 
> exception puts BDB HA VHN into ERRORED state. JE environment re-join the 
> group successfully but  BDB HA VHN is not aware about it, as 
> StateChangeListener is not set and BDB HA VHN does not receive notifications 
> about state transitions
> A manual operator intervention is required to recover from the issue: BDB HA 
> VHN needs to be started up again by invoking state change operation from Web 
> Management Console or REST API.
>  
> Here are JE Exceptions reported on Replica side on transaction rollback:
> {noformat}
> BROKER-20 WARN  [DETACHED 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] 
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade 
> test:nodetestInFlightTransactionsWhilstMajorityIsLost10004 has transaction(s) 
> ahead of the current master. These must be discarded to allow this node to 
> rejoin the group. This condition is normally caused by the use of weak 
> durability options.
> BROKER-20 DEBUG [DETACHED 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] 
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Environment restarting due to 
> exception (JE 5.0.104) 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  Node 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  must rollback 3 commits to the earliest point indicated by transaction 
> id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to 
> rejoin the replication group. All existing ReplicatedEnvironment handles must 
> be closed and reinstantiated.  Log files were truncated to file 0x0, offset 
> 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or 
> abort. Must run recovery by re-opening Environment handles Environment is 
> invalid and must be closed.
> com.sleepycat.je.rep.RollbackException: (JE 5.0.104) 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  Node 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  must rollback 3 commits to the earliest point indicated by transaction 
> id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to 
> rejoin the replication group. All existing ReplicatedEnvironment handles must 
> be closed and reinstantiated.  Log files were truncated to file 0x0, offset 
> 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or 
> abort. Must run recovery by re-opening Environment handles Environment is 
> invalid and must be closed.
> at 
>