Hi Bryan

I haven't seen problem reports raised against Broker-J regarding this error.

The error actually originates from the HA feature within Berkeley BDB JE.
There's a useful reply within the following thread that explains some
background about how the maximum clock delta is used.

https://community.oracle.com/thread/1027853

For this problem to appear,  I see three possibilities:

1) The NTP synchronisation between the nodes was somehow ineffective
causing a genuine skew between the nodes
2) The processing BDB JE internal
com.sleepycat.je.rep.stream.Protocol.SNTPResponse was somehow delayed
causing a spurious exception to be reported on the master.  I am wondering
whether an unfortunately timed lengthy GC pause could cause this.
3) There is some kind of previously unknown defect in BDB JE HA itself.

To help you narrow in, I'd suggest changing the logging configuration so
you retain logs for longer so if you see a reoccurrence you have more to
investigate.   I also suggest turning GC logging with timestamps, at the
JVM level.

You could also consider raising je.rep.maxClockDelta.  Broker-J doesn’t
actually change the replica consistency policy settings from their
defaults, which includes the permitted clock delta setting.  A BDB HA node
never actually reads from the replicated environment until the node becomes
master.  This means the concerns described by
https://docs.oracle.com/cd/E17277_02/html/ReplicationGuide/consistency.html
don't actually apply.   (As an aside I note that
 NoConsistencyRequiredPolicy (rather the TimeConsistencyPolicy default)
should serve Broker-J's use case just as well - but this is not something I
have tried or investigated completely).

Hope this helps.

Keith Wall.


On Mon, 16 Apr 2018 at 18:54, Bryan Dixon <br...@bldixon.net> wrote:

> We are using Broker-J 7.0.2 and just ran into a Berkeley HA Time Sync issue
> that I'm wondering if anyone else has run into.  The stackTrace is at the
> end of this post.   We are running on Windows Server 2012 R2 6.3 amd64 and
> our JDK is Oracle Corporation 1.8.0_162-b12.  We have 3 servers as part of
> our HA setup.
>
> This error occurred in our production environment which has been live for
> just a couple of weeks.  We never ran into this in our Test or Dev
> environments that have been running for a few months.   When one of our
> admins checked the clock times of all 3 servers they were completely in
> sync.  Another admin stated that the server clock times are synced with
> NTP.
> Unfortunately our log files rolled off and I don't know exactly when this
> error first occurred because the older log file are gone.
>
> 2018-04-16 04:10:57,039 ERROR [Group-Change-Learner:prodbrok
> er:prodbroker2]
> (o.a.q.s.u.ServerScopedRuntimeException) - Exception on master check
> com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must
> be
> closed, caused by: com.sleepycat.je.EnvironmentFailureException:
> Environment
> invalid because of previous exception: (JE 7.4.5)
> prodbroker2(2):D:\qpidwork\prodbroker2\config Clock delta: 8859 ms.
> between
> Feeder: prodbroker1 and this Replica exceeds max permissible delta: 2000
> ms.
> HANDSHAKE_ERROR: Error during the handshake between two nodes. Some
> validity
> or compatibility check failed, preventing further communication between the
> nodes. Environment is invalid and must be closed. Originally thrown by HA
> thread: UNKNOWN prodbroker2(2) Originally thrown by HA thread: UNKNOWN
> prodbroker2(2)
>         at
> com.sleepycat.je.EnvironmentFailureException.wrapSelf(Enviro
> nmentFailureException.java:228)
>         at
> com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(Environm
> entImpl.java:1766)
>         at
> com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1775)
>         at com.sleepycat.je.Environment.checkOpen(Environment.java:2473)
>         at com.sleepycat.je.DbInternal.checkOpen(DbInternal.java:105)
>         at
> com.sleepycat.je.rep.ReplicatedEnvironment.checkOpen(Replica
> tedEnvironment.java:1052)
>         at
> com.sleepycat.je.rep.ReplicatedEnvironment.getState(Replicat
> edEnvironment.java:764)
>         at
> org.apache.qpid.server.store.berkeleydb.replication.Replicat
> edEnvironmentFacade$RemoteNodeStateLearner.executeDatabasePi
> ngerOnNodeChangesIfMaster(ReplicatedEnvironmentFacade.java:2276)
>         at
> org.apache.qpid.server.store.berkeleydb.replication.Replicat
> edEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEn
> vironmentFacade.java:2042)
>         at
> org.apache.qpid.server.store.berkeleydb.replication.Replicat
> edEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEn
> vironmentFacade.java:2012)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
> tureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
> tureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: com.sleepycat.je.EnvironmentFailureException: Environment
> invalid
> because of previous exception: (JE 7.4.5)
> prodbroker2(2):D:\qpidwork\prodbroker2\config Clock delta: 8859 ms.
> between
> Feeder: prodbroker1 and this Replica exceeds max permissible delta: 2000
> ms.
> HANDSHAKE_ERROR: Error during the handshake between two nodes. Some
> validity
> or compatibility check failed, preventing further communication between the
> nodes. Environment is invalid and must be closed. Originally thrown by HA
> thread: UNKNOWN prodbroker2(2) Originally thrown by HA thread: UNKNOWN
> prodbroker2(2)
>         at
> com.sleepycat.je.rep.stream.ReplicaFeederHandshake.checkCloc
> kSkew(ReplicaFeederHandshake.java:424)
>         at
> com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(R
> eplicaFeederHandshake.java:261)
>         at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Repli
> ca.java:691)
>         at
> com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInterna
> l(Replica.java:474)
>         at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replic
> a.java:409)
>         at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1873)
>
>
>
> --
> Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936
> .html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
> For additional commands, e-mail: users-h...@qpid.apache.org
>
>

Reply via email to