Hi. Yesterday, our ZooKeeper instances failed to recover and we saw this error in the logs:
2021-12-14 03:01:43,296 [myid:2] - ERROR [main:QuorumPeer@1139] - Unable to load database on disk java.io.IOException: Unreasonable length = 2186882 at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:166) at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:127) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:159) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:768) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:352) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:258) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:303) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1093) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1078) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90) 2021-12-14 03:01:43,304 [myid:2] - INFO [main:AbstractConnector@380] - Stopped ServerConnector@7526515b{HTTP/1.1,[http/1.1]}{0.0.0.0:9141} 2021-12-14 03:01:43,305 [myid:2] - INFO [main:ContextHandler@1016] - Stopped o.e.j.s.ServletContextHandler@2f953efd{/,null,UNAVAILABLE} 2021-12-14 03:01:43,306 [myid:2] - ERROR [main:QuorumPeerMain@113] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1140) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1078) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90) Caused by: java.io.IOException: Unreasonable length = 2186882 at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:166) at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:127) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:159) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:768) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:352) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:258) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:303) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1093) ... 4 more The hard disk was at 50% capacity when this failure occurred. Any ideas what would cause this kind of unrecoverable error? [DrFirst Logo] Joe Marrero Senior Software Engineer jmarr...@drfirst.com<mailto:jdel...@drfirst.com> DrFirst.com<http://www.drfirst.com/> | Twitter<https://twitter.com/drfirst> | Facebook<http://www.facebook.com/DrFirstInc> | LinkedIn<https://www.linkedin.com/company/drfirst/> [HIMSS Banner] Notice of Confidentiality: The information included and/or attached in this electronic mail transmission may contain confidential or privileged information and is intended for the addressee. Any unauthorized disclosure, reproduction, distribution or the taking of action in reliance on the contents of the information is prohibited. If you believe that you have received the message in error, please notify the sender by reply transmission and delete the message without copying or disclosing it.