I got to the bottom of it, this CRC error happens when the preAllocSize of the transaction logs is smaller than the size of the ZK nodes saved.
I tried with preAllocSize=8mb , saving a couple nodes of 15mb is enough to reproduce the error. On Friday, August 14, 2015, Benjamin Jaton <[email protected]> wrote: > OK, so those transactions are probably noise then. > > I changed the LogFormatter to not stop when it detects CRC errors so that > I could see what nodes were in error. > It turns out that those errors appear on larger nodes (>5MB). > We do override jute.maxbuffer to more than the default 1MB. > > Could there be some unexpected behavior with larger nodes? > > On Fri, Aug 14, 2015 at 11:40 AM, Flavio Junqueira < > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Note that the zxid is different in each of the lines. >> >> -Flavio >> >> > On 14 Aug 2015, at 19:27, Benjamin Jaton <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> > >> > I see a lot of those: >> > >> > *8/11/15 11:57:57 AM PDT session 0x14efe8c1ba2000b cxid 0x1ba004 zxid >> > 0x78b5a setData '/path/to/node,(....),60221* >> > immediately followed by >> > *8/11/15 11:57:57 AM PDT session 0x14efe8c1ba2000b cxid 0x1ba00a zxid >> > 0x78b5b error -110* >> > >> > Is -110 a NodeExists exception? A setData should not be in error if the >> > node already exist no? >> > >> > >> > On Fri, Aug 14, 2015 at 9:50 AM, Benjamin Jaton < >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> > wrote: >> > >> >> This is what I have when I run the LogFormatter on the last tlog: >> >> >> >> *Exception in thread "main" java.io.IOException: CRC doesn't match >> >> 1863394799 vs 480060806* >> >> >> >> Do you have any idea on what could cause this to happen? >> >> >> >> On Thu, Aug 13, 2015 at 4:16 PM, Flavio Junqueira <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >> >> >>> Hi Benjamin, >> >>> >> >>> Have you tried LogFormatter to check the file? I guess it won't work >> if >> >>> your file is really corrupt, but it might be worth a shot. >> Unfortunately, >> >>> I'm not aware of a way around it other than deleting the file and and >> >>> losing at least part of the transactions in the log. >> >>> >> >>> -Flavio >> >>> >> >>>> On 13 Aug 2015, at 17:27, Benjamin Jaton <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> >>> wrote: >> >>>> >> >>>> Hello, >> >>>> >> >>>> Does anybody know how those "CRC check failed" errors can occur? >> >>>> >> >>>> Apparently the data has been corrupted, what would be the most likely >> >>>> scenario for this to happen? >> >>>> >> >>>> java.io.IOException: CRC check failed >> >>>> at >> >>>> >> >>> >> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:612) >> >>>> >> >>>> at >> >>>> >> >>> >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) >> >>>> >> >>>> at >> >>> >> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) >> >>>> at >> >>>> >> >>> >> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272) >> >>>> >> >>>> at >> >>>> >> >>> >> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399) >> >>>> >> >>>> at >> >>>> >> >>> >> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) >> >>>> >> >>>> at >> >>>> >> >>> >> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113) >> >>>> >> >>>> at >> >>>> >> >>> >> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) >> >>>> >> >>>> at >> >>>> >> >>> >> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) >> >>>> >> >>>> >> >>>> As a result my ZK can't be started, is it recoverable? >> >>>> >> >>>> Thanks, >> >>>> Ben >> >>>> >> >>>> ---------- Forwarded message ---------- >> >>>> From: Apache Jenkins Server <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> >>>> Date: Sat, Apr 26, 2014 at 11:35 PM >> >>>> Subject: ZooKeeper_branch33_solaris - Build # 867 - Still Failing >> >>>> To: [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> >>>> >> >>>> >> >>>> See https://builds.apache.org/job/ZooKeeper_branch33_solaris/867/ >> >>>> >> >>>> >> >>> >> ################################################################################### >> >>>> ########################## LAST 60 LINES OF THE CONSOLE >> >>>> ########################### >> >>>> [...truncated 301 lines...] >> >>>> [junit] java.io.IOException: CRC check failed >> >>>> [junit] at >> >>>> >> >>> >> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:611) >> >>>> [junit] at >> >>>> org.apache.zookeeper.server.CRCTest.testChecksums(CRCTest.java:165) >> >>>> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> >>>> Method) >> >>>> [junit] at >> >>>> >> >>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> >>>> [junit] at >> >>>> >> >>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> >>>> [junit] at java.lang.reflect.Method.invoke(Method.java:597) >> >>>> [junit] at junit.framework.TestCase.runTest(TestCase.java:168) >> >>>> [junit] at junit.framework.TestCase.runBare(TestCase.java:134) >> >>>> [junit] at >> >>> junit.framework.TestResult$1.protect(TestResult.java:110) >> >>>> [junit] at >> >>>> junit.framework.TestResult.runProtected(TestResult.java:128) >> >>>> [junit] at junit.framework.TestResult.run(TestResult.java:113) >> >>>> [junit] at junit.framework.TestCase.run(TestCase.java:124) >> >>>> [junit] at >> junit.framework.TestSuite.runTest(TestSuite.java:232) >> >>>> [junit] at junit.framework.TestSuite.run(TestSuite.java:227) >> >>>> [junit] at >> >>>> >> >>> >> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) >> >>>> [junit] at >> >>>> junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) >> >>>> [junit] at >> >>>> >> >>> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421) >> >>>> [junit] at >> >>>> >> >>> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912) >> >>>> [junit] at >> >>>> >> >>> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766) >> >>>> [junit] 2014-04-27 06:35:01,089 - INFO [main:CRCTest@68] - >> FINISHED >> >>>> testChecksums >> >>>> [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 54.234 >> >>> sec >> >>>> [junit] java.io.FileNotFoundException: >> >>>> >> >>> >> /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/junitvmwatcher1383325470.properties >> >>>> (No such file or directory) >> >>>> [junit] at java.io.FileInputStream.open(Native Method) >> >>>> [junit] at >> >>> java.io.FileInputStream.<init>(FileInputStream.java:120) >> >>>> [junit] at java.io.FileReader.<init>(FileReader.java:55) >> >>>> [junit] at >> >>>> >> >>> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeAsForked(JUnitTask.java:1028) >> >>>> [junit] at >> >>>> >> >>> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:817) >> >>>> [junit] at >> >>>> >> >>> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeOrQueue(JUnitTask.java:1657) >> >>>> [junit] at >> >>>> >> >>> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:764) >> >>>> [junit] at >> >>>> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) >> >>>> [junit] at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown >> >>>> Source) >> >>>> [junit] at >> >>>> >> >>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> >>>> [junit] at java.lang.reflect.Method.invoke(Method.java:597) >> >>>> [junit] at >> >>>> >> >>> >> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105) >> >>>> [junit] at org.apache.tools.ant.Task.perform(Task.java:348) >> >>>> [junit] at org.apache.tools.ant.Target.execute(Target.java:357) >> >>>> [junit] at >> >>> org.apache.tools.ant.Target.performTasks(Target.java:385) >> >>>> [junit] at >> >>>> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329) >> >>>> [junit] at >> >>>> org.apache.tools.ant.Project.executeTarget(Project.java:1298) >> >>>> [junit] at >> >>>> >> >>> >> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41) >> >>>> [junit] at >> >>>> org.apache.tools.ant.Project.executeTargets(Project.java:1181) >> >>>> [junit] at org.apache.tools.ant.Main.runBuild(Main.java:698) >> >>>> [junit] at org.apache.tools.ant.Main.startAnt(Main.java:199) >> >>>> [junit] at >> >>>> org.apache.tools.ant.launch.Launcher.run(Launcher.java:257) >> >>>> [junit] at >> >>>> org.apache.tools.ant.launch.Launcher.main(Launcher.java:104) >> >>>> [junit] Running org.apache.zookeeper.server.DataTreeUnitTest >> >>>> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec >> >>>> >> >>>> BUILD FAILED >> >>>> >> >>> >> /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build.xml:814: >> >>>> Process fork failed. >> >>>> >> >>>> Total time: 2 minutes 54 seconds >> >>>> Build step 'Invoke Ant' marked build as failure >> >>>> [locks-and-latches] Releasing all the locks >> >>>> [locks-and-latches] All the locks released >> >>>> Recording test results >> >>>> Email was triggered for: Failure >> >>>> Sending email for trigger: Failure >> >>>> >> >>>> >> >>>> >> >>>> >> >>> >> ################################################################################### >> >>>> ############################## FAILED TESTS (if any) >> >>>> ############################## >> >>>> 1 tests failed. >> >>>> FAILED: org.apache.zookeeper.server.DataTreeUnitTest.unknown >> >>>> >> >>>> Error Message: >> >>>> Forked Java VM exited abnormally. Please note the time in the report >> >>> does >> >>>> not reflect the time until the VM exit. >> >>>> >> >>>> Stack Trace: >> >>>> junit.framework.AssertionFailedError: Forked Java VM exited >> abnormally. >> >>>> Please note the time in the report does not reflect the time until >> the >> >>> VM >> >>>> exit. >> >>> >> >>> >> >> >> >> >
