Good catch! It sounds like there is a bug when the data in the writing of the txn log. I wonder if this happens only when the znodes being written or if it hits when there isn't enough space in the pre-allocated space. Do you want to create a jira for us to fix it?
-Flavio > On 15 Aug 2015, at 06:27, Benjamin Jaton <[email protected]> wrote: > > I got to the bottom of it, this CRC error happens when the preAllocSize of > the transaction logs is smaller than the size of the ZK nodes saved. > > I tried with preAllocSize=8mb , saving a couple nodes of 15mb is enough to > reproduce the error. > > On Friday, August 14, 2015, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > >> OK, so those transactions are probably noise then. >> >> I changed the LogFormatter to not stop when it detects CRC errors so that >> I could see what nodes were in error. >> It turns out that those errors appear on larger nodes (>5MB). >> We do override jute.maxbuffer to more than the default 1MB. >> >> Could there be some unexpected behavior with larger nodes? >> >> On Fri, Aug 14, 2015 at 11:40 AM, Flavio Junqueira < >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected] >> <mailto:[email protected]>');>> wrote: >> >>> Note that the zxid is different in each of the lines. >>> >>> -Flavio >>> >>>> On 14 Aug 2015, at 19:27, Benjamin Jaton <[email protected] >>>> <mailto:[email protected]> >>> <javascript:_e(%7B%7D,'cvml','[email protected] >>> <mailto:[email protected]>');>> wrote: >>>> >>>> I see a lot of those: >>>> >>>> *8/11/15 11:57:57 AM PDT session 0x14efe8c1ba2000b cxid 0x1ba004 zxid >>>> 0x78b5a setData '/path/to/node,(....),60221* >>>> immediately followed by >>>> *8/11/15 11:57:57 AM PDT session 0x14efe8c1ba2000b cxid 0x1ba00a zxid >>>> 0x78b5b error -110* >>>> >>>> Is -110 a NodeExists exception? A setData should not be in error if the >>>> node already exist no? >>>> >>>> >>>> On Fri, Aug 14, 2015 at 9:50 AM, Benjamin Jaton < >>> [email protected] <mailto:[email protected]> >>> <javascript:_e(%7B%7D,'cvml','[email protected] >>> <mailto:[email protected]>');>> >>>> wrote: >>>> >>>>> This is what I have when I run the LogFormatter on the last tlog: >>>>> >>>>> *Exception in thread "main" java.io.IOException: CRC doesn't match >>>>> 1863394799 vs 480060806* >>>>> >>>>> Do you have any idea on what could cause this to happen? >>>>> >>>>> On Thu, Aug 13, 2015 at 4:16 PM, Flavio Junqueira <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>>>> >>>>>> Hi Benjamin, >>>>>> >>>>>> Have you tried LogFormatter to check the file? I guess it won't work >>> if >>>>>> your file is really corrupt, but it might be worth a shot. >>> Unfortunately, >>>>>> I'm not aware of a way around it other than deleting the file and and >>>>>> losing at least part of the transactions in the log. >>>>>> >>>>>> -Flavio >>>>>> >>>>>>> On 13 Aug 2015, at 17:27, Benjamin Jaton <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>>>>> wrote: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Does anybody know how those "CRC check failed" errors can occur? >>>>>>> >>>>>>> Apparently the data has been corrupted, what would be the most likely >>>>>>> scenario for this to happen? >>>>>>> >>>>>>> java.io.IOException: CRC check failed >>>>>>> at >>>>>>> >>>>>> >>> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:612) >>>>>>> >>>>>>> at >>>>>>> >>>>>> >>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) >>>>>>> >>>>>>> at >>>>>> >>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) >>>>>>> at >>>>>>> >>>>>> >>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272) >>>>>>> >>>>>>> at >>>>>>> >>>>>> >>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399) >>>>>>> >>>>>>> at >>>>>>> >>>>>> >>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) >>>>>>> >>>>>>> at >>>>>>> >>>>>> >>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113) >>>>>>> >>>>>>> at >>>>>>> >>>>>> >>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) >>>>>>> >>>>>>> at >>>>>>> >>>>>> >>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) >>>>>>> >>>>>>> >>>>>>> As a result my ZK can't be started, is it recoverable? >>>>>>> >>>>>>> Thanks, >>>>>>> Ben >>>>>>> >>>>>>> ---------- Forwarded message ---------- >>>>>>> From: Apache Jenkins Server <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>>>>>> Date: Sat, Apr 26, 2014 at 11:35 PM >>>>>>> Subject: ZooKeeper_branch33_solaris - Build # 867 - Still Failing >>>>>>> To: [email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected] >>> <mailto:[email protected]>');> >>>>>>> >>>>>>> >>>>>>> See https://builds.apache.org/job/ZooKeeper_branch33_solaris/867/ >>>>>>> >>>>>>> >>>>>> >>> ################################################################################### >>>>>>> ########################## LAST 60 LINES OF THE CONSOLE >>>>>>> ########################### >>>>>>> [...truncated 301 lines...] >>>>>>> [junit] java.io.IOException: CRC check failed >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:611) >>>>>>> [junit] at >>>>>>> org.apache.zookeeper.server.CRCTest.testChecksums(CRCTest.java:165) >>>>>>> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>>> Method) >>>>>>> [junit] at >>>>>>> >>>>>> >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>>>> [junit] at >>>>>>> >>>>>> >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>>> [junit] at java.lang.reflect.Method.invoke(Method.java:597) >>>>>>> [junit] at junit.framework.TestCase.runTest(TestCase.java:168) >>>>>>> [junit] at junit.framework.TestCase.runBare(TestCase.java:134) >>>>>>> [junit] at >>>>>> junit.framework.TestResult$1.protect(TestResult.java:110) >>>>>>> [junit] at >>>>>>> junit.framework.TestResult.runProtected(TestResult.java:128) >>>>>>> [junit] at junit.framework.TestResult.run(TestResult.java:113) >>>>>>> [junit] at junit.framework.TestCase.run(TestCase.java:124) >>>>>>> [junit] at >>> junit.framework.TestSuite.runTest(TestSuite.java:232) >>>>>>> [junit] at junit.framework.TestSuite.run(TestSuite.java:227) >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) >>>>>>> [junit] at >>>>>>> junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421) >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912) >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766) >>>>>>> [junit] 2014-04-27 06:35:01,089 - INFO [main:CRCTest@68] - >>> FINISHED >>>>>>> testChecksums >>>>>>> [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 54.234 >>>>>> sec >>>>>>> [junit] java.io.FileNotFoundException: >>>>>>> >>>>>> >>> /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/junitvmwatcher1383325470.properties >>>>>>> (No such file or directory) >>>>>>> [junit] at java.io.FileInputStream.open(Native Method) >>>>>>> [junit] at >>>>>> java.io.FileInputStream.<init>(FileInputStream.java:120) >>>>>>> [junit] at java.io.FileReader.<init>(FileReader.java:55) >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeAsForked(JUnitTask.java:1028) >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:817) >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeOrQueue(JUnitTask.java:1657) >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:764) >>>>>>> [junit] at >>>>>>> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) >>>>>>> [junit] at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown >>>>>>> Source) >>>>>>> [junit] at >>>>>>> >>>>>> >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>>> [junit] at java.lang.reflect.Method.invoke(Method.java:597) >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105) >>>>>>> [junit] at org.apache.tools.ant.Task.perform(Task.java:348) >>>>>>> [junit] at org.apache.tools.ant.Target.execute(Target.java:357) >>>>>>> [junit] at >>>>>> org.apache.tools.ant.Target.performTasks(Target.java:385) >>>>>>> [junit] at >>>>>>> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329) >>>>>>> [junit] at >>>>>>> org.apache.tools.ant.Project.executeTarget(Project.java:1298) >>>>>>> [junit] at >>>>>>> >>>>>> >>> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41) >>>>>>> [junit] at >>>>>>> org.apache.tools.ant.Project.executeTargets(Project.java:1181) >>>>>>> [junit] at org.apache.tools.ant.Main.runBuild(Main.java:698) >>>>>>> [junit] at org.apache.tools.ant.Main.startAnt(Main.java:199) >>>>>>> [junit] at >>>>>>> org.apache.tools.ant.launch.Launcher.run(Launcher.java:257) >>>>>>> [junit] at >>>>>>> org.apache.tools.ant.launch.Launcher.main(Launcher.java:104) >>>>>>> [junit] Running org.apache.zookeeper.server.DataTreeUnitTest >>>>>>> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec >>>>>>> >>>>>>> BUILD FAILED >>>>>>> >>>>>> >>> /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build.xml:814: >>>>>>> Process fork failed. >>>>>>> >>>>>>> Total time: 2 minutes 54 seconds >>>>>>> Build step 'Invoke Ant' marked build as failure >>>>>>> [locks-and-latches] Releasing all the locks >>>>>>> [locks-and-latches] All the locks released >>>>>>> Recording test results >>>>>>> Email was triggered for: Failure >>>>>>> Sending email for trigger: Failure >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>> ################################################################################### >>>>>>> ############################## FAILED TESTS (if any) >>>>>>> ############################## >>>>>>> 1 tests failed. >>>>>>> FAILED: org.apache.zookeeper.server.DataTreeUnitTest.unknown >>>>>>> >>>>>>> Error Message: >>>>>>> Forked Java VM exited abnormally. Please note the time in the report >>>>>> does >>>>>>> not reflect the time until the VM exit. >>>>>>> >>>>>>> Stack Trace: >>>>>>> junit.framework.AssertionFailedError: Forked Java VM exited >>> abnormally. >>>>>>> Please note the time in the report does not reflect the time until >>> the >>>>>> VM >>>>>>> exit.
