Hi all, After much investigation we have a simple reproducible testcase to illustrate an example of corruption using only JackRabbit command line and JackRabbit types. Full test case given below to illustrate the problem.
Before getting into the detail, is there anything platform specific about UUID values? Our Redhat platform has a real problem with the UUID "a55f3f6b-a909-4e8d-b65a-93002ced0920"? We know: * It appears to be related to the "importxml" process and certain UUID values. * The problem only occurs on our Redhat Enterprise Linux 4 platform, not windows * We're using Java 1.5.0_12, MySql 5.0.45 * Its not related to command line API as the same occurs when run programmatically from our application * The problem originally occurred with one of our bespoke types but changing them to nt:folder still causes the problem. See attached. ### We create an empty test workspace [not connected] > startjackrabbit /var/acme/repository.xml /var/acme/repository [not logged in] > login [/] > createworkspace mmp659_test [/] > logout ### We log into the workspace and import the attached file [not logged in] > login admin admin -workspace mmp659_test [/] > importxml /tmp/Security_nt.xml [/] > save ### We prove that folder "1" has been created [/] > cd Security [/Security] > cd AclObjectIdentities [/Security/AclObjectIdentities] > ls name type node new modified ------------------------------ --------------- --------- --------- --------- 1 nt:folder true false false jcr:created Date false false false jcr:primaryType Name false false false total elapsed time: 2 ms. [/Security/AclObjectIdentities] > [/Security/AclObjectIdentities] > exit ### We restart the repository and show that folder "1" was not successfully created and AclObjectIdentities is now corrupt [not connected] > startjackrabbit /var/acme/repository.xml /var/acme/repository [not logged in] > login admin admin -workspace mmp659_test [/] > cd Security [/Security] > ls name type node new modified ------------------------------ --------------- --------- --------- --------- AclObjectIdentities nt:folder true false false jcr:created Date false false false jcr:primaryType Name false false false total [/Security] > cd AclObjectIdentities [/Security/AclObjectIdentities] > ls name type node new modified ------------------------------ --------------- --------- --------- --------- jcr:created Date false false false jcr:primaryType Name false false false total [/Security/AclObjectIdentities] > cd .. display stack trace? [y/n]n [/Security] > removeitem AclObjectIdentities an exception occurred exception: javax.jcr.ItemNotFoundException message: a55f3f6b-a909-4e8d-b65a-93002ced0920 Regards, Shaun -----Original Message----- From: sbarriba [mailto:[EMAIL PROTECTED] Sent: 17 August 2007 15:15 To: [email protected] Subject: RE: Node corruption using Jackrabbit 1.3.1? Hi Stefan et al, We no longer think it's a 1.3.1 issue as we've run the consistency checker on other environments which were managed using 1.3 and found a couple of consistency problems reported. We've got a 'semi-reproducable' test case which causes corruption. Test case: - no clustering disabled - a single model 2 JackRabbit resource - one node running only On start-up the application is automatically seeding a particular workspace by importing an existing sysview XML file. The import works up until the point we get a "invalidated item" message (shown below). ...... 17 Aug 2007 14:41:42,927 DEBUG org.apache.jackrabbit.core.ItemManager - created item f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten tmodel}objectIdentity 17 Aug 2007 14:41:42,928 DEBUG org.apache.jackrabbit.core.ItemManager - caching item f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten tmodel}objectIdentity 17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager - invalidated item a55f3f6b-a909-4e8d-b65a-93002ced0920 17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager - removing item a55f3f6b-a909-4e8d-b65a-93002ced0920 from cache 17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemImpl - /home/miqsys/Security/AclObjectIdentities/1: unable to update item. 17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemManager - invalidated item f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten tmodel}objectIdentity ..... Any nodes attempting to read the offending node gets: 17 Aug 2007 14:41:43,559 DEBUG org.apache.jackrabbit.core.HierarchyManagerImpl - failed to resolve name of a55f3f6b-a909-4e8d-b65a-93002ced0920 Any nodes attempting to update the offending node gets: unable to update item.: a55f3f6b-a909-4e8d-b65a-93002ced0920: a55f3f6b-a909-4e8d-b65a-93002ced0920 at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1222) Any clues on what causes the "invalidated item" message? Regards, Shaun. -----Original Message----- From: Stefan Guggisberg [mailto:[EMAIL PROTECTED] Sent: 17 August 2007 14:04 To: [email protected] Subject: Re: Node corruption using Jackrabbit 1.3.1? hi shaun, are you sure that this is a 1.3.1 specific issue? i remember an earlier post were you described the same problem, but apparently you weren't using 1.3.1: http://www.nabble.com/Strange-%22ignoring-nonexistent-item%22-and-removeitem -fails-tf4169086.html On 8/17/07, sbarriba <[EMAIL PROTECTED]> wrote: > Hi Stefan et al, > Further update on this, plus some answers to your questions. > > The consistency check and fix logic in JackRabbit 1.3.1 solved all but 1 of > the issues. However although the log reports the remaining issue has been > fixed each time, this message appears after repeated restarts :( > > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager - > acme: checked 1000/0 bundles... > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager - > NodeState fe75116c-5617-423b-8c9a-4a964b667f20 references unexistent child > {http://www.acme.co.uk/xmlns/contentmodel}components with id > d3c09b52-d3be-4d3c-8807-b7827d337973 > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager - > acme: checked 2000/0 bundles... > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager - > acme: Fixing 1 inconsistent bundle(s)... > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager - > acme: Fixing bundle fe75116c-5617-423b-8c9a-4a964b667f20 > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager - > acme: checked 2505/0 bundles. > > Is the consistency checker the only way to fix up these problems, or is > there any way we can 'open the hood' to investigate further? only by getting your hands really dirty and by delving deep into the code... > > Stefan wrote: > "did you notice anything peculiar about the corrupt nodes? is there a chance > to reconstruct the steps that lead to this state?" > > What tools what you recommend using to review the corrupt nodes? We only > currently use the command contrib. project. > > Reproducing this scenario is proving really difficult. The original > corruption occurred when a user was creating a particularly complex node > object which included the creation, deletion and re-ordering of various > same-name-siblings. After multi-hours of attempts we are yet to reproduce > the event. Frustrating, but we know its occurred at least twice. > > "furthermore, could you please share some details about your > config/deployment?" > > Sure. > - JackRabbit 1.3.1 > - MySql Bundle Persistence Manager > - Clustered across 2 nodes - only 1 node is read-write, the other is > read-only to the repos > - Spring used to provide a JackRabbit JCRSessionInHttpSession pattern > for the editors who are using a web-based UI. i am not familiar with this. how is the repository instance accessed/created? can you rule out the possibility that a 3rd r/w non-cluster aware instance is created? > - MySql 5.0.45 > - Tomcat 5.0.30 > - Sun JDK 1.5 > - Redhat Enterprise Linux > > All suggestions welcome. hmm, just a few random guesses.... could be - a bundle db pm-related issue - a clustering- or clustering-config related issue - an issue caused by multiple r/w jackrabbit instances accessing the same db - a jr core issue since this is a rather sophisticated setup it's not gonna be easy to investigate. however, we'd definitely need more information about the operations that lead to the corrupt state. btw: please feel free to create a jira issue. cheers stefan > Regards, > Shaun > > > -----Original Message----- > From: Stefan Guggisberg [mailto:[EMAIL PROTECTED] > Sent: 17 August 2007 11:26 > To: [email protected] > Subject: Re: Node corruption using Jackrabbit 1.3.1? > > hi shaun, > > On 8/16/07, sbarriba <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > We upgraded to JackRabbit 1.3.1 a few days ago. > > > > We have since seen a couple of occasions where we've been able to get the > > repository in an indeterminate state. The following output shows the state > > of a node which has an ordered child node property called acme:components > > e.g. > > > > > > > > [miq:FooBar] > nt:base > > > > orderable > > > > + acme:components (acme:Component) multiple COPY > > > > > > > > We have an instance of FooBar where acme:components[5] has disappeared?? > > > > e.g. > > > > > > > > name type node new > modified > > > > ------------------------------ --------------- --------- --------- > --------- > > > > acme:components acme:Section true false false > > > > acme:components[2] acme:Text true false false > > > > acme:components[3] acme:Text true false false > > > > acme:components[4] acme:Text true false false > > > > acme:components[6] acme:Section true false false > > > > acme:components[7] acme:Section true false false > > > > jcr:created Date false false false > > > > jcr:primaryType Name false false false > > > > jcr:uuid String false false false > > > > > > > > I presume this could happen if the deletion of the child node succeeded by > > the saving of the parent FooBar node failed for some reason? > > that should be possible since the changelog of a save operation is stored > atomically. if an error occurs during processing of the change log all > previous changes are rolled back. > > > > > > > > > Surely this is a state that should never happen? > > absolutely, and the problem you're describing is very alarming indeed! > > did you notice anything peculiar about the corrupt nodes? is there a chance > to reconstruct the steps that lead to this state? > > furthermore, could you please share some details about your > config/deployment? > > the only possible explanation i can currently come up with is > that there are multiple jackrabbit instances accessing the same > database... > > cheers > stefan > > > > > > > > > > Regards, > > > > Shaun > > > > > > > > > >
<?xml version="1.0" encoding="UTF-8"?> <sv:node xmlns:nt="http://www.jcp.org/jcr/nt/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:rep="internal" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:sv="http://www.jcp.org/jcr/sv/1.0" xmlns:mix="http://www.jcp.org/jcr/mix/1.0" xmlns:fn="http://www.w3.org/2004/10/xpath-functions" sv:name="Security"> <sv:property sv:name="jcr:primaryType" sv:type="Name"> <sv:value>nt:folder</sv:value> </sv:property> <sv:property sv:name="jcr:uuid" sv:type="String"> <sv:value>d638eacc-deb6-41fd-8868-804c8ecefffd</sv:value> </sv:property> <sv:property sv:name="jcr:created" sv:type="Date"> <sv:value>2007-04-24T07:03:41.678+01:00</sv:value> </sv:property> <sv:node sv:name="AclObjectIdentities"> <sv:property sv:name="jcr:primaryType" sv:type="Name"> <sv:value>nt:folder</sv:value> </sv:property> <sv:property sv:name="jcr:uuid" sv:type="String"> <sv:value>28126c3e-36a0-471d-9cdc-5ac423bac9c5</sv:value> </sv:property> <sv:property sv:name="jcr:created" sv:type="Date"> <sv:value>2007-04-24T07:03:41.693+01:00</sv:value> </sv:property> <sv:node sv:name="1"> <sv:property sv:name="jcr:primaryType" sv:type="Name"> <sv:value>nt:folder</sv:value> </sv:property> <sv:property sv:name="jcr:uuid" sv:type="String"> <sv:value>a55f3f6b-a909-4e8d-b65a-93002ced0920</sv:value> </sv:property> <sv:property sv:name="jcr:created" sv:type="Date"> <sv:value>2007-04-24T07:03:41.693+01:00</sv:value> </sv:property> </sv:node> </sv:node> </sv:node>
