Hi Stefan et al,
We no longer think it's a 1.3.1 issue as we've run the consistency checker
on other environments which were managed using 1.3 and found a couple of
consistency problems reported.
We've got a 'semi-reproducable' test case which causes corruption.
Test case:
- no clustering disabled
- a single model 2 JackRabbit resource
- one node running only
On start-up the application is automatically seeding a particular workspace
by importing an existing sysview XML file.
The import works up until the point we get a "invalidated item" message
(shown below).
......
17 Aug 2007 14:41:42,927 DEBUG org.apache.jackrabbit.core.ItemManager -
created item
f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
tmodel}objectIdentity
17 Aug 2007 14:41:42,928 DEBUG org.apache.jackrabbit.core.ItemManager -
caching item
f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
tmodel}objectIdentity
17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager -
invalidated item a55f3f6b-a909-4e8d-b65a-93002ced0920
17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager -
removing item a55f3f6b-a909-4e8d-b65a-93002ced0920 from cache
17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemImpl -
/home/miqsys/Security/AclObjectIdentities/1: unable to update item.
17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemManager -
invalidated item
f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
tmodel}objectIdentity
.....
Any nodes attempting to read the offending node gets:
17 Aug 2007 14:41:43,559 DEBUG
org.apache.jackrabbit.core.HierarchyManagerImpl - failed to resolve name of
a55f3f6b-a909-4e8d-b65a-93002ced0920
Any nodes attempting to update the offending node gets:
unable to update item.: a55f3f6b-a909-4e8d-b65a-93002ced0920:
a55f3f6b-a909-4e8d-b65a-93002ced0920
at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1222)
Any clues on what causes the "invalidated item" message?
Regards,
Shaun.
-----Original Message-----
From: Stefan Guggisberg [mailto:[EMAIL PROTECTED]
Sent: 17 August 2007 14:04
To: [email protected]
Subject: Re: Node corruption using Jackrabbit 1.3.1?
hi shaun,
are you sure that this is a 1.3.1 specific issue?
i remember an earlier post were you described the same problem,
but apparently you weren't using 1.3.1:
http://www.nabble.com/Strange-%22ignoring-nonexistent-item%22-and-removeitem
-fails-tf4169086.html
On 8/17/07, sbarriba <[EMAIL PROTECTED]> wrote:
> Hi Stefan et al,
> Further update on this, plus some answers to your questions.
>
> The consistency check and fix logic in JackRabbit 1.3.1 solved all but 1
of
> the issues. However although the log reports the remaining issue has been
> fixed each time, this message appears after repeated restarts :(
>
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: checked 1000/0 bundles...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> NodeState fe75116c-5617-423b-8c9a-4a964b667f20 references unexistent child
> {http://www.acme.co.uk/xmlns/contentmodel}components with id
> d3c09b52-d3be-4d3c-8807-b7827d337973
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: checked 2000/0 bundles...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: Fixing 1 inconsistent bundle(s)...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: Fixing bundle fe75116c-5617-423b-8c9a-4a964b667f20
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: checked 2505/0 bundles.
>
> Is the consistency checker the only way to fix up these problems, or is
> there any way we can 'open the hood' to investigate further?
only by getting your hands really dirty and by delving deep into the code...
>
> Stefan wrote:
> "did you notice anything peculiar about the corrupt nodes? is there a
chance
> to reconstruct the steps that lead to this state?"
>
> What tools what you recommend using to review the corrupt nodes? We only
> currently use the command contrib. project.
>
> Reproducing this scenario is proving really difficult. The original
> corruption occurred when a user was creating a particularly complex node
> object which included the creation, deletion and re-ordering of various
> same-name-siblings. After multi-hours of attempts we are yet to reproduce
> the event. Frustrating, but we know its occurred at least twice.
>
> "furthermore, could you please share some details about your
> config/deployment?"
>
> Sure.
> - JackRabbit 1.3.1
> - MySql Bundle Persistence Manager
> - Clustered across 2 nodes - only 1 node is read-write, the other
is
> read-only to the repos
> - Spring used to provide a JackRabbit JCRSessionInHttpSession
pattern
> for the editors who are using a web-based UI.
i am not familiar with this. how is the repository instance
accessed/created?
can you rule out the possibility that a 3rd r/w non-cluster aware instance
is
created?
> - MySql 5.0.45
> - Tomcat 5.0.30
> - Sun JDK 1.5
> - Redhat Enterprise Linux
>
> All suggestions welcome.
hmm, just a few random guesses....
could be
- a bundle db pm-related issue
- a clustering- or clustering-config related issue
- an issue caused by multiple r/w jackrabbit instances
accessing the same db
- a jr core issue
since this is a rather sophisticated setup it's not gonna be easy to
investigate.
however, we'd definitely need more information about the operations that
lead
to the corrupt state.
btw: please feel free to create a jira issue.
cheers
stefan
> Regards,
> Shaun
>
>
> -----Original Message-----
> From: Stefan Guggisberg [mailto:[EMAIL PROTECTED]
> Sent: 17 August 2007 11:26
> To: [email protected]
> Subject: Re: Node corruption using Jackrabbit 1.3.1?
>
> hi shaun,
>
> On 8/16/07, sbarriba <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > We upgraded to JackRabbit 1.3.1 a few days ago.
> >
> > We have since seen a couple of occasions where we've been able to get
the
> > repository in an indeterminate state. The following output shows the
state
> > of a node which has an ordered child node property called
acme:components
> > e.g.
> >
> >
> >
> > [miq:FooBar] > nt:base
> >
> > orderable
> >
> > + acme:components (acme:Component) multiple COPY
> >
> >
> >
> > We have an instance of FooBar where acme:components[5] has disappeared??
> >
> > e.g.
> >
> >
> >
> > name type node new
> modified
> >
> > ------------------------------ --------------- --------- ---------
> ---------
> >
> > acme:components acme:Section true false
false
> >
> > acme:components[2] acme:Text true false
false
> >
> > acme:components[3] acme:Text true false
false
> >
> > acme:components[4] acme:Text true false
false
> >
> > acme:components[6] acme:Section true false
false
> >
> > acme:components[7] acme:Section true false
false
> >
> > jcr:created Date false false false
> >
> > jcr:primaryType Name false false false
> >
> > jcr:uuid String false false false
> >
> >
> >
> > I presume this could happen if the deletion of the child node succeeded
by
> > the saving of the parent FooBar node failed for some reason?
>
> that should be possible since the changelog of a save operation is stored
> atomically. if an error occurs during processing of the change log all
> previous changes are rolled back.
>
> >
> >
> >
> > Surely this is a state that should never happen?
>
> absolutely, and the problem you're describing is very alarming indeed!
>
> did you notice anything peculiar about the corrupt nodes? is there a
chance
> to reconstruct the steps that lead to this state?
>
> furthermore, could you please share some details about your
> config/deployment?
>
> the only possible explanation i can currently come up with is
> that there are multiple jackrabbit instances accessing the same
> database...
>
> cheers
> stefan
>
>
> >
> >
> >
> > Regards,
> >
> > Shaun
> >
> >
> >
> >
>
>