RE: Node corruption using Jackrabbit 1.3.1?

sbarriba Fri, 17 Aug 2007 07:15:16 -0700

Hi Stefan et al,
We no longer think it's a 1.3.1 issue as we've run the consistency checker
on other environments which were managed using 1.3 and found a couple of
consistency problems reported.


We've got a 'semi-reproducable' test case which causes corruption.
Test case:
 - no clustering disabled
 - a single model 2 JackRabbit resource
 - one node running only

On start-up the application is automatically seeding a particular workspace
by importing an existing sysview XML file.
The import works up until the point we get a "invalidated item" message
(shown below).

......
17 Aug 2007 14:41:42,927 DEBUG org.apache.jackrabbit.core.ItemManager  -
created item
f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
tmodel}objectIdentity
17 Aug 2007 14:41:42,928 DEBUG org.apache.jackrabbit.core.ItemManager  -
caching item
f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
tmodel}objectIdentity
17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager  -
invalidated item a55f3f6b-a909-4e8d-b65a-93002ced0920
17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager  -
removing item a55f3f6b-a909-4e8d-b65a-93002ced0920 from cache
17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemImpl  -
/home/miqsys/Security/AclObjectIdentities/1: unable to update item.
17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemManager  -
invalidated item
f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
tmodel}objectIdentity
.....

Any nodes attempting to read the offending node gets:
17 Aug 2007 14:41:43,559 DEBUG
org.apache.jackrabbit.core.HierarchyManagerImpl  - failed to resolve name of
a55f3f6b-a909-4e8d-b65a-93002ced0920

Any nodes attempting to update the offending node gets:

unable to update item.: a55f3f6b-a909-4e8d-b65a-93002ced0920:
a55f3f6b-a909-4e8d-b65a-93002ced0920
        at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1222)

Any clues on what causes the "invalidated item" message?

Regards,
Shaun.





-----Original Message-----
From: Stefan Guggisberg [mailto:[EMAIL PROTECTED] 
Sent: 17 August 2007 14:04
To: [email protected]
Subject: Re: Node corruption using Jackrabbit 1.3.1?

hi shaun,

are you sure that this is a 1.3.1 specific issue?

i remember an earlier post were you described the same problem,
but apparently you weren't using 1.3.1:
http://www.nabble.com/Strange-%22ignoring-nonexistent-item%22-and-removeitem
-fails-tf4169086.html

On 8/17/07, sbarriba <[EMAIL PROTECTED]> wrote:
> Hi Stefan et al,
> Further update on this, plus some answers to your questions.
>
> The consistency check and fix logic in JackRabbit 1.3.1 solved all but 1
of
> the issues. However although the log reports the remaining issue has been
> fixed each time, this message appears after repeated restarts :(
>
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: checked 1000/0 bundles...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> NodeState fe75116c-5617-423b-8c9a-4a964b667f20 references unexistent child
> {http://www.acme.co.uk/xmlns/contentmodel}components with id
> d3c09b52-d3be-4d3c-8807-b7827d337973
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: checked 2000/0 bundles...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: Fixing 1 inconsistent bundle(s)...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: Fixing bundle fe75116c-5617-423b-8c9a-4a964b667f20
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: checked 2505/0 bundles.
>
> Is the consistency checker the only way to fix up these problems, or is
> there any way we can 'open the hood' to investigate further?

only by getting your hands really dirty and by delving deep into the code...

>
> Stefan wrote:
> "did you notice anything peculiar about the corrupt nodes? is there a
chance
> to reconstruct the steps that lead to this state?"
>
> What tools what you recommend using to review the corrupt nodes? We only
> currently use the command contrib. project.
>
> Reproducing this scenario is proving really difficult. The original
> corruption occurred when a user was creating a particularly complex node
> object which included the creation, deletion and re-ordering of various
> same-name-siblings. After multi-hours of attempts we are yet to reproduce
> the event. Frustrating, but we know its occurred at least twice.
>
> "furthermore, could you please share some details about your
> config/deployment?"
>
> Sure.
>  - JackRabbit 1.3.1
>         - MySql Bundle Persistence Manager
>         - Clustered across 2 nodes - only 1 node is read-write, the other
is
> read-only to the repos
>       - Spring used to provide a JackRabbit JCRSessionInHttpSession
pattern
> for the editors who are using a web-based UI.

i am not familiar with this. how is the repository instance
accessed/created?
can you rule out the possibility that a 3rd r/w non-cluster aware instance
is
created?

>  - MySql 5.0.45
>  - Tomcat 5.0.30
>  - Sun JDK 1.5
>  - Redhat Enterprise Linux
>
> All suggestions welcome.

hmm, just a few random guesses....

could be
- a bundle db pm-related issue
- a clustering- or clustering-config related issue
- an issue caused by multiple r/w jackrabbit instances
  accessing the same db
- a jr core issue

since this is a rather sophisticated setup it's not gonna be easy to
investigate.
however, we'd definitely need more information about the operations that
lead
to the corrupt state.

btw: please feel free to create a jira issue.

cheers
stefan

> Regards,
> Shaun
>
>
> -----Original Message-----
> From: Stefan Guggisberg [mailto:[EMAIL PROTECTED]
> Sent: 17 August 2007 11:26
> To: [email protected]
> Subject: Re: Node corruption using Jackrabbit 1.3.1?
>
> hi shaun,
>
> On 8/16/07, sbarriba <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > We upgraded to JackRabbit 1.3.1 a few days ago.
> >
> > We have since seen a couple of occasions where we've been able to get
the
> > repository in an indeterminate state. The following output shows the
state
> > of a node which has an ordered child node property called
acme:components
> > e.g.
> >
> >
> >
> > [miq:FooBar] > nt:base
> >
> >                orderable
> >
> >                + acme:components (acme:Component) multiple COPY
> >
> >
> >
> > We have an instance of FooBar where acme:components[5] has disappeared??
> >
> > e.g.
> >
> >
> >
> > name                           type            node      new
> modified
> >
> > ------------------------------ --------------- --------- ---------
> ---------
> >
> > acme:components                 acme:Section     true      false
false
> >
> > acme:components[2]              acme:Text        true      false
false
> >
> > acme:components[3]              acme:Text        true      false
false
> >
> > acme:components[4]              acme:Text        true      false
false
> >
> > acme:components[6]              acme:Section     true      false
false
> >
> > acme:components[7]              acme:Section     true      false
false
> >
> > jcr:created                    Date            false     false     false
> >
> > jcr:primaryType                Name            false     false     false
> >
> > jcr:uuid                       String          false     false     false
> >
> >
> >
> > I presume this could happen if the deletion of the child node succeeded
by
> > the saving of the parent FooBar node failed for some reason?
>
> that should be possible since the changelog of a save operation is stored
> atomically. if an error occurs during processing of the change log all
> previous changes are rolled back.
>
> >
> >
> >
> > Surely this is a state that should never happen?
>
> absolutely, and the problem you're describing is very alarming indeed!
>
> did you notice anything peculiar about the corrupt nodes? is there a
chance
> to reconstruct the steps that lead to this state?
>
> furthermore, could you please share some details about your
> config/deployment?
>
> the only possible explanation i can currently come up with is
> that there are multiple jackrabbit instances accessing the same
> database...
>
> cheers
> stefan
>
>
> >
> >
> >
> > Regards,
> >
> > Shaun
> >
> >
> >
> >
>
>

RE: Node corruption using Jackrabbit 1.3.1?

Reply via email to