Stefan Egli created OAK-2359:
--------------------------------

             Summary: diffImpl is inefficient when there are many split 
documents
                 Key: OAK-2359
                 URL: https://issues.apache.org/jira/browse/OAK-2359
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: core
    Affects Versions: 1.0.8
         Environment: 1.0.8.r1644758
            Reporter: Stefan Egli
            Priority: Critical


As reported in OAK-2358 there is a potential problem with revisionGC not 
cleaning up split documents properly (in 1.0.8.r1644758 at least). 

As a side-effect, having many garbage-revisions renders the diffImpl algorithm 
to become very slow - normally it would take only a few millis, but with nodes 
that have many split-documents I can see diffImpl take hundres of millis, 
sometimes up to a few seconds. Which causes the observation dequeuing to be 
slower than the rate in which observation events are enqueued, which results in 
observation queue never being cleaned up and event handling being delayed more 
and more.

Adding some logging showed that diffImpl would often read many split-documents, 
which supports the assumption that the revisionGC not cleaning up revisions has 
the diffImpl-slowness as a side-effect. Having said that - diffImpl should 
probably still be able to run fast, since all the revisions it should look at 
should be in the main document, not in split documents.

I dont have a test case handy for this at the moment unfortunately - if more is 
coming up, I'll add more details here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to