[ 
https://issues.apache.org/jira/browse/OAK-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Jain updated OAK-3099:
---------------------------
    Fix Version/s: 1.0.18
                   1.3.3
                   1.2.3

> Revision GC fails when split documents with very long paths are present
> -----------------------------------------------------------------------
>
>                 Key: OAK-3099
>                 URL: https://issues.apache.org/jira/browse/OAK-3099
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mongomk
>    Affects Versions: 1.0.13
>            Reporter: Csaba Varga
>            Assignee: Amit Jain
>            Priority: Minor
>             Fix For: 1.2.3, 1.3.3, 1.0.18
>
>         Attachments: OAK-3099.patch, SplitDocumentGenerator.java
>
>
> My company is using the MongoDB microkernel with Oak, and we've noticed that 
> the daily revision GC is failing with errors like this:
> {code}
> 13.07.2015 13:06:16.261 *ERROR* [pool-7-thread-1-Maintenance 
> Queue(com/adobe/granite/maintenance/job/RevisionCleanupTask)] 
> org.apache.jackrabbit.oak.management.ManagementOperation Revision garbage 
> collection failed
> java.lang.IllegalArgumentException: 
> 13:h113f9d0fe7ac0f87fa06397c37b9ffd4b372eeb1ec93e0818bb4024a32587820
> at 
> org.apache.jackrabbit.oak.plugins.document.Revision.fromString(Revision.java:236)
> at 
> org.apache.jackrabbit.oak.plugins.document.SplitDocumentCleanUp.disconnect(SplitDocumentCleanUp.java:84)
> at 
> org.apache.jackrabbit.oak.plugins.document.SplitDocumentCleanUp.disconnect(SplitDocumentCleanUp.java:56)
> at 
> org.apache.jackrabbit.oak.plugins.document.VersionGCSupport.deleteSplitDocuments(VersionGCSupport.java:53)
> at 
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.collectSplitDocuments(VersionGarbageCollector.java:117)
> at 
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.gc(VersionGarbageCollector.java:105)
> at 
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService$2.run(DocumentNodeStoreService.java:511)
> at org.apache.jackrabbit.oak.spi.state.RevisionGC$1.call(RevisionGC.java:68)
> at org.apache.jackrabbit.oak.spi.state.RevisionGC$1.call(RevisionGC.java:64)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> I've narrowed the issue down to the disconnect(NodeDocument) method of the 
> [SplitDocumentCleanUp 
> class|https://svn.apache.org/repos/asf/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/SplitDocumentCleanUp.java].
>  The method always tries to extract the path of the node from its ID, but 
> this won't work for documents whose path is very long because those documents 
> will have the hash of their path in the ID.
> I believe this code should fix the issue, but I haven't had a chance to 
> actually try it:
> {code}
>     private void disconnect(NodeDocument splitDoc) {
>         String mainId = Utils.getIdFromPath(splitDoc.getMainPath());
>         NodeDocument doc = store.find(NODES, mainId);
>         if (doc == null) {
>             LOG.warn("Main document {} already removed. Split document is {}",
>                     mainId, splitId);
>             return;
>         }
>         String path = splitDoc.getPath();
>         int slashIdx = path.lastIndexOf('/');
>         int height = Integer.parseInt(path.substring(slashIdx + 1));
>         Revision rev = Revision.fromString(
>                 path.substring(path.lastIndexOf('/', slashIdx - 1) + 1, 
> slashIdx));
>         doc = doc.findPrevReferencingDoc(rev, height);
>         if (doc == null) {
>             LOG.warn("Split document {} not referenced anymore. Main document 
> is {}",
>                     splitId, mainId);
>             return;
>         }
>         // remove reference
>         if (doc.getSplitDocType() == INTERMEDIATE) {
>             disconnectFromIntermediate(doc, rev);
>         } else {
>             markStaleOnMain(doc, rev, height);
>         }
>     }
> {code}
> By using getPath(), the code should automatically use either the ID or the 
> _path property, whichever is right for the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to