[ 
https://issues.apache.org/jira/browse/OAK-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Varga updated OAK-3099:
-----------------------------
    Attachment: SplitDocumentGenerator.java

Attaching the Java code I've used to generate the problematic split documents. 
It's a simple OSGi component since that was the easiest way to introduce code 
to a "clean" AEM environment, but it may be a good starting point for a unit 
test.

To reproduce the issue, you need to set your system clock ahead at least 24 
hours after running the generator code. Revision GC doesn't try to clean up 
documents younger than 24 hours by default, so you need to manipulate the clock 
to trigger the issue quickly.

> Revision GC fails when split documents with very long paths are present
> -----------------------------------------------------------------------
>
>                 Key: OAK-3099
>                 URL: https://issues.apache.org/jira/browse/OAK-3099
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mongomk
>    Affects Versions: 1.0.13
>            Reporter: Csaba Varga
>            Priority: Minor
>         Attachments: SplitDocumentGenerator.java
>
>
> My company is using the MongoDB microkernel with Oak, and we've noticed that 
> the daily revision GC is failing with errors like this:
> {code}
> 13.07.2015 13:06:16.261 *ERROR* [pool-7-thread-1-Maintenance 
> Queue(com/adobe/granite/maintenance/job/RevisionCleanupTask)] 
> org.apache.jackrabbit.oak.management.ManagementOperation Revision garbage 
> collection failed
> java.lang.IllegalArgumentException: 
> 13:h113f9d0fe7ac0f87fa06397c37b9ffd4b372eeb1ec93e0818bb4024a32587820
> at 
> org.apache.jackrabbit.oak.plugins.document.Revision.fromString(Revision.java:236)
> at 
> org.apache.jackrabbit.oak.plugins.document.SplitDocumentCleanUp.disconnect(SplitDocumentCleanUp.java:84)
> at 
> org.apache.jackrabbit.oak.plugins.document.SplitDocumentCleanUp.disconnect(SplitDocumentCleanUp.java:56)
> at 
> org.apache.jackrabbit.oak.plugins.document.VersionGCSupport.deleteSplitDocuments(VersionGCSupport.java:53)
> at 
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.collectSplitDocuments(VersionGarbageCollector.java:117)
> at 
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.gc(VersionGarbageCollector.java:105)
> at 
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService$2.run(DocumentNodeStoreService.java:511)
> at org.apache.jackrabbit.oak.spi.state.RevisionGC$1.call(RevisionGC.java:68)
> at org.apache.jackrabbit.oak.spi.state.RevisionGC$1.call(RevisionGC.java:64)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> I've narrowed the issue down to the disconnect(NodeDocument) method of the 
> [SplitDocumentCleanUp 
> class|https://svn.apache.org/repos/asf/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/SplitDocumentCleanUp.java].
>  The method always tries to extract the path of the node from its ID, but 
> this won't work for documents whose path is very long because those documents 
> will have the hash of their path in the ID.
> I believe this code should fix the issue, but I haven't had a chance to 
> actually try it:
> {code}
>     private void disconnect(NodeDocument splitDoc) {
>         String mainId = Utils.getIdFromPath(splitDoc.getMainPath());
>         NodeDocument doc = store.find(NODES, mainId);
>         if (doc == null) {
>             LOG.warn("Main document {} already removed. Split document is {}",
>                     mainId, splitId);
>             return;
>         }
>         String path = splitDoc.getPath();
>         int slashIdx = path.lastIndexOf('/');
>         int height = Integer.parseInt(path.substring(slashIdx + 1));
>         Revision rev = Revision.fromString(
>                 path.substring(path.lastIndexOf('/', slashIdx - 1) + 1, 
> slashIdx));
>         doc = doc.findPrevReferencingDoc(rev, height);
>         if (doc == null) {
>             LOG.warn("Split document {} not referenced anymore. Main document 
> is {}",
>                     splitId, mainId);
>             return;
>         }
>         // remove reference
>         if (doc.getSplitDocType() == INTERMEDIATE) {
>             disconnectFromIntermediate(doc, rev);
>         } else {
>             markStaleOnMain(doc, rev, height);
>         }
>     }
> {code}
> By using getPath(), the code should automatically use either the ID or the 
> _path property, whichever is right for the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to