Joerg Hoh created OAK-12116:
-------------------------------

             Summary: Large synonym list delays commits to the repository
                 Key: OAK-12116
                 URL: https://issues.apache.org/jira/browse/OAK-12116
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: indexing
    Affects Versions: 1.88.0
            Reporter: Joerg Hoh


I came across a situation, where session.save() was very slow and blocking 
other requests. The threaddump looked like this:

{noformat}
        at 
org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.loadSynonyms(FSTSynonymFilterFactory.java:142)
        at 
org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.inform(FSTSynonymFilterFactory.java:112)
        at 
org.apache.lucene.analysis.synonym.SynonymFilterFactory.inform(SynonymFilterFactory.java:90)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.init(NodeStateAnalyzerFactory.java:190)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.loadTokenFilterFactories(NodeStateAnalyzerFactory.java:118)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.composeAnalyzer(NodeStateAnalyzerFactory.java:104)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.createInstance(NodeStateAnalyzerFactory.java:98)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.collectAnalyzers(LuceneIndexDefinition.java:164)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:75)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:67)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.getIndexDefinition(IndexTracker.java:302)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorProvider.getIndexEditor(LuceneIndexEditorProvider.java:204)
        at 
org.apache.jackrabbit.oak.plugins.index.CompositeIndexEditorProvider.getIndexEditor(CompositeIndexEditorProvider.java:76)
        at 
org.apache.jackrabbit.oak.plugins.index.WhiteboardIndexEditorProvider.getIndexEditor(WhiteboardIndexEditorProvider.java:49)
        at 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate.collectIndexEditors(IndexUpdate.java:316)
        at 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate.enter(IndexUpdate.java:171)
        at 
org.apache.jackrabbit.oak.spi.commit.VisibleEditor.enter(VisibleEditor.java:53)
        at 
org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:48)
        at 
org.apache.jackrabbit.oak.spi.commit.EditorHook.processCommit(EditorHook.java:54)
        at 
org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
        at 
org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
        at 
org.apache.jackrabbit.oak.composite.CommitHookEnhancer.processCommit(CommitHookEnhancer.java:55)
        at 
org.apache.jackrabbit.oak.segment.scheduler.Commit.apply(Commit.java:105)
        at 
org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.execute(LockBasedScheduler.java:299)
        at 
org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.schedule(LockBasedScheduler.java:270)
        at 
org.apache.jackrabbit.oak.segment.SegmentNodeStore.merge(SegmentNodeStore.java:212)
        at 
org.apache.jackrabbit.oak.composite.CompositeNodeStore.merge(CompositeNodeStore.java:149)
        at 
org.apache.jackrabbit.oak.core.MutableRoot.commit(MutableRoot.java:261)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.commit(SessionDelegate.java:402)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:557)
        at 
org.apache.jackrabbit.oak.jcr.session.SessionImpl$9.performVoid(SessionImpl.java:460)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performVoid(SessionDelegate.java:306)
        at 
org.apache.jackrabbit.oak.jcr.session.SessionImpl.save(SessionImpl.java:457)
{noformat}

This was a consistent pattern visible across multiple threaddumps taken across 
a larger period of time (30+ minutes).

We have identified that one Lucene index (marked with async,nrt) was configured 
with a synonym.txt with 600kb size. That means, parsing that indeed takes time.

The expected behavior is that this file is parsed once in the context of the 
index creation and then cached in the IndexTracker; but given that we have seen 
this behavior across multiple requests over a larger period of time, this 
obviously did not happen, and the index object was constructed on every commit.






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to