Joerg Hoh created OAK-12116:
-------------------------------
Summary: Large synonym list delays commits to the repository
Key: OAK-12116
URL: https://issues.apache.org/jira/browse/OAK-12116
Project: Jackrabbit Oak
Issue Type: Improvement
Components: indexing
Affects Versions: 1.88.0
Reporter: Joerg Hoh
I came across a situation, where session.save() was very slow and blocking
other requests. The threaddump looked like this:
{noformat}
at
org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.loadSynonyms(FSTSynonymFilterFactory.java:142)
at
org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.inform(FSTSynonymFilterFactory.java:112)
at
org.apache.lucene.analysis.synonym.SynonymFilterFactory.inform(SynonymFilterFactory.java:90)
at
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.init(NodeStateAnalyzerFactory.java:190)
at
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.loadTokenFilterFactories(NodeStateAnalyzerFactory.java:118)
at
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.composeAnalyzer(NodeStateAnalyzerFactory.java:104)
at
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.createInstance(NodeStateAnalyzerFactory.java:98)
at
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.collectAnalyzers(LuceneIndexDefinition.java:164)
at
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:75)
at
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:67)
at
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.getIndexDefinition(IndexTracker.java:302)
at
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorProvider.getIndexEditor(LuceneIndexEditorProvider.java:204)
at
org.apache.jackrabbit.oak.plugins.index.CompositeIndexEditorProvider.getIndexEditor(CompositeIndexEditorProvider.java:76)
at
org.apache.jackrabbit.oak.plugins.index.WhiteboardIndexEditorProvider.getIndexEditor(WhiteboardIndexEditorProvider.java:49)
at
org.apache.jackrabbit.oak.plugins.index.IndexUpdate.collectIndexEditors(IndexUpdate.java:316)
at
org.apache.jackrabbit.oak.plugins.index.IndexUpdate.enter(IndexUpdate.java:171)
at
org.apache.jackrabbit.oak.spi.commit.VisibleEditor.enter(VisibleEditor.java:53)
at
org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:48)
at
org.apache.jackrabbit.oak.spi.commit.EditorHook.processCommit(EditorHook.java:54)
at
org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
at
org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
at
org.apache.jackrabbit.oak.composite.CommitHookEnhancer.processCommit(CommitHookEnhancer.java:55)
at
org.apache.jackrabbit.oak.segment.scheduler.Commit.apply(Commit.java:105)
at
org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.execute(LockBasedScheduler.java:299)
at
org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.schedule(LockBasedScheduler.java:270)
at
org.apache.jackrabbit.oak.segment.SegmentNodeStore.merge(SegmentNodeStore.java:212)
at
org.apache.jackrabbit.oak.composite.CompositeNodeStore.merge(CompositeNodeStore.java:149)
at
org.apache.jackrabbit.oak.core.MutableRoot.commit(MutableRoot.java:261)
at
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.commit(SessionDelegate.java:402)
at
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:557)
at
org.apache.jackrabbit.oak.jcr.session.SessionImpl$9.performVoid(SessionImpl.java:460)
at
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performVoid(SessionDelegate.java:306)
at
org.apache.jackrabbit.oak.jcr.session.SessionImpl.save(SessionImpl.java:457)
{noformat}
This was a consistent pattern visible across multiple threaddumps taken across
a larger period of time (30+ minutes).
We have identified that one Lucene index (marked with async,nrt) was configured
with a synonym.txt with 600kb size. That means, parsing that indeed takes time.
The expected behavior is that this file is parsed once in the context of the
index creation and then cached in the IndexTracker; but given that we have seen
this behavior across multiple requests over a larger period of time, this
obviously did not happen, and the index object was constructed on every commit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)