[
https://issues.apache.org/jira/browse/OAK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Parvulescu updated OAK-5499:
-
Attachment: OAK-5499-v2-fix.patch
OAK-5499-v2-demo.patch
Attaching a possible fix. I decided to investigate a different approach, which
is to skip the out of band indexing if the base state is the {{MISSING_NODE}},
this is the only case where the extra traversal is very expensive (see
OAK-5499-v2-fix.patch).
The way it would work is the first index's reindex will be a part of the full
traversal (no longer a dedicated reindex traversal), and would also pickup
other index definitions that also need a reindex and include those as well in
the current traversal (no longer spawning out of band reindex traversals).
Unfortunately this is a pain to test, so I don't have anything better than some
logs, I also attached the version of the patch where anyone can see the logs
locally (see OAK-5499-v2-demo.patch).
To simplify feedback here's the output without the patch (_IU_ is the
IndexUpdate class, _CNA_ is the childNodeAdded call, _E0_ and _E1_ are the 2
indexers):
{noformat}
[IU] Reindexing [/oak:index/foo1Index]
[E0] /
[E0] /content
[E0] /content/childContent
[E0] /content/childContent/c0
[E0] /content/childContent/c0/c1
[E0] /content/oak:index
[E0] /content/oak:index/foo2Index
[E0] /oak:index
[E0] /oak:index/foo1Index
Reindexing done for [/oak:index/foo1Index]
[IU] CNA /
[IU] Reindexing [/content/oak:index/foo2Index]
[E1] /
[E1] /childContent
[E1] /childContent/c0
[E1] /childContent/c0/c1
[E1] /oak:index
[E1] /oak:index/foo2Index
Reindexing done for [/content/oak:index/foo2Index]
[IU] CNA /content
[IU] CNA /content/childContent
[IU] CNA /content/childContent/c0
[IU] CNA /content
[IU] CNA /content/oak:index
[IU] CNA /
[IU] CNA /oak:index
{noformat}
We can see the extra traversals happening, whereas with the patch, both indexes
reindex are collapsed into the main traversal thread:
{noformat}
[IU] Reindexing [/oak:index/foo1Index]
[E0] /
[IU] CNA /
[IU] Reindexing [/content/oak:index/foo2Index]
[E1] /
[E0] /content
[IU] CNA /content
[E1] /childContent
[E0] /content/childContent
[IU] CNA /content/childContent
[E1] /childContent/c0
[E0] /content/childContent/c0
[IU] CNA /content/childContent/c0
[E1] /childContent/c0/c1
[E0] /content/childContent/c0/c1
[IU] CNA /content
[E1] /oak:index
[E0] /content/oak:index
[IU] CNA /content/oak:index
[E1] /oak:index/foo2Index
[E0] /content/oak:index/foo2Index
[IU] CNA /
[E0] /oak:index
[IU] CNA /oak:index
[E0] /oak:index/foo1Index
{noformat}
I took some special care to preserve the current logging style on reindex, and
I believe I managed to do that, but there might have been aspects I forgot.
feedback very appreciated!
> IndexUpdate can do mulitple traversal of a content tree during initial index
> when there are sub-root indices
>
>
> Key: OAK-5499
> URL: https://issues.apache.org/jira/browse/OAK-5499
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Minor
> Fix For: 1.8
>
> Attachments: OAK-5499.patch, OAK-5499-v2-demo.patch,
> OAK-5499-v2-fix.patch
>
>
> In case we've index defs such as:
> {noformat}
> /oak:index/foo1Index
> /content
>/oak:index/foo2Index
> {noformat}
> then initial indexing process \[0] would traverse tree under {{/content}}
> twice - once while indexing for top-level indices and next when it starts to
> index newly discovered {{foo2Index}} while traversing {{/content/oak:index}}.
> What we can do is that while first diff processes {{/content}} and discovers
> a node named {{oak:index}}, it can actively go in that tree and peek into
> index defs from under it and register as required. The diff can then proceed
> under {{/content}} while the new indices would also get diffs (avoiding
> another traversal)
> \[0] first time indexing or in case {{/:async}} gets deleted or checkpoint
> for async index couldn't be retrieved
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)