[
https://issues.apache.org/jira/browse/OAK-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chetan Mehrotra updated OAK-2599:
---------------------------------
Attachment: OAK-2599-1.patch
[proposed patch|^OAK-2599-1.patch] based on following approach
Adds support for a {{PathFilter}} which can decide what to do with a given
path base don a set of includes and excludes. It can result in either
* INCLUDE - The given path needs to be included and hence indexed
* EXCLUDE - Given path and its subtree must be excluded from procesisng and
hence not indexed
* TRAVERSE - Given path should be traversed but not indexed. Later some path
down under might be actually part of some include and would the be indexed. One
place where this would be used is say if your include consist of /a/b and /a/c.
Then for /a editor must just traverse and not index while actual indexing would
be done for /a/b and /a/c only
It makes use of config from index config
{noformat}
/oak:index/foo
- jcr:primaryType = "oak:QueryIndexDefinition"
- includedPaths (string) multiple
- excludedPaths (string) multiple
{noformat}
Where
* {{includedPaths}} - Multi value property indicating set of path which should
be included for indexing.
* {{excludedPaths}} - Multi value property indicating set of path which should
NOT be included for indexing
* Both fields are option - If none provided then default [includedPaths: '/' ,
excludedPaths: ""] is used
* PathFilter has to be used by specific index implementations. For now
LuceneIndexEditor would make use of this
*Included Paths*
By default the recommended way to control which paths are included is to place
the index definition under given path itself. For e.g. if you want that only
nodes under {{/content/en}} should be indexed then you can achieve that by
creating the index definition under {{/content/en/oak:index/<index>}}. However
this requires that queries also make use of path restrictions for such an index
to be picked up.
With this patch one can provide set of path to be included by config. For e.g.
you can create index definition under /oak:index and just want to index nodes
under /lib and /apps then its not possible with previous approach. That can now
be done by providing set of path to be indexed and then only nodes under those
paths would be indexed
*Benefits*
* Editor would avoid processing the diff for paths not of interest
* One can exclude paths which a user knows are not of interest to some indexes.
This would help in processing writes happening in those paths
[~alexparvulescu] [~tmueller] Can you review the patch
[~mduerig] This patch is nit based on the approach used for filtering in event
processing. So if you can also have a look it would be helpful!
> Allow excluding certain paths from getting indexed for particular index
> -----------------------------------------------------------------------
>
> Key: OAK-2599
> URL: https://issues.apache.org/jira/browse/OAK-2599
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: core
> Reporter: Chetan Mehrotra
> Fix For: 1.3.0
>
> Attachments: OAK-2599-1.patch
>
>
> Currently an {{IndexEditor}} gets to index all nodes under the tree where it
> is defined (post OAK-1980). Due to this IndexEditor would traverse the whole
> repo (or subtree if configured in non root path) to perform reindex.
> Depending on the repo size this process can take quite a bit of time. It
> would be faster if an IndexEditor can exclude certain paths from traversal
> Consider an application like Adobe AEM and an index which only index
> dam:Asset or the default full text index. For a fulltext index it might make
> sense to avoid indexing the versionStore. So if the index editor skips such
> path then lots of redundant traversal can be avoided.
> Also see http://markmail.org/thread/4cuuicakagi6av4v
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)