[ 
https://issues.apache.org/jira/browse/OAK-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-2599:
---------------------------------
    Attachment: OAK-2599-1.patch

[proposed patch|^OAK-2599-1.patch] based on following approach

Adds support for a {{PathFilter}}  which can decide what to do with a given 
path base don a set of includes and excludes. It can result in either
* INCLUDE - The given path needs to be included and hence indexed
* EXCLUDE - Given path and its subtree must be excluded from procesisng and 
hence not indexed
* TRAVERSE - Given path should be traversed but not indexed. Later some path 
down under might be actually part of some include and would the be indexed. One 
place where this would be used is say if your include consist of /a/b and /a/c. 
Then for /a editor must just traverse and not index while actual indexing would 
be done for /a/b and /a/c only

It makes use of config from index config
{noformat}
/oak:index/foo
  - jcr:primaryType = "oak:QueryIndexDefinition"
 - includedPaths (string) multiple
 - excludedPaths (string) multiple
{noformat}

Where 
* {{includedPaths}} - Multi value property indicating set of path which should 
be included for indexing. 
* {{excludedPaths}} - Multi value property indicating set of path which should 
NOT be included for indexing

* Both fields are option - If none provided then default [includedPaths: '/' , 
excludedPaths: ""] is used
* PathFilter has to be used by specific index implementations. For now 
LuceneIndexEditor would make use of this 

*Included Paths*
By default the recommended way to control which paths are included is to place 
the index definition under given path itself. For e.g. if you want that only 
nodes under {{/content/en}} should be indexed then you can achieve that by 
creating the index definition under {{/content/en/oak:index/<index>}}. However 
this requires that queries also make use of path restrictions for such an index 
to be picked up.

With this patch one can provide set of path to be included by config. For e.g. 
you can create index definition under /oak:index and just want to index nodes 
under /lib and /apps then its not possible with previous approach. That can now 
be done by providing set of path to be indexed and then only nodes under those 
paths would be indexed

*Benefits*
* Editor would avoid processing the diff for paths not of interest
* One can exclude paths which a user knows are not of interest to some indexes. 
This would help in processing writes happening in those paths

[~alexparvulescu] [~tmueller] Can you review the patch

[~mduerig] This patch is nit based on the approach used for filtering in event 
processing. So if you can also have a look it would be helpful!

> Allow excluding certain paths from getting indexed for particular index
> -----------------------------------------------------------------------
>
>                 Key: OAK-2599
>                 URL: https://issues.apache.org/jira/browse/OAK-2599
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: core
>            Reporter: Chetan Mehrotra
>             Fix For: 1.3.0
>
>         Attachments: OAK-2599-1.patch
>
>
> Currently an {{IndexEditor}} gets to index all nodes under the tree where it 
> is defined (post OAK-1980).  Due to this IndexEditor would traverse the whole 
> repo (or subtree if configured in non root path) to perform reindex. 
> Depending on the repo size this process can take quite a bit of time. It 
> would be faster if an IndexEditor can exclude certain paths from traversal
> Consider an application like Adobe AEM and an index which only index 
> dam:Asset or the default full text index. For a fulltext index it might make 
> sense to avoid indexing the versionStore. So if the index editor skips such 
> path then lots of redundant traversal can be avoided. 
> Also see http://markmail.org/thread/4cuuicakagi6av4v



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to