[ 
https://issues.apache.org/jira/browse/OAK-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837941#comment-15837941
 ] 

Thomas Mueller edited comment on OAK-5324 at 1/26/17 9:37 AM:
--------------------------------------------------------------

> But I assume this issue is rather about a way to introduce a new index or 
> update an existing one when the system is online, right? In that case, the 
> branch-less mode is off the table.

I see. I wrote a tool that allows managing indexes (creating, changing, 
reindexing, removing) using a script, for both the regular and the branch-less 
mode now:
http://svn.apache.org/r1780222

> At least for new indexes we could try to improve the branch handling in the 
> DocumentNodeStore.

If that turns out to be much easier, we could probably make reindexing a 
special case of creating a new index. For example, re-index into a new hidden 
child node, ":data_1", ":data_2",..., so that the existing nodes are not 
changed. And only change the pointer to the latest ":data_x" node at the end, 
maybe in a separate commit. After that, the old, outdated ":data_(n-1)" node 
could be removed step-by-step using multiple commits, or in one commit (which 
can't conflict).

Another options might be to split indexing into multiple commits. For example 
use a "fromPath" .. "toPath" range, and only re-index part of the repository at 
a time.

> Async re-index? Does that disable synchronous index updates while it is 
> re-indexing?

I don't know currently.




was (Author: tmueller):
> But I assume this issue is rather about a way to introduce a new index or 
> update an existing one when the system is online, right? In that case, the 
> branch-less mode is off the table.

I see. I wrote a tool that allows managing indexes (creating, changing, 
reindexing, removing) using a script, for both the regular and the branch-less 
mode now:
http://svn.apache.org/r1780222

> At least for new indexes we could try to improve the branch handling in the 
> DocumentNodeStore.

If that turns out to be much easier, we could probably make reindexing a 
special case of creating a new index. For example, re-index into a new hidden 
child node, ":data_1", ":data_2",..., so that the existing nodes are not 
changed. And only change the pointer to the latest ":data_x" node at the very 
end, in a separate commit.

Another options might be to split indexing into multiple commits. For example 
use a "fromPath" .. "toPath" range, and only re-index part of the repository at 
a time.

> Async re-index? Does that disable synchronous index updates while it is 
> re-indexing?

I don't know currently.



> Enable property index reindexing via oak-run
> --------------------------------------------
>
>                 Key: OAK-5324
>                 URL: https://issues.apache.org/jira/browse/OAK-5324
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: documentmk, run
>            Reporter: Chetan Mehrotra
>            Assignee: Thomas Mueller
>             Fix For: 1.6, 1.8
>
>
> Currently introducing a new property index or performing a reindex of 
> existing property index is problamatic on DocumentNodeStore. This happens 
> because doing this results in either 
> # Persisted branch - Which is slow at times and has issues related to 
> conflict handling
> # Large in memory branch which increases heap pressure
> To enable this use case we should add some tooling in oak-run where we can 
> use different approach for achieving the same. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to