[ 
https://issues.apache.org/jira/browse/OAK-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949861#comment-13949861
 ] 

Alex Parvulescu edited comment on OAK-1456 at 3/27/14 8:06 PM:
---------------------------------------------------------------

attaching initial patch for feedback.

The idea is that for property indexes that have the 'reindex-async' flag set to 
true, when the 'reindex' flag is raised, the reindex will happen asynchronously.
The process should work as follows:
 - raising the 'reindex' flag makes the property index editor consider an index 
for full reindex, now if the 'reindex-async' flag is present, the editor will 
simply set the 'async' property on the index (async = 'async-reindex') and 
ignore it.
 - there will be a new thread (a copy of the 'async' one) dedicated to these 
special properties which runs in the background and will pickup the 
aforementioned index and run a full reindex on it.

The trick here is that this thread waits until is completes a cycle without any 
changes _then_ it will remove the 'async' property thus switching the property 
index back to a synchronous mode.
So one open issue here is: the async will switch back to sync at the very least 
in a matter of 2 cycles (by the current setting 10 seconds).

The mechanism works fine, the issues are around installing a second async 
thread for the new channel. There were some tweaks I had to do to the current 
async indexer, but nothing too disruptive (there were still some assumptions 
around the fact that there is only one async indexer running at a time). Also, 
I had to add a _synchronized_ on the _checkpoint_ method in the 
SegmentNodeStore, because of a race issue where the 2 async indexers were 
running at the same time, and consistently one of them could not create the 
checkpoint (adding some delay fixed the problem).

I'm not 100% sure if switching from async to sync is prone to lose some info 
because of the retry policy of the merge operation that will introduce the 
changes.
Also I think reindexing under a different node is not necessary, as the changes 
should be visible after the commit which contains everything.

[~jukkaz] maybe you can take a quick look at the patch?
























was (Author: alex.parvulescu):
attaching initial patch for feedback.

The idea is that for property indexes that have the 'reindex-async' flag set to 
true, when the 'reindex' flag is raised, the reindex will happen asynchronously.
The process should work as follows:
 - raising the 'reindex' flag makes the property index editor consider an index 
for full reindex, now if the 'reindex-async' flag is present, the editor will 
simply set the 'async' property on the index (async = 'async-reindex') and 
ignore it.
 - there will be a new thread (a copy of the 'async' one) dedicated to these 
special properties which runs in the background and will pickup the 
aforementioned index and run a full reindex on it.
The trick here is that this thread waits until is completes a cycle without any 
changes _then_ is will remove the 'async' property thus switching the property 
index back to a synchronous mode.
So one open issue here is: the async will switch back to sync at the very least 
in a matter of 2 cycles (by the current setting 10 seconds).

The mechanism works fine, the issues are around installing a second async 
thread for the new channel. There were some tweaks I had to do to the current 
async indexer, but nothing too disruptive (there were still some assumptions 
around the fact that there is only one async indexer running at a time). Also, 
I had to add a _synchronized_ on the _checkpoint_ method in the 
SegmentNodeStore, because of a race issue where the 2 async indexers were 
running at the same time, and consistently one of them could not create the 
checkpoint (adding some delay fixed the problem).

I'm not 100% sure if switching from async to sync is prone to lose some info 
because of the retry policy of the merge operation that will introduce the 
changes.
Also I think reindexing under a different node is not necessary, as the changes 
should be visible after the commit which contains everything.

[~jukkaz] maybe you can take a quick look at the patch?























> Non-blocking reindexing
> -----------------------
>
>                 Key: OAK-1456
>                 URL: https://issues.apache.org/jira/browse/OAK-1456
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>            Reporter: Michael Marth
>            Assignee: Alex Parvulescu
>            Priority: Blocker
>              Labels: production, resilience
>             Fix For: 0.20
>
>         Attachments: OAK-1456.patch
>
>
> For huge Oak repos it will be essential to re-index some or all indexes in 
> case they go out of sync in a non-blocking way (i.e. the repo is still 
> operation while the re-indexing takes place).
> For an asynchronous index this should not be much of a problem. One could 
> drop it and recreate (as an added benefit it might be nice if the user could 
> simply add a property "reindex" to the index definition node to trigger this).
> For synchronous indexes, I suggest the mechanism creates an asynchronous 
> index behind the scenes first and once it has caught up
> * blocks writes (?)
> * removes the existing synchronous index
> * moves asynchronous index in its place and makes it synchronous



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to