Hello,

during an Adobe internal Oak-coordination call I presented two improvements for 
the clustered Oak setup I’m working on: OAK-3865 and OAK-4412. The presentation 
is available at [1], please find the summary of the discussion below.

OAK-3865 optimises the secondary-read strategy. It tracks all the revisions 
affected / read by the Oak instance and also fetches _lastRevs from all the 
secondary Mongo instances to decide whether it’s safe to use the 
“preferSecondary” or “nearest” preference. Chetan suggested we should use the 
node document cache to have even better results - he added his comment to JIRA 
[2] as well. Julian thinks we should use the find(…, maxAge) method more often, 
maybe even exposing the maxAge in JCR. The first step, however, is to have this 
OAK-3865 finished.

OAK-4412 introduces a new “hybrid” Lucene index that consists of asynchronous, 
shared part and a volatile, local part updated synchronously. This way we can 
have an index which is updated immediately (or almost immediately), like the 
property index, but without the need to have expensive repository writes.

The main problem here is how we should update the volatile part of the index - 
using commit hook or observer. Commit hook allows us to have the changes 
visible immediately, but it may also slow done all the commits. On the other 
hand, using observer will introduce a small delay between the commit and having 
the modifications indexed, but it won’t burden the commit process. Stefan’s 
idea is to use observer and modify the query logic, so it can check whether 
there are some pending changes. If so, the query can wait until recent changes 
are indexed [3].

With regards to the future work, I’m eager to commit the first patch 
(OAK-3865). If the Oak community agrees on the approach and the implementation 
(the patch is attached to the JIRA [4]), I’ll merge it next week, on Wednesday. 
The Chetan idea is very good, but I’d like to extract it to a separate issue, 
as the patch is quite big already.

For the OAK-4412, I’ll try to implement the Stefan idea of waiting for the 
index update in the query time and keep you posted.

Best regards,
Tomek

[1] 
https://issues.apache.org/jira/secure/attachment/12808917/clustered-oak-setup-improvements.pdf
[2] 
https://issues.apache.org/jira/browse/OAK-3865?focusedCommentId=15320292#comment-15320292
[3] 
https://issues.apache.org/jira/browse/OAK-4412?focusedCommentId=15320275#comment-15320275
[4] https://issues.apache.org/jira/browse/OAK-3865

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
[email protected]

Reply via email to