[ 
https://issues.apache.org/jira/browse/OAK-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomek Rękawek updated OAK-3865:
-------------------------------
    Attachment: clustered-oak-setup-improvements.pdf

> New strategy to optimize secondary reads
> ----------------------------------------
>
>                 Key: OAK-3865
>                 URL: https://issues.apache.org/jira/browse/OAK-3865
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk
>            Reporter: Tomek Rękawek
>              Labels: performance
>             Fix For: 1.6
>
>         Attachments: OAK-3865.patch, clustered-oak-setup-improvements.pdf, 
> diagram.png
>
>
> *Introduction*
> In the current trunk we'll only read document _D_ from the secondary instance 
> if:
> (1) we have the parent _P_ of document _D_ cached and
> (2) the parent hasn't been modified in 6 hours.
> The OAK-2106 tried to optimise (2) by estimating lag using MongoDB replica 
> stats. It was unreliable, so the second approach was to read the last 
> revisions directly from each Mongo instance. If the modification date of _P_ 
> is before last revisions on all secondary Mongos, then secondary can be used.
> The main problem with this approach is that we still need to have the _P_ to 
> be in cache. I think we need another way to optimise the secondary reading, 
> as right now only about 3% of requests connects to the secondary, which is 
> bad especially for the global-clustering case (Mongo and Oak instances across 
> the globe). The optimisation provided in OAK-2106 doesn't make the things 
> much better and may introduce some consistency issues.
> *Proposal - tldr version*
> Oak will remember the last revision it has ever seen. In the same time, it'll 
> query each secondary Mongo instance, asking what's the available stored root 
> revision. If all secondary instances have a root revision >= last revision 
> seen by a given Oak instance, it's safe to use the secondary read preference.
> *Proposal*
> I had following constraints in mind preparing this:
> 1. Let's assume we have a sequence of commits with revisions _R1_, _R2_ and 
> _R3_ modifying nodes _N1_, _N2_ and _N3_. If we already read the _N1_ from 
> revision _R2_ then reading from a secondary shouldn't result in getting older 
> revision (eg. _R1_).
> 2. If an Oak instance modifies a document, then reading from a secondary 
> shouldn't result in getting the old version (before modification).
> So, let's have two maps:
> * _M1_ the most recent document revision read from the Mongo for each cluster 
> id,
> * _M2_ the oldest last rev value for root document for each cluster id read 
> from all the secondary instances.
> Maintaining _M1_:
> For every read from the Mongo we'll check if the lastRev for some cluster id 
> is newer than _M1_ entry. If so, we'll update _M1_. For all writes we'll add 
> the saved revision id with the current cluster id in _M1_.
> Maintaining _M2_:
> It should be periodically updated. Such mechanism is already prepared in the 
> OAK-2106 patch.
> The method deciding whether we can read from the secondary instance should 
> compare two maps. If all entries in _M2_ are newer than _M1_ it means that 
> the secondary instances contains at least as new repository state as we 
> already accessed and therefore it's safe to read from secondary.
> Regarding the documents modified by the local Oak instance, we should 
> remember all the locally-modified paths and their revisions and use primary 
> Mongo to access them as long as the changes are not replicated to all the 
> secondaries. When the secondaries are up to date with the modification, we 
> can remove it from the local-changes collections.
> Attached image diagram.png presents the idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to