[
https://issues.apache.org/jira/browse/OAK-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tomek Rękawek updated OAK-3865:
-------------------------------
Attachment: (was: OAK-3865.png)
> New strategy to optimize secondary reads
> ----------------------------------------
>
> Key: OAK-3865
> URL: https://issues.apache.org/jira/browse/OAK-3865
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: mongomk
> Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: diagram.png
>
>
> *Introduction*
> In the current trunk we'll only read document _D_ from the secondary instance
> if:
> (1) we have the parent _P_ of document _D_ cached and
> (2) the parent hasn't been modified in 6 hours.
> The OAK-2106 tried to optimise (2) by estimating lag using MongoDB replica
> stats. It was unreliable, so the second approach was to read the last
> revisions directly from each Mongo instance. If the modification date of _P_
> is before last revisions on all secondary Mongos, then secondary can be used.
> The main problem with this approach is that we still need to have the _P_ to
> be in cache. I think we need another way to optimise the secondary reading,
> as right now only about 3% of requests connects to the secondary, which is
> bad especially for the global-clustering case (Mongo and Oak instances across
> the globe). The optimisation provided in OAK-2106 doesn't make the things
> much better and may introduce some consistency issues.
> *Proposal*
> I had following constraints in mind preparing this:
> 1. Let's assume we have a sequence of commits with revisions _R1_, _R2_ and
> _R3_ modifying nodes _N1_, _N2_ and _N3_. If we already read the _N1_ from
> revision _R2_ then reading from a secondary shouldn't result in getting older
> revision (eg. _R1_).
> 2. If an Oak instance modifies a document, then reading from a secondary
> shouldn't result in getting the old version (before modification).
> So, let's have two maps:
> * _M1_ the most recent document revision read from the Mongo for each cluster
> id,
> * _M2_ the oldest last rev value for root document for each cluster id read
> from all the secondary instances.
> Maintaining _M1_:
> For every read from the Mongo we'll check if the lastRev for some cluster id
> is newer than _M1_ entry. If so, we'll update _M1_. For all writes we'll add
> the saved revision id with the current cluster id in _M1_.
> Maintaining _M2_:
> It should be periodically updated. Such mechanism is already prepared in the
> OAK-2106 patch.
> The method deciding whether we can read from the secondary instance should
> compare two maps. If all entries in _M2_ are newer than _M1_ it means that
> the secondary instances contains at least as new repository state as we
> already accessed and therefore it's safe to read from secondary.
> !OAK-3865.png!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)