[ 
https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli reassigned OAK-2844:
--------------------------------

    Assignee: Stefan Egli

> Introducing a simple document-based discovery-light service (to circumvent 
> documentMk's eventual consistency delays)
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: OAK-2844
>                 URL: https://issues.apache.org/jira/browse/OAK-2844
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: mongomk
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>              Labels: resilience
>             Fix For: 1.4
>
>         Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, 
> OAK-2844.patch, OAK-2844.v3.patch
>
>
> When running discovery.impl on a mongoMk-backed jcr repository, there are 
> risks of hitting problems such as described in "SLING-3432 
> pseudo-network-partitioning": this happens when a jcr-level heartbeat does 
> not reach peers within the configured heartbeat timeout - it then treats that 
> affected instance as dead, removes it from the topology, and continues with 
> the remainings, potentially electing a new leader, running the risk of 
> duplicate leaders. This happens when delays in mongoMk grow larger than the 
> (configured) heartbeat timeout. These problems ultimately are due to the 
> 'eventual consistency' nature of, not only mongoDB, but more so of mongoMk. 
> The only alternative so far is to increase the heartbeat timeout to match the 
> expected or measured delays that mongoMk can produce (under say given 
> load/performance scenarios).
> Assuming that mongoMk will always carry a risk of certain delays and a 
> maximum, reasonable (for discovery.impl timeout that is) maximum cannot be 
> guaranteed, a better solution is to provide discovery with more 'real-time' 
> like information and/or privileged access to mongoDb.
> Here's a summary of alternatives that have so far been floating around as a 
> solution to circumvent eventual consistency:
>  # expose existing (jmx) information about active 'clusterIds' - this has 
> been proposed in SLING-4603. The pros: reuse of existing functionality. The 
> cons: going via jmx, binding of exposed functionality as 'to be maintained 
> API'
>  # expose a plain mongo db/collection (via osgi injection) such that a higher 
> (sling) level discovery could directly write heartbeats there. The pros: 
> heartbeat latency would be minimal (assuming the collection is not sharded). 
> The cons: exposes a mongo db/collection potentially also to anyone else, with 
> the risk of opening up to unwanted possibilities
>  # introduce a simple 'discovery-light' API to oak which solely provides 
> information about which instances are active in a cluster. The implementation 
> of this is not exposed. The pros: no need to expose a mongoDb/collection, 
> allows any other jmx-functionality to remain unchanged. The cons: a new API 
> that must be maintained
> This ticket is about the 3rd option, about a new mongo-based discovery-light 
> service that is introduced to oak. The functionality in short:
>  * it defines a 'local instance id' that is non-persisted, ie can change at 
> each bundle activation.
>  * it defines a 'view id' that uniquely identifies a particular incarnation 
> of a 'cluster view/state' (which is: a list of active instance ids)
>  * and it defines a list of active instance ids
>  * the above attributes are passed to interested components via a listener 
> that can be registered. that listener is called whenever the discovery-light 
> notices the cluster view has changed.
> While the actual implementation could in fact be based on the existing 
> {{getActiveClusterNodes()}} {{getClusterId()}} of the 
> {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with that part, 
> as that has dependencies to other logic. But instead, the suggestion is to 
> create a dedicated, other, collection ('discovery') where heartbeats as well 
> as the currentView are stored.
> Will attach a suggestion for an initial version of this for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to