[ 
https://issues.apache.org/jira/browse/OAK-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925759#comment-16925759
 ] 

Tomek Rękawek commented on OAK-8613:
------------------------------------

[~frm], [~dulceanu], [~ahanikel], [~ierandra] - this is a skunk works project 
that I've been spending my summer evenings on. It would be great if you can 
look at the design or code and give some feedback. I haven't discussed it with 
anyone yet.

> Azure Segment Store clustering PoC
> ----------------------------------
>
>                 Key: OAK-8613
>                 URL: https://issues.apache.org/jira/browse/OAK-8613
>             Project: Jackrabbit Oak
>          Issue Type: Story
>          Components: segment-azure
>            Reporter: Tomek Rękawek
>            Priority: Major
>         Attachments: OAK-8613.patch, remote node store.png
>
>
> (This description is WiP).
> Azure Segment Store offers a way to read the same segments, concurrently in 
> many Oak instances. With a way to coordinate writes, it's possible to 
> implement a distributed node store based on the SegmentNS. The solution will 
> consist of following elements:
> * a central server, that'd coordinate the writes to the shared repository,
> * a number of clients, that can read directly from the shared repository. 
> They also have their own, private repositories within the same cloud storage,
> * the shared repository, which represents the current state. It can be read 
> by anyone, but only the central server can write it,
> * the private repositories. Clients can write their own segments there and 
> then pass their references to the server.
> As above, every client uses two repositories: shared (in the read only mode) 
> and private (in read-write mode). When a client wants to read the current 
> root, it asks the server for its revision. Then, it looks in the shared 
> repository, reads the segment and creates a node state.
> If the client wants to modify the repository, it convert the fetched node 
> state into node builder. The applied changes will eventually be serialized in 
> the new segment, in the client's private repository. In order to merge it 
> changes, the client will send two revision ids: of the base root (fetched 
> from shared repo) and of the current root (stored in the private repo). 
> Server will check if the base root wasn't updated in the meantime - in such 
> case it requests a rebase. Otherwise, it'll read the current root from the 
> private repository, apply the commit changes in the shared repository and 
> update the journal.
> gRPC is used for the communication between the server and clients. This is 
> used only for the coordination. All the data are actually exchanged via 
> segment stores.
>  !remote node store.png|width=100%! 
> The attached [^OAK-8613.patch] contains the implementation split into 3 parts:
> * oak-store-remote-commons, which contains the gRPC descriptors of used 
> services and embeds the required libraries,
> * oak-store-remote-server builds an executable JAR, that starts the server,
> * oak-store-remote-client is an OSGi bundle that starts NodeStore connecting 
> to the configured server and Azure Storage.
> There are also some changes in the oak-segment-tar - new SPIs that allows to 
> read segments with their revisions (record ids) and exposes the revision in 
> the node state.
> The gRPC uses Guava 26. I was able to somehow get it running with other Oak 
> bundles, using Guava 15, but if we want to produtize it, we'd need to update 
> Oak's Guava.
> There's a new fixture that tests the implementation. It can be run with:
> {noformat}
> mvn clean install -f oak-it/pom.xml -Dnsfixtures=REMOTE -Dtest=NodeStoreTest 
> -Dtest.opts=-Xmx4g
> {noformat}
> This is prototype. It lacks of the tests and important resilience features:
> * unit tests, especially for the discovery lite implementation,
> * server resilience and support disconnecting clients in a clean way (eg. 
> unregister node observers),
> * client resilience, with support to re-connect,
> * clean resources on both sides on disconnection (eg. remove private 
> repository).
> Potential improvements:
> * can we have multiple replicas for the server, in the active-passive mode, 
> to increase resilience?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to