[
https://issues.apache.org/jira/browse/OAK-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962800#comment-16962800
]
Axel Hanikel commented on OAK-7932:
-----------------------------------
Added reference to second implementation to description.
> A distributed segment store for the cloud
> -----------------------------------------
>
> Key: OAK-7932
> URL: https://issues.apache.org/jira/browse/OAK-7932
> Project: Jackrabbit Oak
> Issue Type: Wish
> Components: segment-tar
> Reporter: Axel Hanikel
> Assignee: Axel Hanikel
> Priority: Minor
>
> h1. Outline
> This issue documents some proof-of-concept work for adapting the segment tar
> nodestore to a
> distributed environment. The main idea is to adopt an actor-like model,
> meaning:
> - Communication between actors (services) is done exclusively via messages.
> - An actor (which could also be a thread) processes one message at a time,
> avoiding sharing
> state with other actors as far as possible.
> - Segments are kept in RAM and are written to external storage lazily only
> for disaster recovery.
> - As RAM is a very limited resource, different actors own their share of
> the total segment space.
> - An actor can also cache a few segments which it does not own but which
> it uses often (such as
> the one containing the root node)
> - The granularity of operating on whole segments may be too coarse, so
> perhaps reducing the segment
> size would improve performance.
> - We could even use the segment solely as an addressing component and
> operate at the record level.
> That would avoid copying data around when collecting garbage: garbage
> records would just be
> evicted from RAM.
> h1. Implementation
> The first idea was to use ZeroMQ for communication because it seems to be a
> high-quality and
> easy to use implementation. A major drawback is that the library is written
> in C and the Java
> library which does the JNI stuff seems hard to set up and did not work for
> me. There is a native
> Java implementation of the ZeroMQ protocol, aptly called jeromq, which seems
> to work well so far,
> but I don't know about its performance yet.
> There is an attempt to use jeromq in the segment store in a very very very
> early stage at
> [https://github.com/ahanikel/jackrabbit-oak/tree/zeromq] . It is based on
> the memory segment store
> and currently just replaces direct function calls for reading and writing
> segments with messages being
> sent and received.
> A second implementation, at
> [https://github.com/ahanikel/jackrabbit-oak/tree/zeromq-nodestore] is a simple
> nodestore implementation which is kind of a dual to the segment store in the
> sense that it is on the other end
> of the compactness spectrum. The segment store is very dense and avoids
> duplication whereever possible.
> The nodestore in this implementation, however, is quite redundant: Every
> nodestate gets its own UUID and is saved together
> with its properties, similar to the document node store. This redundancy
> wastes space, but on the other hand garbage
> collection (yet unimplemented) is easier because there is no segment that
> needs to be rewritten to get rid of data that is no
> longer referenced; unreferenced nodes can just be deleted. This
> implementation still has bugs, but being much simpler
> than the segment store, it can eventually be used to experiment with
> different configurations and examine their
> performance.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)