[ https://issues.apache.org/jira/browse/OAK-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Axel Hanikel updated OAK-7932: ------------------------------ Description: h1. Outline This issue documents some proof-of-concept work for adapting the segment tar nodestore to a distributed environment. The main idea is to adopt an actor-like model, meaning: - Communication between actors (services) is done exclusively via messages. - An actor (which could also be a thread) processes one message at a time, avoiding sharing state with other actors as far as possible. - Segments are kept in RAM and are written to external storage lazily only for disaster recovery. - As RAM is a very limited resource, different actors own their share of the total segment space. - An actor can also cache a few segments which it does not own but which it uses often (such as the one containing the root node) - The granularity of operating on whole segments may be too coarse, so perhaps reducing the segment size would improve performance. - We could even use the segment solely as an addressing component and operate at the record level. That would avoid copying data around when collecting garbage: garbage records would just be evicted from RAM. h1. Implementation The first idea was to use ZeroMQ for communication because it seems to be a high-quality and easy to use implementation. A major drawback is that the library is written in C and the Java library which does the JNI stuff seems hard to set up and did not work for me. There is a native Java implementation of the ZeroMQ protocol, aptly called jeromq, which seems to work well so far, but I don't know about its performance yet. There is an attempt to use jeromq in the segment store in a very very very early stage at [https://github.com/ahanikel/jackrabbit-oak/tree/zeromq] . It is based on the memory segment store and currently just replaces direct function calls for reading and writing segments with messages being sent and received. was: # Outline This issue documents some proof-of-concept work for adapting the segment tar nodestore to a distributed environment. The main idea is to adopt an actor-like model, meaning: - Communication between actors (services) is done exclusively via messages. - An actor (which could also be a thread) processes one message at a time, avoiding sharing state with other actors as far as possible. - Segments are kept in RAM and are written to external storage lazily only for disaster recovery. - As RAM is a very limited resource, different actors own their share of the total segment space. - An actor can also cache a few segments which it does not own but which it uses often (such as the one containing the root node) - The granularity of operating on whole segments may be too coarse, so perhaps reducing the segment size would improve performance. - We could even use the segment solely as an addressing component and operate at the record level. That would avoid copying data around when collecting garbage: garbage records would just be evicted from RAM. # Implementation The first idea was to use ZeroMQ for communication because it seems to be a high-quality and easy to use implementation. A major drawback is that the library is written in C and the Java library which does the JNI stuff seems hard to set up and did not work for me. There is a native Java implementation of the ZeroMQ protocol, aptly called jeromq, which seems to work well so far, but I don't know about its performance yet. There is an attempt to use jeromq in the segment store in a very very very early stage at <https://github.com/ahanikel/jackrabbit-oak/tree/zeromq> . It is based on the memory segment store and currently just replaces direct function calls for reading and writing segments with messages being sent and received. > A distributed segment store for the cloud > ----------------------------------------- > > Key: OAK-7932 > URL: https://issues.apache.org/jira/browse/OAK-7932 > Project: Jackrabbit Oak > Issue Type: Wish > Components: segment-tar > Reporter: Axel Hanikel > Assignee: Axel Hanikel > Priority: Minor > > h1. Outline > This issue documents some proof-of-concept work for adapting the segment tar > nodestore to a > distributed environment. The main idea is to adopt an actor-like model, > meaning: > - Communication between actors (services) is done exclusively via messages. > - An actor (which could also be a thread) processes one message at a time, > avoiding sharing > state with other actors as far as possible. > - Segments are kept in RAM and are written to external storage lazily only > for disaster recovery. > - As RAM is a very limited resource, different actors own their share of > the total segment space. > - An actor can also cache a few segments which it does not own but which > it uses often (such as > the one containing the root node) > - The granularity of operating on whole segments may be too coarse, so > perhaps reducing the segment > size would improve performance. > - We could even use the segment solely as an addressing component and > operate at the record level. > That would avoid copying data around when collecting garbage: garbage > records would just be > evicted from RAM. > h1. Implementation > The first idea was to use ZeroMQ for communication because it seems to be a > high-quality and > easy to use implementation. A major drawback is that the library is written > in C and the Java > library which does the JNI stuff seems hard to set up and did not work for > me. There is a native > Java implementation of the ZeroMQ protocol, aptly called jeromq, which seems > to work well so far, > but I don't know about its performance yet. > There is an attempt to use jeromq in the segment store in a very very very > early stage at > [https://github.com/ahanikel/jackrabbit-oak/tree/zeromq] . It is based on the > memory segment store > and currently just replaces direct function calls for reading and writing > segments with messages being > sent and received. -- This message was sent by Atlassian JIRA (v7.6.3#76005)