[
https://issues.apache.org/jira/browse/OAK-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcel Reutegger reopened OAK-6921:
-----------------------------------
[~tomek.rekawek], {{GraphLoader}} does not have a license header. Can you
please fix this?
> Support pluggable segment storage
> ---------------------------------
>
> Key: OAK-6921
> URL: https://issues.apache.org/jira/browse/OAK-6921
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: segment-tar
> Reporter: Tomek Rękawek
> Assignee: Tomek Rękawek
> Priority: Major
> Fix For: 1.9.0, 1.10
>
> Attachments: OAK-6921.patch, current-state.png, new-interfaces.png
>
>
> h3. Rationale
> segment-tar, as names suggest, stores the segments in a bunch of tar
> archives, inside the {{segmentstore}} directory on the local file system. For
> some cases, especially in the cloud deployments, it may be interesting to
> store the segments outside the local FS - the remote storage such as Amazon
> S3, Azure Blob Storage or HDFS may be cheaper than a mounted disk, more
> scalable, easier for the provisioning, etc.
> h3. Storing segment in tar files
> !current-state.png!
> There are 3 classes responsible for handling tar files in the segment-tar:
> TarFiles, TarWriter and TarReader. The TarFiles manages the {{segmentstore}}
> directory, scans it for the .tars and for each one creates a TarReader. It
> also creates a single TarWriter object, used to write (and also read) the
> most recent tar file.
> The TarWriter appends segments to the latest tar file and also serializes the
> auxiliary indexes: segment index, binary references index and the segment
> graph. It also takes of synchronization, as we're dealing with a mutable data
> structure - tar file opened in the append mode.
> The TarReader not only reads the segments from the tar file, but is also
> responsible for the revision GC (mark & sweep methods) and recovering data
> from files which hasn't been closed cleanly (eg. have no index).
> h3. New abstraction layer - SegmentArchiveManager
> !new-interfaces.png!
> In order to store segments not in the tar files, but somewhere else, it'd be
> possible to create own implementation of the TarFiles, TarWriter and
> TarReader. However, such implementation would duplicate a lot of code, not
> strictly related to the persistence - mark(), sweep(), synchronization, etc.
> Rather than that, the attached patch presents a different approach: a new
> layer of abstraction is injected into TarFiles, TarWriter and TarReader - it
> only takes care of the segments persistence and knows nothing about the
> synchronization, GC, etc. - leaving it to the upper layer.
> The new abstraction layer is modelled using 3 new classes:
> SegmentArchiveManager, SegmentArchiveReader and SegmentArchiveWriter. They
> are strictly related to the existing Tar* classes and used by them.
> SegmentArchiveManager provides a bunch of file system-style methods, like
> open(), create(), delete(), exists(), etc. The open() and create() returns
> instances of the SAReader and SAWriter.
> SegmentArchiveReader, despite from reading segments, can also load and parse
> the index, graph and binary references. The logic responsible for parsing
> these structures has been already extracted, so it doesn't need to be
> duplicated in the SAReader implementations. Also, SAReader needs to be aware
> about the index, since it contains the segment offsets.
> The SAWriter class allows to write and read the segments and also store the
> indexes. It isn't thread safe - it assumes that the synchronization is
> already done on the higher layers.
> In the patch, I've moved the tar implementation to the new classes:
> SegmentTarManager, SegmentTarReader and SegmentTarWriter.
> h3. Other files
> Apart from the segments, the {{segmentstore}} directory also contains
> following files:
> * repo.lock
> * journal.log
> * gc.log
> * manifest
> All these files are supported by the new SegmentNodeStorePersistence. Usually
> there's a simple interface (RepositoryLock, JournalLogFile, etc.) for
> handling the files.
> h3. TODO
> * The names and package locations for all the affected classes are subjects
> to change - after applying the patch the TarFiles doesn't deal with the .tar
> files anymore, similarly the TarReader and TarWriter delegates the low-level
> file access duties to the SegmentArchiveReader and Writer. I didn't want to
> change the names yet, to make it easier to understand and rebase the patch
> with the trunk changes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)