[ 
https://issues.apache.org/jira/browse/OAK-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031154#comment-16031154
 ] 

Francesco Mari commented on OAK-4582:
-------------------------------------

The first part of the refactoring is now open for review on GitHub. This is 
what I've done so far:

* {{RawRecordWriter}} and {{RawRecordReader}} have been created to respectively 
serialize records into and deserialize records (or parts of a record) from 
anything that can satisfy a very small set of interfaces. The interfaces these 
two classes rely on abstract the primitive operations that we need to write 
records into segments.
* The interfaces required by {{RawRecordReader}} and {{RawRecordWriter}} are 
currently implemented by {{RecordReader}} and {{SegmentBufferWriter}}. The code 
currently uses the new serializing/deserializing logic but didn't switch the 
way segments are represented (read below).
* Many tests have been introduced to test the serialization/deserialization of 
records at the lowest possible granularity level. If a single bit would flip 
somewhere, these tests will be able to pinpoint in which kind of record and in 
which field of that record. Moreover, no file system is required to run these 
tests.
* {{SegmentReader}} and {{SegmentWriter}} have been introduced. These classes 
represent respectively a read-only and a read-write segment, but they are not 
currently used in production code. Enough tests have been written, though, to 
prove that their behaviour is identical to the one of the classes that 
currently serialize/deserialize segments.

There is more stuff still to do:

* {{SegmentReader}} and {{SegmentWriter}} need to be used as the lower level 
abstraction to read and write segments. This should be easy, since most of the 
code uses {{RawRecordWriter}} and {{RawRecordReader}} directly, which work 
seamlessly on top of {{SegmentReader}} and {{SegmentWriter}}.
* Modify the rest of the production code to always use {{SegmentAccess}} when 
possible, to be able to hide if a read-only or a read-write segment is used. 
This point requires a little bit of thinking, and I'm not sure at the moment 
how to tackle this and how invasive this change will be.

[~mduerig], in the meantime, it would be great if you could go through my 
changes. I will be happy to clarify either here or in GitHub.

> Split Segment in a read-only and a read-write implementations
> -------------------------------------------------------------
>
>                 Key: OAK-4582
>                 URL: https://issues.apache.org/jira/browse/OAK-4582
>             Project: Jackrabbit Oak
>          Issue Type: Technical task
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>              Labels: technical_debt
>             Fix For: 1.8
>
>         Attachments: benchmark-01.png, benchmark-01.txt
>
>
> {{Segment}} is central to the working of the Segment Store, but it currently 
> serves two purposes:
> # It is a temporary storage location for the currently written segment, 
> waiting to be full and flushed to disk.
> # It is a way to parse serialzed segments read from disk.
> To distinguish these two use cases, I suggest to promote {{Segment}} to the 
> status of interface, and to create two different implementations for a 
> read-only and a read-write segments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to