[ 
https://issues.apache.org/jira/browse/OAK-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-4104:
-------------------------------
    Description: 
We should refactor how records (e.g. node states) are read from segments. 
Currently this is scattered and replicated across various places. All of which 
hard coding certain indexes into a byte buffer (see calls to 
{{Record.getOffset}} for how bad this is). 
The current implementation makes it very hard to maintain the code and evolve 
the segment format. We should optimally have one place per segment version 
defining the format as a single source of truth which is then reused by the 
various parts in of the SegmentMK, tooling and tests. 

We should also evaluate 3rd party data serialisation libraries, which could 
make our lives easier. Focus should be on ease of use, separation of concerns 
(schema vs. implementation), compactness of format, efficient en/decoding, 
support for schema evolution. Possible candidates include [protocol 
buffers|https://developers.google.com/protocol-buffers/] and [Apache 
Avro|http://avro.apache.org/]. 

  was:
We should refactor how records (e.g. node states) are read from segments. 
Currently this is scattered and replicated across various places. All of which 
hard coding certain indexes into a byte buffer (see calls to 
{{Record.getOffset}} for how bad this is). 
The current implementation makes it very hard to maintain the code and evolve 
the segment format. We should optimally have one place per segment version 
defining the format as a single source of truth which is then reused by the 
various parts in of the SegmentMK, tooling and tests. 


> Refactor reading records from segments
> --------------------------------------
>
>                 Key: OAK-4104
>                 URL: https://issues.apache.org/jira/browse/OAK-4104
>             Project: Jackrabbit Oak
>          Issue Type: Technical task
>          Components: segment-tar
>            Reporter: Michael Dürig
>              Labels: technical_debt
>             Fix For: 1.8
>
>
> We should refactor how records (e.g. node states) are read from segments. 
> Currently this is scattered and replicated across various places. All of 
> which hard coding certain indexes into a byte buffer (see calls to 
> {{Record.getOffset}} for how bad this is). 
> The current implementation makes it very hard to maintain the code and evolve 
> the segment format. We should optimally have one place per segment version 
> defining the format as a single source of truth which is then reused by the 
> various parts in of the SegmentMK, tooling and tests. 
> We should also evaluate 3rd party data serialisation libraries, which could 
> make our lives easier. Focus should be on ease of use, separation of concerns 
> (schema vs. implementation), compactness of format, efficient en/decoding, 
> support for schema evolution. Possible candidates include [protocol 
> buffers|https://developers.google.com/protocol-buffers/] and [Apache 
> Avro|http://avro.apache.org/]. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to