[
https://issues.apache.org/jira/browse/OAK-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Francesco Mari updated OAK-3107:
--------------------------------
Attachment: OAK-3107-01.patch
The attached patch implements the encoding proposed in the description of the
issue. I don't know if this is the optimal solution to the problem, but it
shouldn't affect segments written with the previous version of the code while
still allowing to cope with large binary IDs.
> SegmentWriter should be able to store blob IDs longer than 4096 bytes
> ---------------------------------------------------------------------
>
> Key: OAK-3107
> URL: https://issues.apache.org/jira/browse/OAK-3107
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Reporter: Francesco Mari
> Attachments: OAK-3107-01.patch
>
>
> The {{SegmentWriter}} is able to store blob IDs that are no longer than 4096
> bytes, but some implementation of {{BlobStore}} may return blob IDs 4096
> bytes long (or more).
> It should be possible to use a different encoding for long blob IDs. The blob
> IDs should be written as a string (using {{SegmentWriter#writeString}}), and
> its reference ID embedded into a value record.
> The encoding in this case should be something like the following:
> {noformat}
> 11110 + 3bit + 3byte
> {noformat}
> where the three least significant bits of the first bytes are actually
> unused, and the three bytes are used to store the record ID of the string
> representing the blob ID.
> This new encoding is necessary to maintain backwards compatibility with the
> current way of storing blob IDs and to give a way to {{SegmentBlob}} to
> recognise this new encoding.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)