[
https://issues.apache.org/jira/browse/OAK-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Francesco Mari updated OAK-6915:
--------------------------------
Attachment: OAK-6915-diagnostics-02.patch
I'm not convinced about the third version of the patch and the usefulness of
{{SegmentId#unloaded}}.
In order to gather some data, I improved the diagnostics patch. The patch now
computes how many segments are written, so to have a feeling about the amount
of unique segments the {{FileStore}} has to work with. Additionally, every
statistics has been split for data and bulk segments. Since {{SegmentCache}}
behaves differently for data and bulk segments, it makes sense to gather
separate numbers.
I run {{StandbyTestIT#testSyncLoop}} with the diagnostic patch in place and
some variations.
|trunk|{noformat}TarMK data segment ID allocations: 54
TarMK bulk segment ID allocations: 4
TarMK uncached data segment reads: 99854
TarMK uncached bulk segment reads: 2
TarMK written data segments......: 30
TarMK written bulk segments......: 2
{noformat}|
|OAK-6915.patch|{noformat}TarMK data segment ID allocations: 56
TarMK bulk segment ID allocations: 4
TarMK uncached data segment reads: 101649
TarMK uncached bulk segment reads: 2
TarMK written data segments......: 32
TarMK written bulk segments......: 2{noformat}|
|OAK-6915-02.patch|{noformat}TarMK data segment ID allocations: 38
TarMK bulk segment ID allocations: 2
TarMK uncached data segment reads: 33
TarMK uncached bulk segment reads: 2
TarMK written data segments......: 29
TarMK written bulk segments......: 2{noformat}|
Even if we ignore the number of uncached data segment reads, the ratio between
data segment ID allocations and number of written data segments shows something
important: segment IDs are not as perfectly internalized as we thought. This
has two important consequences.
First, no matter if we call {{SegmentId#unloaded}}, there will probably be
another {{SegmentId}} out there with the same MSB/LSB that contains a reference
to the segment. Actually, as speculated in OAK-6919, {{SegmentCache}} itself
might be the culprit of this behaviour.
Second, {{SegmentCache}} should not depend on {{SegmentId}} to be perfectly
internalized. It is very easy to break this design assumption, and the
consequences of it are usually disastrous. It is, in my opinion, better to
implement a cache that doesn't work with this assumption in mind.
> Minimize the amount of uncached segment reads
> ---------------------------------------------
>
> Key: OAK-6915
> URL: https://issues.apache.org/jira/browse/OAK-6915
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segment-tar
> Reporter: Francesco Mari
> Assignee: Francesco Mari
> Fix For: 1.8, 1.7.12
>
> Attachments: OAK-6915-01.patch, OAK-6915-02.patch,
> OAK-6915-diagnostics-02.patch, OAK-6915-diagnostics.patch, OAK-6915.patch
>
>
> The current implementation of {{SegmentCache}} should make better use of the
> underlying Guava cache by relying on the cached segments instead of
> unconditionally performing an uncached segment read via the
> {{Callable<Segment>}} passed to {{SegmentCache#getSegment}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)