[
https://issues.apache.org/jira/browse/OAK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024111#comment-15024111
]
Michael Dürig commented on OAK-1828:
------------------------------------
After the overly zealous synchronisation in {{SegmentWriter}} kept causing
deadlocks (OAK-2560, OAK-3179, OAK-3264 to name a few), I went ahead with a
[refactoring|https://github.com/mduerig/jackrabbit-oak/tree/OAK-1828].
Furthermore the work on OAK-3348 should also benefit from this as it opens ways
for segregating commits on different gc generations. I will follow up on
OAK-3348 re. this aspect when I get there.
The gist of the refactoring is [this
commit|https://github.com/mduerig/jackrabbit-oak/commit/e6c71ac2cf3524a6078a67b0bad0da430edfa378].
While superficially there are a lot of lines touched, the approach is quite
simple: instead of using a single shared writer to write segments, I use a pool
of such. This moves coordination from the writer itself to the pool.
Specifically: {{SegmentWriter}} itself does not maintain the state of a segment
any more. Instead it maintains a pool of {{SegmentBuilder}} instances, which
themselves each maintain the state of a segment. {{SegmentBuilder}} instances
are not thread safe themselves. The general patter now is: {{SegmentWriter}}
acquires a {{SegmentBuilder}} from the pool. This requires coordination, but no
"outside" calls are being made while holding a lock. Once granted, the
{{SegmentBuilder}} is owned exclusively by the {{SegmentWriter}} and can be
used without further synchronisation to write any content to the underlying
segment. Once done it is returned to the pool.
(I don't like the name {{SegmentBuilder}} btw. and would prefer
{{SegmentWriter}}. The latter rather writes records and should thus be called
{{RecordWriter}}. However such a renaming would most likely cause a lot of
confusion going forward. Maybe we can just find a better name for
{{SegmentBuilder}}.
My GitHub branch has a couple of [follow up
commits|https://github.com/mduerig/jackrabbit-oak/commits/OAK-1828] to the
above one. This is some fall out from the refactoring, not directly related to
this issue though. Nevertheless I think it improves the code and should be
includes.
[~alexparvulescu], [~frm], please have a look.
> Improved SegmentWriter
> ----------------------
>
> Key: OAK-1828
> URL: https://issues.apache.org/jira/browse/OAK-1828
> Project: Jackrabbit Oak
> Issue Type: Sub-task
> Components: segmentmk
> Reporter: Jukka Zitting
> Assignee: Alex Parvulescu
> Priority: Minor
> Labels: technical_debt
> Fix For: 1.3.12
>
>
> At about 1kLOC and dozens of methods, the SegmentWriter class currently a bit
> too complex for one of the key components of the TarMK. It also uses a
> somewhat non-obvious mix of synchronized and unsynchronized code to
> coordinate multiple concurrent threads that may be writing content at the
> same time. The synchronization blocks are also broader than what really would
> be needed, which in some cases causes unnecessary lock contention in
> concurrent write loads.
> To improve the readability and maintainability of the code, and to increase
> performance of concurrent writes, it would be useful to split part of the
> SegmentWriter functionality to a separate RecordWriter class that would be
> responsible for writing individual records into a segment. The
> SegmentWriter.prepare() method would return a new RecordWriter instance, and
> the higher-level SegmentWriter methods would use the returned instance for
> all the work that's currently guarded in synchronization blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)