[ 
https://issues.apache.org/jira/browse/OAK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024111#comment-15024111
 ] 

Michael Dürig commented on OAK-1828:
------------------------------------

After the overly zealous synchronisation in {{SegmentWriter}} kept causing 
deadlocks (OAK-2560, OAK-3179, OAK-3264 to name a few), I went ahead with a 
[refactoring|https://github.com/mduerig/jackrabbit-oak/tree/OAK-1828]. 
Furthermore the work on OAK-3348 should also benefit from this as it opens ways 
for segregating commits on different gc generations. I will follow up on 
OAK-3348 re. this aspect when I get there. 

The gist of the refactoring is [this 
commit|https://github.com/mduerig/jackrabbit-oak/commit/e6c71ac2cf3524a6078a67b0bad0da430edfa378].
 While superficially there are a lot of lines touched, the approach is quite 
simple: instead of using a single shared writer to write segments, I use a pool 
of such. This moves coordination from the writer itself to the pool. 
Specifically: {{SegmentWriter}} itself does not maintain the state of a segment 
any more. Instead it maintains a pool of {{SegmentBuilder}} instances, which 
themselves each maintain the state of a segment. {{SegmentBuilder}} instances 
are not thread safe themselves. The general patter now is: {{SegmentWriter}} 
acquires a {{SegmentBuilder}} from the pool. This requires coordination, but no 
"outside" calls are being made while holding a lock. Once granted, the 
{{SegmentBuilder}} is owned exclusively by the {{SegmentWriter}} and can be 
used without further synchronisation to write any content to the underlying 
segment. Once done it is returned to the pool. 
(I don't like the name {{SegmentBuilder}} btw. and would prefer 
{{SegmentWriter}}. The latter rather writes records and should thus be called 
{{RecordWriter}}. However such a renaming would most likely cause a lot of 
confusion going forward. Maybe we can just find a better name for 
{{SegmentBuilder}}.

My GitHub branch has a couple of [follow up 
commits|https://github.com/mduerig/jackrabbit-oak/commits/OAK-1828] to the 
above one. This is some fall out from the refactoring, not directly related to 
this issue though. Nevertheless I think it improves the code and should be 
includes. 

[~alexparvulescu], [~frm], please have a look. 



> Improved SegmentWriter
> ----------------------
>
>                 Key: OAK-1828
>                 URL: https://issues.apache.org/jira/browse/OAK-1828
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: segmentmk
>            Reporter: Jukka Zitting
>            Assignee: Alex Parvulescu
>            Priority: Minor
>              Labels: technical_debt
>             Fix For: 1.3.12
>
>
> At about 1kLOC and dozens of methods, the SegmentWriter class currently a bit 
> too complex for one of the key components of the TarMK. It also uses a 
> somewhat non-obvious mix of synchronized and unsynchronized code to 
> coordinate multiple concurrent threads that may be writing content at the 
> same time. The synchronization blocks are also broader than what really would 
> be needed, which in some cases causes unnecessary lock contention in 
> concurrent write loads.
> To improve the readability and maintainability of the code, and to increase 
> performance of concurrent writes, it would be useful to split part of the 
> SegmentWriter functionality to a separate RecordWriter class that would be 
> responsible for writing individual records into a segment. The 
> SegmentWriter.prepare() method would return a new RecordWriter instance, and 
> the higher-level SegmentWriter methods would use the returned instance for 
> all the work that's currently guarded in synchronization blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to