[
https://issues.apache.org/jira/browse/OAK-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Dürig updated OAK-4291:
-------------------------------
Attachment: OAK_4291.patch
Attaching [^OAK_4291.patch] with a different approach making
{{SegmentBufferWriterPool#flush}} synchronous again. However in a way that
should not lead to the deadlock we have previously seen as the actual flush is
still done without holding a lock. The approach waits for the currently busy
(borrowed) writers to become available. So flush should only block as long
until all currently in progress writes terminate. As long as flush is not
called from *within* a write operation (which seems like a bad idea anyway)
there should be no deadlock.
[~frm], could you give this one a thorough review wrt. to progress, mutual
exclusion, races, liveliness etc. ? I will then try to come up with some test
coverage.
> FileStore.flush prone to races leading to corruption
> ----------------------------------------------------
>
> Key: OAK-4291
> URL: https://issues.apache.org/jira/browse/OAK-4291
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: segment-tar
> Reporter: Michael Dürig
> Assignee: Michael Dürig
> Priority: Critical
> Labels: resilience
> Fix For: 1.6
>
> Attachments: OAK_4291.patch
>
>
> There is a small window in {{FileStore.flush}} that could lead to data
> corruption: if we crash right after setting the persisted head but before any
> delay-flushed {{SegmentBufferWriter}} instance flushes (see
> {{SegmentBufferWriterPool.returnWriter()}}) then that data is lost although
> it might already be referenced from the persisted head.
> We need to come up with a test case for this.
> A possible fix would be to return a future from {{SegmentWriter.flush}} and
> rely on a completion callback. Such a change would most likely also be useful
> for OAK-3690.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)