[jira] [Updated] (OAK-4291) FileStore.flush prone to races leading to corruption

JIRA Wed, 01 Jun 2016 09:18:21 -0700

     [ 
https://issues.apache.org/jira/browse/OAK-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Dürig updated OAK-4291:
-------------------------------
    Attachment: OAK_4291.patch

Attaching [^OAK_4291.patch] with a different approach making 
{{SegmentBufferWriterPool#flush}} synchronous again. However in a way that 
should not lead to the deadlock we have previously seen as the actual flush is 
still done without holding a lock. The approach waits for the currently busy 
(borrowed) writers to become available. So flush should only block as long 
until all currently in progress writes terminate. As long as flush is not 
called from *within* a write operation (which seems like a bad idea anyway) 
there should be no deadlock. 

[~frm], could you give this one a thorough review wrt. to progress, mutual 
exclusion, races, liveliness etc. ? I will then try to come up with some test 
coverage. 

> FileStore.flush prone to races leading to corruption
> ----------------------------------------------------
>
>                 Key: OAK-4291
>                 URL: https://issues.apache.org/jira/browse/OAK-4291
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>            Priority: Critical
>              Labels: resilience
>             Fix For: 1.6
>
>         Attachments: OAK_4291.patch
>
>
> There is a small window in {{FileStore.flush}} that could lead to data 
> corruption: if we crash right after setting the persisted head but before any 
> delay-flushed {{SegmentBufferWriter}} instance flushes (see 
> {{SegmentBufferWriterPool.returnWriter()}}) then that data is lost although 
> it might already be referenced from the persisted head.
> We need to come up with a test case for this. 
> A possible fix would be to return a future from {{SegmentWriter.flush}} and 
> rely on a completion callback. Such a change would most likely also be useful 
> for OAK-3690. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (OAK-4291) FileStore.flush prone to races leading to corruption

Reply via email to