[ 
https://issues.apache.org/jira/browse/OAK-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235631#comment-16235631
 ] 

Andrei Dulceanu commented on OAK-6888:
--------------------------------------

[~frm], I was wondering how the current version of the patch solves the general 
case, since currently the {{TarMK flush}} thread still uses {{tryFlush}}. In 
{{DataStoreTestBase#testSync}} and {{DataStoreTestBase#testSyncBigBlob}} there 
are calls to {{primary.flush()}} before the sync happens. This forces the flush 
on the primary, before that single client sync run. 

My question is: how are we protecting ourselves from scenarios like the one you 
already described, if we still use {{FileStore#tryFlush}} in the {{TarMK 
flush}} thread? Doesn't this defeat the whole purpose of this fix? OTOH, 
waiting for every flush to succeed (w, w/o a cold standby attached) will have a 
tremendous impact on performance, right?

/cc [~mduerig]

> Flushing the FileStore might return before data is persisted
> ------------------------------------------------------------
>
>                 Key: OAK-6888
>                 URL: https://issues.apache.org/jira/browse/OAK-6888
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>            Priority: Major
>             Fix For: 1.8, 1.7.11
>
>         Attachments: failure.txt
>
>
> The implementation of {{FileStore#flush}} might return before all the 
> expected data is persisted on disk. 
> The root cause of this behaviour is the implementation of 
> {{TarRevisions#flush}}, which is too lenient when acquiring the lock for the 
> journal file. If a background flush operation is in progress and a user calls 
> {{FileStore#flush}}, that method will immediately return because the lock of 
> the journal file is already owned by the background flush operation. The 
> caller doesn't have the guarantee that everything committed before 
> {{FileStore#flush}} is persisted to disk when the method returns. 
> A fix for this problem might be to create an additional implementation of 
> flush. The current implementation, needed for the background flush thread, 
> will not be exposed to the users of {{FileStore}}. The new implementation of 
> {{TarRevisions#flush}} should have stricter semantics and always guarantee 
> that the persisted head contains everything visible to the user of 
> {{FileStore}} before the flush operation was started.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to