[
https://issues.apache.org/jira/browse/OAK-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236061#comment-16236061
]
Andrei Dulceanu commented on OAK-6888:
--------------------------------------
[~frm],
bq. When there are multiple sync cycles, the standby will eventually contain
every change committed on the primary, exactly like before.
I agree.
bq. Later on, when the content of the primary and the standby instance is
compared, the new head state is used instead
(8574c330-29ca-491a-a66e-b5b0d1b6b75e.0000000b). At this time, the background
flush operation is completed and the primary FileStore has a different
persisted head state than the standby.
Thinking more about this, I guess this is only an issue wrt to testability.
Suppose we have a primary and a standby attached to it and *the sync is running
for a limited time/limited no. of iterations*. How can we asses that after x
minutes/cycles everything on standby is on a par with primary? One option would
be to call {{primary#flush}} as we are doing now, I guess, but this could not
work for more complicated scenarios (e.g. OAK-6674).
Would it make sense to have some kind of "flush policy" set on the primary
which would allow us to better control {{tryFlush}} vs {{flush}}? This doesn't
need to be exposed, but only internally configurable in our tests.
/cc [~mduerig]
> Flushing the FileStore might return before data is persisted
> ------------------------------------------------------------
>
> Key: OAK-6888
> URL: https://issues.apache.org/jira/browse/OAK-6888
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: segment-tar
> Reporter: Francesco Mari
> Assignee: Francesco Mari
> Priority: Major
> Fix For: 1.8, 1.7.11
>
> Attachments: failure.txt
>
>
> The implementation of {{FileStore#flush}} might return before all the
> expected data is persisted on disk.
> The root cause of this behaviour is the implementation of
> {{TarRevisions#flush}}, which is too lenient when acquiring the lock for the
> journal file. If a background flush operation is in progress and a user calls
> {{FileStore#flush}}, that method will immediately return because the lock of
> the journal file is already owned by the background flush operation. The
> caller doesn't have the guarantee that everything committed before
> {{FileStore#flush}} is persisted to disk when the method returns.
> A fix for this problem might be to create an additional implementation of
> flush. The current implementation, needed for the background flush thread,
> will not be exposed to the users of {{FileStore}}. The new implementation of
> {{TarRevisions#flush}} should have stricter semantics and always guarantee
> that the persisted head contains everything visible to the user of
> {{FileStore}} before the flush operation was started.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)