[
https://issues.apache.org/jira/browse/OAK-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296232#comment-16296232
]
Chetan Mehrotra commented on OAK-7074:
--------------------------------------
With 1818634 now sorting uses distinct mode to avoid duplicates.
[~catholicon] mentioned in offline discussion that for duplicates we just need
to ensure NodeStateEntries are unique per per. It does not matter for same path
which entry is picked. Further document may appear more than once in a cursor
traversal for one of the following cases
# Document was updated - If document gets updated then it may be moved around
and thus may appear twice in natural order traversal. So while sorting we can
still pick anyone as the NodeState view for the checkpoint revision would be
same for both Mongo documents.
# Document was moved due to internal design of Mongo - It may happen that Mongo
may move around document without update (say due to some compaction process).
In that case we are not sure on consistency gurantee of natural order traversal
i.e. is it possible that document may not get reflected in cursor result at all
if Mongo is in use?
So based on #1 we just need to ensure that sorting removes any duplicates
> Ensure that all Documents are read with document order traversal indexing
> -------------------------------------------------------------------------
>
> Key: OAK-7074
> URL: https://issues.apache.org/jira/browse/OAK-7074
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: mongomk, run
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> With OAK-6353 support was added for document order traversal indexing. In
> this mode we open a DB cursor and try to read all documents from it using
> document order traversal. Such a cursor may remain open for long time (2-4
> hrs) and its possible that document may get reordered by the Mongo storage
> engine. This would result in 2 aspects to be thought about
> # Duplicate documents - Same document may appear more than once in result set
> # Possibly missed document - It may be a possibility that a document got
> moved and missed becoming part of cursor.
> Both these aspects would need to be handled
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)