[ 
https://issues.apache.org/jira/browse/OAK-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296232#comment-16296232
 ] 

Chetan Mehrotra commented on OAK-7074:
--------------------------------------

With 1818634 now sorting uses distinct mode to avoid duplicates.

[~catholicon] mentioned in offline discussion that for duplicates we just need 
to ensure NodeStateEntries are unique per per. It does not matter for same path 
which entry is picked. Further document may appear more than once in a cursor 
traversal for one of the following cases

# Document was updated - If document gets updated then it may be moved around 
and thus may appear twice in natural order traversal. So while sorting we can 
still pick anyone as the NodeState view for the checkpoint revision would be 
same for both Mongo documents. 
# Document was moved due to internal design of Mongo - It may happen that Mongo 
may move around document without update (say due to some compaction process). 
In that case we are not sure on consistency gurantee of natural order traversal 
i.e. is it possible that document may not get reflected in cursor result at all 
if Mongo is in use?

So based on #1 we just need to ensure that sorting removes any duplicates

> Ensure that all Documents are read with document order traversal indexing
> -------------------------------------------------------------------------
>
>                 Key: OAK-7074
>                 URL: https://issues.apache.org/jira/browse/OAK-7074
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk, run
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>
>
> With OAK-6353 support was added for document order traversal indexing. In 
> this mode we open a DB cursor and try to read all documents from it using 
> document order traversal. Such a cursor may remain open for long time (2-4 
> hrs) and its possible that document may get reordered by the Mongo storage 
> engine. This would result in 2 aspects to be thought about 
> # Duplicate documents - Same document may appear more than once in result set 
> # Possibly missed document - It may be a possibility that a document got 
> moved and missed becoming part of cursor. 
> Both these aspects would need to be handled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to