Chetan Mehrotra created OAK-1966:
------------------------------------

             Summary: Add Hint for selecting more performant index in 
MongoDocumentStore#query 
                 Key: OAK-1966
                 URL: https://issues.apache.org/jira/browse/OAK-1966
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: mongomk
            Reporter: Chetan Mehrotra
            Assignee: Chetan Mehrotra
             Fix For: 1.1, 1.0.3


In MongoDocumentStore#query we make a call like

bq. db.nodes.find({ _id: { $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" 
}, _modified: { $gte: 1405085300 } }).sort({_id:1})

Further we have indexes
* {_id : 1}
* {_modified : -1}

Depending on scenario one of the two index would perform better
* If very few changes have happened in the time interval then {{_modified}} 
would perform better
* In other cases {{_id}} index would perform better

Ideally this should be decided by Mongo Query planner but it seems that at 
times it is not making the right choice. For example getting plan for the query 
with same shape yields following results 

{noformat}
A - planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:19046849 
nscannedObjects:19046849 scanAndOrder:1 keyUpdates:0 numYields:41359 
locks(micros) r:103894585 nreturned:2 reslen:1173 61334ms
B - 2014-07-10T15:59:19.994-0400 [conn1365] query aem-author.nodes query: 
{$query: { _id:{ $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" }, 
_modified:{ $gte: 1404245090 }}, $orderby:{ _id: 1 }} planSummary: IXSCAN{ 
_modified: -1 } ntoreturn:0 ntoskip:0 nscanned:15626664 
nscannedObjects:15626664 scanAndOrder:1 keyUpdates:0 numYields:324016 
locks(micros) r:453960384 nreturned:1 reslen:582 972125ms
C - 2014-07-11T15:22:44.579-0400 [conn1387] query aem-author.nodes query: { 
$query: { _id: { $gt: "4:/oak:index/uuid/:index/", $lt: 
"4:/oak:index/uuid/:index0" }, _modified: { $gte: 1405106530 } }, $orderby: { 
_id: 1 }, $hint: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoreturn:0 
ntoskip:0 nscanned:701631 nscannedObjects:701631 keyUpdates:0 numYields:42 
locks(micros) r:4471112 nreturned:17 reslen:6557 2540ms 
{noformat}

So Mongo used a BasicCursor, _id index, _modified index in different runs. Now 
lets see whats the difference between time of query and _modified and nscanned

* B - 15626664, 8 days, _modified - Should have used _id index as duration is 
too large 
* C - 701631, 34 sec, _id - Might have used modified index as duration to check 
for is less

Mongo 2.6 uses heuristics to determine which plan to use. As mentioned in 
[SERVER-13866|https://jira.mongodb.org/browse/SERVER-13866]

bq. Some background: to choose a query plan to use for a given query when 
multiple candidate plans exist, the query engine runs each candidate plan and 
then picks the plan that produced the most results during a trial period on a 
subset of the data to be scanned. The winning query plan is then cached, and 
used for subsequent queries of the same shape until the cache entry is 
invalidated (which happens under certain conditions, such as when the data 
distribution in the collection changes sufficiently or when the chosen query 
plan performs consistently much worse than it did during initial selection).

So at times Mongo might make a right guess at times not! So we need to 
determine ways such that right index is used by Mongo to execute a given query



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to