[ 
https://issues.apache.org/jira/browse/OAK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-1966.
----------------------------------

    Resolution: Fixed

> Add Hint for selecting more performant index in MongoDocumentStore#query 
> -------------------------------------------------------------------------
>
>                 Key: OAK-1966
>                 URL: https://issues.apache.org/jira/browse/OAK-1966
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.1, 1.0.3
>
>         Attachments: out.csv
>
>
> In MongoDocumentStore#query we make a call like
> bq. db.nodes.find({ _id: { $gt: "3:/content/foo/01/", $lt: 
> "3:/content/foo010" }, _modified: { $gte: 1405085300 } }).sort({_id:1})
> Further we have indexes
> * {_id : 1}
> * {_modified : -1}
> Depending on scenario one of the two index would perform better
> * If very few changes have happened in the time interval then {{_modified}} 
> would perform better
> * In other cases {{_id}} index would perform better
> Ideally this should be decided by Mongo Query planner but it seems that at 
> times it is not making the right choice. For example getting plan for the 
> query with same shape yields following results 
> {noformat}
> A - planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:19046849 
> nscannedObjects:19046849 scanAndOrder:1 keyUpdates:0 numYields:41359 
> locks(micros) r:103894585 nreturned:2 reslen:1173 61334ms
> B - 2014-07-10T15:59:19.994-0400 [conn1365] query aem-author.nodes query: 
> {$query: { _id:{ $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" }, 
> _modified:{ $gte: 1404245090 }}, $orderby:{ _id: 1 }} planSummary: IXSCAN{ 
> _modified: -1 } ntoreturn:0 ntoskip:0 nscanned:15626664 
> nscannedObjects:15626664 scanAndOrder:1 keyUpdates:0 numYields:324016 
> locks(micros) r:453960384 nreturned:1 reslen:582 972125ms
> C - 2014-07-11T15:22:44.579-0400 [conn1387] query aem-author.nodes query: { 
> $query: { _id: { $gt: "4:/oak:index/uuid/:index/", $lt: 
> "4:/oak:index/uuid/:index0" }, _modified: { $gte: 1405106530 } }, $orderby: { 
> _id: 1 }, $hint: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoreturn:0 
> ntoskip:0 nscanned:701631 nscannedObjects:701631 keyUpdates:0 numYields:42 
> locks(micros) r:4471112 nreturned:17 reslen:6557 2540ms 
> {noformat}
> So Mongo used a BasicCursor, _id index, _modified index in different runs. 
> Now lets see whats the difference between time of query and _modified and 
> nscanned
> * B - 15626664, 8 days, _modified - Should have used _id index as duration is 
> too large 
> * C - 701631, 34 sec, _id - Might have used modified index as duration to 
> check for is less
> Mongo 2.6 uses heuristics to determine which plan to use. As mentioned in 
> [SERVER-13866|https://jira.mongodb.org/browse/SERVER-13866]
> bq. Some background: to choose a query plan to use for a given query when 
> multiple candidate plans exist, the query engine runs each candidate plan and 
> then picks the plan that produced the most results during a trial period on a 
> subset of the data to be scanned. The winning query plan is then cached, and 
> used for subsequent queries of the same shape until the cache entry is 
> invalidated (which happens under certain conditions, such as when the data 
> distribution in the collection changes sufficiently or when the chosen query 
> plan performs consistently much worse than it did during initial selection).
> So at times Mongo might make a right guess at times not! So we need to 
> determine ways such that right index is used by Mongo to execute a given query



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to