[
https://issues.apache.org/jira/browse/OAK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chetan Mehrotra resolved OAK-1966.
----------------------------------
Resolution: Fixed
> Add Hint for selecting more performant index in MongoDocumentStore#query
> -------------------------------------------------------------------------
>
> Key: OAK-1966
> URL: https://issues.apache.org/jira/browse/OAK-1966
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: mongomk
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.1, 1.0.3
>
> Attachments: out.csv
>
>
> In MongoDocumentStore#query we make a call like
> bq. db.nodes.find({ _id: { $gt: "3:/content/foo/01/", $lt:
> "3:/content/foo010" }, _modified: { $gte: 1405085300 } }).sort({_id:1})
> Further we have indexes
> * {_id : 1}
> * {_modified : -1}
> Depending on scenario one of the two index would perform better
> * If very few changes have happened in the time interval then {{_modified}}
> would perform better
> * In other cases {{_id}} index would perform better
> Ideally this should be decided by Mongo Query planner but it seems that at
> times it is not making the right choice. For example getting plan for the
> query with same shape yields following results
> {noformat}
> A - planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:19046849
> nscannedObjects:19046849 scanAndOrder:1 keyUpdates:0 numYields:41359
> locks(micros) r:103894585 nreturned:2 reslen:1173 61334ms
> B - 2014-07-10T15:59:19.994-0400 [conn1365] query aem-author.nodes query:
> {$query: { _id:{ $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" },
> _modified:{ $gte: 1404245090 }}, $orderby:{ _id: 1 }} planSummary: IXSCAN{
> _modified: -1 } ntoreturn:0 ntoskip:0 nscanned:15626664
> nscannedObjects:15626664 scanAndOrder:1 keyUpdates:0 numYields:324016
> locks(micros) r:453960384 nreturned:1 reslen:582 972125ms
> C - 2014-07-11T15:22:44.579-0400 [conn1387] query aem-author.nodes query: {
> $query: { _id: { $gt: "4:/oak:index/uuid/:index/", $lt:
> "4:/oak:index/uuid/:index0" }, _modified: { $gte: 1405106530 } }, $orderby: {
> _id: 1 }, $hint: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoreturn:0
> ntoskip:0 nscanned:701631 nscannedObjects:701631 keyUpdates:0 numYields:42
> locks(micros) r:4471112 nreturned:17 reslen:6557 2540ms
> {noformat}
> So Mongo used a BasicCursor, _id index, _modified index in different runs.
> Now lets see whats the difference between time of query and _modified and
> nscanned
> * B - 15626664, 8 days, _modified - Should have used _id index as duration is
> too large
> * C - 701631, 34 sec, _id - Might have used modified index as duration to
> check for is less
> Mongo 2.6 uses heuristics to determine which plan to use. As mentioned in
> [SERVER-13866|https://jira.mongodb.org/browse/SERVER-13866]
> bq. Some background: to choose a query plan to use for a given query when
> multiple candidate plans exist, the query engine runs each candidate plan and
> then picks the plan that produced the most results during a trial period on a
> subset of the data to be scanned. The winning query plan is then cached, and
> used for subsequent queries of the same shape until the cache entry is
> invalidated (which happens under certain conditions, such as when the data
> distribution in the collection changes sufficiently or when the chosen query
> plan performs consistently much worse than it did during initial selection).
> So at times Mongo might make a right guess at times not! So we need to
> determine ways such that right index is used by Mongo to execute a given query
--
This message was sent by Atlassian JIRA
(v6.2#6252)