Chetan Mehrotra created OAK-1966:
------------------------------------
Summary: Add Hint for selecting more performant index in
MongoDocumentStore#query
Key: OAK-1966
URL: https://issues.apache.org/jira/browse/OAK-1966
Project: Jackrabbit Oak
Issue Type: Improvement
Components: mongomk
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Fix For: 1.1, 1.0.3
In MongoDocumentStore#query we make a call like
bq. db.nodes.find({ _id: { $gt: "3:/content/foo/01/", $lt: "3:/content/foo010"
}, _modified: { $gte: 1405085300 } }).sort({_id:1})
Further we have indexes
* {_id : 1}
* {_modified : -1}
Depending on scenario one of the two index would perform better
* If very few changes have happened in the time interval then {{_modified}}
would perform better
* In other cases {{_id}} index would perform better
Ideally this should be decided by Mongo Query planner but it seems that at
times it is not making the right choice. For example getting plan for the query
with same shape yields following results
{noformat}
A - planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:19046849
nscannedObjects:19046849 scanAndOrder:1 keyUpdates:0 numYields:41359
locks(micros) r:103894585 nreturned:2 reslen:1173 61334ms
B - 2014-07-10T15:59:19.994-0400 [conn1365] query aem-author.nodes query:
{$query: { _id:{ $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" },
_modified:{ $gte: 1404245090 }}, $orderby:{ _id: 1 }} planSummary: IXSCAN{
_modified: -1 } ntoreturn:0 ntoskip:0 nscanned:15626664
nscannedObjects:15626664 scanAndOrder:1 keyUpdates:0 numYields:324016
locks(micros) r:453960384 nreturned:1 reslen:582 972125ms
C - 2014-07-11T15:22:44.579-0400 [conn1387] query aem-author.nodes query: {
$query: { _id: { $gt: "4:/oak:index/uuid/:index/", $lt:
"4:/oak:index/uuid/:index0" }, _modified: { $gte: 1405106530 } }, $orderby: {
_id: 1 }, $hint: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoreturn:0
ntoskip:0 nscanned:701631 nscannedObjects:701631 keyUpdates:0 numYields:42
locks(micros) r:4471112 nreturned:17 reslen:6557 2540ms
{noformat}
So Mongo used a BasicCursor, _id index, _modified index in different runs. Now
lets see whats the difference between time of query and _modified and nscanned
* B - 15626664, 8 days, _modified - Should have used _id index as duration is
too large
* C - 701631, 34 sec, _id - Might have used modified index as duration to check
for is less
Mongo 2.6 uses heuristics to determine which plan to use. As mentioned in
[SERVER-13866|https://jira.mongodb.org/browse/SERVER-13866]
bq. Some background: to choose a query plan to use for a given query when
multiple candidate plans exist, the query engine runs each candidate plan and
then picks the plan that produced the most results during a trial period on a
subset of the data to be scanned. The winning query plan is then cached, and
used for subsequent queries of the same shape until the cache entry is
invalidated (which happens under certain conditions, such as when the data
distribution in the collection changes sufficiently or when the chosen query
plan performs consistently much worse than it did during initial selection).
So at times Mongo might make a right guess at times not! So we need to
determine ways such that right index is used by Mongo to execute a given query
--
This message was sent by Atlassian JIRA
(v6.2#6252)