[ 
https://issues.apache.org/jira/browse/OAK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060593#comment-14060593
 ] 

Chetan Mehrotra commented on OAK-1966:
--------------------------------------

After discussing with [~mreutegg] for now we can go with following approach

_Have a configured duration limit say 5 mins. So if  currentTime - modified  < 
5 mins would use _modified index in hint. Otherwise \_id_

Later we can have more educated means to determine this duration. This can be 
done on basis of
* Periodic queries to get count of modifications done in last 1, 5, 10 mins. If 
changes less than 10k then for upto that duration use _modified
* Monitor query time and periodically run explain and make use of that data 

Further we should later also add some JMX stats for Queries made to Mongo. Key 
stats we can capture
* Count of different types of queries we make. In most case we make 2-4 types 
of queries so specific counts can be captured
* Time taken in just query execution. Ignore time in cursor traversal
* Count of read queries made to primary vs secondary

> Add Hint for selecting more performant index in MongoDocumentStore#query 
> -------------------------------------------------------------------------
>
>                 Key: OAK-1966
>                 URL: https://issues.apache.org/jira/browse/OAK-1966
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.1, 1.0.3
>
>
> In MongoDocumentStore#query we make a call like
> bq. db.nodes.find({ _id: { $gt: "3:/content/foo/01/", $lt: 
> "3:/content/foo010" }, _modified: { $gte: 1405085300 } }).sort({_id:1})
> Further we have indexes
> * {_id : 1}
> * {_modified : -1}
> Depending on scenario one of the two index would perform better
> * If very few changes have happened in the time interval then {{_modified}} 
> would perform better
> * In other cases {{_id}} index would perform better
> Ideally this should be decided by Mongo Query planner but it seems that at 
> times it is not making the right choice. For example getting plan for the 
> query with same shape yields following results 
> {noformat}
> A - planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:19046849 
> nscannedObjects:19046849 scanAndOrder:1 keyUpdates:0 numYields:41359 
> locks(micros) r:103894585 nreturned:2 reslen:1173 61334ms
> B - 2014-07-10T15:59:19.994-0400 [conn1365] query aem-author.nodes query: 
> {$query: { _id:{ $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" }, 
> _modified:{ $gte: 1404245090 }}, $orderby:{ _id: 1 }} planSummary: IXSCAN{ 
> _modified: -1 } ntoreturn:0 ntoskip:0 nscanned:15626664 
> nscannedObjects:15626664 scanAndOrder:1 keyUpdates:0 numYields:324016 
> locks(micros) r:453960384 nreturned:1 reslen:582 972125ms
> C - 2014-07-11T15:22:44.579-0400 [conn1387] query aem-author.nodes query: { 
> $query: { _id: { $gt: "4:/oak:index/uuid/:index/", $lt: 
> "4:/oak:index/uuid/:index0" }, _modified: { $gte: 1405106530 } }, $orderby: { 
> _id: 1 }, $hint: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoreturn:0 
> ntoskip:0 nscanned:701631 nscannedObjects:701631 keyUpdates:0 numYields:42 
> locks(micros) r:4471112 nreturned:17 reslen:6557 2540ms 
> {noformat}
> So Mongo used a BasicCursor, _id index, _modified index in different runs. 
> Now lets see whats the difference between time of query and _modified and 
> nscanned
> * B - 15626664, 8 days, _modified - Should have used _id index as duration is 
> too large 
> * C - 701631, 34 sec, _id - Might have used modified index as duration to 
> check for is less
> Mongo 2.6 uses heuristics to determine which plan to use. As mentioned in 
> [SERVER-13866|https://jira.mongodb.org/browse/SERVER-13866]
> bq. Some background: to choose a query plan to use for a given query when 
> multiple candidate plans exist, the query engine runs each candidate plan and 
> then picks the plan that produced the most results during a trial period on a 
> subset of the data to be scanned. The winning query plan is then cached, and 
> used for subsequent queries of the same shape until the cache entry is 
> invalidated (which happens under certain conditions, such as when the data 
> distribution in the collection changes sufficiently or when the chosen query 
> plan performs consistently much worse than it did during initial selection).
> So at times Mongo might make a right guess at times not! So we need to 
> determine ways such that right index is used by Mongo to execute a given query



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to