[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192596#comment-16192596 ] Vikas Saurabh commented on OAK-937: --- What I meant (or how I read Chetan's comment) was: since we are not "falling back to untagged indices to answer a query contains a tag"; thus, we should also not pick traversal to answer such queries as traversal index is also not tagged. To allow a query fall to traveral, maybe we have a system-tag that represents a tag for traversal index. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192583#comment-16192583 ] Thomas Mueller commented on OAK-937: [~catholicon] I'm sorry I'm afraid I don't understand... What is your use case? The main use case I see for "option(index...)" is: there are multiple fulltext indexes with different aggregation rules. Now a query should use one specific index. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180589#comment-16180589 ] Vikas Saurabh commented on OAK-937: --- I think the argument was extension of: bq. For the case a query uses a tag, but no index has that tag (or: all indexes with this tag doesn't know how to deal with that query) So, maybe we can have special tag that allows traversal for the specific query!? > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180505#comment-16180505 ] Thomas Mueller commented on OAK-937: I understand for unit tests is problematic that traversal can be used sometimes (specially to assert that a specific index is used). For that, we have a way to disable traversal during tests, QueryImpl.traversalEnabled / QueryEngineImpl.setTraversalEnabled. This is used in AbstractQueryTest. > I think if tags are provided then traversal index is not used at all. You mean, "traversal index should not be used"? Yes, this could be done. But I would prefer to add a hint "option(traversal disabled)", that might be easier to use. Traversal can be used if traversal cost is very low (lower than the cost of an index). For example traversing just one node (issamenode), or very few nodes (descendant nodes, but the counter index reports a cost lower than the index). But no warnings should be logged in this case. If there are warnings logged, then we need to investigate and fix that. So, what was the query and the warnings? > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179121#comment-16179121 ] Vikas Saurabh commented on OAK-937: --- Continuing with the thought that we decided to not consider indices which don't support the explicitly supplied tags - I also feel that traversal should not be allowed for tagged queries. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178972#comment-16178972 ] Chetan Mehrotra commented on OAK-937: - bq. Currently no way to disable traversal that way, if traversal cost is very low. To disable traversal, disable the counter index, so that traversal cost is very high; but this is global. Today I struggled forcing use of specific index in HybridIndex benchmark as somehow it was deciding to use traversal and log lots of warning that some index should be defined. I think if tags are provided then traversal index is not used at all. May be this is due to way its done currently where index implementations "opt out" from planning if tags are not matching. Would not it be better if QueryEngine does not consult indexes where matching tags are not found. Or at least for traversal if plan indicates its via tag then traversal is not used [~catholicon] [~tmueller] Thoughts? > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16132117#comment-16132117 ] Thomas Mueller commented on OAK-937: > `index prefer tag x` could have some use-case... Yes, that's true. Right now, cost estimation is very inaccurate. We can improve that, and should spend so time on that. Sure, we are somewhat limited by what Lucene can do efficiently, and it's not clear to me how fast certain operations are. If needed, we could maintain our own simple statistics (e.g. estimated number of distinct entries for a property). But histograms, and statistics for fulltext queries, this we shouldn't try to do ourselves. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16132107#comment-16132107 ] Thomas Mueller commented on OAK-937: http://svn.apache.org/r1805407 (trunk). Added tests (forgot them previously), and restrict tag names to strictly a-zA-Z0-9_. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16132099#comment-16132099 ] Vikas Saurabh commented on OAK-937: --- bq. I wouldn't use "-", even thought that works right now (by accident I guess). At some point we might want to change the parser to support + - * / and other operations. Well, I'd be happy to just document and not extend parser to support more characters. bq. So the current behavior is "fail fast" (well, relatively fast). You propose to not fail the query, but use a different index. I think failing the query is actually better: it indicates something is not as expected. Maybe a typo. Maybe refresh was not set. Maybe forgot to add the index. I assume if one specifies a tag in the query, then the given index(es) are used, and not behind the scenes maybe a different one. Ack. And I agree that I falling down untagged indices would be too magical. I'd recant my thought for that fallback. {quote} > better chance to win That would be tweaking the cost in favour of some index(es). That could be done, but in this case I would probably use a different syntax, maybe "option(index prefer tag x)". I suggest we wait implementing this right now. {quote} Kind of moot with my recanted thought above :). But, maybe {{index prefer tag x}} could have some use-case I agree that it should be tracked separately. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_4, candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16132091#comment-16132091 ] Thomas Mueller commented on OAK-937: > support both by name and by tag lookup [~chetanm] that's true. I don't want to document "by name" right now, because I think I want to remove this. There is a relatively large risk that people hardcode index names in the code, and then can not easily migrate to other indexes (for example, combine two indexes later on, or switch to Solr). The only place where index name might be needed is for the nodetype index, as I found it hard to support tags there (the nodetype index is a "virtual" index and not tight to an index node). The "index name" is (for Lucene and property indexes) the name of the index definition node. You can specify both an index name and a tag, and all indexes where either one matches can be used. > refresh to true ... relevant for 1.6 onwards [~catholicon] OK. My plan is to backport to 1.6, not sure yet about earlier Oak versions yet. The good thing is, even if someone sets refresh in an earlier version, nothing bad will happen (it's just ignored). > maybe, we should clarify how tags look like I will document that only a-zA-Z0-9_ should be used. That makes sense. I didn't test with special characters so far, I think the limit is in the parser. I wouldn't use "-", even thought that works right now (by accident I guess). At some point we might want to change the parser to support + - * / and other operations. By the way, internally this is implemented using property restrictions. So you can see this in the index plan as follows: {noformat} explain //* option(index tag helloWorld) ... cost using filter Filter(query= explain select [jcr:path], [jcr:score], * from [nt:base] as a option(index tag [helloWorld]) /* xpath: //* option(index tag helloWorld) */, path=*, property=[:indexTag=[helloWorld]]) {noformat} > I wonder if no 'tagged' indices could answer the query, then instead of > falling down to traversal For the case a query uses a tag, but no index has that tag (or: all indexes with this tag doesn't know how to deal with that query). Right now, it will use traversal. That means, you will get a traversal warning, and the query will probably fail (if traversed too many nodes, and if configured to fail immediately). So the current behavior is "fail fast" (well, relatively fast). You propose to not fail the query, but use a different index. I think failing the query is actually better: it indicates something is not as expected. Maybe a typo. Maybe refresh was not set. Maybe forgot to add the index. I assume if one specifies a tag in the query, then the given index(es) are used, and not behind the scenes maybe a different one. > better chance to win That would be tweaking the cost in favour of some index(es). That could be done, but in this case I would probably use a different syntax, maybe "option(index prefer tag x)". I suggest we wait implementing this right now. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_4, candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16132049#comment-16132049 ] Vikas Saurabh commented on OAK-937: --- [~tmueller], few minor comments for docs: bq. For indexes of type lucene, when adding adding or changing the property tags, you need to also set the property refresh to true (Boolean), so that the change is applied. No indexing is required. I think that's relevant for 1.6 onwards. Maybe we should specify that. Maybe, we should clarify how tags look like: maybe, no special characters barring hypher ({{-}})... or other such restrictions. Also, I wonder if no 'tagged' indices could answer the query, then instead of falling down to traversal, we might still want to given all indices fight it out (I don't mean compare-tagged-first-and-then-run-untagged-comparison... just that maybe instead of outright disallowing non-participating-indices, we just change comparison of cost to give tag-match a better chance to win). > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_4, candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131978#comment-16131978 ] Chetan Mehrotra commented on OAK-937: - [~tmueller] Just from initial look at changes in IndexPlanner I see we support both by name and by tag lookup. So to confirm there are 2 supported ways? Also should by name be i.e. by indexPath > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_4, candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131974#comment-16131974 ] Thomas Mueller commented on OAK-937: [~catholicon], [~chetanm], [~teofili] this is now [documented here|http://jackrabbit.apache.org/oak/docs/query/query-engine.html#Query_Option_Index_Tag]. Could you please review this, specially syntax, limitations. An open question is if this feature should be backported, and if yes to which versions. I would probably wait until it is well tested, and then backport to 1.6, and maybe 1.4 (if it's not too much work). > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_4, candidate_oak_1_6, performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131813#comment-16131813 ] Thomas Mueller commented on OAK-937: http://svn.apache.org/r1805368 (trunk) * Add "tags" (multi-valued String property) to the indexes of choice, for example "tags = [x, y]" * Indexes support multiple tags * Syntax (both XPath and SQL-2): option(index tag x) * The query supports one tag only * The query will only consider the indexes that contain the specified tag (that is, possibly multiple) * Currently supported for property indexes and Lucene indexes Limitations: * Partial support for nodetype index: if a tag is specified, but doesn't support tags itself * Not supported by Solr indexes, reference index, so they might still return a low cost * Currently no way to disable traversal that way, if traversal cost is very low. To disable traversal, disable the counter index, so that traversal cost is very high; but this is global. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: performance > Fix For: 1.8, 1.7.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120994#comment-16120994 ] Chetan Mehrotra commented on OAK-937: - bq. For example, each index can have a multi-valued property "tags". Then a query can specify "option(index tag )". +1. This allows customer to bind to specific index or enable QE to select from a set of indexes. [~catholicon] Regarding the aggregate - There are other cases also like custom synonyms, analyzer configured for same nodetype. So its best to do selection at index level instead. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: performance > Fix For: 1.8 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120750#comment-16120750 ] Vikas Saurabh commented on OAK-937: --- While I like the idea of providing tag-based-index hints (with a minor improvement could be pick a set of tags - "option(index tag ,") ); but bq. The main problem I want to address with this issue is: there are multiple Lucene index configurations, with different aggregation rules. I think this particular problem might be solved by doing indirection inside index def itself. e.g. {noformat} + /aggregates// + useCase1/ - oak:aggregateClassifier = true + + useCase2/ - oak:aggregateClassifier = true + + {noformat} ... and extend {{contains()}} clause to potentially choose nothing (all aggregates participate) or a subset of classifiers. The reason I'd want to solve multiple use-cases of aggregation/nodeScopeIndex this way is to still hold the convention that we have one index for a particular type - that, imo, makes people think more about index design and also provides a clearer view right away from index definitions (yes, tag approach would also work... but to me humans are worse at indirection than computers) > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: performance > Fix For: 1.8 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116614#comment-16116614 ] Thomas Mueller commented on OAK-937: > my concern is that it can be easily misused I fully agree. It is a bit similar to "option(traversal ok)": it's easy to add that to the query, and if one does that, then such queries are not detected easily to be potentially slow. Some other dangerous features we have are "includedPaths" and "excludedPaths" in a Lucene index. > mark some features as experimental or expert Sure, we need to do that. But even if we do, there is still a risk. The main problem I want to address with this issue is: there are multiple Lucene index configurations, with different aggregation rules. And then there is a query that uses "contains(., '')". How can you ensure the right index is used (the one with the aggregation rule you care about)? The implementation I have so far is experimental, and I didn't document it on purpose, because I don't consider this the final design. It is mainly to allow testing if and how this works. I would like to discuss how to best solve the problem. In my view, instead of "option(index abc)", which hardcodes exactly _one_ specific index, I think it's better to allow using a group of indexes. For example, each index can have a multi-valued property "tags". Then a query can specify "option(index tag )". That way, a query can potentially use multiple indexes (those that have the given tag). When adding a new index, queries don't have to be changed, instead the new index needs to define the right tags. This is the approach "All problems in computer science can be solved by another level of indirection". [~catholicon] [~chetanm] [~teofili] what do you think? > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: performance > Fix For: 1.8 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107080#comment-16107080 ] Tommaso Teofili commented on OAK-937: - I think this is a somewhat sensitive feature, my concern is that it can be easily misused, as it exposes implementation details in a sort of API (the query language syntax). On the other hand I have seen cases where users would have liked to have such syntax. At the moment IMHO we are in a sort of middle ground where users can / have to know how to define and use indexes but we also suggest not to touch certain configurations or reindex in most of the cases. Probably we should decide whether users should be caring about indexes or not, or at least mark some features as experimental or expert so that users are less unaware that they could shoot themselves in their feet. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: performance > Fix For: 1.8 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104930#comment-16104930 ] Thomas Mueller commented on OAK-937: http://svn.apache.org/r1803272 (trunk) support for "option(index abc)" (missing: tests, documentation, and further options) > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller >Priority: Critical > Labels: performance > Fix For: 1.8 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051608#comment-16051608 ] Thomas Mueller commented on OAK-937: I think it makes sense to implement this, mainly because different fulltext indexes (depending on the aggregate configuration) can return different results. Let's say there is a fulltext index on "Page" and another on "hierarchyNode" (supertype of Page), with a different aggregate definition. For queries on Page, in theory both indexes can be used. Instead of hardcoding the exact index to be used, I suggest we use "index tags" (multi-value property "tags" in the index definition). Then the query can use "option(index tagged 'x')" to ensure only indexes with this tag are used. (could be one index only, could be multiple). That way the actual index name doesn't need to be hardcoded in the query. Also, there should be a way to exclude certain indexes, for example using "option(exclude index tagged 'x')" or possibly also "option(exclude index async)" to exclude async index usage. Another option would be to configure the index to respond / not respond to certain queries. But the risk is that we have to hardcode the actual query, which is not a good idea. And configuring query "shapes" is problematic as well, as queries might change. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Alex Deparvu >Assignee: Thomas Mueller > Labels: performance > Fix For: 1.8 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653545#comment-15653545 ] Thomas Mueller commented on OAK-937: Now that we have OAK-4888, I suggest we wait and see how that works, and depending on that either we close it as won't fix, or fix it later. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, query >Reporter: Alex Parvulescu >Assignee: Thomas Mueller >Priority: Minor > Labels: performance > Fix For: 1.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588652#comment-15588652 ] Thomas Mueller commented on OAK-937: A "option(...)" mechanism (similar to MS SQL Server) was introduced in OAK-4888. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, query >Reporter: Alex Parvulescu >Assignee: Thomas Mueller >Priority: Minor > Labels: performance > Fix For: 1.6 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316066#comment-14316066 ] Davide Giannella commented on OAK-937: -- +1 for SQL extensions. > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, query >Reporter: Alex Parvulescu >Priority: Minor > Fix For: 1.2 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315775#comment-14315775 ] Chetan Mehrotra commented on OAK-937: - bq. For XPath and SQL, the "hint" could be an extension of the syntax. I would prefer that over a comment, as it allows for stricter parsing Makes sense. I was suggesting hint via comment route but it is only useful for pure SQL case where we need to write portable SQL queries which can run across multiple Database having different dialects. For Oak case the query can run only on Oak based JCR deployments and therefore code portability is not required > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, query >Reporter: Alex Parvulescu >Priority: Minor > Fix For: 1.2 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315726#comment-14315726 ] Thomas Mueller commented on OAK-937: I prefer extending the query language, instead of using a new API. For XPath and SQL, the "hint" could be an extension of the syntax. I would prefer that over a comment, as it allows for stricter parsing. Example: {noformat} //*[@jcrModified] = 1] index jcrModified select * from [nt:base] index [jcrModified] where [jcrModified] = 1 {noformat} See also: Oracle: http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm#i4852 SQLite: https://www.sqlite.org/lang_indexedby.html If the index is not available, the default should be that the query fail with an exception that says the index is missing. We can also support "optional" indexes (use the given index if it is available), but I would use a different (longer) syntax for that: {noformat} //*[@jcrModified] = 1] index jcrModified optional {noformat} > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, query >Reporter: Alex Parvulescu >Priority: Minor > Fix For: 1.2 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint
[ https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314096#comment-14314096 ] Chetan Mehrotra commented on OAK-937: - With support for comments (OAK-2354) we can possibly provide hints via comments. This would allow the user to make use of the feature without depending on Oak specific API > Query engine index selection tweaks: shortcut and hint > -- > > Key: OAK-937 > URL: https://issues.apache.org/jira/browse/OAK-937 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, query >Reporter: Alex Parvulescu >Priority: Minor > Fix For: 1.2 > > > This issue covers 2 different changes related to the way the QueryEngine > selects a query index: > Firstly there could be a way to end the index selection process early via a > known constant value: if an index returns a known value token (like -1000) > then the query engine would effectively stop iterating through the existing > index impls and use that index directly. > Secondly it would be nice to be able to specify a desired index (if one is > known to perform better) thus skipping the existing selection mechanism (cost > calculation and comparison). This could be done via certain query hints [0]. > [0] http://en.wikipedia.org/wiki/Hint_(SQL) -- This message was sent by Atlassian JIRA (v6.3.4#6332)