[ 
https://issues.apache.org/jira/browse/OAK-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-446:
-------------------------------

    Description: 
In order to run queries with multiple conditions efficiently, it is currently 
required to create an index on all of those conditions. For example, the query:

{code}
where lastName = 'x' and firstName = 'y'
{code}

will only run efficiently (assuming there are many nodes with the same lastName 
and many nodes with the same firstName) if there is an index on both lastName 
_and_ firstName. If there are two indexes, one just on lastName and the other 
just on firstName, then one of those indexes is used, but not both.

The problem doesn't only apply to properties, it also applies to node types. So 
a query of the form 

{code}
select * from [acme:Page] where [x] = 'y'
{code}

will use either an index on the node type, or an index on 'x', but not both. It 
seems such queries are quite important in JCR.

To speed up such queries, I suggest we implement a (virtual) 'intersecting 
index' that internally merges the results from multiple (two or more) indexes. 
To do that, the indexes need to have a common property, for example the path.

For example, the first index is on lastName and path, the second index is on 
firstName and path. The intersecting index would then query the first index 
with firstName = 'x', and then query the second index with lastName = 'y' and 
path >= '...' (the value returned by the first index). This would go back and 
forth until a row is found that satisfies both conditions (the intersection 
could be empty of course).

To make this work, index implementations should support path lookup.

To speed up cost calculation for the intersecting index, it might be needed to 
extend the QueryIndex interface to return the list of property restrictions an 
index supports.

I don't currently see this as a very high priority because we didn't yet run 
into big performance problems here, plus the Lucene index will probably not 
benefit from such a feature. But I would like to keep the issue open so we have 
a plan in case we do run into performance problems.



  was:
In order to run queries with multiple conditions efficiently, it is currently 
required to create an index on all of those conditions. For example, the query:

{code}
where lastName = 'x' and firstName = 'y'
{code}

will only run efficiently (assuming there are many nodes with the same lastName 
and many nodes with the same firstName) if there is an index on both lastName 
_and_ firstName. If there are two indexes, one just lastName and the other on 
just firstName, then one of those index is used, but not both.

The problem doesn't only apply to properties, it also applies to nodeType. So a 
query of the form 

{code}
select * from [acme:Page] where [x] = 'y'
{code}

will use either an index on the node type, or an index on 'x', but not both. It 
seems such queries are quite important in JCR.

To speed up such queries, I suggest we implement a (virtual) 'intersecting 
index' that internally merges the results from multiple (two or more) indexes. 
To do that, the indexes need to have a common property, for example the path.

For example, the first index is on lastName and path, the second index on 
firstName and path. The intersecting index would then first query the first 
index with firstName = 'x', and then query the second index, with lastName = 
'y' and path >= '...' (the value returned by the first index). This would go 
back and forth until a row is found that satisfies both conditions (the 
intersection could be empty of course).

To make this work, index implementations should support path lookup.

To speed up cost calculation for the intersecting index, it might be needed to 
extend the QueryIndex interface to return the list of property restrictions an 
index supports.



       Priority: Minor  (was: Major)
    
> Query: implement an intersecting index
> --------------------------------------
>
>                 Key: OAK-446
>                 URL: https://issues.apache.org/jira/browse/OAK-446
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Minor
>
> In order to run queries with multiple conditions efficiently, it is currently 
> required to create an index on all of those conditions. For example, the 
> query:
> {code}
> where lastName = 'x' and firstName = 'y'
> {code}
> will only run efficiently (assuming there are many nodes with the same 
> lastName and many nodes with the same firstName) if there is an index on both 
> lastName _and_ firstName. If there are two indexes, one just on lastName and 
> the other just on firstName, then one of those indexes is used, but not both.
> The problem doesn't only apply to properties, it also applies to node types. 
> So a query of the form 
> {code}
> select * from [acme:Page] where [x] = 'y'
> {code}
> will use either an index on the node type, or an index on 'x', but not both. 
> It seems such queries are quite important in JCR.
> To speed up such queries, I suggest we implement a (virtual) 'intersecting 
> index' that internally merges the results from multiple (two or more) 
> indexes. To do that, the indexes need to have a common property, for example 
> the path.
> For example, the first index is on lastName and path, the second index is on 
> firstName and path. The intersecting index would then query the first index 
> with firstName = 'x', and then query the second index with lastName = 'y' and 
> path >= '...' (the value returned by the first index). This would go back and 
> forth until a row is found that satisfies both conditions (the intersection 
> could be empty of course).
> To make this work, index implementations should support path lookup.
> To speed up cost calculation for the intersecting index, it might be needed 
> to extend the QueryIndex interface to return the list of property 
> restrictions an index supports.
> I don't currently see this as a very high priority because we didn't yet run 
> into big performance problems here, plus the Lucene index will probably not 
> benefit from such a feature. But I would like to keep the issue open so we 
> have a plan in case we do run into performance problems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to