[jira] [Updated] (OAK-7379) Lucene Index: per-column selectivity, assume 5 unique entries

2018-11-15 Thread Julian Reschke (JIRA)


 [ 
https://issues.apache.org/jira/browse/OAK-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7379:

Fix Version/s: 1.9.0

> Lucene Index: per-column selectivity, assume 5 unique entries
> -
>
> Key: OAK-7379
> URL: https://issues.apache.org/jira/browse/OAK-7379
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>  Labels: candidate_oak_1_8
> Fix For: 1.9.0, 1.10
>
>
> Currently, if a query has a property restriction of the form "property = x", 
> and the property is indexed in a Lucene property index, the estimated cost is 
> the index is the number of documents indexed for that property. This is a 
> very conservative estimate, it means all documents have the same value. So 
> the cost is relatively high for that index.
> In almost all cases, there are many distinct values for a property. Rarely 
> there are few values, or a skewed distribution where one value contains most 
> documents. But in almost all cases there are more than 5 distinct values.
> I think it makes sense to use 5 as the default value. It is still 
> conservative (cost of the index is high), but much better than now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7379) Lucene Index: per-column selectivity, assume 5 unique entries

2018-04-11 Thread Thomas Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-7379:

Labels: candidate_oak_1_8  (was: )

> Lucene Index: per-column selectivity, assume 5 unique entries
> -
>
> Key: OAK-7379
> URL: https://issues.apache.org/jira/browse/OAK-7379
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>  Labels: candidate_oak_1_8
> Fix For: 1.10
>
>
> Currently, if a query has a property restriction of the form "property = x", 
> and the property is indexed in a Lucene property index, the estimated cost is 
> the index is the number of documents indexed for that property. This is a 
> very conservative estimate, it means all documents have the same value. So 
> the cost is relatively high for that index.
> In almost all cases, there are many distinct values for a property. Rarely 
> there are few values, or a skewed distribution where one value contains most 
> documents. But in almost all cases there are more than 5 distinct values.
> I think it makes sense to use 5 as the default value. It is still 
> conservative (cost of the index is high), but much better than now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7379) Lucene Index: per-column selectivity, assume 5 unique entries

2018-03-27 Thread Thomas Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-7379:

Fix Version/s: 1.10

> Lucene Index: per-column selectivity, assume 5 unique entries
> -
>
> Key: OAK-7379
> URL: https://issues.apache.org/jira/browse/OAK-7379
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.10
>
>
> Currently, if a query has a property restriction of the form "property = x", 
> and the property is indexed in a Lucene property index, the estimated cost is 
> the index is the number of documents indexed for that property. This is a 
> very conservative estimate, it means all documents have the same value. So 
> the cost is relatively high for that index.
> In almost all cases, there are many distinct values for a property. Rarely 
> there are few values, or a skewed distribution where one value contains most 
> documents. But in almost all cases there are more than 5 distinct values.
> I think it makes sense to use 5 as the default value. It is still 
> conservative (cost of the index is high), but much better than now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)