[jira] [Commented] (OAK-5899) PropertyDefinitions should allow for some tweakability to declare usefulness

2017-07-11 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083406#comment-16083406
 ] 

Chetan Mehrotra commented on OAK-5899:
--

Merged to trunk with 1801675

> PropertyDefinitions should allow for some tweakability to declare usefulness
> 
>
> Key: OAK-5899
> URL: https://issues.apache.org/jira/browse/OAK-5899
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Chetan Mehrotra
>Priority: Blocker
>  Labels: candidate_oak_1_6, docs-impacting
> Fix For: 1.8, 1.7.4, 1.6.3
>
> Attachments: OAK-5899-v1.patch
>
>
> At times, we have property definitions which are added to support for dense 
> results right out of the index (e.g. {{contains(\*, 'foo') AND 
> \[bar]='baz'}}).
> In such cases, the added property definition "might" not be the best one to 
> answer queries which only have the property restriction (eg only 
> {{\[bar]='baz'}}
> There should be a way for property definition to declare this. May be there 
> are cases of some spectrum too - i.e. not only a boolean-usable-or-not, but 
> some kind of scale of how-usable is it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5899) PropertyDefinitions should allow for some tweakability to declare usefulness

2017-07-11 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082240#comment-16082240
 ] 

Thomas Mueller commented on OAK-5899:
-

I see costPerEntryFactor was used not according to spec before, so the patch 
didn't introduce this. This can be fixed later if needed.

About "weight" being used like a binary, this is also something we can change 
later.

So I'm OK to apply the patch.

> PropertyDefinitions should allow for some tweakability to declare usefulness
> 
>
> Key: OAK-5899
> URL: https://issues.apache.org/jira/browse/OAK-5899
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.8, 1.6.3
>
> Attachments: OAK-5899-v1.patch
>
>
> At times, we have property definitions which are added to support for dense 
> results right out of the index (e.g. {{contains(\*, 'foo') AND 
> \[bar]='baz'}}).
> In such cases, the added property definition "might" not be the best one to 
> answer queries which only have the property restriction (eg only 
> {{\[bar]='baz'}}
> There should be a way for property definition to declare this. May be there 
> are cases of some spectrum too - i.e. not only a boolean-usable-or-not, but 
> some kind of scale of how-usable is it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5899) PropertyDefinitions should allow for some tweakability to declare usefulness

2017-07-11 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082219#comment-16082219
 ] 

Thomas Mueller commented on OAK-5899:
-

@Chetan the definition of costPerEntry in QueryIndex.IndexPlan is:

/**
 * The cost to read one entry from the cursor. The returned value should
 * approximately match the number of disk read operations plus the
 * number of network roundtrips (worst case).
 * 
 * @return the lookup cost per entry, in estimated number of I/O 
operations
 */

To me it looks like this doesn't match what the Lucene index does:

{noformat}
 //For plan2 as 2 props are indexed its costPerEntry should be less than plan1 
which
+//indexes only one prop
+assertThat(plan2.getCostPerEntry(), lessThan(plan1.getCostPerEntry()));
+
{noformat}

I would rather modify estimatedEntryCount:

/**
 * The estimated number of entries in the cursor that is returned by 
the query method,
 * when using this plan. This value does not have to be accurate.
 * 
 * @return the estimated number of entries
 */
long getEstimatedEntryCount();


> PropertyDefinitions should allow for some tweakability to declare usefulness
> 
>
> Key: OAK-5899
> URL: https://issues.apache.org/jira/browse/OAK-5899
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.8, 1.6.3
>
> Attachments: OAK-5899-v1.patch
>
>
> At times, we have property definitions which are added to support for dense 
> results right out of the index (e.g. {{contains(\*, 'foo') AND 
> \[bar]='baz'}}).
> In such cases, the added property definition "might" not be the best one to 
> answer queries which only have the property restriction (eg only 
> {{\[bar]='baz'}}
> There should be a way for property definition to declare this. May be there 
> are cases of some spectrum too - i.e. not only a boolean-usable-or-not, but 
> some kind of scale of how-usable is it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5899) PropertyDefinitions should allow for some tweakability to declare usefulness

2017-03-08 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901109#comment-15901109
 ] 

Vikas Saurabh commented on OAK-5899:


[~tmueller], I was recently trying to read up the index structure a bit to get 
some handle on OAK-5707. Admittedly, I don't know much of lucene's low-level 
APIs and there might be a better way(s) to do it. It seems to me that we should 
be able to get a histogram of num-doc-per-term-per-field in lucene too. That 
said, I feel that we should set 
selectivity/weight/whatever-name-we-come-up-with when standard-dev of 
num-doc-per-term-per-field is relatively low (and yes, both "when" and 
"relatively" need to qualified better)

> PropertyDefinitions should allow for some tweakability to declare usefulness
> 
>
> Key: OAK-5899
> URL: https://issues.apache.org/jira/browse/OAK-5899
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Priority: Minor
> Fix For: 1.8
>
>
> At times, we have property definitions which are added to support for dense 
> results right out of the index (e.g. {{contains(\*, 'foo') AND 
> \[bar]='baz'}}).
> In such cases, the added property definition "might" not be the best one to 
> answer queries which only have the property restriction (eg only 
> {{\[bar]='baz'}}
> There should be a way for property definition to declare this. May be there 
> are cases of some spectrum too - i.e. not only a boolean-usable-or-not, but 
> some kind of scale of how-usable is it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-5899) PropertyDefinitions should allow for some tweakability to declare usefulness

2017-03-07 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899399#comment-15899399
 ] 

Thomas Mueller commented on OAK-5899:
-

Lucene supports field info, and the Lucene index JMX bean allows to read that 
info using getFieldInfo, see OAK-3219. This is a start. I added that to the JMX 
bean so that we can find out how fast it is. It could be the basis for an 
"analyze" tool for Oak.

> PropertyDefinitions should allow for some tweakability to declare usefulness
> 
>
> Key: OAK-5899
> URL: https://issues.apache.org/jira/browse/OAK-5899
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Priority: Minor
> Fix For: 1.8
>
>
> At times, we have property definitions which are added to support for dense 
> results right out of the index (e.g. {{contains(\*, 'foo') AND 
> \[bar]='baz'}}).
> In such cases, the added property definition "might" not be the best one to 
> answer queries which only have the property restriction (eg only 
> {{\[bar]='baz'}}
> There should be a way for property definition to declare this. May be there 
> are cases of some spectrum too - i.e. not only a boolean-usable-or-not, but 
> some kind of scale of how-usable is it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-5899) PropertyDefinitions should allow for some tweakability to declare usefulness

2017-03-07 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899392#comment-15899392
 ] 

Thomas Mueller commented on OAK-5899:
-

> scale of how-usable is it

Yes. Many relational databases make [cost 
estimation|https://en.wikipedia.org/wiki/Query_optimization#Cost_estimation] 
using histograms. Even [SQLite supports 
that|https://www.sqlite.org/compile.html#enable_stat4]. The H2 database uses 
"selectivity" on a [per-column 
basis|http://h2database.com/html/functions.html#selectivity].

I think Lucene doesn't provide that, as it's mainly used for fulltext search, 
and not so much for relational queries. But for our case, just having an 
estimate on the number of entries for a certain property value (cardinality) 
would be very useful. A configuration options would help a lot. An "analyze" 
tool for Oak could update those values at runtime, similar to what the SQL 
command "analyze" does for relational database 
([Oracle|https://docs.oracle.com/cd/B12037_01/server.101/b10759/statements_4005.htm],
 [PostgreSQL|https://www.postgresql.org/docs/current/static/sql-analyze.html], 
[MySQL|https://dev.mysql.com/doc/refman/5.7/en/analyze-table.html], 
[SQLite|https://www.sqlite.org/lang_analyze.html], 
[H2|http://h2database.com/html/grammar.html#analyze]).

> PropertyDefinitions should allow for some tweakability to declare usefulness
> 
>
> Key: OAK-5899
> URL: https://issues.apache.org/jira/browse/OAK-5899
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Priority: Minor
> Fix For: 1.8
>
>
> At times, we have property definitions which are added to support for dense 
> results right out of the index (e.g. {{contains(\*, 'foo') AND 
> \[bar]='baz'}}).
> In such cases, the added property definition "might" not be the best one to 
> answer queries which only have the property restriction (eg only 
> {{\[bar]='baz'}}
> There should be a way for property definition to declare this. May be there 
> are cases of some spectrum too - i.e. not only a boolean-usable-or-not, but 
> some kind of scale of how-usable is it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)