[jira] [Commented] (OAK-7125) Build Jackrabbit Oak #1141 failed

2018-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313426#comment-16313426
 ] 

Hudson commented on OAK-7125:
-

Previously failing build now is OK.
 Passed run: [Jackrabbit Oak 
#1143|https://builds.apache.org/job/Jackrabbit%20Oak/1143/] [console 
log|https://builds.apache.org/job/Jackrabbit%20Oak/1143/console]

> Build Jackrabbit Oak #1141 failed
> -
>
> Key: OAK-7125
> URL: https://issues.apache.org/jira/browse/OAK-7125
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: continuous integration
>Reporter: Hudson
>
> No description is provided
> The build Jackrabbit Oak #1141 has failed.
> First failed run: [Jackrabbit Oak 
> #1141|https://builds.apache.org/job/Jackrabbit%20Oak/1141/] [console 
> log|https://builds.apache.org/job/Jackrabbit%20Oak/1141/console]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7091) Avoid streaming data twice in composite data store

2018-01-05 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-7091:
---
Issue Type: Task  (was: Technical task)
Parent: (was: OAK-7083)

> Avoid streaming data twice in composite data store
> --
>
> Key: OAK-7091
> URL: https://issues.apache.org/jira/browse/OAK-7091
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>
> When adding a new record to an Oak instance that is using composite data 
> store, the blob stream will be read twice before it is stored - once by the 
> composite data store (to determine the blob ID) and again by the delegate.  
> This necessary because if there are multiple writable delegates and one 
> delegate already has a matching blob, the composite should call 
> {{addRecord()}} on the delegate that has the matching blob, which may not be 
> the highest priority delegate.  So we need to know the blob ID in order to 
> select the correct writable delegate.
> We could add a method to the CompositeDataStoreAware interface wherein the 
> data store can be told which blob ID to use (from the composite) so that it 
> doesn't have to process the stream again.  Then the composite data store, 
> after having read the stream to a temporary file, can pass an input stream 
> from the temporary file to the delegate along with the computed blob ID, to 
> avoid reading the stream twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OAK-7091) Avoid streaming data twice in composite data store

2018-01-05 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan reassigned OAK-7091:
--

Assignee: (was: Matt Ryan)

> Avoid streaming data twice in composite data store
> --
>
> Key: OAK-7091
> URL: https://issues.apache.org/jira/browse/OAK-7091
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>
> When adding a new record to an Oak instance that is using composite data 
> store, the blob stream will be read twice before it is stored - once by the 
> composite data store (to determine the blob ID) and again by the delegate.  
> This necessary because if there are multiple writable delegates and one 
> delegate already has a matching blob, the composite should call 
> {{addRecord()}} on the delegate that has the matching blob, which may not be 
> the highest priority delegate.  So we need to know the blob ID in order to 
> select the correct writable delegate.
> We could add a method to the CompositeDataStoreAware interface wherein the 
> data store can be told which blob ID to use (from the composite) so that it 
> doesn't have to process the stream again.  Then the composite data store, 
> after having read the stream to a temporary file, can pass an input stream 
> from the temporary file to the delegate along with the computed blob ID, to 
> avoid reading the stream twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7091) Avoid streaming data twice in composite data store

2018-01-05 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-7091:
---
Description: 
When adding a new record to an Oak instance that is using composite data store, 
the blob stream will be read twice before it is stored - once by the composite 
data store (to determine the blob ID) and again by the delegate.  This 
necessary because if there are multiple writable delegates and one delegate 
already has a matching blob, the composite should call {{addRecord()}} on the 
delegate that has the matching blob, which may not be the highest priority 
delegate.  So we need to know the blob ID in order to select the correct 
writable delegate.

We could add a method to the CompositeDataStoreAware interface wherein the data 
store can be told which blob ID to use (from the composite) so that it doesn't 
have to process the stream again.  Then the composite data store, after having 
read the stream to a temporary file, can pass an input stream from the 
temporary file to the delegate along with the computed blob ID, to avoid 
reading the stream twice.

  was:When adding a new record to an Oak instance that is using composite data 
store, the blob stream will be read twice before it is stored - once by the 
composite data store (to determine the blob ID) and again by the delegate.  We 
could add a method to the CompositeDataStoreAware interface wherein the data 
store can be told which blob ID to use (from the composite) so that it doesn't 
have to process the stream again.  Then the composite data store, after having 
read the stream to a temporary file, can pass an input stream from the 
temporary file to the delegate along with the computed blob ID, to avoid 
reading the stream twice.


> Avoid streaming data twice in composite data store
> --
>
> Key: OAK-7091
> URL: https://issues.apache.org/jira/browse/OAK-7091
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>
> When adding a new record to an Oak instance that is using composite data 
> store, the blob stream will be read twice before it is stored - once by the 
> composite data store (to determine the blob ID) and again by the delegate.  
> This necessary because if there are multiple writable delegates and one 
> delegate already has a matching blob, the composite should call 
> {{addRecord()}} on the delegate that has the matching blob, which may not be 
> the highest priority delegate.  So we need to know the blob ID in order to 
> select the correct writable delegate.
> We could add a method to the CompositeDataStoreAware interface wherein the 
> data store can be told which blob ID to use (from the composite) so that it 
> doesn't have to process the stream again.  Then the composite data store, 
> after having read the stream to a temporary file, can pass an input stream 
> from the temporary file to the delegate along with the computed blob ID, to 
> avoid reading the stream twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7091) Avoid streaming data twice in composite data store

2018-01-05 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313415#comment-16313415
 ] 

Matt Ryan commented on OAK-7091:


The first scenario I'm developing for the composite data store supports a 
single read-only delegate and a single writable delegate, so this capability is 
technically not needed until the composite data store supports multiple 
writable delegates.  Instead for now the composite data store can just pass the 
stream along to the only writable delegate.

If/when this capability is added to the composite data store, we could also add 
a method to the delegate handler to ask how many writable delegates exist.  If 
there is only one, the composite data store can optimize and avoid computing 
the blob ID, and simply pass the stream along to the only writable delegate.

> Avoid streaming data twice in composite data store
> --
>
> Key: OAK-7091
> URL: https://issues.apache.org/jira/browse/OAK-7091
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>
> When adding a new record to an Oak instance that is using composite data 
> store, the blob stream will be read twice before it is stored - once by the 
> composite data store (to determine the blob ID) and again by the delegate.  
> We could add a method to the CompositeDataStoreAware interface wherein the 
> data store can be told which blob ID to use (from the composite) so that it 
> doesn't have to process the stream again.  Then the composite data store, 
> after having read the stream to a temporary file, can pass an input stream 
> from the temporary file to the delegate along with the computed blob ID, to 
> avoid reading the stream twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7125) Build Jackrabbit Oak #1141 failed

2018-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313295#comment-16313295
 ] 

Hudson commented on OAK-7125:
-

Previously failing build now is OK.
 Passed run: [Jackrabbit Oak 
#1142|https://builds.apache.org/job/Jackrabbit%20Oak/1142/] [console 
log|https://builds.apache.org/job/Jackrabbit%20Oak/1142/console]

> Build Jackrabbit Oak #1141 failed
> -
>
> Key: OAK-7125
> URL: https://issues.apache.org/jira/browse/OAK-7125
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: continuous integration
>Reporter: Hudson
>
> No description is provided
> The build Jackrabbit Oak #1141 has failed.
> First failed run: [Jackrabbit Oak 
> #1141|https://builds.apache.org/job/Jackrabbit%20Oak/1141/] [console 
> log|https://builds.apache.org/job/Jackrabbit%20Oak/1141/console]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7117) Suppress Tika startup warnings

2018-01-05 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313240#comment-16313240
 ] 

Julian Reschke commented on OAK-7117:
-

the new config properties might actually require Tika 1.17, see 
https://issues.apache.org/jira/browse/TIKA-2490

> Suppress Tika startup warnings
> --
>
> Key: OAK-7117
> URL: https://issues.apache.org/jira/browse/OAK-7117
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: lucene
>Affects Versions: 1.8
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Attachments: OAK-7117.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6373) oak-run check should also check checkpoints

2018-01-05 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313198#comment-16313198
 ] 

Francesco Mari commented on OAK-6373:
-

I think the command should behave in the following way.

* The path passed to {{filter}} should always be a content path, i.e. no 
super-root or checkpoint prefix should be specified by the user.
* If {{checkpoints}} is not specified, the command checks the head state.
* If {{checkpoints}} is specified, the command checks the checkpoints but not 
the head state.
* If no arguments are specified for {{checkpoints}}, every checkpoint is 
checked.
* If one or more arguments are specified for {{checkpoints}}, they are the 
checkpoints that should be checked. If one or more of those checkpoints can't 
be found, the tool ignores it and continues with the next one.

The only use case that the points above don't cover is checking the head state 
and the checkpoints with one invocation. Either we ignore this use case, or we 
add an additional {{head}} option that, when used in combination with 
{{checkpoints}}, allows the traversal of both the head state and the specified 
checkpoints. {{head}} doesn't have any effect unless {{checkpoints}} is 
specified. This might be tackled by a future improvement.

> oak-run check should also check checkpoints 
> 
>
> Key: OAK-6373
> URL: https://issues.apache.org/jira/browse/OAK-6373
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run, segment-tar
>Reporter: Michael Dürig
>Assignee: Andrei Dulceanu
>  Labels: tooling
> Fix For: 1.8
>
>
> {{oak-run check}} does currently *not* traverse and check the items in the 
> checkpoint. I think we should change this and add an option to traverse all, 
> some or none of the checkpoints. When doing this we need to keep in mind the 
> interaction of this new feature with the {{filter}} option: the paths passed 
> through this option need then be prefixed with {{/root}}. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7126) make RDBCacheConsistency2Test store-agnostic

2018-01-05 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7126:

Attachment: OAK-7126.diff

Proposed test - [~mreutegg], please review.

(and yes, the patch is currently missing the removal of the old test class)

> make RDBCacheConsistency2Test store-agnostic
> 
>
> Key: OAK-7126
> URL: https://issues.apache.org/jira/browse/OAK-7126
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Attachments: OAK-7126.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7126) make RDBCacheConsistency2Test store-agnostic

2018-01-05 Thread Julian Reschke (JIRA)
Julian Reschke created OAK-7126:
---

 Summary: make RDBCacheConsistency2Test store-agnostic
 Key: OAK-7126
 URL: https://issues.apache.org/jira/browse/OAK-7126
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: documentmk
Reporter: Julian Reschke
Assignee: Julian Reschke
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7125) Build Jackrabbit Oak #1141 failed

2018-01-05 Thread Hudson (JIRA)
Hudson created OAK-7125:
---

 Summary: Build Jackrabbit Oak #1141 failed
 Key: OAK-7125
 URL: https://issues.apache.org/jira/browse/OAK-7125
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: continuous integration
Reporter: Hudson


No description is provided

The build Jackrabbit Oak #1141 has failed.
First failed run: [Jackrabbit Oak 
#1141|https://builds.apache.org/job/Jackrabbit%20Oak/1141/] [console 
log|https://builds.apache.org/job/Jackrabbit%20Oak/1141/console]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7124) Support MemoryNodeStore with NodeStoreFixtureProvider

2018-01-05 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7124.
--
Resolution: Fixed

Done with 1820292

> Support MemoryNodeStore with NodeStoreFixtureProvider
> -
>
> Key: OAK-7124
> URL: https://issues.apache.org/jira/browse/OAK-7124
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.15
>
>
> At times we need to use oak-run console to just execute some script (like 
> OAK-7122). Currently oak-run console would require a working repository 
> access. To support such cases we should enable support for using 
> MemoryNodeStore. So following command can be used
> {noformat}
> java -jar oak-run-*.jar console memory
> {noformat}
> The memory NodeStore can be used to play with NodeStore API. Or this can just 
> be used to enable launch of groovy script



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7124) Support MemoryNodeStore with NodeStoreFixtureProvider

2018-01-05 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7124:


 Summary: Support MemoryNodeStore with NodeStoreFixtureProvider
 Key: OAK-7124
 URL: https://issues.apache.org/jira/browse/OAK-7124
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8, 1.7.15


At times we need to use oak-run console to just execute some script (like 
OAK-7122). Currently oak-run console would require a working repository access. 
To support such cases we should enable support for using MemoryNodeStore. So 
following command can be used

{noformat}
java -jar oak-run-*.jar console memory
{noformat}

The memory NodeStore can be used to play with NodeStore API. Or this can just 
be used to enable launch of groovy script



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7122) Implement script to compare lucene indexes logically

2018-01-05 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7122.
--
Resolution: Done

> Implement script to compare lucene indexes logically
> 
>
> Key: OAK-7122
> URL: https://issues.apache.org/jira/browse/OAK-7122
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> With Document Traversal based indexing we have implemented a newer indexing 
> logic. To validate that index produced by it is is same as one done by 
> existing indexing flow we need to implement a script which can enable 
> comparing the index content logically
> This was recently discussed on lucene mailing list [1] and suggestion there 
> was it can be done by un-inverting the index. So to enable that we need to 
> implement a script which can 
> # Open a Lucene index
> # Map the Lucene Document to path of node
> # For each document determine what all fields are associated with it (stored 
> and non stored)
> # Dump this content in file sorted by path and for each line field name 
> sorted by name
> Then such dumps can be generated for old and new index and compared via 
> simple text diff
> [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7123) ChildNodeStateProvider does not return all immediate children

2018-01-05 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7123.
--
Resolution: Fixed

Done with 1820278

> ChildNodeStateProvider does not return all immediate children
> -
>
> Key: OAK-7123
> URL: https://issues.apache.org/jira/browse/OAK-7123
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: run
>Affects Versions: 1.7.14
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.15
>
>
> Based on script implemented in OAK-7122 and running it against a test index 
> it was observed that some of the relative fields were not getting indexed. 
> This happens because the ChildNodeStateProvider#children does not handle the 
> immediate children check properly. It would fail for case like
> {noformat}
> /a
> /a/b
> /a/b/c
> /a/d
> /a/d/e
> {noformat}
> Currently it would only report 'b' as child of 'a'. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/5/18 10:38 AM:


[~tmueller] so adding the feature to aggregate the current rep:facet extraction 
from the UNION alternatives has 2 drawbacks:

1) as said above, all constraints have to be passed to lucene, so the query has 
to be in DNF, which is not the case at the moment
2) even if this is the case, the disjunctive conjunctions are not mutually 
exclusive leading to inaccurate result as well

1) can be easily fixed by converting the restriction sot NNF before doing the 
optimisation. 2) would require also a deduplication between the lucene result 
sets returned from each of the unions. 




was (Author: diru):
[~tmueller] so adding the feature to aggregate the current rep:facet extraction 
from the UNION alternatives has 2 drawbacks:

1) as said above, all constraints have to be passed to lucene, so the query has 
to be in DNF, which is not the case at the moment
2) even if this is the case, the disjunctive conjunctions are not mutually 
exclusive leading to inaccurate result as well

It would require also a deduplication between the lucene results returned from 
each of the unions. 



> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/5/18 10:37 AM:


[~tmueller] so adding the feature to aggregate the current rep:facet extraction 
from the UNION alternatives has 2 drawbacks:

1) as said above, all constraints have to be passed to lucene, so the query has 
to be in DNF, which is not the case at the moment
2) even if this is the case, the disjunctive conjunctions are not mutually 
exclusive leading to inaccurate result as well

It would require also a deduplication between the lucene results returned from 
each of the unions. 




was (Author: diru):
[~tmueller] so adding the feature to aggregate the current rep:facet extraction 
from the UNION alternatives has 2 drawbacks:

1) as said above, all constraints have to be passed to lucene, so the query has 
to be in DNF, which is not the case at the moment
2) even if this is the case, the disjunctive conjunctions are not mutually 
exclusive leading to inaccurate result as well



> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900
 ] 

Dirk Rudolph commented on OAK-7109:
---

[~tmueller] so adding the feature to aggregate the current rep:facet extraction 
from the UNION alternatives has 2 drawbacks:

1) as said above, all constraints have to be passed to lucene, so the query has 
to be in DNF, which is not the case at the moment
2) even if this is the case, the disjunctive conjunctions are not mutually 
exclusive leading to inaccurate result as well



> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7122) Implement script to compare lucene indexes logically

2018-01-05 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312866#comment-16312866
 ] 

Chetan Mehrotra edited comment on OAK-7122 at 1/5/18 10:24 AM:
---

Implemented the script at [1]. Currently it build up the structure in memory. 
If this proves to be problamatic for large index can look into building the 
structure on file system

*Usage*

{code}
java -DindexPath=/path/to/indexing-result/indexes/lucene/data \
-jar oak-run-*.jar \
console /path/to/segmentstore \
":load 
https://raw.githubusercontent.com/chetanmeh/oak-console-scripts/master/src/main/groovy/lucene/luceneIndexDumper.groovy";
{code}

[1] 
https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene


was (Author: chetanm):
Implemented the script at [1]. Currently it build up the structure in memory. 
If this proves to be problamatic for large index can look into building the 
structure on file system

[1] 
https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene

> Implement script to compare lucene indexes logically
> 
>
> Key: OAK-7122
> URL: https://issues.apache.org/jira/browse/OAK-7122
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> With Document Traversal based indexing we have implemented a newer indexing 
> logic. To validate that index produced by it is is same as one done by 
> existing indexing flow we need to implement a script which can enable 
> comparing the index content logically
> This was recently discussed on lucene mailing list [1] and suggestion there 
> was it can be done by un-inverting the index. So to enable that we need to 
> implement a script which can 
> # Open a Lucene index
> # Map the Lucene Document to path of node
> # For each document determine what all fields are associated with it (stored 
> and non stored)
> # Dump this content in file sorted by path and for each line field name 
> sorted by name
> Then such dumps can be generated for old and new index and compared via 
> simple text diff
> [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7123) ChildNodeStateProvider does not return all immediate children

2018-01-05 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7123:


 Summary: ChildNodeStateProvider does not return all immediate 
children
 Key: OAK-7123
 URL: https://issues.apache.org/jira/browse/OAK-7123
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: run
Affects Versions: 1.7.14
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8, 1.7.15


Based on script implemented in OAK-7122 and running it against a test index it 
was observed that some of the relative fields were not getting indexed. This 
happens because the ChildNodeStateProvider#children does not handle the 
immediate children check properly. It would fail for case like

{noformat}
/a
/a/b
/a/b/c
/a/d
/a/d/e
{noformat}

Currently it would only report 'b' as child of 'a'. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7122) Implement script to compare lucene indexes logically

2018-01-05 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312866#comment-16312866
 ] 

Chetan Mehrotra commented on OAK-7122:
--

Implemented the script at [1]. Currently it build up the structure in memory. 
If this proves to be problamatic for large index can look into building the 
structure on file system

[1] 
https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene

> Implement script to compare lucene indexes logically
> 
>
> Key: OAK-7122
> URL: https://issues.apache.org/jira/browse/OAK-7122
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> With Document Traversal based indexing we have implemented a newer indexing 
> logic. To validate that index produced by it is is same as one done by 
> existing indexing flow we need to implement a script which can enable 
> comparing the index content logically
> This was recently discussed on lucene mailing list [1] and suggestion there 
> was it can be done by un-inverting the index. So to enable that we need to 
> implement a script which can 
> # Open a Lucene index
> # Map the Lucene Document to path of node
> # For each document determine what all fields are associated with it (stored 
> and non stored)
> # Dump this content in file sorted by path and for each line field name 
> sorted by name
> Then such dumps can be generated for old and new index and compared via 
> simple text diff
> [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312793#comment-16312793
 ] 

Thomas Mueller commented on OAK-7109:
-

[~catholicon] OK I see facets does not exactly match "group by" + "count". So, 
what if we add a feature to aggregate the data from a "select [rep:facet(...)] 
... UNION select [rep:facet(...)] ..." query? I believe aggregating that data 
in the query engine should be possible, as the data format of the facet feature 
is known.

>> What if Lucene doesn't index all the constraints?
> fail such queries

Sounds good to me. I believe right now, if a query uses "select 
[rep:facet(...)]", then only indexes that support that are used. If there is no 
index that supports facets, then the query should fail with an exception (if 
that's not the case yet, we should probably add that). If the Lucene index 
doesn't support some of the conditions, then it shouldn't return an index plan. 
That should solve the problem with "union" queries as well.

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest

2018-01-05 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311966#comment-16311966
 ] 

Julian Reschke edited comment on OAK-7121 at 1/5/18 9:32 AM:
-

trunk: [r1820199|http://svn.apache.org/r1820199]
1.6: [r1820220|http://svn.apache.org/r1820220]
1.4: [r1820264|http://svn.apache.org/r1820264]
1.2: [r1820268|http://svn.apache.org/r1820268]
1.0: [r1820271|http://svn.apache.org/r1820271]



was (Author: reschke):
trunk: [r1820199|http://svn.apache.org/r1820199]
1.6: [r1820220|http://svn.apache.org/r1820220]
1.4: [r1820264|http://svn.apache.org/r1820264]
1.2: [r1820268|http://svn.apache.org/r1820268]


> DocumentStore testing: allow config of DocumentMK.Builder in 
> AbstractDocumentStoreTest
> --
>
> Key: OAK-7121
> URL: https://issues.apache.org/jira/browse/OAK-7121
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20, 1.0.41
>
> Attachments: OAK-7121.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest

2018-01-05 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7121:

Fix Version/s: 1.0.41

> DocumentStore testing: allow config of DocumentMK.Builder in 
> AbstractDocumentStoreTest
> --
>
> Key: OAK-7121
> URL: https://issues.apache.org/jira/browse/OAK-7121
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20, 1.0.41
>
> Attachments: OAK-7121.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest

2018-01-05 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7121:

Labels:   (was: candidate_oak_1_0)

> DocumentStore testing: allow config of DocumentMK.Builder in 
> AbstractDocumentStoreTest
> --
>
> Key: OAK-7121
> URL: https://issues.apache.org/jira/browse/OAK-7121
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20, 1.0.41
>
> Attachments: OAK-7121.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7122) Implement script to compare lucene indexes logically

2018-01-05 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7122:


 Summary: Implement script to compare lucene indexes logically
 Key: OAK-7122
 URL: https://issues.apache.org/jira/browse/OAK-7122
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8


With Document Traversal based indexing we have implemented a newer indexing 
logic. To validate that index produced by it is is same as one done by existing 
indexing flow we need to implement a script which can enable comparing the 
index content logically

This was recently discussed on lucene mailing list [1] and suggestion there was 
it can be done by un-inverting the index. So to enable that we need to 
implement a script which can 

# Open a Lucene index
# Map the Lucene Document to path of node
# For each document determine what all fields are associated with it (stored 
and non stored)
# Dump this content in file sorted by path and for each line field name sorted 
by name

Then such dumps can be generated for old and new index and compared via simple 
text diff

[1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest

2018-01-05 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311966#comment-16311966
 ] 

Julian Reschke edited comment on OAK-7121 at 1/5/18 8:35 AM:
-

trunk: [r1820199|http://svn.apache.org/r1820199]
1.6: [r1820220|http://svn.apache.org/r1820220]
1.4: [r1820264|http://svn.apache.org/r1820264]
1.2: [r1820268|http://svn.apache.org/r1820268]



was (Author: reschke):
trunk: [r1820199|http://svn.apache.org/r1820199]
1.6: [r1820220|http://svn.apache.org/r1820220]
1.4: [r1820264|http://svn.apache.org/r1820264]


> DocumentStore testing: allow config of DocumentMK.Builder in 
> AbstractDocumentStoreTest
> --
>
> Key: OAK-7121
> URL: https://issues.apache.org/jira/browse/OAK-7121
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
>  Labels: candidate_oak_1_0
> Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20
>
> Attachments: OAK-7121.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest

2018-01-05 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7121:

Labels: candidate_oak_1_0  (was: candidate_oak_1_0 candidate_oak_1_2)

> DocumentStore testing: allow config of DocumentMK.Builder in 
> AbstractDocumentStoreTest
> --
>
> Key: OAK-7121
> URL: https://issues.apache.org/jira/browse/OAK-7121
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
>  Labels: candidate_oak_1_0
> Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20
>
> Attachments: OAK-7121.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest

2018-01-05 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7121:

Fix Version/s: 1.2.28

> DocumentStore testing: allow config of DocumentMK.Builder in 
> AbstractDocumentStoreTest
> --
>
> Key: OAK-7121
> URL: https://issues.apache.org/jira/browse/OAK-7121
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
>  Labels: candidate_oak_1_0
> Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20
>
> Attachments: OAK-7121.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312691#comment-16312691
 ] 

Dirk Rudolph commented on OAK-7109:
---

{quote}
I have a very pessimistic view that we should fail such queries - I mean it's 
better to fail and allow for right index def than giving incorrect results.
{quote}
+1

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312687#comment-16312687
 ] 

Vikas Saurabh edited comment on OAK-7109 at 1/5/18 8:22 AM:


{quote}
(I know the "group by" and "count" are not currently supported by Oak).
Or are there other aspects I missed?
{quote}
Indeed fundamentally that's what facets do -  provide usually few (not 'all' 
unlike group by) properties and count according to how many documents match the 
query. Lucene's faceting support also does ranges although we don't support 
that yet - e.g. I could facet of "jcr:created" and the categories could turn 
out as "today", "within last week", etc (I'm not completely sure about the 
API... I'm just trying to illustrate that faceted categories can potentially be 
not-the-actually-stored-value).

bq. What do you mean with "scoring"?
The scoring part is entirely different issue unrelated to facets - e.g. we 
won't (can't??) correctly order documents matching queries such as {{ WHERE 
(CONTAINS(., 'text') AND foo1='bar') OR (CONTAINS(., 'text' AND foo2='bar' AND 
foo3='bar')}} (foo=bar could be different fulltext clause too... the issue is 
that we can't quite merge scores coming out of separate lucene queries)
But, let's ignore the scoring for this issue.

bq. What if Lucene doesn't index all the constraints?
I have a very pessimistic view that we should fail such queries - I mean it's 
better to fail and allow for right index def than giving incorrect results.


was (Author: catholicon):
{quote}
(I know the "group by" and "count" are not currently supported by Oak).
Or are there other aspects I missed?
{quote}
Indeed fundamentally that's what facets do -  provide usually few (not 'all' 
unlike group by) properties and count according to how many documents match the 
query. Lucene's faceting support also does ranges although we don't support 
that yet - e.g. I could facet of "jcr:created" and the categories could turn 
out as "today", "within last week", etc (I'm not completely sure about the 
API... I'm just trying to illustrate that faceted categories can potentially be 
not-the-actually-stored-value).

bq. What do you mean with "scoring"?
The scoring part is entirely different issue unrelated to facets - e.g. we 
correctly won't (can't??) order documents matching queries such as {{ WHERE 
(CONTAINS(., 'text') AND foo1='bar') OR (CONTAINS(., 'text' AND foo2='bar' AND 
foo3='bar')}} (foo=bar could be different fulltext clause too... the issue is 
that we can't quite merge scores coming out of separate lucene queries)
But, let's ignore the scoring for this issue.

bq. What if Lucene doesn't index all the constraints?
I have a very pessimistic view that we should fail such queries - I mean it's 
better to fail and allow for right index def than giving incorrect results.

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sen

[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312687#comment-16312687
 ] 

Vikas Saurabh commented on OAK-7109:


{quote}
(I know the "group by" and "count" are not currently supported by Oak).
Or are there other aspects I missed?
{quote}
Indeed fundamentally that's what facets do -  provide usually few (not 'all' 
unlike group by) properties and count according to how many documents match the 
query. Lucene's faceting support also does ranges although we don't support 
that yet - e.g. I could facet of "jcr:created" and the categories could turn 
out as "today", "within last week", etc (I'm not completely sure about the 
API... I'm just trying to illustrate that faceted categories can potentially be 
not-the-actually-stored-value).

bq. What do you mean with "scoring"?
The scoring part is entirely different issue unrelated to facets - e.g. we 
correctly won't (can't??) order documents matching queries such as {{ WHERE 
(CONTAINS(., 'text') AND foo1='bar') OR (CONTAINS(., 'text' AND foo2='bar' AND 
foo3='bar')}} (foo=bar could be different fulltext clause too... the issue is 
that we can't quite merge scores coming out of separate lucene queries)
But, let's ignore the scoring for this issue.

bq. What if Lucene doesn't index all the constraints?
I have a very pessimistic view that we should fail such queries - I mean it's 
better to fail and allow for right index def than giving incorrect results.

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312675#comment-16312675
 ] 

Thomas Mueller commented on OAK-7109:
-

I don't fully know how facets work. Could you help me a bit with this please. 
The query
{noformat}
select [rep:facet(simple/tags)] from [nt:base] as a 
where contains(a.[*], 'ipsum') 
and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2'))
{noformat}

converted to "regular SQL" would be this, right?
{noformat}
select [simple/tags], count(*)
from [nt:base] as a 
where contains(a.[*], 'ipsum') 
and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2'))
group by [simple/tags]
{noformat}

(I know the "group by" and "count" are not currently supported by Oak).
Or are there other aspects I missed? What do you mean with "scoring"?

If it's the same, then I guess we might want to support the "group by" and 
"count" features in Oak, or add a custom logic to combine the results of 
{noformat}
select [rep:facet(...)] ... UNION select [rep:facet(...)] ...
{noformat}

> passing all constraints to lucene

What if Lucene doesn't index all the constraints?

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)