[jira] [Commented] (OAK-7125) Build Jackrabbit Oak #1141 failed
[ https://issues.apache.org/jira/browse/OAK-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313426#comment-16313426 ] Hudson commented on OAK-7125: - Previously failing build now is OK. Passed run: [Jackrabbit Oak #1143|https://builds.apache.org/job/Jackrabbit%20Oak/1143/] [console log|https://builds.apache.org/job/Jackrabbit%20Oak/1143/console] > Build Jackrabbit Oak #1141 failed > - > > Key: OAK-7125 > URL: https://issues.apache.org/jira/browse/OAK-7125 > Project: Jackrabbit Oak > Issue Type: Bug > Components: continuous integration >Reporter: Hudson > > No description is provided > The build Jackrabbit Oak #1141 has failed. > First failed run: [Jackrabbit Oak > #1141|https://builds.apache.org/job/Jackrabbit%20Oak/1141/] [console > log|https://builds.apache.org/job/Jackrabbit%20Oak/1141/console] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7091) Avoid streaming data twice in composite data store
[ https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Ryan updated OAK-7091: --- Issue Type: Task (was: Technical task) Parent: (was: OAK-7083) > Avoid streaming data twice in composite data store > -- > > Key: OAK-7091 > URL: https://issues.apache.org/jira/browse/OAK-7091 > Project: Jackrabbit Oak > Issue Type: Task > Components: blob, blob-cloud, blob-cloud-azure, blob-plugins >Reporter: Matt Ryan > > When adding a new record to an Oak instance that is using composite data > store, the blob stream will be read twice before it is stored - once by the > composite data store (to determine the blob ID) and again by the delegate. > This necessary because if there are multiple writable delegates and one > delegate already has a matching blob, the composite should call > {{addRecord()}} on the delegate that has the matching blob, which may not be > the highest priority delegate. So we need to know the blob ID in order to > select the correct writable delegate. > We could add a method to the CompositeDataStoreAware interface wherein the > data store can be told which blob ID to use (from the composite) so that it > doesn't have to process the stream again. Then the composite data store, > after having read the stream to a temporary file, can pass an input stream > from the temporary file to the delegate along with the computed blob ID, to > avoid reading the stream twice. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (OAK-7091) Avoid streaming data twice in composite data store
[ https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Ryan reassigned OAK-7091: -- Assignee: (was: Matt Ryan) > Avoid streaming data twice in composite data store > -- > > Key: OAK-7091 > URL: https://issues.apache.org/jira/browse/OAK-7091 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: blob, blob-cloud, blob-cloud-azure, blob-plugins >Reporter: Matt Ryan > > When adding a new record to an Oak instance that is using composite data > store, the blob stream will be read twice before it is stored - once by the > composite data store (to determine the blob ID) and again by the delegate. > This necessary because if there are multiple writable delegates and one > delegate already has a matching blob, the composite should call > {{addRecord()}} on the delegate that has the matching blob, which may not be > the highest priority delegate. So we need to know the blob ID in order to > select the correct writable delegate. > We could add a method to the CompositeDataStoreAware interface wherein the > data store can be told which blob ID to use (from the composite) so that it > doesn't have to process the stream again. Then the composite data store, > after having read the stream to a temporary file, can pass an input stream > from the temporary file to the delegate along with the computed blob ID, to > avoid reading the stream twice. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7091) Avoid streaming data twice in composite data store
[ https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Ryan updated OAK-7091: --- Description: When adding a new record to an Oak instance that is using composite data store, the blob stream will be read twice before it is stored - once by the composite data store (to determine the blob ID) and again by the delegate. This necessary because if there are multiple writable delegates and one delegate already has a matching blob, the composite should call {{addRecord()}} on the delegate that has the matching blob, which may not be the highest priority delegate. So we need to know the blob ID in order to select the correct writable delegate. We could add a method to the CompositeDataStoreAware interface wherein the data store can be told which blob ID to use (from the composite) so that it doesn't have to process the stream again. Then the composite data store, after having read the stream to a temporary file, can pass an input stream from the temporary file to the delegate along with the computed blob ID, to avoid reading the stream twice. was:When adding a new record to an Oak instance that is using composite data store, the blob stream will be read twice before it is stored - once by the composite data store (to determine the blob ID) and again by the delegate. We could add a method to the CompositeDataStoreAware interface wherein the data store can be told which blob ID to use (from the composite) so that it doesn't have to process the stream again. Then the composite data store, after having read the stream to a temporary file, can pass an input stream from the temporary file to the delegate along with the computed blob ID, to avoid reading the stream twice. > Avoid streaming data twice in composite data store > -- > > Key: OAK-7091 > URL: https://issues.apache.org/jira/browse/OAK-7091 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: blob, blob-cloud, blob-cloud-azure, blob-plugins >Reporter: Matt Ryan >Assignee: Matt Ryan > > When adding a new record to an Oak instance that is using composite data > store, the blob stream will be read twice before it is stored - once by the > composite data store (to determine the blob ID) and again by the delegate. > This necessary because if there are multiple writable delegates and one > delegate already has a matching blob, the composite should call > {{addRecord()}} on the delegate that has the matching blob, which may not be > the highest priority delegate. So we need to know the blob ID in order to > select the correct writable delegate. > We could add a method to the CompositeDataStoreAware interface wherein the > data store can be told which blob ID to use (from the composite) so that it > doesn't have to process the stream again. Then the composite data store, > after having read the stream to a temporary file, can pass an input stream > from the temporary file to the delegate along with the computed blob ID, to > avoid reading the stream twice. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7091) Avoid streaming data twice in composite data store
[ https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313415#comment-16313415 ] Matt Ryan commented on OAK-7091: The first scenario I'm developing for the composite data store supports a single read-only delegate and a single writable delegate, so this capability is technically not needed until the composite data store supports multiple writable delegates. Instead for now the composite data store can just pass the stream along to the only writable delegate. If/when this capability is added to the composite data store, we could also add a method to the delegate handler to ask how many writable delegates exist. If there is only one, the composite data store can optimize and avoid computing the blob ID, and simply pass the stream along to the only writable delegate. > Avoid streaming data twice in composite data store > -- > > Key: OAK-7091 > URL: https://issues.apache.org/jira/browse/OAK-7091 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: blob, blob-cloud, blob-cloud-azure, blob-plugins >Reporter: Matt Ryan >Assignee: Matt Ryan > > When adding a new record to an Oak instance that is using composite data > store, the blob stream will be read twice before it is stored - once by the > composite data store (to determine the blob ID) and again by the delegate. > We could add a method to the CompositeDataStoreAware interface wherein the > data store can be told which blob ID to use (from the composite) so that it > doesn't have to process the stream again. Then the composite data store, > after having read the stream to a temporary file, can pass an input stream > from the temporary file to the delegate along with the computed blob ID, to > avoid reading the stream twice. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7125) Build Jackrabbit Oak #1141 failed
[ https://issues.apache.org/jira/browse/OAK-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313295#comment-16313295 ] Hudson commented on OAK-7125: - Previously failing build now is OK. Passed run: [Jackrabbit Oak #1142|https://builds.apache.org/job/Jackrabbit%20Oak/1142/] [console log|https://builds.apache.org/job/Jackrabbit%20Oak/1142/console] > Build Jackrabbit Oak #1141 failed > - > > Key: OAK-7125 > URL: https://issues.apache.org/jira/browse/OAK-7125 > Project: Jackrabbit Oak > Issue Type: Bug > Components: continuous integration >Reporter: Hudson > > No description is provided > The build Jackrabbit Oak #1141 has failed. > First failed run: [Jackrabbit Oak > #1141|https://builds.apache.org/job/Jackrabbit%20Oak/1141/] [console > log|https://builds.apache.org/job/Jackrabbit%20Oak/1141/console] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7117) Suppress Tika startup warnings
[ https://issues.apache.org/jira/browse/OAK-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313240#comment-16313240 ] Julian Reschke commented on OAK-7117: - the new config properties might actually require Tika 1.17, see https://issues.apache.org/jira/browse/TIKA-2490 > Suppress Tika startup warnings > -- > > Key: OAK-7117 > URL: https://issues.apache.org/jira/browse/OAK-7117 > Project: Jackrabbit Oak > Issue Type: Task > Components: lucene >Affects Versions: 1.8 >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Attachments: OAK-7117.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6373) oak-run check should also check checkpoints
[ https://issues.apache.org/jira/browse/OAK-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313198#comment-16313198 ] Francesco Mari commented on OAK-6373: - I think the command should behave in the following way. * The path passed to {{filter}} should always be a content path, i.e. no super-root or checkpoint prefix should be specified by the user. * If {{checkpoints}} is not specified, the command checks the head state. * If {{checkpoints}} is specified, the command checks the checkpoints but not the head state. * If no arguments are specified for {{checkpoints}}, every checkpoint is checked. * If one or more arguments are specified for {{checkpoints}}, they are the checkpoints that should be checked. If one or more of those checkpoints can't be found, the tool ignores it and continues with the next one. The only use case that the points above don't cover is checking the head state and the checkpoints with one invocation. Either we ignore this use case, or we add an additional {{head}} option that, when used in combination with {{checkpoints}}, allows the traversal of both the head state and the specified checkpoints. {{head}} doesn't have any effect unless {{checkpoints}} is specified. This might be tackled by a future improvement. > oak-run check should also check checkpoints > > > Key: OAK-6373 > URL: https://issues.apache.org/jira/browse/OAK-6373 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run, segment-tar >Reporter: Michael Dürig >Assignee: Andrei Dulceanu > Labels: tooling > Fix For: 1.8 > > > {{oak-run check}} does currently *not* traverse and check the items in the > checkpoint. I think we should change this and add an option to traverse all, > some or none of the checkpoints. When doing this we need to keep in mind the > interaction of this new feature with the {{filter}} option: the paths passed > through this option need then be prefixed with {{/root}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7126) make RDBCacheConsistency2Test store-agnostic
[ https://issues.apache.org/jira/browse/OAK-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Reschke updated OAK-7126: Attachment: OAK-7126.diff Proposed test - [~mreutegg], please review. (and yes, the patch is currently missing the removal of the old test class) > make RDBCacheConsistency2Test store-agnostic > > > Key: OAK-7126 > URL: https://issues.apache.org/jira/browse/OAK-7126 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Attachments: OAK-7126.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-7126) make RDBCacheConsistency2Test store-agnostic
Julian Reschke created OAK-7126: --- Summary: make RDBCacheConsistency2Test store-agnostic Key: OAK-7126 URL: https://issues.apache.org/jira/browse/OAK-7126 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Julian Reschke Assignee: Julian Reschke Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-7125) Build Jackrabbit Oak #1141 failed
Hudson created OAK-7125: --- Summary: Build Jackrabbit Oak #1141 failed Key: OAK-7125 URL: https://issues.apache.org/jira/browse/OAK-7125 Project: Jackrabbit Oak Issue Type: Bug Components: continuous integration Reporter: Hudson No description is provided The build Jackrabbit Oak #1141 has failed. First failed run: [Jackrabbit Oak #1141|https://builds.apache.org/job/Jackrabbit%20Oak/1141/] [console log|https://builds.apache.org/job/Jackrabbit%20Oak/1141/console] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OAK-7124) Support MemoryNodeStore with NodeStoreFixtureProvider
[ https://issues.apache.org/jira/browse/OAK-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra resolved OAK-7124. -- Resolution: Fixed Done with 1820292 > Support MemoryNodeStore with NodeStoreFixtureProvider > - > > Key: OAK-7124 > URL: https://issues.apache.org/jira/browse/OAK-7124 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8, 1.7.15 > > > At times we need to use oak-run console to just execute some script (like > OAK-7122). Currently oak-run console would require a working repository > access. To support such cases we should enable support for using > MemoryNodeStore. So following command can be used > {noformat} > java -jar oak-run-*.jar console memory > {noformat} > The memory NodeStore can be used to play with NodeStore API. Or this can just > be used to enable launch of groovy script -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-7124) Support MemoryNodeStore with NodeStoreFixtureProvider
Chetan Mehrotra created OAK-7124: Summary: Support MemoryNodeStore with NodeStoreFixtureProvider Key: OAK-7124 URL: https://issues.apache.org/jira/browse/OAK-7124 Project: Jackrabbit Oak Issue Type: Improvement Components: run Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Fix For: 1.8, 1.7.15 At times we need to use oak-run console to just execute some script (like OAK-7122). Currently oak-run console would require a working repository access. To support such cases we should enable support for using MemoryNodeStore. So following command can be used {noformat} java -jar oak-run-*.jar console memory {noformat} The memory NodeStore can be used to play with NodeStore API. Or this can just be used to enable launch of groovy script -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OAK-7122) Implement script to compare lucene indexes logically
[ https://issues.apache.org/jira/browse/OAK-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra resolved OAK-7122. -- Resolution: Done > Implement script to compare lucene indexes logically > > > Key: OAK-7122 > URL: https://issues.apache.org/jira/browse/OAK-7122 > Project: Jackrabbit Oak > Issue Type: Task > Components: run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8 > > > With Document Traversal based indexing we have implemented a newer indexing > logic. To validate that index produced by it is is same as one done by > existing indexing flow we need to implement a script which can enable > comparing the index content logically > This was recently discussed on lucene mailing list [1] and suggestion there > was it can be done by un-inverting the index. So to enable that we need to > implement a script which can > # Open a Lucene index > # Map the Lucene Document to path of node > # For each document determine what all fields are associated with it (stored > and non stored) > # Dump this content in file sorted by path and for each line field name > sorted by name > Then such dumps can be generated for old and new index and compared via > simple text diff > [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OAK-7123) ChildNodeStateProvider does not return all immediate children
[ https://issues.apache.org/jira/browse/OAK-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra resolved OAK-7123. -- Resolution: Fixed Done with 1820278 > ChildNodeStateProvider does not return all immediate children > - > > Key: OAK-7123 > URL: https://issues.apache.org/jira/browse/OAK-7123 > Project: Jackrabbit Oak > Issue Type: Bug > Components: run >Affects Versions: 1.7.14 >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8, 1.7.15 > > > Based on script implemented in OAK-7122 and running it against a test index > it was observed that some of the relative fields were not getting indexed. > This happens because the ChildNodeStateProvider#children does not handle the > immediate children check properly. It would fail for case like > {noformat} > /a > /a/b > /a/b/c > /a/d > /a/d/e > {noformat} > Currently it would only report 'b' as child of 'a'. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900 ] Dirk Rudolph edited comment on OAK-7109 at 1/5/18 10:38 AM: [~tmueller] so adding the feature to aggregate the current rep:facet extraction from the UNION alternatives has 2 drawbacks: 1) as said above, all constraints have to be passed to lucene, so the query has to be in DNF, which is not the case at the moment 2) even if this is the case, the disjunctive conjunctions are not mutually exclusive leading to inaccurate result as well 1) can be easily fixed by converting the restriction sot NNF before doing the optimisation. 2) would require also a deduplication between the lucene result sets returned from each of the unions. was (Author: diru): [~tmueller] so adding the feature to aggregate the current rep:facet extraction from the UNION alternatives has 2 drawbacks: 1) as said above, all constraints have to be passed to lucene, so the query has to be in DNF, which is not the case at the moment 2) even if this is the case, the disjunctive conjunctions are not mutually exclusive leading to inaccurate result as well It would require also a deduplication between the lucene results returned from each of the unions. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900 ] Dirk Rudolph edited comment on OAK-7109 at 1/5/18 10:37 AM: [~tmueller] so adding the feature to aggregate the current rep:facet extraction from the UNION alternatives has 2 drawbacks: 1) as said above, all constraints have to be passed to lucene, so the query has to be in DNF, which is not the case at the moment 2) even if this is the case, the disjunctive conjunctions are not mutually exclusive leading to inaccurate result as well It would require also a deduplication between the lucene results returned from each of the unions. was (Author: diru): [~tmueller] so adding the feature to aggregate the current rep:facet extraction from the UNION alternatives has 2 drawbacks: 1) as said above, all constraints have to be passed to lucene, so the query has to be in DNF, which is not the case at the moment 2) even if this is the case, the disjunctive conjunctions are not mutually exclusive leading to inaccurate result as well > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900 ] Dirk Rudolph commented on OAK-7109: --- [~tmueller] so adding the feature to aggregate the current rep:facet extraction from the UNION alternatives has 2 drawbacks: 1) as said above, all constraints have to be passed to lucene, so the query has to be in DNF, which is not the case at the moment 2) even if this is the case, the disjunctive conjunctions are not mutually exclusive leading to inaccurate result as well > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7122) Implement script to compare lucene indexes logically
[ https://issues.apache.org/jira/browse/OAK-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312866#comment-16312866 ] Chetan Mehrotra edited comment on OAK-7122 at 1/5/18 10:24 AM: --- Implemented the script at [1]. Currently it build up the structure in memory. If this proves to be problamatic for large index can look into building the structure on file system *Usage* {code} java -DindexPath=/path/to/indexing-result/indexes/lucene/data \ -jar oak-run-*.jar \ console /path/to/segmentstore \ ":load https://raw.githubusercontent.com/chetanmeh/oak-console-scripts/master/src/main/groovy/lucene/luceneIndexDumper.groovy"; {code} [1] https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene was (Author: chetanm): Implemented the script at [1]. Currently it build up the structure in memory. If this proves to be problamatic for large index can look into building the structure on file system [1] https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene > Implement script to compare lucene indexes logically > > > Key: OAK-7122 > URL: https://issues.apache.org/jira/browse/OAK-7122 > Project: Jackrabbit Oak > Issue Type: Task > Components: run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8 > > > With Document Traversal based indexing we have implemented a newer indexing > logic. To validate that index produced by it is is same as one done by > existing indexing flow we need to implement a script which can enable > comparing the index content logically > This was recently discussed on lucene mailing list [1] and suggestion there > was it can be done by un-inverting the index. So to enable that we need to > implement a script which can > # Open a Lucene index > # Map the Lucene Document to path of node > # For each document determine what all fields are associated with it (stored > and non stored) > # Dump this content in file sorted by path and for each line field name > sorted by name > Then such dumps can be generated for old and new index and compared via > simple text diff > [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-7123) ChildNodeStateProvider does not return all immediate children
Chetan Mehrotra created OAK-7123: Summary: ChildNodeStateProvider does not return all immediate children Key: OAK-7123 URL: https://issues.apache.org/jira/browse/OAK-7123 Project: Jackrabbit Oak Issue Type: Bug Components: run Affects Versions: 1.7.14 Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Fix For: 1.8, 1.7.15 Based on script implemented in OAK-7122 and running it against a test index it was observed that some of the relative fields were not getting indexed. This happens because the ChildNodeStateProvider#children does not handle the immediate children check properly. It would fail for case like {noformat} /a /a/b /a/b/c /a/d /a/d/e {noformat} Currently it would only report 'b' as child of 'a'. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7122) Implement script to compare lucene indexes logically
[ https://issues.apache.org/jira/browse/OAK-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312866#comment-16312866 ] Chetan Mehrotra commented on OAK-7122: -- Implemented the script at [1]. Currently it build up the structure in memory. If this proves to be problamatic for large index can look into building the structure on file system [1] https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene > Implement script to compare lucene indexes logically > > > Key: OAK-7122 > URL: https://issues.apache.org/jira/browse/OAK-7122 > Project: Jackrabbit Oak > Issue Type: Task > Components: run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8 > > > With Document Traversal based indexing we have implemented a newer indexing > logic. To validate that index produced by it is is same as one done by > existing indexing flow we need to implement a script which can enable > comparing the index content logically > This was recently discussed on lucene mailing list [1] and suggestion there > was it can be done by un-inverting the index. So to enable that we need to > implement a script which can > # Open a Lucene index > # Map the Lucene Document to path of node > # For each document determine what all fields are associated with it (stored > and non stored) > # Dump this content in file sorted by path and for each line field name > sorted by name > Then such dumps can be generated for old and new index and compared via > simple text diff > [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312793#comment-16312793 ] Thomas Mueller commented on OAK-7109: - [~catholicon] OK I see facets does not exactly match "group by" + "count". So, what if we add a feature to aggregate the data from a "select [rep:facet(...)] ... UNION select [rep:facet(...)] ..." query? I believe aggregating that data in the query engine should be possible, as the data format of the facet feature is known. >> What if Lucene doesn't index all the constraints? > fail such queries Sounds good to me. I believe right now, if a query uses "select [rep:facet(...)]", then only indexes that support that are used. If there is no index that supports facets, then the query should fail with an exception (if that's not the case yet, we should probably add that). If the Lucene index doesn't support some of the conditions, then it shouldn't return an index plan. That should solve the problem with "union" queries as well. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest
[ https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311966#comment-16311966 ] Julian Reschke edited comment on OAK-7121 at 1/5/18 9:32 AM: - trunk: [r1820199|http://svn.apache.org/r1820199] 1.6: [r1820220|http://svn.apache.org/r1820220] 1.4: [r1820264|http://svn.apache.org/r1820264] 1.2: [r1820268|http://svn.apache.org/r1820268] 1.0: [r1820271|http://svn.apache.org/r1820271] was (Author: reschke): trunk: [r1820199|http://svn.apache.org/r1820199] 1.6: [r1820220|http://svn.apache.org/r1820220] 1.4: [r1820264|http://svn.apache.org/r1820264] 1.2: [r1820268|http://svn.apache.org/r1820268] > DocumentStore testing: allow config of DocumentMK.Builder in > AbstractDocumentStoreTest > -- > > Key: OAK-7121 > URL: https://issues.apache.org/jira/browse/OAK-7121 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20, 1.0.41 > > Attachments: OAK-7121.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest
[ https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Reschke updated OAK-7121: Fix Version/s: 1.0.41 > DocumentStore testing: allow config of DocumentMK.Builder in > AbstractDocumentStoreTest > -- > > Key: OAK-7121 > URL: https://issues.apache.org/jira/browse/OAK-7121 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20, 1.0.41 > > Attachments: OAK-7121.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest
[ https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Reschke updated OAK-7121: Labels: (was: candidate_oak_1_0) > DocumentStore testing: allow config of DocumentMK.Builder in > AbstractDocumentStoreTest > -- > > Key: OAK-7121 > URL: https://issues.apache.org/jira/browse/OAK-7121 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20, 1.0.41 > > Attachments: OAK-7121.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-7122) Implement script to compare lucene indexes logically
Chetan Mehrotra created OAK-7122: Summary: Implement script to compare lucene indexes logically Key: OAK-7122 URL: https://issues.apache.org/jira/browse/OAK-7122 Project: Jackrabbit Oak Issue Type: Task Components: run Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Fix For: 1.8 With Document Traversal based indexing we have implemented a newer indexing logic. To validate that index produced by it is is same as one done by existing indexing flow we need to implement a script which can enable comparing the index content logically This was recently discussed on lucene mailing list [1] and suggestion there was it can be done by un-inverting the index. So to enable that we need to implement a script which can # Open a Lucene index # Map the Lucene Document to path of node # For each document determine what all fields are associated with it (stored and non stored) # Dump this content in file sorted by path and for each line field name sorted by name Then such dumps can be generated for old and new index and compared via simple text diff [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest
[ https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311966#comment-16311966 ] Julian Reschke edited comment on OAK-7121 at 1/5/18 8:35 AM: - trunk: [r1820199|http://svn.apache.org/r1820199] 1.6: [r1820220|http://svn.apache.org/r1820220] 1.4: [r1820264|http://svn.apache.org/r1820264] 1.2: [r1820268|http://svn.apache.org/r1820268] was (Author: reschke): trunk: [r1820199|http://svn.apache.org/r1820199] 1.6: [r1820220|http://svn.apache.org/r1820220] 1.4: [r1820264|http://svn.apache.org/r1820264] > DocumentStore testing: allow config of DocumentMK.Builder in > AbstractDocumentStoreTest > -- > > Key: OAK-7121 > URL: https://issues.apache.org/jira/browse/OAK-7121 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Labels: candidate_oak_1_0 > Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20 > > Attachments: OAK-7121.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest
[ https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Reschke updated OAK-7121: Labels: candidate_oak_1_0 (was: candidate_oak_1_0 candidate_oak_1_2) > DocumentStore testing: allow config of DocumentMK.Builder in > AbstractDocumentStoreTest > -- > > Key: OAK-7121 > URL: https://issues.apache.org/jira/browse/OAK-7121 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Labels: candidate_oak_1_0 > Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20 > > Attachments: OAK-7121.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7121) DocumentStore testing: allow config of DocumentMK.Builder in AbstractDocumentStoreTest
[ https://issues.apache.org/jira/browse/OAK-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Reschke updated OAK-7121: Fix Version/s: 1.2.28 > DocumentStore testing: allow config of DocumentMK.Builder in > AbstractDocumentStoreTest > -- > > Key: OAK-7121 > URL: https://issues.apache.org/jira/browse/OAK-7121 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Labels: candidate_oak_1_0 > Fix For: 1.6.8, 1.8, 1.2.28, 1.7.15, 1.4.20 > > Attachments: OAK-7121.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312691#comment-16312691 ] Dirk Rudolph commented on OAK-7109: --- {quote} I have a very pessimistic view that we should fail such queries - I mean it's better to fail and allow for right index def than giving incorrect results. {quote} +1 > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312687#comment-16312687 ] Vikas Saurabh edited comment on OAK-7109 at 1/5/18 8:22 AM: {quote} (I know the "group by" and "count" are not currently supported by Oak). Or are there other aspects I missed? {quote} Indeed fundamentally that's what facets do - provide usually few (not 'all' unlike group by) properties and count according to how many documents match the query. Lucene's faceting support also does ranges although we don't support that yet - e.g. I could facet of "jcr:created" and the categories could turn out as "today", "within last week", etc (I'm not completely sure about the API... I'm just trying to illustrate that faceted categories can potentially be not-the-actually-stored-value). bq. What do you mean with "scoring"? The scoring part is entirely different issue unrelated to facets - e.g. we won't (can't??) correctly order documents matching queries such as {{ WHERE (CONTAINS(., 'text') AND foo1='bar') OR (CONTAINS(., 'text' AND foo2='bar' AND foo3='bar')}} (foo=bar could be different fulltext clause too... the issue is that we can't quite merge scores coming out of separate lucene queries) But, let's ignore the scoring for this issue. bq. What if Lucene doesn't index all the constraints? I have a very pessimistic view that we should fail such queries - I mean it's better to fail and allow for right index def than giving incorrect results. was (Author: catholicon): {quote} (I know the "group by" and "count" are not currently supported by Oak). Or are there other aspects I missed? {quote} Indeed fundamentally that's what facets do - provide usually few (not 'all' unlike group by) properties and count according to how many documents match the query. Lucene's faceting support also does ranges although we don't support that yet - e.g. I could facet of "jcr:created" and the categories could turn out as "today", "within last week", etc (I'm not completely sure about the API... I'm just trying to illustrate that faceted categories can potentially be not-the-actually-stored-value). bq. What do you mean with "scoring"? The scoring part is entirely different issue unrelated to facets - e.g. we correctly won't (can't??) order documents matching queries such as {{ WHERE (CONTAINS(., 'text') AND foo1='bar') OR (CONTAINS(., 'text' AND foo2='bar' AND foo3='bar')}} (foo=bar could be different fulltext clause too... the issue is that we can't quite merge scores coming out of separate lucene queries) But, let's ignore the scoring for this issue. bq. What if Lucene doesn't index all the constraints? I have a very pessimistic view that we should fail such queries - I mean it's better to fail and allow for right index def than giving incorrect results. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sen
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312687#comment-16312687 ] Vikas Saurabh commented on OAK-7109: {quote} (I know the "group by" and "count" are not currently supported by Oak). Or are there other aspects I missed? {quote} Indeed fundamentally that's what facets do - provide usually few (not 'all' unlike group by) properties and count according to how many documents match the query. Lucene's faceting support also does ranges although we don't support that yet - e.g. I could facet of "jcr:created" and the categories could turn out as "today", "within last week", etc (I'm not completely sure about the API... I'm just trying to illustrate that faceted categories can potentially be not-the-actually-stored-value). bq. What do you mean with "scoring"? The scoring part is entirely different issue unrelated to facets - e.g. we correctly won't (can't??) order documents matching queries such as {{ WHERE (CONTAINS(., 'text') AND foo1='bar') OR (CONTAINS(., 'text' AND foo2='bar' AND foo3='bar')}} (foo=bar could be different fulltext clause too... the issue is that we can't quite merge scores coming out of separate lucene queries) But, let's ignore the scoring for this issue. bq. What if Lucene doesn't index all the constraints? I have a very pessimistic view that we should fail such queries - I mean it's better to fail and allow for right index def than giving incorrect results. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312675#comment-16312675 ] Thomas Mueller commented on OAK-7109: - I don't fully know how facets work. Could you help me a bit with this please. The query {noformat} select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2')) {noformat} converted to "regular SQL" would be this, right? {noformat} select [simple/tags], count(*) from [nt:base] as a where contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2')) group by [simple/tags] {noformat} (I know the "group by" and "count" are not currently supported by Oak). Or are there other aspects I missed? What do you mean with "scoring"? If it's the same, then I guess we might want to support the "group by" and "count" features in Oak, or add a custom logic to combine the results of {noformat} select [rep:facet(...)] ... UNION select [rep:facet(...)] ... {noformat} > passing all constraints to lucene What if Lucene doesn't index all the constraints? > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)