[jira] [Updated] (OAK-7119) Restrict de-serialization mechanism for older serialized cache map in DataStoreCacheUtils to the classes required

2018-01-03 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7119:

Labels: candidate_oak_1_6  (was: )

> Restrict de-serialization mechanism for older serialized cache map in 
> DataStoreCacheUtils to the classes required
> -
>
> Key: OAK-7119
> URL: https://issues.apache.org/jira/browse/OAK-7119
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: blob-plugins
>Reporter: Amit Jain
>Assignee: Amit Jain
>  Labels: candidate_oak_1_6
> Fix For: 1.8, 1.7.15
>
>
> We could use the class 
> https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/serialization/ValidatingObjectInputStream.html
>  to restrict de-serialization to the required classes and throw errors in 
> case of others.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7101) Stale documents in RDBDocumentStore cache

2018-01-03 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309812#comment-16309812
 ] 

Julian Reschke commented on OAK-7101:
-

FWIW, this seems to be a race between the update operation and the query 
operation building the QueryContext.

Reducing the cache size to 64*1024 makes the problem occur more frequently.

> Stale documents in RDBDocumentStore cache
> -
>
> Key: OAK-7101
> URL: https://issues.apache.org/jira/browse/OAK-7101
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: rdbmk
>Affects Versions: 1.0, 1.4.0, 1.6.0, 1.2.0
>Reporter: Marcel Reutegger
>Assignee: Julian Reschke
> Fix For: 1.0.40, 1.4.19, 1.6.8, 1.8, 1.2.28, 1.7.15
>
> Attachments: OAK-7101.patch, query-lock.diff
>
>
> Concurrent query and update operations on RDBDocumentStore may result in 
> stale entries in the document cache.
> Potentially related issues are OAK-5387 and OAK-6062.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7120) Use specific MongoDB version on travis-ci

2018-01-03 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger resolved OAK-7120.
---
Resolution: Fixed

> Use specific MongoDB version on travis-ci 
> --
>
> Key: OAK-7120
> URL: https://issues.apache.org/jira/browse/OAK-7120
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: mongomk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Minor
> Fix For: 1.4.19, 1.6.8, 1.2.28
>
>
> As mentioned in OAK-5317 travis-ci sometimes bumps MongoDB versions on their 
> workers. The build file should therefore use a specific version (at least on 
> the branches).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7120) Use specific MongoDB version on travis-ci

2018-01-03 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309683#comment-16309683
 ] 

Marcel Reutegger edited comment on OAK-7120 at 1/3/18 3:39 PM:
---

I had to switch to {{sudo:required}} because container based environments do 
not allow to downgrade MongoDB.

Done in
- 1.6: http://svn.apache.org/r1819951 and http://svn.apache.org/r1819977
- 1.4: http://svn.apache.org/r1819978 and http://svn.apache.org/r1819983
- 1.2: http://svn.apache.org/r1819979 and http://svn.apache.org/r1819985


was (Author: mreutegg):
I had to switch to {{sudo:required}} because container based environments do 
not allow to downgrade MongoDB.

Done in
- 1.6: http://svn.apache.org/r1819951 and http://svn.apache.org/r1819977

> Use specific MongoDB version on travis-ci 
> --
>
> Key: OAK-7120
> URL: https://issues.apache.org/jira/browse/OAK-7120
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: mongomk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Minor
> Fix For: 1.4.19, 1.6.8, 1.2.28
>
>
> As mentioned in OAK-5317 travis-ci sometimes bumps MongoDB versions on their 
> workers. The build file should therefore use a specific version (at least on 
> the branches).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7120) Use specific MongoDB version on travis-ci

2018-01-03 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309683#comment-16309683
 ] 

Marcel Reutegger edited comment on OAK-7120 at 1/3/18 3:27 PM:
---

I had to switch to {{sudo:required}} because container based environments do 
not allow to downgrade MongoDB.

Done in
- 1.6: http://svn.apache.org/r1819951 and http://svn.apache.org/r1819977


was (Author: mreutegg):
I had to switch to {{sudo:required}} because container based environments do 
not allow to downgrade MongoDB.

Done in
- 1.6: http://svn.apache.org/r1819951

> Use specific MongoDB version on travis-ci 
> --
>
> Key: OAK-7120
> URL: https://issues.apache.org/jira/browse/OAK-7120
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: mongomk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Minor
> Fix For: 1.4.19, 1.6.8, 1.2.28
>
>
> As mentioned in OAK-5317 travis-ci sometimes bumps MongoDB versions on their 
> workers. The build file should therefore use a specific version (at least on 
> the branches).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7101) Stale documents in RDBDocumentStore cache

2018-01-03 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309690#comment-16309690
 ] 

Marcel Reutegger commented on OAK-7101:
---

bq. backport the CacheChangesTracker

I'd say, it depends on whether this is even possible without major conflicts 
and whether there is a more simple solution to this problem. For older branches 
like 1.2 and 1.0 my preference is a simple fix that keeps the risk of 
regressions low.

> Stale documents in RDBDocumentStore cache
> -
>
> Key: OAK-7101
> URL: https://issues.apache.org/jira/browse/OAK-7101
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: rdbmk
>Affects Versions: 1.0, 1.4.0, 1.6.0, 1.2.0
>Reporter: Marcel Reutegger
>Assignee: Julian Reschke
> Fix For: 1.0.40, 1.4.19, 1.6.8, 1.8, 1.2.28, 1.7.15
>
> Attachments: OAK-7101.patch, query-lock.diff
>
>
> Concurrent query and update operations on RDBDocumentStore may result in 
> stale entries in the document cache.
> Potentially related issues are OAK-5387 and OAK-6062.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7120) Use specific MongoDB version on travis-ci

2018-01-03 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309683#comment-16309683
 ] 

Marcel Reutegger commented on OAK-7120:
---

I had to switch to {{sudo:required}} because container based environments do 
not allow to downgrade MongoDB.

Done in
- 1.6: http://svn.apache.org/r1819951

> Use specific MongoDB version on travis-ci 
> --
>
> Key: OAK-7120
> URL: https://issues.apache.org/jira/browse/OAK-7120
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: mongomk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Minor
> Fix For: 1.4.19, 1.6.8, 1.2.28
>
>
> As mentioned in OAK-5317 travis-ci sometimes bumps MongoDB versions on their 
> workers. The build file should therefore use a specific version (at least on 
> the branches).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7101) Stale documents in RDBDocumentStore cache

2018-01-03 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7101:

Attachment: query-lock.diff

tried with locking where query puts things into the cache, but that doesn't 
seem to help...

> Stale documents in RDBDocumentStore cache
> -
>
> Key: OAK-7101
> URL: https://issues.apache.org/jira/browse/OAK-7101
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: rdbmk
>Affects Versions: 1.0, 1.4.0, 1.6.0, 1.2.0
>Reporter: Marcel Reutegger
>Assignee: Julian Reschke
> Fix For: 1.0.40, 1.4.19, 1.6.8, 1.8, 1.2.28, 1.7.15
>
> Attachments: OAK-7101.patch, query-lock.diff
>
>
> Concurrent query and update operations on RDBDocumentStore may result in 
> stale entries in the document cache.
> Potentially related issues are OAK-5387 and OAK-6062.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7071) PostingsHighlighter, Highlighter and SimpleExcerptProvider return all different formats for excerpts

2018-01-03 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7071:
--
Description: 
*PostingsHighligher* returns for example 
{quote} 
[my text with any highlighting followed by more text]
{quote}
because the PostingsHighligher itself returns for each field a {{String[]}} of 
phrases limited by the beforehand given max phrases. This String[] is the 
transformed to String using {{Arrays.toString()}} at 
[LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688]
 causing the value to be wrapped in square brackets.

*Highlighter* returns 
{quote}
my text with any highlighting followed by more text 
{quote}

*SimpleExcerptProvider* returns
{quote}
my text with any highlighting followed by more 
text
{quote}

As the PostingsHighligher cannot get any custom prefix or suffix, I would 
suggest set  as default for the others as well to prevent any further 
text transformation post extracting the excerpts.


  was:
*PostingsHighligher* returns for example 
{quote} 
[my text with any highlighting followed by more text]
{quote}
because the PostingsHighligher itself returns for each field a {{String[]}} of 
phrases limited by the beforehand given max phrases. This String[] is the 
transformed to String using {{Arrays.toString()}} at 
[LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688]
 causing the value to be wrapped in square brackets.

*Highlighter* returns 
{quote}
my text with any highlighting followed by more text 
{quote}

*SimpleExcerptProvider* returns
{quote}
my text with any highlighting followed by more text 
{quote}

As the PostingsHighligher cannot get any custom prefix or suffix, I would 
suggest set  as default for the others as well to prevent any further 
text transformation post extracting the excerpts.



> PostingsHighlighter, Highlighter and SimpleExcerptProvider return all 
> different formats for excerpts
> 
>
> Key: OAK-7071
> URL: https://issues.apache.org/jira/browse/OAK-7071
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>  Labels: excerpt
>
> *PostingsHighligher* returns for example 
> {quote} 
> [my text with any highlighting followed by more text]
> {quote}
> because the PostingsHighligher itself returns for each field a {{String[]}} 
> of phrases limited by the beforehand given max phrases. This String[] is the 
> transformed to String using {{Arrays.toString()}} at 
> [LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688]
>  causing the value to be wrapped in square brackets.
> *Highlighter* returns 
> {quote}
> my text with any highlighting followed by more text 
> {quote}
> *SimpleExcerptProvider* returns
> {quote}
> my text with any highlighting followed by more 
> text
> {quote}
> As the PostingsHighligher cannot get any custom prefix or suffix, I would 
> suggest set  as default for the others as well to prevent any further 
> text transformation post extracting the excerpts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309559#comment-16309559
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:47 PM:


Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting/xor ala "If a is set to true, b has 
to be in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example 
([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch])

Edit: I think there are 2 issues here: 
1) the OR of the query with both statements 
2) the not with the query containing only the second disjunctive statement. 


was (Author: diru):
Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting/xor ala "If a is set to true, b has 
to be in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example 
([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch])

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309559#comment-16309559
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:14 PM:


Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting/xor ala "If a is set to true, b has 
to be in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example 
([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch])


was (Author: diru):
Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example 
([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch])

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309559#comment-16309559
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:13 PM:


Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example 
([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch])


was (Author: diru):
Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example (restrictionPropagationTest.patch)

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309559#comment-16309559
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:13 PM:


Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example (restrictionPropagationTest.patch)


was (Author: diru):
Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example.

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309559#comment-16309559
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:12 PM:


Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example.


was (Author: diru):
Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.



> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7109:
--
Attachment: restrictionPropagationTest.patch

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309559#comment-16309559
 ] 

Dirk Rudolph commented on OAK-7109:
---

Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.



> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7119) Restrict de-serialization mechanism for older serialized cache map in DataStoreCacheUtils to the classes required

2018-01-03 Thread Amit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309505#comment-16309505
 ] 

Amit Jain commented on OAK-7119:


Changed the deserialization code to use the above commons-io wrapper with 
http://svn.apache.org/viewvc?rev=1819950=rev

> Restrict de-serialization mechanism for older serialized cache map in 
> DataStoreCacheUtils to the classes required
> -
>
> Key: OAK-7119
> URL: https://issues.apache.org/jira/browse/OAK-7119
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: blob-plugins
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.8, 1.7.15
>
>
> We could use the class 
> https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/serialization/ValidatingObjectInputStream.html
>  to restrict de-serialization to the required classes and throw errors in 
> case of others.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7101) Stale documents in RDBDocumentStore cache

2018-01-03 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309463#comment-16309463
 ] 

Julian Reschke commented on OAK-7101:
-

Thanks for the notification, [~mreutegg]. I'll try to have it fail here as well.

Which brings me back to the question whether we should try to backport the 
CacheChangesTracker?

> Stale documents in RDBDocumentStore cache
> -
>
> Key: OAK-7101
> URL: https://issues.apache.org/jira/browse/OAK-7101
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: rdbmk
>Affects Versions: 1.0, 1.4.0, 1.6.0, 1.2.0
>Reporter: Marcel Reutegger
>Assignee: Julian Reschke
> Fix For: 1.0.40, 1.4.19, 1.6.8, 1.8, 1.2.28, 1.7.15
>
> Attachments: OAK-7101.patch
>
>
> Concurrent query and update operations on RDBDocumentStore may result in 
> stale entries in the document cache.
> Potentially related issues are OAK-5387 and OAK-6062.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7101) Stale documents in RDBDocumentStore cache

2018-01-03 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309440#comment-16309440
 ] 

Marcel Reutegger commented on OAK-7101:
---

Please note, the test does not aways fail, thus you may have to run it in a 
loop to get more confidence if a fix is good.

A recent 1.2 build on travis showed a failed 
[RDBCacheConsistency2Test|https://travis-ci.org/apache/jackrabbit-oak/builds/322857354].

> Stale documents in RDBDocumentStore cache
> -
>
> Key: OAK-7101
> URL: https://issues.apache.org/jira/browse/OAK-7101
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: rdbmk
>Affects Versions: 1.0, 1.4.0, 1.6.0, 1.2.0
>Reporter: Marcel Reutegger
>Assignee: Julian Reschke
> Fix For: 1.0.40, 1.4.19, 1.6.8, 1.8, 1.2.28, 1.7.15
>
> Attachments: OAK-7101.patch
>
>
> Concurrent query and update operations on RDBDocumentStore may result in 
> stale entries in the document cache.
> Potentially related issues are OAK-5387 and OAK-6062.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5317) MongoBlobStore creates _id index unnecessarily

2018-01-03 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger updated OAK-5317:
--
Fix Version/s: 1.2.28
   1.4.19
   1.0.40

> MongoBlobStore creates _id index unnecessarily
> --
>
> Key: OAK-5317
> URL: https://issues.apache.org/jira/browse/OAK-5317
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Minor
> Fix For: 1.5.16, 1.6.0, 1.0.40, 1.4.19, 1.2.28
>
>
> In MongoDB each collection automatically has an index on the _id field. 
> MongoBlobStore therefore does not need to create that index.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7120) Use specific MongoDB version on travis-ci

2018-01-03 Thread Marcel Reutegger (JIRA)
Marcel Reutegger created OAK-7120:
-

 Summary: Use specific MongoDB version on travis-ci 
 Key: OAK-7120
 URL: https://issues.apache.org/jira/browse/OAK-7120
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: mongomk
Reporter: Marcel Reutegger
Assignee: Marcel Reutegger
Priority: Minor
 Fix For: 1.4.19, 1.6.8, 1.2.28


As mentioned in OAK-5317 travis-ci sometimes bumps MongoDB versions on their 
workers. The build file should therefore use a specific version (at least on 
the branches).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5317) MongoBlobStore creates _id index unnecessarily

2018-01-03 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309423#comment-16309423
 ] 

Marcel Reutegger commented on OAK-5317:
---

bq. should we not fix .travis.xml 1.4 and 1.2 to match the supported mongo 
versions?

Agreed. Created OAK-5317.

> MongoBlobStore creates _id index unnecessarily
> --
>
> Key: OAK-5317
> URL: https://issues.apache.org/jira/browse/OAK-5317
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Minor
> Fix For: 1.5.16, 1.6.0
>
>
> In MongoDB each collection automatically has an index on the _id field. 
> MongoBlobStore therefore does not need to create that index.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7116) Use JMX mode to reindex on SegmentNodeStore without requiring manual steps

2018-01-03 Thread Daniel Hasler (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309406#comment-16309406
 ] 

Daniel Hasler commented on OAK-7116:


[~chetanm] thanks! +1 for a prototype to achieve automation friendliness for 
oak-run reindexing on Tar.

> Use JMX mode to reindex on SegmentNodeStore without requiring manual steps
> --
>
> Key: OAK-7116
> URL: https://issues.apache.org/jira/browse/OAK-7116
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.10
>
>
> oak-run indexing for SegmentNodeStore currently require following steps while 
> performing indexing against a repository which is in use [1]
> # Create checkpoint via MBean and pass it as part of cli args
> # Perform actual indexing with read only access to repo
> # Import the index via MBean operation 
> As per current documented steps #1 and #3 are manual. This can potentially be 
> simplified by directly using JMX operation from within oak-run as currently 
> for accessing SegmentNodeStore oak-run needs to run on same host as actual 
> application
> *JMX Access*
> JMX access can be done via following ways
> # Application using Oak has JMX remote 
> ## Enabled and same info provided as cli args
> ## Not enabled - In such a case we can rely on 
> ### pid and [local 
> connection|https://stackoverflow.com/questions/13252914/how-to-connect-to-a-local-jmx-server-by-knowing-the-process-id]
>  or [attach|https://github.com/nickman/jmxlocal]
> ### Or query all running java prcess jmx and check if SegmentNodeStore repo 
> path is same as one provided in cli
> # Application using OAk
> *Proposed Approach*
> # Establish the JMX connection
> # Create checkpoint using the JMX connection programatically
> # Perform indexing with read only access
> # Import the index via JMX access
> With this indexing support for SegmentNodeStore would be at par with 
> DocumentNodeStore in terms of ease of use
> [1] https://jackrabbit.apache.org/oak/docs/query/oak-run-indexing.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7119) Restrict de-serialization mechanism for older serialized cache map in DataStoreCacheUtils to the classes required

2018-01-03 Thread Amit Jain (JIRA)
Amit Jain created OAK-7119:
--

 Summary: Restrict de-serialization mechanism for older serialized 
cache map in DataStoreCacheUtils to the classes required
 Key: OAK-7119
 URL: https://issues.apache.org/jira/browse/OAK-7119
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: blob-plugins
Reporter: Amit Jain
Assignee: Amit Jain
 Fix For: 1.8, 1.7.15


We could use the class 
https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/serialization/ValidatingObjectInputStream.html
 to restrict de-serialization to the required classes and throw errors in case 
of others.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7115) Store NodeState json in bytes when storing in in-memory queue

2018-01-03 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7115.
--
   Resolution: Fixed
Fix Version/s: (was: 1.10)
   1.7.15
   1.8

Done with 1819936. It just stores the json in bytes but does not perform any 
compression. With this change the dumping time for 65M nodestates reduced from 
2.632h to 2.230h i.e. saving of 24 mins!

> Store NodeState json in bytes when storing in in-memory queue
> -
>
> Key: OAK-7115
> URL: https://issues.apache.org/jira/browse/OAK-7115
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8, 1.7.15
>
> Attachments: OAK-7115-v1.patch
>
>
> Currently TraverseWithSortStrategy stores the NodeStateEntry as json text in 
> the in-memory queue. We can save memory by storing it in byte array and 
> probably compressed which would allow storing more entries in-memory before 
> sorting and saving in the file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309376#comment-16309376
 ] 

Dirk Rudolph commented on OAK-7109:
---

Hi [~catholicon] somehow the mail agent doesn't accept my mailings to oak-dev 
(I'm subscribed and receiving mail but sending doesn't work ... anyway).

I checked the implementation of the optimisation and its not in dnf, as the 
optimisation is not done on the negation normal form of the query (so not(a or 
b) are not properly expanded to not(a) and not(b). For example (based on 
org.apache.jackrabbit.oak.query.SQL2OptimiseQueryTest#optimiseAndOrAnd()):

{code}
given ([a]=1 or [b]=2 or ([c]=3 and not([d]=4 or [e]=5))) and [x]=6 <=> ([a]=1 
or [b]=2 or ([c]=3 and [d]<>4 and [e]<>5))) and [x]=6
expected ([a]=1 and [x]=6), ([b]=2 and [x]=6), ([c]=3 and [d]<>4 and [e]<>5 and 
[x]=6)
actual ((c = 3) and (not ((d = 4) or (e = 5 and (x = 6), (b = 2) and (x = 
6), (a = 1) and (x = 6)
{code}

And even, assuming we would have the alternative being a DNF and facet counting 
across unions would be supported merging the results from each of the queries 
given to lucene, the result will still be wrong as each of the disjunctive 
statements will not be mutually exclusive (as it would be with xor). So from my 
perspective there is not way to get proper facet counts in that case from 
consumer side and only the option of 

b) filtering the documents based on the filter 
c) passing all constraints to lucene

would work. 

Regarding b) as from what I can see in the code base the nodes are not actually 
read but only the permissions on their path are checked in 
[FilteredSortedSetDocValuesFacetCounts.java#L91|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L91]

I will check further why our specific query doesn't get entirely passed to 
lucene (or better which constraints are not taken into account beside the path 
constraints). Anyway as a user of the jcr api I would expect a 
RepositoryException (or any other) when I try to run a query with facet 
extraction that no index can provide - similar to the exception I get when the 
field I extract facets on is not stored. 


> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7115) Store NodeState json in bytes when storing in in-memory queue

2018-01-03 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7115:
-
Summary: Store NodeState json in bytes when storing in in-memory queue  
(was: Compress NodeStateEntry when storing in in-memory queue)

> Store NodeState json in bytes when storing in in-memory queue
> -
>
> Key: OAK-7115
> URL: https://issues.apache.org/jira/browse/OAK-7115
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.10
>
> Attachments: OAK-7115-v1.patch
>
>
> Currently TraverseWithSortStrategy stores the NodeStateEntry as json text in 
> the in-memory queue. We can save memory by storing it in byte array and 
> probably compressed which would allow storing more entries in-memory before 
> sorting and saving in the file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7117) Suppress Tika startup warnings

2018-01-03 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7117:

Attachment: OAK-7117.diff

with test failures, needs more work

> Suppress Tika startup warnings
> --
>
> Key: OAK-7117
> URL: https://issues.apache.org/jira/browse/OAK-7117
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: lucene
>Affects Versions: 1.8
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Attachments: OAK-7117.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7117) Suppress Tika startup warnings

2018-01-03 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7117:

Attachment: (was: OAK-7101.diff)

> Suppress Tika startup warnings
> --
>
> Key: OAK-7117
> URL: https://issues.apache.org/jira/browse/OAK-7117
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: lucene
>Affects Versions: 1.8
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (OAK-7117) Suppress Tika startup warnings

2018-01-03 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7117:

Comment: was deleted

(was: (still test failures, needs work))

> Suppress Tika startup warnings
> --
>
> Key: OAK-7117
> URL: https://issues.apache.org/jira/browse/OAK-7117
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: lucene
>Affects Versions: 1.8
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7117) Suppress Tika startup warnings

2018-01-03 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7117:

Attachment: OAK-7101.diff

(still test failures, needs work)

> Suppress Tika startup warnings
> --
>
> Key: OAK-7117
> URL: https://issues.apache.org/jira/browse/OAK-7117
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: lucene
>Affects Versions: 1.8
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)