[jira] [Commented] (OAK-10643) MongoDocumentStore: improve diagnostics for too large docs

2024-02-14 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817488#comment-17817488
 ] 

Julian Reschke commented on OAK-10643:
--

trunk: 
[66b8bef296|https://github.com/apache/jackrabbit-oak/commit/66b8bef296b132e821a26b2486cfa5339393395b]

> MongoDocumentStore: improve diagnostics for too large docs
> --
>
> Key: OAK-10643
> URL: https://issues.apache.org/jira/browse/OAK-10643
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>  Labels: candidate_oak_1_22
> Fix For: 1.62.0
>
>
> Log or add to exception message (or both):
> - attempted UpdateOp
> - statistics about the document that was too large to be updated (that would 
> require a read from Mongo)
> Later on, we may want to extend this to that higher layers 
> (DocumentNodeStore) can try some kind of recovery.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10643) MongoDocumentStore: improve diagnostics for too large docs

2024-02-14 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-10643:
-
Labels: candidate_oak_1_22  (was: )

> MongoDocumentStore: improve diagnostics for too large docs
> --
>
> Key: OAK-10643
> URL: https://issues.apache.org/jira/browse/OAK-10643
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>  Labels: candidate_oak_1_22
> Fix For: 1.62.0
>
>
> Log or add to exception message (or both):
> - attempted UpdateOp
> - statistics about the document that was too large to be updated (that would 
> require a read from Mongo)
> Later on, we may want to extend this to that higher layers 
> (DocumentNodeStore) can try some kind of recovery.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10643) MongoDocumentStore: improve diagnostics for too large docs

2024-02-14 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke resolved OAK-10643.
--
Fix Version/s: 1.62.0
   Resolution: Fixed

> MongoDocumentStore: improve diagnostics for too large docs
> --
>
> Key: OAK-10643
> URL: https://issues.apache.org/jira/browse/OAK-10643
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
> Fix For: 1.62.0
>
>
> Log or add to exception message (or both):
> - attempted UpdateOp
> - statistics about the document that was too large to be updated (that would 
> require a read from Mongo)
> Later on, we may want to extend this to that higher layers 
> (DocumentNodeStore) can try some kind of recovery.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10281) Introduce recoveryDelay to ClusterNodeInfo.isRecoveryNeeded

2024-02-14 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved OAK-10281.
---
Resolution: Fixed

* merged [https://github.com/apache/jackrabbit-oak/pull/1288]
 * created OAK-10651 to look into improvements

> Introduce recoveryDelay to ClusterNodeInfo.isRecoveryNeeded
> ---
>
> Key: OAK-10281
> URL: https://issues.apache.org/jira/browse/OAK-10281
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: 1.62.0
>
>
> Oak instances periodically update their leases to signal to peers in the 
> cluster that they are still alive. A lease that has timed out is hence taken 
> as indication that the corresponding oak instance has crashed (and not 
> released the lease). It is also assumed that the corresponding, crashing oak 
> instance does not do any further write operations after the lease timeout - 
> as it would otherwise have been alive and updated their lease, which it did 
> not.
> As already reported elsewhere (eg OAK-10254) there is a case where indeed 
> writes happen later than the lease timeout (aka "late writes"): a writing 
> thread could go passed the lease check, then a stop-the-world (eg high JVM 
> GC) could halt the thread for more than the lease timeout (eg 2min), and upon 
> continuation that writing thread could then send the write operation to the 
> DocumentStore.
> One way to mitigate this late-write risk is to delay the recovery. Ie wait 
> with doing the LastRevRecovery for eg 10min after a lease failure. That 
> includes putting the state of the clusterNode back into inactive.
> This ticket is about introducing such a recoveryDelay config parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10651) Improve ClusterNodeInfo.recoveryDelayMillis

2024-02-14 Thread Stefan Egli (Jira)
Stefan Egli created OAK-10651:
-

 Summary: Improve ClusterNodeInfo.recoveryDelayMillis
 Key: OAK-10651
 URL: https://issues.apache.org/jira/browse/OAK-10651
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: documentmk
Reporter: Stefan Egli


In OAK-10281 a static ClusterNodeInfo.recoveryDelayMillis has been introduced. 
While not a drama, preferably we'd have it non static eg bound to some 
config/context or just DocumentNodeStore instead. This ticket is to revisit 
this static in the context of some broader refactoring that eg might also 
include the similarly static clock object. Several ideas were discussed in 
[PR#1288|https://github.com/apache/jackrabbit-oak/pull/1288#issuecomment-1921925331]
 eg [PR#1292|https://github.com/apache/jackrabbit-oak/pull/1292] or 
[PR#1301|https://github.com/apache/jackrabbit-oak/pull/1301] that could serve 
as a basis for future discussions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (OAK-10648) "IS NULL" (Null Props) Cause Incorrect Query Estimation

2024-02-14 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817235#comment-17817235
 ] 

Thomas Mueller edited comment on OAK-10648 at 2/14/24 3:00 PM:
---

I didn't test this yet, but the following change seem to be necessary:

https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/query/FulltextIndexPlanner.java#L851

{noformat}
oak-search FulltextIndexPlanner

 if (pr.isNotNullRestriction()) {
// don't use weight for "is not null" restrictions
weight = 1;
 missing code start --
} else if (pr.isNullRestriction()) {
// don't use weight for "is null" restrictions
weight = 1;
 missing code end --
} else {
if (weight > 1) {
// for non-equality conditions such as
// where x > 1, x < 2, x like y,...:
// use a maximum weight of 3,
// so assume we read at least 30%
if (!isEqualityRestriction(pr)) {
weight = Math.min(3, weight);
}
}
}
{noformat}

We should probably add a feature toggle / system property so that we can switch 
back to the original behavior, to we can switch back in case an application 
relies on the current behavior.


was (Author: tmueller):
I didn't test this yet, but the following change seem to be necessary:

{noformat}
oak-search FulltextIndexPlanner

 if (pr.isNotNullRestriction()) {
// don't use weight for "is not null" restrictions
weight = 1;
 missing code start --
} else if (pr.isNullRestriction()) {
// don't use weight for "is null" restrictions
weight = 1;
 missing code end --
} else {
if (weight > 1) {
// for non-equality conditions such as
// where x > 1, x < 2, x like y,...:
// use a maximum weight of 3,
// so assume we read at least 30%
if (!isEqualityRestriction(pr)) {
weight = Math.min(3, weight);
}
}
}
{noformat}

We should probably add a feature toggle / system property so that we can switch 
back to the original behavior, to we can switch back in case an application 
relies on the current behavior.

> "IS NULL" (Null Props) Cause Incorrect Query Estimation
> ---
>
> Key: OAK-10648
> URL: https://issues.apache.org/jira/browse/OAK-10648
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Patrique Legault
>Priority: Major
> Attachments: Non Union Query Plan.json, Non Union With Null 
> Check.json, Screenshot 2024-02-13 at 9.30.43 AM.png, Union Query Plan.json, 
> cqTagLucene.json
>
>
> Using null props in a query can cause the query engine to incorrectly 
> estimate the cost of query plan which can lead to a traversal and slow 
> queries to execute.
> If you look at the query plan below the number of null props documents is 
> quiet high yet the cost for the query is only 19. When we execute the UNION 
> query the cost is 38 which is why it is not selected when in reality the 
> original cost should be much higher.
> After removing the null check the cost estimation is drastically different 
> and correctly reflects the number of documents in the index.
> Queries:
> {noformat}
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND (LOWER([jcr:title.en]) LIKE '%ksb1325bm%' OR LOWER([jcr:title]) LIKE 
> '%ksb1325bm%') 
> {noformat}
>  
> {noformat}
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND LOWER([jcr:title.en]) LIKE '%ksb1325bm%' 
> UNION
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND LOWER([jcr:title]) LIKE '%ksb1325bm%'
> {noformat}
> Index definition for the "cq:movedTo" property:
> {noformat}
> "cqMovedTo": {
> "notNullCheckEnabled": true,
> "nullCheckEnabled": true,
> "propertyIndex": true,
> "name": "cq:movedTo",
> "type": "String"
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10650) MongoDocumentStore.findDocuments can fail with BSON exception

2024-02-14 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817405#comment-17817405
 ] 

Julian Reschke commented on OAK-10650:
--

trunk: 
[f165691e0b|https://github.com/apache/jackrabbit-oak-/commit/f165691e0bff0aa7ed5a2650a11dd52630181b20]

> MongoDocumentStore.findDocuments can fail with BSON exception
> -
>
> Key: OAK-10650
> URL: https://issues.apache.org/jira/browse/OAK-10650
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
>  Labels: candidate_oak_1_22
> Fix For: 1.62.0
>
>
> This can happen in an edge case where the BSON condition exceeds the 16MB 
> limit (see in test for OAK-10642).
> The quick fix is to catch the exception and then use a simplified version of 
> the method that get's the documents one-by-one.
> Mid-term, we may want to refactor this so that we avoid the exception by 
> limiting the size of the BSON condition proactively.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10650) MongoDocumentStore.findDocuments can fail with BSON exception

2024-02-14 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke resolved OAK-10650.
--
Fix Version/s: 1.62.0
   Resolution: Fixed

> MongoDocumentStore.findDocuments can fail with BSON exception
> -
>
> Key: OAK-10650
> URL: https://issues.apache.org/jira/browse/OAK-10650
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.62.0
>
>
> This can happen in an edge case where the BSON condition exceeds the 16MB 
> limit (see in test for OAK-10642).
> The quick fix is to catch the exception and then use a simplified version of 
> the method that get's the documents one-by-one.
> Mid-term, we may want to refactor this so that we avoid the exception by 
> limiting the size of the BSON condition proactively.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10650) MongoDocumentStore.findDocuments can fail with BSON exception

2024-02-14 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-10650:
-
Labels: candidate_oak_1_22  (was: )

> MongoDocumentStore.findDocuments can fail with BSON exception
> -
>
> Key: OAK-10650
> URL: https://issues.apache.org/jira/browse/OAK-10650
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
>  Labels: candidate_oak_1_22
> Fix For: 1.62.0
>
>
> This can happen in an edge case where the BSON condition exceeds the 16MB 
> limit (see in test for OAK-10642).
> The quick fix is to catch the exception and then use a simplified version of 
> the method that get's the documents one-by-one.
> Mid-term, we may want to refactor this so that we avoid the exception by 
> limiting the size of the BSON condition proactively.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10650) MongoDocumentStore.findDocuments can fail with BSON exception

2024-02-14 Thread Julian Reschke (Jira)
Julian Reschke created OAK-10650:


 Summary: MongoDocumentStore.findDocuments can fail with BSON 
exception
 Key: OAK-10650
 URL: https://issues.apache.org/jira/browse/OAK-10650
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: documentmk
Reporter: Julian Reschke
Assignee: Julian Reschke


This can happen in an edge case where the BSON condition exceeds the 16MB limit 
(see in test for OAK-10642).

The quick fix is to catch the exception and then use a simplified version of 
the method that get's the documents one-by-one.

Mid-term, we may want to refactor this so that we avoid the exception by 
limiting the size of the BSON condition proactively.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (OAK-10641) DocumentStore: improve test coverage for large properties / documents

2024-02-14 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815702#comment-17815702
 ] 

Julian Reschke edited comment on OAK-10641 at 2/14/24 11:34 AM:


trunk: 
[07fbc86a9a|https://github.com/apache/jackrabbit-oak/commit/07fbc86a9a8a241f4542cc9cb79f339a0e899c3a]
 
[ed1274c878|https://github.com/apache/jackrabbit-oak/commit/ed1274c87866eaa7b7ef67bee5027150871fc09c]


was (Author: reschke):
trunk: 
[ed1274c878|https://github.com/apache/jackrabbit-oak/commit/ed1274c87866eaa7b7ef67bee5027150871fc09c]

> DocumentStore: improve test coverage for large properties / documents
> -
>
> Key: OAK-10641
> URL: https://issues.apache.org/jira/browse/OAK-10641
> Project: Jackrabbit Oak
>  Issue Type: Test
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>  Labels: candidate_oak_1_22
> Fix For: 1.62.0
>
>
> In BasicDocumentStore, we already test large string properties upon document 
> creation (but only up to 8MB).
> Add tests for document *updates*, and also for adding large properties for 
> existing docs.
> Note that these tests will always pass, they just exercise the store impl up 
> to the limit and log the results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10635) BundledTypeRegistry's use of shaded Guava problematic when used outside Oak

2024-02-14 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-10635:
-
Labels: candidate_oak_1_22  (was: )

> BundledTypeRegistry's use of shaded Guava problematic when used outside Oak
> ---
>
> Key: OAK-10635
> URL: https://issues.apache.org/jira/browse/OAK-10635
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Mark Adamcin
>Priority: Minor
>  Labels: candidate_oak_1_22
> Fix For: 1.62.0
>
>
> The oak-shaded-guava bundle exports shaded guava packages with a version that 
> is defined by google to match the version of the upstream artifact. While it 
> is a semantic versioning scheme, it follows the API contract of the entire 
> artifact, and does not distinguish API changes in included packages like 
> .base and .collect at a granular level, which can result in otherwise 
> avoidable OSGi wiring errors when references to guava types leak outside of 
> the greater Oak API boundary, such as when classes are embedded or when guava 
> types are explicitly referenced in signatures outside of oak-shaded-guava.
> oak-commons should endeavor to provide a stable facade API for the simpler 
> parts of the guava library that are referenced at runtime by other oak 
> bundles, such as newHashMap(), ImmutableList.copyOf(), Preconditions.check*, 
> and perhaps Closer. 
> One example I know of that could where I could benefit from this approach 
> almost immediately is a project where I am embedding 
> BundlingConfigInitializer and BundledTypesRegistry from oak-store-document in 
> a customized repository configuration. When BundledTypesRegistry is embedded, 
> it brings with it imports of ImmutableMap, Maps, and Sets from 
> org.apache.jackrabbit.guava.common.collect. With the recent guava upgrade to 
> 33.0.0 in OAK-10605 in 1.61-SNAPSHOT, the custom repository bundle fails to 
> activate because the previous import-package bounds no longer match: 
> {{org.apache.jackrabbit.guava.common.collect;version=[32.1.3,33).}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10649) MemoryDS: add toggle to limit document size

2024-02-14 Thread Julian Reschke (Jira)
Julian Reschke created OAK-10649:


 Summary: MemoryDS: add toggle to limit document size
 Key: OAK-10649
 URL: https://issues.apache.org/jira/browse/OAK-10649
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: documentmk, test
Reporter: Julian Reschke
Assignee: Julian Reschke


To simplify testing related to MongoDB's 16 MB limit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10648) "IS NULL" (Null Props) Cause Incorrect Query Estimation

2024-02-14 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-10648:
-
Description: 
Using null props in a query can cause the query engine to incorrectly estimate 
the cost of query plan which can lead to a traversal and slow queries to 
execute.

If you look at the query plan below the number of null props documents is quiet 
high yet the cost for the query is only 19. When we execute the UNION query the 
cost is 38 which is why it is not selected when in reality the original cost 
should be much higher.

After removing the null check the cost estimation is drastically different and 
correctly reflects the number of documents in the index.

Queries:
{noformat}
SELECT * FROM [cq:Tag] 
WHERE [cq:movedTo] IS NULL 
AND (LOWER([jcr:title.en]) LIKE '%ksb1325bm%' OR LOWER([jcr:title]) LIKE 
'%ksb1325bm%') 
{noformat}
 
{noformat}
SELECT * FROM [cq:Tag] 
WHERE [cq:movedTo] IS NULL 
AND LOWER([jcr:title.en]) LIKE '%ksb1325bm%' 
UNION
SELECT * FROM [cq:Tag] 
WHERE [cq:movedTo] IS NULL 
AND LOWER([jcr:title]) LIKE '%ksb1325bm%'
{noformat}

Index definition for the "cq:movedTo" property:

{noformat}
"cqMovedTo": {
"notNullCheckEnabled": true,
"nullCheckEnabled": true,
"propertyIndex": true,
"name": "cq:movedTo",
"type": "String"
}
{noformat}

  was:
Using null props in a query can cause the query engine to incorrectly estimate 
the cost of query plan which can lead to a traversal and slow queries to 
execute.

 

If you look at the query plan below the number of null props documents is quiet 
high yet the cost for the query is only 19. When we execute the UNION query the 
cost is 38 which is why it is not selected when in reality the original cost 
should be much higher.

 

After removing the null check the cost estimation is drastically different and 
correctly reflects the number of documents in the index.

Queries:
{noformat}
SELECT * FROM [cq:Tag] 
WHERE [cq:movedTo] IS NULL 
AND (LOWER([jcr:title.en]) LIKE '%ksb1325bm%' OR LOWER([jcr:title]) LIKE 
'%ksb1325bm%') 
{noformat}
 
{noformat}
SELECT * FROM [cq:Tag] 
WHERE [cq:movedTo] IS NULL 
AND LOWER([jcr:title.en]) LIKE '%ksb1325bm%' 
UNION
SELECT * FROM [cq:Tag] 
WHERE [cq:movedTo] IS NULL 
AND LOWER([jcr:title]) LIKE '%ksb1325bm%'
{noformat}



> "IS NULL" (Null Props) Cause Incorrect Query Estimation
> ---
>
> Key: OAK-10648
> URL: https://issues.apache.org/jira/browse/OAK-10648
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Patrique Legault
>Priority: Major
> Attachments: Non Union Query Plan.json, Non Union With Null 
> Check.json, Screenshot 2024-02-13 at 9.30.43 AM.png, Union Query Plan.json, 
> cqTagLucene.json
>
>
> Using null props in a query can cause the query engine to incorrectly 
> estimate the cost of query plan which can lead to a traversal and slow 
> queries to execute.
> If you look at the query plan below the number of null props documents is 
> quiet high yet the cost for the query is only 19. When we execute the UNION 
> query the cost is 38 which is why it is not selected when in reality the 
> original cost should be much higher.
> After removing the null check the cost estimation is drastically different 
> and correctly reflects the number of documents in the index.
> Queries:
> {noformat}
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND (LOWER([jcr:title.en]) LIKE '%ksb1325bm%' OR LOWER([jcr:title]) LIKE 
> '%ksb1325bm%') 
> {noformat}
>  
> {noformat}
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND LOWER([jcr:title.en]) LIKE '%ksb1325bm%' 
> UNION
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND LOWER([jcr:title]) LIKE '%ksb1325bm%'
> {noformat}
> Index definition for the "cq:movedTo" property:
> {noformat}
> "cqMovedTo": {
> "notNullCheckEnabled": true,
> "nullCheckEnabled": true,
> "propertyIndex": true,
> "name": "cq:movedTo",
> "type": "String"
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10648) "IS NULL" (Null Props) Cause Incorrect Query Estimation

2024-02-14 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-10648:
-
Summary: "IS NULL" (Null Props) Cause Incorrect Query Estimation  (was: 
Null Props Cause Incorrect Query Estimation)

> "IS NULL" (Null Props) Cause Incorrect Query Estimation
> ---
>
> Key: OAK-10648
> URL: https://issues.apache.org/jira/browse/OAK-10648
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Patrique Legault
>Priority: Major
> Attachments: Non Union Query Plan.json, Non Union With Null 
> Check.json, Screenshot 2024-02-13 at 9.30.43 AM.png, Union Query Plan.json, 
> cqTagLucene.json
>
>
> Using null props in a query can cause the query engine to incorrectly 
> estimate the cost of query plan which can lead to a traversal and slow 
> queries to execute.
>  
> If you look at the query plan below the number of null props documents is 
> quiet high yet the cost for the query is only 19. When we execute the UNION 
> query the cost is 38 which is why it is not selected when in reality the 
> original cost should be much higher.
>  
> After removing the null check the cost estimation is drastically different 
> and correctly reflects the number of documents in the index.
> Queries:
> {noformat}
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND (LOWER([jcr:title.en]) LIKE '%ksb1325bm%' OR LOWER([jcr:title]) LIKE 
> '%ksb1325bm%') 
> {noformat}
>  
> {noformat}
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND LOWER([jcr:title.en]) LIKE '%ksb1325bm%' 
> UNION
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND LOWER([jcr:title]) LIKE '%ksb1325bm%'
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10648) Null Props Cause Incorrect Query Estimation

2024-02-14 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-10648:
-
Description: 
Using null props in a query can cause the query engine to incorrectly estimate 
the cost of query plan which can lead to a traversal and slow queries to 
execute.

 

If you look at the query plan below the number of null props documents is quiet 
high yet the cost for the query is only 19. When we execute the UNION query the 
cost is 38 which is why it is not selected when in reality the original cost 
should be much higher.

 

After removing the null check the cost estimation is drastically different and 
correctly reflects the number of documents in the index.

Queries:
{noformat}
SELECT * FROM [cq:Tag] 
WHERE [cq:movedTo] IS NULL 
AND (LOWER([jcr:title.en]) LIKE '%ksb1325bm%' OR LOWER([jcr:title]) LIKE 
'%ksb1325bm%') 
{noformat}
 
{noformat}
SELECT * FROM [cq:Tag] 
WHERE [cq:movedTo] IS NULL 
AND LOWER([jcr:title.en]) LIKE '%ksb1325bm%' 
UNION
SELECT * FROM [cq:Tag] 
WHERE [cq:movedTo] IS NULL 
AND LOWER([jcr:title]) LIKE '%ksb1325bm%'
{noformat}


  was:
Using null props in a query can cause the query engine to incorrectly estimate 
the cost of query plan which can lead to a traversal and slow queries to 
execute.

 

If you look at the query plan below the number of null props documents is quiet 
high yet the cost for the query is only 19. When we execute the UNION query the 
cost is 38 which is why it is not selected when in reality the original cost 
should be much higher.

 

After removing the null check the cost estimation is drastically different and 
correctly reflects the number of documents in the index.


> Null Props Cause Incorrect Query Estimation
> ---
>
> Key: OAK-10648
> URL: https://issues.apache.org/jira/browse/OAK-10648
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Patrique Legault
>Priority: Major
> Attachments: Non Union Query Plan.json, Non Union With Null 
> Check.json, Screenshot 2024-02-13 at 9.30.43 AM.png, Union Query Plan.json, 
> cqTagLucene.json
>
>
> Using null props in a query can cause the query engine to incorrectly 
> estimate the cost of query plan which can lead to a traversal and slow 
> queries to execute.
>  
> If you look at the query plan below the number of null props documents is 
> quiet high yet the cost for the query is only 19. When we execute the UNION 
> query the cost is 38 which is why it is not selected when in reality the 
> original cost should be much higher.
>  
> After removing the null check the cost estimation is drastically different 
> and correctly reflects the number of documents in the index.
> Queries:
> {noformat}
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND (LOWER([jcr:title.en]) LIKE '%ksb1325bm%' OR LOWER([jcr:title]) LIKE 
> '%ksb1325bm%') 
> {noformat}
>  
> {noformat}
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND LOWER([jcr:title.en]) LIKE '%ksb1325bm%' 
> UNION
> SELECT * FROM [cq:Tag] 
> WHERE [cq:movedTo] IS NULL 
> AND LOWER([jcr:title]) LIKE '%ksb1325bm%'
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)