[jira] [Resolved] (OAK-10804) Indexing job: optimize check for hidden nodes

2024-05-16 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10804.
---
Fix Version/s: 1.64.0
   Resolution: Done

> Indexing job: optimize check for hidden nodes
> -
>
> Key: OAK-10804
> URL: https://issues.apache.org/jira/browse/OAK-10804
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.64.0
>
>
> While downloading the repository from Mongo, the indexing job has to discard 
> hidden entries. This is being done by a call to 
> {{{}NodeStateUtils.isHiddenPath(){}}}. This call is rather expensive, as it 
> creates an iterator over the path segments, which requires creating a new 
> string for each path segment. As the indexing job has to check every entry to 
> verify if it is hidden, this creates a significant overhead.
> The implementation of checking for hidden paths can be replaced by a simple 
> search for {{"/:"}} in the string representing the path, which requires no 
> object allocation and should therefore be much faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10810) Remove redundant call to StringCache.get in Path.fromString()

2024-05-16 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10810.
---
Fix Version/s: 1.64.0
   Resolution: Done

> Remove redundant call to StringCache.get in Path.fromString()
> -
>
> Key: OAK-10810
> URL: https://issues.apache.org/jira/browse/OAK-10810
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>  Labels: candidate_oak_1_22
> Fix For: 1.64.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10810) Remove redundant call to StringCache.get in Path.fromString()

2024-05-16 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10810:
-

 Summary: Remove redundant call to StringCache.get in 
Path.fromString()
 Key: OAK-10810
 URL: https://issues.apache.org/jira/browse/OAK-10810
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10778) Indexing job: support parallel download from MongoDB with two connections in Pipelined strategy

2024-05-16 Thread Nuno Santos (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846831#comment-17846831
 ] 

Nuno Santos commented on OAK-10778:
---

Fix here:
[https://github.com/apache/jackrabbit-oak/pull/1463]

> Indexing job: support parallel download from MongoDB with two connections in 
> Pipelined strategy
> ---
>
> Key: OAK-10778
> URL: https://issues.apache.org/jira/browse/OAK-10778
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Assignee: Nuno Santos
>Priority: Major
> Fix For: 1.64.0
>
>
> The current version of the Pipelined download strategy uses a single 
> connection/thread to download from MongoDB. We can further increase the 
> download speed by using an additional MongoDB connection. A Mongo deployment 
> has 1 primary and 2 secondaries, so in principle we could have 1 connection 
> to each secondary, effectively doubling the download speed.
> There are a few points to observe:
>  - Connections should go to different secondaries. If both connections go to 
> the same secondary, there's a high change that they will be limited by what a 
> single replica can provide and of overloading that replica. So each secondary 
> should have one and only one connection.
>  - How to partition the range of documents to download between two threads. 
> We are already downloading from Mongo in order of {{(_modified, _id)}}. A 
> simple and effective partition strategy for 2 connections is for one to 
> download in ascending and the other in descending order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10808) PipelinedMongoConnectionFailureIT should not fail if Mongo is not available

2024-05-16 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10808:
-

 Summary: PipelinedMongoConnectionFailureIT should not fail if 
Mongo is not available
 Key: OAK-10808
 URL: https://issues.apache.org/jira/browse/OAK-10808
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10788) Indexing job downloader: shutdown gracefully all threads in case of failure

2024-05-15 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10788.
---
Fix Version/s: 1.64.0
   Resolution: Done

> Indexing job downloader: shutdown gracefully all threads in case of failure
> ---
>
> Key: OAK-10788
> URL: https://issues.apache.org/jira/browse/OAK-10788
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.64.0
>
>
> If the download fails, the threads created by the Pipeline strategy are not 
> all being correctly shutdown, some of them may be left behind. As they are 
> all daemon threads, they will not prevent the JVM from shutting down. But 
> when they are forcibly closed at the JVM shutdown, they print in the logs 
> several exceptions (connections closed abruptly, trying to access objects 
> that were already closed) that are confusing and distract from the root cause 
> of the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10804) Indexing job: optimize check for hidden nodes

2024-05-15 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10804:
--
Description: 
While downloading the repository from Mongo, the indexing job has to discard 
hidden entries. This is being done by a call to 
{{{}NodeStateUtils.isHiddenPath(){}}}. This call is rather expensive, as it 
creates an iterator over the path segments, which requires creating a new 
string for each path segment. As the indexing job has to check every entry to 
verify if it is hidden, this creates a significant overhead.

The implementation of checking for hidden paths can be replaced by a simple 
search for {{"/:"}} in the string representing the path, which requires no 
object allocation and should therefore be much faster.

  was:
While downloading the repository from Mongo, the indexing job has to discard 
hidden entries. This is being done by a call to 
`NodeStateUtils.isHiddenPath()`. This call is rather expensive, as it creates 
an iterator over the path segments, which requires creating a new string for 
each path segment. As the indexing job has to check every entry to verify if it 
is hidden, this creates a significant overhead.

The implementation of checking for hidden paths can be replaced by a simple 
search for {{"/:"}} in the string representing the path, which requires no 
object allocation and should therefore be much faster.


> Indexing job: optimize check for hidden nodes
> -
>
> Key: OAK-10804
> URL: https://issues.apache.org/jira/browse/OAK-10804
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>
> While downloading the repository from Mongo, the indexing job has to discard 
> hidden entries. This is being done by a call to 
> {{{}NodeStateUtils.isHiddenPath(){}}}. This call is rather expensive, as it 
> creates an iterator over the path segments, which requires creating a new 
> string for each path segment. As the indexing job has to check every entry to 
> verify if it is hidden, this creates a significant overhead.
> The implementation of checking for hidden paths can be replaced by a simple 
> search for {{"/:"}} in the string representing the path, which requires no 
> object allocation and should therefore be much faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10804) Indexing job: optimize check for hidden nodes

2024-05-15 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10804:
--
Summary: Indexing job: optimize check for hidden nodes  (was: Indexing job: 
optimize check for if a node is hidden)

> Indexing job: optimize check for hidden nodes
> -
>
> Key: OAK-10804
> URL: https://issues.apache.org/jira/browse/OAK-10804
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>
> While downloading the repository from Mongo, the indexing job has to discard 
> hidden entries. This is being done by a call to 
> `NodeStateUtils.isHiddenPath()`. This call is rather expensive, as it creates 
> an iterator over the path segments, which requires creating a new string for 
> each path segment. As the indexing job has to check every entry to verify if 
> it is hidden, this creates a significant overhead.
> The implementation of checking for hidden paths can be replaced by a simple 
> search for {{"/:"}} in the string representing the path, which requires no 
> object allocation and should therefore be much faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10804) Indexing job: optimize check for if a node is hidden

2024-05-15 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10804:
-

 Summary: Indexing job: optimize check for if a node is hidden
 Key: OAK-10804
 URL: https://issues.apache.org/jira/browse/OAK-10804
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


While downloading the repository from Mongo, the indexing job has to discard 
hidden entries. This is being done by a call to 
`NodeStateUtils.isHiddenPath()`. This call is rather expensive, as it creates 
an iterator over the path segments, which requires creating a new string for 
each path segment. As the indexing job has to check every entry to verify if it 
is hidden, this creates a significant overhead.

The implementation of checking for hidden paths can be replaced by a simple 
search for {{"/:"}} in the string representing the path, which requires no 
object allocation and should therefore be much faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10796) Avoid creation of intermediate StringBuilder in JsopBuilder

2024-05-13 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10796.
---
Fix Version/s: 1.64.0
   Resolution: Done

> Avoid creation of intermediate StringBuilder in JsopBuilder
> ---
>
> Key: OAK-10796
> URL: https://issues.apache.org/jira/browse/OAK-10796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: commons
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.64.0
>
>
> Since invokedynamic was added to Java, this code in the JsopBuilder class:
> {code:java}
> StringBuilder buff = new StringBuilder(length + 2);
> return buff.append('\"').append(s).append('\"').toString(); {code}
> can be written more simply and more efficiently like this:
> {code:java}
>  return '\"' + s + '\"';
> {code}
> https://www.baeldung.com/java-string-concatenation-invoke-dynamic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10795) Indexing job: eliminate unnecessary intermediate object creation in transform stage

2024-05-13 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10795.
---
Fix Version/s: 1.64.0
   Resolution: Done

> Indexing job: eliminate unnecessary intermediate object creation in transform 
> stage
> ---
>
> Key: OAK-10795
> URL: https://issues.apache.org/jira/browse/OAK-10795
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.64.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10778) Indexing job: support parallel download from MongoDB with two connections in Pipelined strategy

2024-05-13 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10778.
---
Fix Version/s: 1.64.0
   Resolution: Done

> Indexing job: support parallel download from MongoDB with two connections in 
> Pipelined strategy
> ---
>
> Key: OAK-10778
> URL: https://issues.apache.org/jira/browse/OAK-10778
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.64.0
>
>
> The current version of the Pipelined download strategy uses a single 
> connection/thread to download from MongoDB. We can further increase the 
> download speed by using an additional MongoDB connection. A Mongo deployment 
> has 1 primary and 2 secondaries, so in principle we could have 1 connection 
> to each secondary, effectively doubling the download speed.
> There are a few points to observe:
>  - Connections should go to different secondaries. If both connections go to 
> the same secondary, there's a high change that they will be limited by what a 
> single replica can provide and of overloading that replica. So each secondary 
> should have one and only one connection.
>  - How to partition the range of documents to download between two threads. 
> We are already downloading from Mongo in order of {{(_modified, _id)}}. A 
> simple and effective partition strategy for 2 connections is for one to 
> download in ascending and the other in descending order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10796) Avoid creation of intermediate StringBuilder in JsopBuilder

2024-05-13 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10796:
--
Summary: Avoid creation of intermediate StringBuilder in JsopBuilder  (was: 
Avoid creation of intermedia Stringbuffer in JsopBuilder)

> Avoid creation of intermediate StringBuilder in JsopBuilder
> ---
>
> Key: OAK-10796
> URL: https://issues.apache.org/jira/browse/OAK-10796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>
> Since invokedynamic was added to Java, this code in the JsopBuilder class:
> {code:java}
> StringBuilder buff = new StringBuilder(length + 2);
> return buff.append('\"').append(s).append('\"').toString(); {code}
> can be written more simply and more efficiently like this:
> {code:java}
>  return '\"' + s + '\"';
> {code}
> https://www.baeldung.com/java-string-concatenation-invoke-dynamic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10796) Avoid creation of intermedia Stringbuffer in JsopBuilder

2024-05-10 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10796:
--
Description: 
Since invokedynamic was added to Java, this code in the JsopBuilder class:
{code:java}
StringBuilder buff = new StringBuilder(length + 2);
return buff.append('\"').append(s).append('\"').toString(); {code}

can be written more simply and more efficiently like this:

{code:java}
 return '\"' + s + '\"';
{code}

https://www.baeldung.com/java-string-concatenation-invoke-dynamic

  was:
Since invokedynamic was added to Java, this code in the JsopBuilder class:
{code:java}
StringBuilder buff = new StringBuilder(length + 2);
return buff.append('\"').append(s).append('\"').toString(); {code}

can be written more simply and more efficiently like this:

{code:java}
 return '\"' + s + '\"';
{code:java}

https://www.baeldung.com/java-string-concatenation-invoke-dynamic


> Avoid creation of intermedia Stringbuffer in JsopBuilder
> 
>
> Key: OAK-10796
> URL: https://issues.apache.org/jira/browse/OAK-10796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>
> Since invokedynamic was added to Java, this code in the JsopBuilder class:
> {code:java}
> StringBuilder buff = new StringBuilder(length + 2);
> return buff.append('\"').append(s).append('\"').toString(); {code}
> can be written more simply and more efficiently like this:
> {code:java}
>  return '\"' + s + '\"';
> {code}
> https://www.baeldung.com/java-string-concatenation-invoke-dynamic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10796) Avoid creation of intermedia Stringbuffer in JsopBuilder

2024-05-10 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10796:
-

 Summary: Avoid creation of intermedia Stringbuffer in JsopBuilder
 Key: OAK-10796
 URL: https://issues.apache.org/jira/browse/OAK-10796
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


Since invokedynamic was added to Java, this code in the JsopBuilder class:
{code:java}
StringBuilder buff = new StringBuilder(length + 2);
return buff.append('\"').append(s).append('\"').toString(); {code}

can be written more simply and more efficiently like this:

{code:java}
 return '\"' + s + '\"';
{code:java}

https://www.baeldung.com/java-string-concatenation-invoke-dynamic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10795) Indexing job: eliminate unnecessary intermediate object creation in transform stage

2024-05-10 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10795:
-

 Summary: Indexing job: eliminate unnecessary intermediate object 
creation in transform stage
 Key: OAK-10795
 URL: https://issues.apache.org/jira/browse/OAK-10795
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10789) Indexing job: log paths used for inclusing/exclusion for Mongo regex filters in job summary

2024-05-07 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10789:
-

 Summary: Indexing job: log paths used for inclusing/exclusion for 
Mongo regex filters in job summary
 Key: OAK-10789
 URL: https://issues.apache.org/jira/browse/OAK-10789
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


The paths applied in the regex filter should be logged in the indexing report.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10788) Indexing job downloader: shutdown gracefully all threads in case of failure

2024-05-07 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10788:
-

 Summary: Indexing job downloader: shutdown gracefully all threads 
in case of failure
 Key: OAK-10788
 URL: https://issues.apache.org/jira/browse/OAK-10788
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: indexing
Reporter: Nuno Santos


If the download fails, the threads created by the Pipeline strategy are not all 
being correctly shutdown, some of them may be left behind. As they are all 
daemon threads, they will not prevent the JVM from shutting down. But when they 
are forcibly closed at the JVM shutdown, they print in the logs several 
exceptions (connections closed abruptly, trying to access objects that were 
already closed) that are confusing and distract from the root cause of the 
problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10778) Indexing job: support parallel download from MongoDB with two connections in Pipelined strategy

2024-04-25 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10778:
-

 Summary: Indexing job: support parallel download from MongoDB with 
two connections in Pipelined strategy
 Key: OAK-10778
 URL: https://issues.apache.org/jira/browse/OAK-10778
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


The current version of the Pipelined download strategy uses a single 
connection/thread to download from MongoDB. We can further increase the 
download speed by using an additional MongoDB connection. A Mongo deployment 
has 1 primary and 2 secondaries, so in principle we could have 1 connection to 
each secondary, effectively doubling the download speed.

There are a few points to observe:
 - Connections should go to different secondaries. If both connections go to 
the same secondary, there's a high change that they will be limited by what a 
single replica can provide and of overloading that replica. So each secondary 
should have one and only one connection.

 - How to partition the range of documents to download between two threads. We 
are already downloading from Mongo in order of {{(_modified, _id)}}. A simple 
and effective partition strategy for 2 connections is for one to download in 
ascending and the other in descending order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10682) [Indexing job] Improve Mongo regex filter to only use positive conditions (no negations)

2024-04-02 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10682.
---
Fix Version/s: 1.62.0
   Resolution: Done

> [Indexing job] Improve Mongo regex filter to only use positive conditions (no 
> negations)
> 
>
> Key: OAK-10682
> URL: https://issues.apache.org/jira/browse/OAK-10682
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
> Environment: The current implementation of filtering excluded paths 
> and custom regex is using a condition like
> {noformat}
> { _id:  { $nin: [ /^[0-9]{1,3}:\/content\/dam\/.*$/ ]} {noformat}
> Mongo cannot evaluate this condition without retrieving the full document, 
> because a value of {{_null}} would also match this condition and the index 
> does not contain {{null}} values. Therefore, when the index contains excluded 
> paths, the download will be much slower because Mongo has to retrieve every 
> single document to evaluate the condition.
> As a workaround, we can transform the regex on an equivalent one that matches 
> the complement of the original regex using [negative 
> lookahead|https://stackoverflow.com/questions/1240275/how-to-negate-specific-word-in-regex].
>  This allows rewriting the filter condition using only positive conditions, 
> which can be evaluated using only the index.
>Reporter: Nuno Santos
>Priority: Major
>  Labels: indexing
> Fix For: 1.62.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10681) [indexing job] Support custom filters of paths on Mongo

2024-03-05 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10681.
---
Fix Version/s: 1.62.0
   Resolution: Done

> [indexing job] Support custom filters of paths on Mongo
> ---
>
> Key: OAK-10681
> URL: https://issues.apache.org/jira/browse/OAK-10681
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>  Labels: indexing
> Fix For: 1.62.0
>
>
> The indexing job often has to download parts of the Oak tree that are not 
> needed for the indexes being indexed. The index definitions can define 
> included/excludedPaths to control which parts of the tree to index and the 
> indexing job uses this to filter on Mongo the documents that should be sent. 
> But often there are subtrees in Oak that are not needed for indexing but are 
> not excluded by the index definitions, for instance, subtrees with hidden 
> binary data.
> This task is to add a configuration option to Oak to specify a list of 
> subtrees that should not be downloaded in any case, for instance:
> {noformat}
> customExcludedPaths = "/foo;/tmp/bar" {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10682) [Indexing job] Improve Mongo regex filter to only use positive conditions (no negations)

2024-03-01 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10682:
-

 Summary: [Indexing job] Improve Mongo regex filter to only use 
positive conditions (no negations)
 Key: OAK-10682
 URL: https://issues.apache.org/jira/browse/OAK-10682
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
 Environment: The current implementation of filtering excluded paths 
and custom regex is using a condition like
{noformat}
{ _id:  { $nin: [ /^[0-9]{1,3}:\/content\/dam\/.*$/ ]} {noformat}
Mongo cannot evaluate this condition without retrieving the full document, 
because a value of {{_null}} would also match this condition and the index does 
not contain {{null}} values. Therefore, when the index contains excluded paths, 
the download will be much slower because Mongo has to retrieve every single 
document to evaluate the condition.

As a workaround, we can transform the regex on an equivalent one that matches 
the complement of the original regex using [negative 
lookahead|https://stackoverflow.com/questions/1240275/how-to-negate-specific-word-in-regex].
 This allows rewriting the filter condition using only positive conditions, 
which can be evaluated using only the index.
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10681) [indexing job] Support custom filters of paths on Mongo

2024-03-01 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10681:
-

 Summary: [indexing job] Support custom filters of paths on Mongo
 Key: OAK-10681
 URL: https://issues.apache.org/jira/browse/OAK-10681
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


The indexing job often has to download parts of the Oak tree that are not 
needed for the indexes being indexed. The index definitions can define 
included/excludedPaths to control which parts of the tree to index and the 
indexing job uses this to filter on Mongo the documents that should be sent. 
But often there are subtrees in Oak that are not needed for indexing but are 
not excluded by the index definitions, for instance, subtrees with hidden 
binary data.

This task is to add a configuration option to Oak to specify a list of subtrees 
that should not be downloaded in any case, for instance:
{noformat}
customExcludedPaths = "/foo;/tmp/bar" {noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10671) [Indexing job] - Improve Mongo regex query: remove condition on non-indexed _path field to speedup traversal

2024-02-29 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10671.
---
Fix Version/s: 1.62.0
   Resolution: Done

> [Indexing job] - Improve Mongo regex query: remove condition on non-indexed 
> _path field to speedup traversal
> 
>
> Key: OAK-10671
> URL: https://issues.apache.org/jira/browse/OAK-10671
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.62.0
>
>
> Regex path filtering currently is implemented with a condition like:
> {noformat}
> _id in [^[0-9]{1,3}:\Q/foo/bar/\E.*$] OR ('_id' in [^[0-9]{1,3}:h.*$}] AND 
> _path in [^\Q/foo/bar/\E.*$]
> {noformat}
> The second condition is necessary to deal with long path documents, whose 
> {{_id}} is an hash instead of the path of the document, and that have an 
> additional {{_path}} property with the full path of the document. The {{_id}} 
> field is part of the index used by the query, but {{_path}} is not indexed. 
> So the performance of this query will be very sensitive to how many time the 
> query condition can be resolved without having to lookup the value of 
> {{{}_path{}}}, which requires retrieving the full document from the column 
> store. If the condition can be evaluated only using the {{_id}} value, them 
> if there is no match the document should not be retrieved from the column 
> store.
> Unfortunately, Mongo does not seem to properly optimize this query and is 
> retrieving the document from the column storage even when {{_id}} does not 
> match the path /foo/bar and the _id is not in the hash format. This leads to 
> very poor performance as both the index and the column store have to be fully 
> read by this query.
> We can instead use the following condition:
> {noformat}
> _id in [^[0-9]{1,3}:\Q/foo/bar/\E.*$ , ^[0-9]{1,3}:h.*$}]
> {noformat}
> That is, download the document if the _id matches the path or if it is an 
> hash. This has the disadvantage that it will download all long path documents 
> from the repository, many of which might not be needed. However, this query 
> condition only uses the _id field so it is guaranteed to be evaluated fully 
> using only the data on the index. And the number of long paths documents is 
> usually very small, some environments don't even have any long path 
> documents, so downloading them should not take much time. And the indexing 
> job will anyway reapply the filter on paths locally, to eliminate the long 
> path documents which are not required by the indexing job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10671) [Indexing job] - Improve Mongo regex query: remove condition on non-indexed _path field to speedup traversal

2024-02-26 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10671:
-

 Summary: [Indexing job] - Improve Mongo regex query: remove 
condition on non-indexed _path field to speedup traversal
 Key: OAK-10671
 URL: https://issues.apache.org/jira/browse/OAK-10671
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


Regex path filtering currently is implemented with a condition like:
{noformat}
_id in [^[0-9]{1,3}:\Q/foo/bar/\E.*$] OR ('_id' in [^[0-9]{1,3}:h.*$}] AND 
_path in [^\Q/foo/bar/\E.*$]
{noformat}
The second condition is necessary to deal with long path documents, whose 
{{_id}} is an hash instead of the path of the document, and that have an 
additional {{_path}} property with the full path of the document. The {{_id}} 
field is part of the index used by the query, but {{_path}} is not indexed. So 
the performance of this query will be very sensitive to how many time the query 
condition can be resolved without having to lookup the value of {{{}_path{}}}, 
which requires retrieving the full document from the column store. If the 
condition can be evaluated only using the {{_id}} value, them if there is no 
match the document should not be retrieved from the column store.

Unfortunately, Mongo does not seem to properly optimize this query and is 
retrieving the document from the column storage even when {{_id}} does not 
match the path /foo/bar and the _id is not in the hash format. This leads to 
very poor performance as both the index and the column store have to be fully 
read by this query.

We can instead use the following condition:
{noformat}
_id in [^[0-9]{1,3}:\Q/foo/bar/\E.*$ , ^[0-9]{1,3}:h.*$}]
{noformat}
That is, download the document if the _id matches the path or if it is an hash. 
This has the disadvantage that it will download all long path documents from 
the repository, many of which might not be needed. However, this query 
condition only uses the _id field so it is guaranteed to be evaluated fully 
using only the data on the index. And the number of long paths documents is 
usually very small, some environments don't even have any long path documents, 
so downloading them should not take much time. And the indexing job will anyway 
reapply the filter on paths locally, to eliminate the long path documents which 
are not required by the indexing job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10637) Indexing job/regex path filtering - when / is the only included path, do not add an explicit filter

2024-02-06 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10637.
---
Fix Version/s: 1.62.0
   Resolution: Done

> Indexing job/regex path filtering - when / is the only included path, do not 
> add an explicit filter
> ---
>
> Key: OAK-10637
> URL: https://issues.apache.org/jira/browse/OAK-10637
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.62.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10620) Print summary at the end of the indexing job

2024-02-02 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10620.
---
Fix Version/s: 1.62.0
   Resolution: Done

> Print summary at the end of the indexing job
> 
>
> Key: OAK-10620
> URL: https://issues.apache.org/jira/browse/OAK-10620
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.62.0
>
>
> This summary is intended to have an easy way of copy'n'paste all the relevant 
> information from a run for keeping a record. With this summary, it should not 
> be needed to grep/search the logs just to get an overview of the job.
> The summary should include:
>  - Coordinates of the enviroment
>  - Name of indexes that were indexed
>  - Time of the different phases (download, sort, index)
>  - Complete configuration
>  - Version of the indexing job (aem-ethos-tools and Oak)
>  - All the metrics collected during the run of the job



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10637) Indexing job/regex path filtering - when / is the only included path, do not add an explicit filter

2024-02-02 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10637:
-

 Summary: Indexing job/regex path filtering - when / is the only 
included path, do not add an explicit filter
 Key: OAK-10637
 URL: https://issues.apache.org/jira/browse/OAK-10637
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10608) [Indexing job] Improve regex expression used to download from Mongo to make better used of Mongo indexes

2024-01-29 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10608.
---
Fix Version/s: 1.62.0
   Resolution: Done

> [Indexing job] Improve regex expression used to download from Mongo to make 
> better used of Mongo indexes
> 
>
> Key: OAK-10608
> URL: https://issues.apache.org/jira/browse/OAK-10608
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.62.0
>
>
> The current regex expression used to filter from Mongo the included/excluded 
> paths has conditions on both the fields \{{_id}} and \{{_path}}. In most 
> cases, the \{{_id}} field contains the path of the node, but when the path is 
> too long, the \{{_id}} is replaced by an hash of the path and the full path 
> is added to the document as an additional \{{_path}} field. For these cases, 
> the regex expression must also check the \{{_path}} field. 
> When running an ordered traversal, we use a Mongo index on \{{(_modified, 
> _id)}}. So checks on \{{_id}} can be done with just the data retrieved from 
> the index. But for the check on \{{_path}}, Mongo needs to read the full 
> document from the column store, which slows down significantly the traversal.
> Currently, if \{{_id}} does not match, the regex expression will always check 
> \{{_path}}, forcing a retrieval of the document. But we only need to check 
> \{{_path}} if the \{{_id}} is of the form of a long path id, that is, of the 
> pattern \{{4:h...}}, otherwise, if the _id is not a long path, then if 
> it does not match the regex, we can be sure that the document is not needed. 
> The check that \{{_id}} is an hash can be done without retrieving the full 
> document from the column store, so it will be fast. And in the common case, 
> the document is not a long path, so this simple check will avoid retrieving 
> the document from the column store.
> This optimization will have a bit impact when the regex expression matches a 
> small fraction of the repository. In the current implementation, Mongo has to 
> traverse both the index and the column store for all possible regex filters. 
> But with the additional check for long paths, Mongo has still to traverse the 
> full index but it will only retrieve from the column store the documents that 
> match the filter or the long path documents. And since the index is much 
> smaller than the column store and can more easily be cached, this will 
> significantly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10620) Print summary at the end of the indexing job

2024-01-22 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10620:
-

 Summary: Print summary at the end of the indexing job
 Key: OAK-10620
 URL: https://issues.apache.org/jira/browse/OAK-10620
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


This summary is intended to have an easy way of copy'n'paste all the relevant 
information from a run for keeping a record. With this summary, it should not 
be needed to grep/search the logs just to get an overview of the job.

The summary should include:
 - Coordinates of the enviroment
 - Name of indexes that were indexed
 - Time of the different phases (download, sort, index)
 - Complete configuration
 - Version of the indexing job (aem-ethos-tools and Oak)
 - All the metrics collected during the run of the job



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10592) [Indexing job] Add a regex filter to exclude matching entries from being downloaded from Mongo

2024-01-18 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10592:
--
Summary: [Indexing job] Add a regex filter to exclude matching entries from 
being downloaded from Mongo  (was: Ignore FVs nodes when downloading from Mongo)

> [Indexing job] Add a regex filter to exclude matching entries from being 
> downloaded from Mongo
> --
>
> Key: OAK-10592
> URL: https://issues.apache.org/jira/browse/OAK-10592
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10592) [Indexing job] Add a regex filter to exclude matching entries from being downloaded from Mongo

2024-01-18 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10592.
---
Resolution: Done

> [Indexing job] Add a regex filter to exclude matching entries from being 
> downloaded from Mongo
> --
>
> Key: OAK-10592
> URL: https://issues.apache.org/jira/browse/OAK-10592
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10608) [Indexing job] Improve regex expression used to download from Mongo to make better used of Mongo indexes

2024-01-16 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10608:
-

 Summary: [Indexing job] Improve regex expression used to download 
from Mongo to make better used of Mongo indexes
 Key: OAK-10608
 URL: https://issues.apache.org/jira/browse/OAK-10608
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


The current regex expression used to filter from Mongo the included/excluded 
paths has conditions on both the fields \{{_id}} and \{{_path}}. In most cases, 
the \{{_id}} field contains the path of the node, but when the path is too 
long, the \{{_id}} is replaced by an hash of the path and the full path is 
added to the document as an additional \{{_path}} field. For these cases, the 
regex expression must also check the \{{_path}} field. 

When running an ordered traversal, we use a Mongo index on \{{(_modified, 
_id)}}. So checks on \{{_id}} can be done with just the data retrieved from the 
index. But for the check on \{{_path}}, Mongo needs to read the full document 
from the column store, which slows down significantly the traversal.

Currently, if \{{_id}} does not match, the regex expression will always check 
\{{_path}}, forcing a retrieval of the document. But we only need to check 
\{{_path}} if the \{{_id}} is of the form of a long path id, that is, of the 
pattern \{{4:h...}}, otherwise, if the _id is not a long path, then if it 
does not match the regex, we can be sure that the document is not needed. The 
check that \{{_id}} is an hash can be done without retrieving the full document 
from the column store, so it will be fast. And in the common case, the document 
is not a long path, so this simple check will avoid retrieving the document 
from the column store.

This optimization will have a bit impact when the regex expression matches a 
small fraction of the repository. In the current implementation, Mongo has to 
traverse both the index and the column store for all possible regex filters. 
But with the additional check for long paths, Mongo has still to traverse the 
full index but it will only retrieve from the column store the documents that 
match the filter or the long path documents. And since the index is much 
smaller than the column store and can more easily be cached, this will 
significantly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10589) Improve regex path filtering to also handle cases where excludedPaths are defined

2024-01-08 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10589.
---
Fix Version/s: 1.62.0
   Resolution: Done

> Improve regex path filtering to also handle cases where excludedPaths are 
> defined
> -
>
> Key: OAK-10589
> URL: https://issues.apache.org/jira/browse/OAK-10589
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.62.0
>
>
> Currently, we apply regex path filtering in the following case:
>  * *includedPaths is non empty and excludedPaths is empty* - use a filter on 
> the Mongo query with every includedPath. 
> But we can apply path filtering on Mongo in more situations:
>  * *{{includedPaths}} empty, {{excludedPaths}} non-empty* - This is the 
> reverse situation of what we currently support, so we can define a Mongo 
> filter with the list of {{excludedPaths}} and negate it.
>  * *both includedPaths and excludedPaths  are non-empty* - In this case we 
> can simply ignore the excluded paths and download all included paths. If an 
> excluded path is outside an included path, it will not be downloaded because 
> it will not match the included path filters. If an excluded path is a 
> descendant of an included path, it will be downloaded from Mongo but filtered 
> in the transform stage before being written to the FlatFileStore.
>  * *includePaths and excludedPaths are both empty* - In this case we fall 
> back to downloading everything.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10592) Ignore FVs nodes when downloading from Mongo

2024-01-03 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10592:
-

 Summary: Ignore FVs nodes when downloading from Mongo
 Key: OAK-10592
 URL: https://issues.apache.org/jira/browse/OAK-10592
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10590) Indexing job downloads and creates FFS with full node store if includedPaths is specified as a string instead of array of strings

2024-01-02 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10590.
---
Fix Version/s: 1.62.0
   Resolution: Done

> Indexing job downloads and creates FFS with full node store if includedPaths 
> is specified as a string instead of array of strings
> -
>
> Key: OAK-10590
> URL: https://issues.apache.org/jira/browse/OAK-10590
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.62.0
>
>
> The {{includedPaths}} property of an index definition should be an array of 
> strings.
> If it is instead specified as a String, like in this example:
> {noformat}
> "includedPaths": "/a/b", {noformat}
> The indexing job defaults to using the {{/}} as the value for includedPaths, 
> and therefore downloads the full node store and creates an FFS containing 
> everything except the hidden paths. The logic that handles this case is here:
> [https://github.com/apache/jackrabbit-oak/blob/0b8f4ab2e736c6561ae745a5fe6040a59911eeb3/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/spi/filter/PathFilter.java#L95-L103]
> This will slow down significantly the indexing, as it will negate any 
> benefits from using regex filtering. And even if regex filtering is not 
> enabled or cannot be used, using / as includedPaths will also result in the 
> FFS containing more nodes than it should, which will once again slow down the 
> indexing job.
> Suggested fix: if includedPaths is a String, treat it as a one element array 
> and at the same time log a warning.
> Additionally, apply the same fix to other properties in the index definition:
>  * {{excludedPaths}}
>  * {{includedPaths}}
>  * {{queryPaths}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10590) Indexing job downloads and creates FFS with full node store if includedPaths is specified as a string instead of array of strings

2023-12-19 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10590:
-

 Summary: Indexing job downloads and creates FFS with full node 
store if includedPaths is specified as a string instead of array of strings
 Key: OAK-10590
 URL: https://issues.apache.org/jira/browse/OAK-10590
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: indexing
Reporter: Nuno Santos


The {{includedPaths}} property of an index definition should be an array of 
strings.

If it is instead specified as a String, like in this example:
{noformat}
"includedPaths": "/a/b", {noformat}
The indexing job defaults to using the {{/}} as the value for includedPaths, 
and therefore downloads the full node store and creates an FFS containing 
everything except the hidden paths. The logic that handles this case is here:

[https://github.com/apache/jackrabbit-oak/blob/0b8f4ab2e736c6561ae745a5fe6040a59911eeb3/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/spi/filter/PathFilter.java#L95-L103]

This will slow down significantly the indexing, as it will negate any benefits 
from using regex filtering. And even if regex filtering is not enabled or 
cannot be used, using / as includedPaths will also result in the FFS containing 
more nodes than it should, which will once again slow down the indexing job.

Suggested fix: if includedPaths is a String, treat it as a one element array 
and at the same time log a warning.

Additionally, apply the same fix to other properties in the index definition:
 * {{excludedPaths}}
 * {{includedPaths}}
 * {{queryPaths}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10571) Names of metrics exported by indexing logic are inconsistent

2023-12-18 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10571.
---
Fix Version/s: 1.62.0
   Resolution: Done

> Names of metrics exported by indexing logic are inconsistent
> 
>
> Key: OAK-10571
> URL: https://issues.apache.org/jira/browse/OAK-10571
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.62.0
>
>
> - Some metrics are fully in snake case, while others are a mix of camel and 
> snake case.
> - The metrics that represent times should all end in {{duration_seconds}}
> - The duration of the Mongo dump phase is missing
> {noformat}
> oak_indexer_full_index_creation_duration_seconds 
> oak_indexer_indexing_duration_seconds 
> oak_indexer_merge_node_store_duration_seconds 
> oak_indexer_pipelined_documentsAccepted 
> oak_indexer_pipelined_documentsDownloaded 
> oak_indexer_pipelined_documentsRejected 
> oak_indexer_pipelined_documentsRejectedEmptyNodeState 
> oak_indexer_pipelined_documentsRejectedSplit 
> oak_indexer_pipelined_documentsTraversed 
> oak_indexer_pipelined_entriesAccepted 
> oak_indexer_pipelined_entriesRejected 
> oak_indexer_pipelined_entriesRejectedHiddenPaths 
> oak_indexer_pipelined_entriesRejectedPathFiltered 
> oak_indexer_pipelined_entriesTraversed 
> oak_indexer_pipelined_extractedEntriesTotalSize 
> oak_indexer_pipelined_mergeSortEagerMergesRuns 
> oak_indexer_pipelined_mergeSortFinalMergeFilesCount 
> oak_indexer_pipelined_mergeSortFinalMergeTime 
> oak_indexer_pipelined_mergeSortIntermediateFilesCount 
> oak_indexer_pipelined_mongoDownloadEnqueueDelayPercentage 
> oak_indexer_import_bring_index_uptodate_duration_seconds 
> oak_indexer_import_import_index_data_duration_seconds 
> oak_indexer_import_release_checkpoint_duration_seconds 
> oak_indexer_import_switch_lane_duration_seconds 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10589) Improve regex path filtering to also handle cases where excludedPaths are defined

2023-12-18 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10589:
-

 Summary: Improve regex path filtering to also handle cases where 
excludedPaths are defined
 Key: OAK-10589
 URL: https://issues.apache.org/jira/browse/OAK-10589
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


Currently, we apply regex path filtering in the following case:
 * *includedPaths is non empty and excludedPaths is empty* - use a filter on 
the Mongo query with every includedPath. 

But we can apply path filtering on Mongo in more situations:
 * *{{includedPaths}} empty, {{excludedPaths}} non-empty* - This is the reverse 
situation of what we currently support, so we can define a Mongo filter with 
the list of {{excludedPaths}} and negate it.
 * *both includedPaths and excludedPaths  are non-empty* - In this case we can 
simply ignore the excluded paths and download all included paths. If an 
excluded path is outside an included path, it will not be downloaded because it 
will not match the included path filters. If an excluded path is a descendant 
of an included path, it will be downloaded from Mongo but filtered in the 
transform stage before being written to the FlatFileStore.
 * *includePaths and excludedPaths are both empty* - In this case we fall back 
to downloading everything.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10580) Indexing job: improve regex path filtering, support multiple includedPaths

2023-12-15 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10580.
---
Fix Version/s: 1.62.0
   Resolution: Done

> Indexing job: improve regex path filtering, support multiple includedPaths
> --
>
> Key: OAK-10580
> URL: https://issues.apache.org/jira/browse/OAK-10580
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.62.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10438) Remove MULTTHREADED_TRAVERSE_WITH_SORT download strategy

2023-12-15 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10438:
--
Summary: Remove MULTTHREADED_TRAVERSE_WITH_SORT download strategy  (was: 
Remove deprecated download strategies)

> Remove MULTTHREADED_TRAVERSE_WITH_SORT download strategy
> 
>
> Key: OAK-10438
> URL: https://issues.apache.org/jira/browse/OAK-10438
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>  Labels: Indexing
> Fix For: 1.62.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10438) Remove deprecated download strategies

2023-12-15 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10438.
---
Fix Version/s: 1.62.0
   Resolution: Done

> Remove deprecated download strategies
> -
>
> Key: OAK-10438
> URL: https://issues.apache.org/jira/browse/OAK-10438
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>  Labels: Indexing
> Fix For: 1.62.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10580) Indexing job: improve regex path filtering, support multiple includedPaths

2023-12-13 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10580:
-

 Summary: Indexing job: improve regex path filtering, support 
multiple includedPaths
 Key: OAK-10580
 URL: https://issues.apache.org/jira/browse/OAK-10580
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10571) Names of metrics exported by indexing logic are inconsistent

2023-11-29 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10571:
-

 Summary: Names of metrics exported by indexing logic are 
inconsistent
 Key: OAK-10571
 URL: https://issues.apache.org/jira/browse/OAK-10571
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


- Some metrics are fully in snake case, while others are a mix of camel and 
snake case.
- The metrics that represent times should all end in {{duration_seconds}}
- The duration of the Mongo dump phase is missing

{noformat}
oak_indexer_full_index_creation_duration_seconds 
oak_indexer_indexing_duration_seconds 
oak_indexer_merge_node_store_duration_seconds 
oak_indexer_pipelined_documentsAccepted 
oak_indexer_pipelined_documentsDownloaded 
oak_indexer_pipelined_documentsRejected 
oak_indexer_pipelined_documentsRejectedEmptyNodeState 
oak_indexer_pipelined_documentsRejectedSplit 
oak_indexer_pipelined_documentsTraversed 
oak_indexer_pipelined_entriesAccepted 
oak_indexer_pipelined_entriesRejected 
oak_indexer_pipelined_entriesRejectedHiddenPaths 
oak_indexer_pipelined_entriesRejectedPathFiltered 
oak_indexer_pipelined_entriesTraversed 
oak_indexer_pipelined_extractedEntriesTotalSize 
oak_indexer_pipelined_mergeSortEagerMergesRuns 
oak_indexer_pipelined_mergeSortFinalMergeFilesCount 
oak_indexer_pipelined_mergeSortFinalMergeTime 
oak_indexer_pipelined_mergeSortIntermediateFilesCount 
oak_indexer_pipelined_mongoDownloadEnqueueDelayPercentage 
oak_indexer_import_bring_index_uptodate_duration_seconds 
oak_indexer_import_import_index_data_duration_seconds 
oak_indexer_import_release_checkpoint_duration_seconds 
oak_indexer_import_switch_lane_duration_seconds 
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10541) Pipelined strategy: improve memory management of transform stage

2023-11-24 Thread Nuno Santos (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789364#comment-17789364
 ] 

Nuno Santos commented on OAK-10541:
---

Now that the version increase was rolled back, can we resolve this issue?

> Pipelined strategy: improve memory management of transform stage
> 
>
> Key: OAK-10541
> URL: https://issues.apache.org/jira/browse/OAK-10541
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Assignee: Julian Reschke
>Priority: Major
> Fix For: 1.60.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10541) Pipelined strategy: improve memory management of transform stage

2023-11-17 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10541.
---
Fix Version/s: 1.60.0
   Resolution: Done

> Pipelined strategy: improve memory management of transform stage
> 
>
> Key: OAK-10541
> URL: https://issues.apache.org/jira/browse/OAK-10541
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.60.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10519) Export metrics from indexing job

2023-11-16 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10519.
---
Fix Version/s: 1.60.0
   Resolution: Done

> Export metrics from indexing job
> 
>
> Key: OAK-10519
> URL: https://issues.apache.org/jira/browse/OAK-10519
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.60.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10554) LastRevRecoveryRandomizedIT test seems to be flaky

2023-11-16 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10554:
-

 Summary: LastRevRecoveryRandomizedIT test seems to be flaky
 Key: OAK-10554
 URL: https://issues.apache.org/jira/browse/OAK-10554
 Project: Jackrabbit Oak
  Issue Type: Bug
Reporter: Nuno Santos


The failure below was observed in a CI in a run that was preceded and followed 
by other runs where the test did not fail, without apparently any change to the 
code that could have affected this test. 
{noformat}
14:53:23  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 1.247 s <<< FAILURE! - in 
org.apache.jackrabbit.oak.plugins.document.LastRevRecoveryRandomizedIT
14:53:23  [ERROR] 
randomized(org.apache.jackrabbit.oak.plugins.document.LastRevRecoveryRandomizedIT)
  Time elapsed: 1.247 s  <<< ERROR!
14:53:23  org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: 
Configured cluster node id 1 already in use: needs recovery and was unable to 
perform it myself
14:53:23at 
org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo.createInstance(ClusterNodeInfo.java:629)
14:53:23at 
org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo.getInstance(ClusterNodeInfo.java:471)
14:53:23at 
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.(DocumentNodeStore.java:607)
14:53:23at 
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBuilder.build(DocumentNodeStoreBuilder.java:176)
14:53:23at 
org.apache.jackrabbit.oak.plugins.document.DocumentMK$Builder.getNodeStore(DocumentMK.java:481)
14:53:23at 
org.apache.jackrabbit.oak.plugins.document.LastRevRecoveryRandomizedIT.checkStore(LastRevRecoveryRandomizedIT.java:262)
14:53:23at 
org.apache.jackrabbit.oak.plugins.document.LastRevRecoveryRandomizedIT.randomized(LastRevRecoveryRandomizedIT.java:133)
14:53:23at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10538) Pipeline strategy: eliminate unnecessary intermediate copy of entries in transform stage

2023-11-14 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10538.
---
Fix Version/s: 1.60.0
   Resolution: Done

> Pipeline strategy: eliminate unnecessary intermediate copy of entries in 
> transform stage
> 
>
> Key: OAK-10538
> URL: https://issues.apache.org/jira/browse/OAK-10538
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.60.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10547) Indexing job fails at the end of reindexing if it took more than 24h to run

2023-11-13 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10547.
---
Fix Version/s: 1.60.0
   Resolution: Done

> Indexing job fails at the end of reindexing if it took more than 24h to run
> ---
>
> Key: OAK-10547
> URL: https://issues.apache.org/jira/browse/OAK-10547
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.60.0
>
>
> {noformat}
> 10:21:04.989 [main] ERROR com.adobe.granite.indexing.tool.Main - Can't 
> perform operation
> java.time.DateTimeException: Invalid value for SecondOfDay (valid values 0 - 
> 86399): 98646
>     at 
> java.base/java.time.temporal.ValueRange.checkValidValue(ValueRange.java:311)
>     at 
> java.base/java.time.temporal.ChronoField.checkValidValue(ChronoField.java:717)
>     at java.base/java.time.LocalTime.ofSecondOfDay(LocalTime.java:380)
>     at 
> org.apache.jackrabbit.oak.plugins.index.FormattingUtils.formatToSeconds(FormattingUtils.java:27)
>     at 
> org.apache.jackrabbit.oak.index.indexer.document.DocumentStoreIndexerBase.reindex(DocumentStoreIndexerBase.java:303)
>     at com.adobe.granite.indexing.tool.ReindexCmd.index(ReindexCmd.java:195)
>     at com.adobe.granite.indexing.tool.ReindexCmd.run(ReindexCmd.java:134)
>     at com.adobe.granite.indexing.tool.Main.main(Main.java:112) 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10547) Indexing job fails at the end of reindexing if it took more than 24h to run

2023-11-11 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10547:
--
Description: 
{noformat}
10:21:04.989 [main] ERROR com.adobe.granite.indexing.tool.Main - Can't perform 
operation
java.time.DateTimeException: Invalid value for SecondOfDay (valid values 0 - 
86399): 98646
    at 
java.base/java.time.temporal.ValueRange.checkValidValue(ValueRange.java:311)
    at 
java.base/java.time.temporal.ChronoField.checkValidValue(ChronoField.java:717)
    at java.base/java.time.LocalTime.ofSecondOfDay(LocalTime.java:380)
    at 
org.apache.jackrabbit.oak.plugins.index.FormattingUtils.formatToSeconds(FormattingUtils.java:27)
    at 
org.apache.jackrabbit.oak.index.indexer.document.DocumentStoreIndexerBase.reindex(DocumentStoreIndexerBase.java:303)
    at com.adobe.granite.indexing.tool.ReindexCmd.index(ReindexCmd.java:195)
    at com.adobe.granite.indexing.tool.ReindexCmd.run(ReindexCmd.java:134)
    at com.adobe.granite.indexing.tool.Main.main(Main.java:112) 
{noformat}

> Indexing job fails at the end of reindexing if it took more than 24h to run
> ---
>
> Key: OAK-10547
> URL: https://issues.apache.org/jira/browse/OAK-10547
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
>
> {noformat}
> 10:21:04.989 [main] ERROR com.adobe.granite.indexing.tool.Main - Can't 
> perform operation
> java.time.DateTimeException: Invalid value for SecondOfDay (valid values 0 - 
> 86399): 98646
>     at 
> java.base/java.time.temporal.ValueRange.checkValidValue(ValueRange.java:311)
>     at 
> java.base/java.time.temporal.ChronoField.checkValidValue(ChronoField.java:717)
>     at java.base/java.time.LocalTime.ofSecondOfDay(LocalTime.java:380)
>     at 
> org.apache.jackrabbit.oak.plugins.index.FormattingUtils.formatToSeconds(FormattingUtils.java:27)
>     at 
> org.apache.jackrabbit.oak.index.indexer.document.DocumentStoreIndexerBase.reindex(DocumentStoreIndexerBase.java:303)
>     at com.adobe.granite.indexing.tool.ReindexCmd.index(ReindexCmd.java:195)
>     at com.adobe.granite.indexing.tool.ReindexCmd.run(ReindexCmd.java:134)
>     at com.adobe.granite.indexing.tool.Main.main(Main.java:112) 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10547) Indexing job fails at the end of reindexing if it took more than 24h to run

2023-11-11 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10547:
-

 Summary: Indexing job fails at the end of reindexing if it took 
more than 24h to run
 Key: OAK-10547
 URL: https://issues.apache.org/jira/browse/OAK-10547
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10541) Pipelined strategy: improve memory management of transform stage

2023-11-08 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10541:
-

 Summary: Pipelined strategy: improve memory management of 
transform stage
 Key: OAK-10541
 URL: https://issues.apache.org/jira/browse/OAK-10541
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10538) Pipeline strategy: eliminate unnecessary intermediate copy of entries in transform stage

2023-11-07 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10538:
-

 Summary: Pipeline strategy: eliminate unnecessary intermediate 
copy of entries in transform stage
 Key: OAK-10538
 URL: https://issues.apache.org/jira/browse/OAK-10538
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10460) PIPELINED strategy fails with OOME during final merge phase for very large repositories

2023-10-26 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10460.
---
Fix Version/s: 1.60.0
   Resolution: Done

> PIPELINED strategy fails with OOME during final merge phase for very large 
> repositories
> ---
>
> Key: OAK-10460
> URL: https://issues.apache.org/jira/browse/OAK-10460
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.60.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10519) Export metrics from indexing job

2023-10-25 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10519:
-

 Summary: Export metrics from indexing job
 Key: OAK-10519
 URL: https://issues.apache.org/jira/browse/OAK-10519
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10504) Add indexing job total duration log message

2023-10-24 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10504.
---
Fix Version/s: 1.60.0
   Resolution: Done

> Add indexing job total duration log message
> ---
>
> Key: OAK-10504
> URL: https://issues.apache.org/jira/browse/OAK-10504
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.60.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10437) Deprecate all download strategies except PIPELINED

2023-10-20 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10437.
---
Fix Version/s: 1.60.0
   Resolution: Done

> Deprecate all download strategies except PIPELINED
> --
>
> Key: OAK-10437
> URL: https://issues.apache.org/jira/browse/OAK-10437
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>  Labels: Indexing
> Fix For: 1.60.0
>
>
> Deprecate these strategies: STORE_AND_SORT, TRAVERSE_WITH_SORT, 
> MULTITHREADED_TRAVERSE_WITH_SORT.
>  
> When they are used, print a log message saying clearly that they are 
> deprecated for removal and suggest using PIPELINED.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10505) Make PIPELINED the default download strategy in the indexing job

2023-10-20 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10505.
---
Fix Version/s: 1.60.0
   Resolution: Done

> Make PIPELINED the default download strategy in the indexing job
> 
>
> Key: OAK-10505
> URL: https://issues.apache.org/jira/browse/OAK-10505
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.60.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10505) Make PIPELINED the default download strategy in the indexing job

2023-10-18 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10505:
-

 Summary: Make PIPELINED the default download strategy in the 
indexing job
 Key: OAK-10505
 URL: https://issues.apache.org/jira/browse/OAK-10505
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10504) Add indexing job total duration log message

2023-10-18 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10504:
--
Priority: Minor  (was: Major)

> Add indexing job total duration log message
> ---
>
> Key: OAK-10504
> URL: https://issues.apache.org/jira/browse/OAK-10504
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10504) Add indexing job total duration log message

2023-10-18 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10504:
-

 Summary: Add indexing job total duration log message
 Key: OAK-10504
 URL: https://issues.apache.org/jira/browse/OAK-10504
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10491) Indexing: pass a MongoDatabase instance instead of MongoConnection to indexing logic

2023-10-17 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10491.
---
Fix Version/s: 1.60.0
   Resolution: Done

> Indexing: pass a MongoDatabase instance instead of MongoConnection to 
> indexing logic
> 
>
> Key: OAK-10491
> URL: https://issues.apache.org/jira/browse/OAK-10491
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.60.0
>
>
> The pipeline indexing strategy needs to have access to a MongoDatabase object 
> to register a custom codec to deserialize the responses from Mongo. 
> Previously we were passing a MongoConnection object which contained a 
> reference to MongoDatabase. But the indexing job does not need any other 
> fields from MongoConnection other than MongoDatabase. But requiring 
> MongoConnection makes it harder for users of Oak to call this API.
> We can simplify the logic by requiring only a MongoConnection.   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10491) Indexing: pass a MongoDatabase instance instead of MongoConnection to indexing logic

2023-10-13 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10491:
-

 Summary: Indexing: pass a MongoDatabase instance instead of 
MongoConnection to indexing logic
 Key: OAK-10491
 URL: https://issues.apache.org/jira/browse/OAK-10491
 Project: Jackrabbit Oak
  Issue Type: Improvement
Reporter: Nuno Santos


The pipeline indexing strategy needs to have access to a MongoDatabase object 
to register a custom codec to deserialize the responses from Mongo. Previously 
we were passing a MongoConnection object which contained a reference to 
MongoDatabase. But the indexing job does not need any other fields from 
MongoConnection other than MongoDatabase. But requiring MongoConnection makes 
it harder for users of Oak to call this API.

We can simplify the logic by requiring only a MongoConnection.   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10475) Expose the mongo connection in MongoDocumentNodeStoreBuilderBase

2023-10-10 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10475.
---
Fix Version/s: 1.58.0
   Resolution: Done

> Expose the mongo connection in MongoDocumentNodeStoreBuilderBase
> 
>
> Key: OAK-10475
> URL: https://issues.apache.org/jira/browse/OAK-10475
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.58.0
>
>
> This is a follow up of OAK-10453, which changed the indexing job to use 
> custom codecs when downloading from Mongo. For this, the indexing logic needs 
> access to the Mongo Connection to register the codec. If a library client 
> uses MongoDocumentNodeStoreBuilderBase to build a MongoDocumentStore 
> instance, then it may also need access to the Mongo connection to pass it to 
> the indexing logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10475) Expose the mongo connection in MongoDocumentNodeStoreBuilderBase

2023-10-09 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10475:
--
Description: This is a follow up of OAK-10453, which changed the indexing 
job to use custom codecs when downloading from Mongo. For this, the indexing 
logic needs access to the Mongo Connection to register the codec. If a client 
uses the MongoDocumentStore  (was: This is a follow up of )

> Expose the mongo connection in MongoDocumentNodeStoreBuilderBase
> 
>
> Key: OAK-10475
> URL: https://issues.apache.org/jira/browse/OAK-10475
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>
> This is a follow up of OAK-10453, which changed the indexing job to use 
> custom codecs when downloading from Mongo. For this, the indexing logic needs 
> access to the Mongo Connection to register the codec. If a client uses the 
> MongoDocumentStore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10475) Expose the mongo connection in MongoDocumentNodeStoreBuilderBase

2023-10-09 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10475:
--
Description: This is a follow up of OAK-10453, which changed the indexing 
job to use custom codecs when downloading from Mongo. For this, the indexing 
logic needs access to the Mongo Connection to register the codec. If a library 
client uses MongoDocumentNodeStoreBuilderBase to build a MongoDocumentStore 
instance, then it may also need access to the Mongo connection to pass it to 
the indexing logic.  (was: This is a follow up of OAK-10453, which changed the 
indexing job to use custom codecs when downloading from Mongo. For this, the 
indexing logic needs access to the Mongo Connection to register the codec. If a 
client uses the MongoDocumentStore)

> Expose the mongo connection in MongoDocumentNodeStoreBuilderBase
> 
>
> Key: OAK-10475
> URL: https://issues.apache.org/jira/browse/OAK-10475
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>
> This is a follow up of OAK-10453, which changed the indexing job to use 
> custom codecs when downloading from Mongo. For this, the indexing logic needs 
> access to the Mongo Connection to register the codec. If a library client 
> uses MongoDocumentNodeStoreBuilderBase to build a MongoDocumentStore 
> instance, then it may also need access to the Mongo connection to pass it to 
> the indexing logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10475) Expose the mongo connection in MongoDocumentNodeStoreBuilderBase

2023-10-09 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10475:
--
Description: This is a follow up of 

> Expose the mongo connection in MongoDocumentNodeStoreBuilderBase
> 
>
> Key: OAK-10475
> URL: https://issues.apache.org/jira/browse/OAK-10475
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>
> This is a follow up of 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10475) Expose the mongo connection in MongoDocumentNodeStoreBuilderBase

2023-10-09 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10475:
-

 Summary: Expose the mongo connection in 
MongoDocumentNodeStoreBuilderBase
 Key: OAK-10475
 URL: https://issues.apache.org/jira/browse/OAK-10475
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10458) Indexing job: Make LZ4 the default compression algorithm in OAK

2023-10-05 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10458.
---
Fix Version/s: 1.58.0
   Resolution: Done

> Indexing job: Make LZ4 the default compression algorithm in OAK
> ---
>
> Key: OAK-10458
> URL: https://issues.apache.org/jira/browse/OAK-10458
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.58.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10453) Pipelined strategy: enforce size limit on memory taken by objects in the queue between download and transform thread

2023-10-05 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10453.
---
Fix Version/s: 1.58.0
   Resolution: Done

> Pipelined strategy: enforce size limit on memory taken by objects in the 
> queue between download and transform thread
> 
>
> Key: OAK-10453
> URL: https://issues.apache.org/jira/browse/OAK-10453
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.58.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10458) Indexing job: Make LZ4 the default compression algorithm in OAK

2023-09-29 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10458.
---
Fix Version/s: 1.58.0
   Resolution: Fixed

> Indexing job: Make LZ4 the default compression algorithm in OAK
> ---
>
> Key: OAK-10458
> URL: https://issues.apache.org/jira/browse/OAK-10458
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>  Labels: Indexing
> Fix For: 1.58.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10452) Indexing job/regex filtering: getting ancestors nodes of filtered path incorrectly does a full col scan on Mongo

2023-09-28 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10452.
---
Fix Version/s: 1.58.0
   Resolution: Fixed

> Indexing job/regex filtering: getting ancestors nodes of filtered path 
> incorrectly does a full col scan on Mongo
> 
>
> Key: OAK-10452
> URL: https://issues.apache.org/jira/browse/OAK-10452
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.58.0
>
>
> In the PIPELINED strategy of the indexing job, when regex path filtering is 
> enabled, the job does two queries to Mongo:
>  * Download the ancestors of the base path (eg., {{0:/}}, {{1:/p1}}, 
> {{2:/p1/p2}}).
>  * Download all the children of the base path (eg., {{???:/p1/p2/*}})
> The first query returns only a few results so it should use the index on 
> {{_id}}. However, to deal with the rare case where the path is a long path 
> and the {{_id}} field is actually a hash instead of the path, the query for 
> the ancestors is also searching for matches on the {{_path}} field, which 
> will be set if {{_id}} is an hash. The issue here is that {{_path}} is not 
> indexed, so the first query reverts to a full col scan, which is much slower 
> than an index scan for the handful of ancestors. This negates most or even 
> all of the gains of using regex filtering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10460) PIPELINED strategy fails with OOME during final merge phase for very large repositories

2023-09-27 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10460:
-

 Summary: PIPELINED strategy fails with OOME during final merge 
phase for very large repositories
 Key: OAK-10460
 URL: https://issues.apache.org/jira/browse/OAK-10460
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10458) Indexing job: Make LZ4 the default compression algorithm in OAK

2023-09-27 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10458:
-

 Summary: Indexing job: Make LZ4 the default compression algorithm 
in OAK
 Key: OAK-10458
 URL: https://issues.apache.org/jira/browse/OAK-10458
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10453) Pipelined strategy: enforce size limit on memory taken by objects in the queue between download and transform thread

2023-09-21 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10453:
-

 Summary: Pipelined strategy: enforce size limit on memory taken by 
objects in the queue between download and transform thread
 Key: OAK-10453
 URL: https://issues.apache.org/jira/browse/OAK-10453
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10452) Indexing job/regex filtering: getting ancestors nodes of filtered path incorrectly does a full col scan on Mongo

2023-09-21 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10452:
-

 Summary: Indexing job/regex filtering: getting ancestors nodes of 
filtered path incorrectly does a full col scan on Mongo
 Key: OAK-10452
 URL: https://issues.apache.org/jira/browse/OAK-10452
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


In the PIPELINED strategy of the indexing job, when regex path filtering is 
enabled, the job does two queries to Mongo:
 * Download the ancestors of the base path (eg., {{0:/}}, {{1:/p1}}, 
{{2:/p1/p2}}).
 * Download all the children of the base path (eg., {{???:/p1/p2/*}})

The first query returns only a few results so it should use the index on 
{{_id}}. However, to deal with the rare case where the path is a long path and 
the {{_id}} field is actually a hash instead of the path, the query for the 
ancestors is also searching for matches on the {{_path}} field, which will be 
set if {{_id}} is an hash. The issue here is that {{_path}} is not indexed, so 
the first query reverts to a full col scan, which is much slower than an index 
scan for the handful of ancestors. This negates most or even all of the gains 
of using regex filtering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10423) Improve logging of metrics in indexing job

2023-09-08 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10423.
---
Fix Version/s: 1.58.0
   Resolution: Done

> Improve logging of metrics in indexing job
> --
>
> Key: OAK-10423
> URL: https://issues.apache.org/jira/browse/OAK-10423
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>  Labels: indexing
> Fix For: 1.58.0
>
>
> Improvements:
>  - Report all times in {{{}hours:minutes:seconds{}}}. Currently we are 
> relying on the Guava's {{{}Stopwatch.toString(){}}}, which reports times as 
> {{{}hh.[00-99]{}}}, instead of the standard 0 to 60 minutes. But if a 
> duration is less than 1h, then it reports number of minutes. This is 
> inconsistent and confusing, so we should always use the standard time format.
>  - Report time for index upload to Lucene
> The MT traverse strategy is not reporting times for some phases, but as we 
> are phasing out this strategy, it is not important to improve metrics 
> reporting for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10438) Remove deprecated download strategies

2023-09-07 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10438:
-

 Summary: Remove deprecated download strategies
 Key: OAK-10438
 URL: https://issues.apache.org/jira/browse/OAK-10438
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10437) Deprecate all download strategies except PIPELINED

2023-09-07 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10437:
-

 Summary: Deprecate all download strategies except PIPELINED
 Key: OAK-10437
 URL: https://issues.apache.org/jira/browse/OAK-10437
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


Deprecate these strategies: STORE_AND_SORT, TRAVERSE_WITH_SORT, 
MULTITHREADED_TRAVERSE_WITH_SORT.
 
When they are used, print a log message saying clearly that they are deprecated 
for removal and suggest using PIPELINED.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10358) Indexing job: push filtering of paths to MongoDB

2023-09-07 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10358.
---
Fix Version/s: 1.58.0
   Resolution: Done

> Indexing job: push filtering of paths to MongoDB
> 
>
> Key: OAK-10358
> URL: https://issues.apache.org/jira/browse/OAK-10358
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.58.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10423) Improve logging of metrics in indexing job

2023-08-30 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10423:
--
Description: 
Improvements:
 - Report all times in {{{}hours:minutes:seconds{}}}. Currently we are relying 
on the Guava's {{{}Stopwatch.toString(){}}}, which reports times as 
{{{}hh.[00-99]{}}}, instead of the standard 0 to 60 minutes. But if a duration 
is less than 1h, then it reports number of minutes. This is inconsistent and 
confusing, so we should always use the standard time format.

 - Report time for index upload to Lucene

The MT traverse strategy is not reporting times for some phases, but as we are 
phasing out this strategy, it is not important to improve metrics reporting for 
it.

> Improve logging of metrics in indexing job
> --
>
> Key: OAK-10423
> URL: https://issues.apache.org/jira/browse/OAK-10423
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>  Labels: indexing
>
> Improvements:
>  - Report all times in {{{}hours:minutes:seconds{}}}. Currently we are 
> relying on the Guava's {{{}Stopwatch.toString(){}}}, which reports times as 
> {{{}hh.[00-99]{}}}, instead of the standard 0 to 60 minutes. But if a 
> duration is less than 1h, then it reports number of minutes. This is 
> inconsistent and confusing, so we should always use the standard time format.
>  - Report time for index upload to Lucene
> The MT traverse strategy is not reporting times for some phases, but as we 
> are phasing out this strategy, it is not important to improve metrics 
> reporting for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10423) Improve logging of metrics in indexing job

2023-08-30 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10423:
-

 Summary: Improve logging of metrics in indexing job
 Key: OAK-10423
 URL: https://issues.apache.org/jira/browse/OAK-10423
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10375) Binary data in logs related to the haystack property

2023-08-03 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10375:
-

 Summary: Binary data in logs related to the haystack property
 Key: OAK-10375
 URL: https://issues.apache.org/jira/browse/OAK-10375
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: indexing
Reporter: Nuno Santos


When indexing documents with the {{haystack0}} property, some log messages 
contain the binary data of the property. In the log below, I replaced the 
binary data by {{{}{}}}, but it is usually very long. 

{noformat}
16:30:40.107 [main] ERROR o.a.j.o.p.i.l.LuceneDocumentMaker - could not index 
similarity field for property 
haystack0 =  
and definition 
PropertyDefinition\{name='jcr:content/metadata/imageFeatures/haystack0', 
propertyType=0, boost=1.0, isRegexp=false, index=true, stored=false, 
nodeScopeIndex=true, propertyIndex=true, analyzed=false, ordered=false, 
useInSuggest=false, useInSimilarity=true, nullCheckEnabled=false, 
notNullCheckEnabled=false, function=null} 
{noformat}

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10356) Adjust lower and upper bounds of auto-detected memory limits in PipelinedStrategy

2023-07-19 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10356.
---
Resolution: Done

> Adjust lower and upper bounds of auto-detected memory limits in 
> PipelinedStrategy
> -
>
> Key: OAK-10356
> URL: https://issues.apache.org/jira/browse/OAK-10356
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10358) Indexing job: push filtering of paths to MongoDB

2023-07-18 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10358:
-

 Summary: Indexing job: push filtering of paths to MongoDB
 Key: OAK-10358
 URL: https://issues.apache.org/jira/browse/OAK-10358
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10356) Adjust lower and upper bounds of auto-detected memory limits in PipelinedStrategy

2023-07-18 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10356:
-

 Summary: Adjust lower and upper bounds of auto-detected memory 
limits in PipelinedStrategy
 Key: OAK-10356
 URL: https://issues.apache.org/jira/browse/OAK-10356
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10350) Update spring-boot dependency to version 2.7.13

2023-07-18 Thread Nuno Santos (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744153#comment-17744153
 ] 

Nuno Santos commented on OAK-10350:
---

Probably because I am not a committer. I answered in the PR.

> Update spring-boot dependency to version 2.7.13
> ---
>
> Key: OAK-10350
> URL: https://issues.apache.org/jira/browse/OAK-10350
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: standalone
>Reporter: Manfred Baedke
>Assignee: Manfred Baedke
>Priority: Minor
> Fix For: 1.56.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10294) Indexing job: add new Pipelined Strategy for dumping Mongo contents in preparation for reindexing

2023-06-30 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10294.
---
Resolution: Done

> Indexing job: add new Pipelined Strategy for dumping Mongo contents in 
> preparation for reindexing
> -
>
> Key: OAK-10294
> URL: https://issues.apache.org/jira/browse/OAK-10294
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10294) Indexing job: add new Pipelined Strategy for dumping Mongo contents in preparation for reindexing

2023-06-30 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10294:
--
Summary: Indexing job: add new Pipelined Strategy for dumping Mongo 
contents in preparation for reindexing  (was: Pipelined Mongo dump for indexing 
job)

> Indexing job: add new Pipelined Strategy for dumping Mongo contents in 
> preparation for reindexing
> -
>
> Key: OAK-10294
> URL: https://issues.apache.org/jira/browse/OAK-10294
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10294) Pipelined Mongo dump for indexing job

2023-06-30 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10294:
--
Summary: Pipelined Mongo dump for indexing job  (was: Pipelined Mongo dump 
for indexing job (WIP - Do Not Merge))

> Pipelined Mongo dump for indexing job
> -
>
> Key: OAK-10294
> URL: https://issues.apache.org/jira/browse/OAK-10294
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10294) Pipelined Mongo dump for indexing job (WIP - Do Not Merge)

2023-06-12 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10294:
-

 Summary: Pipelined Mongo dump for indexing job (WIP - Do Not Merge)
 Key: OAK-10294
 URL: https://issues.apache.org/jira/browse/OAK-10294
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10005) Bump Mockito from 3.12.4 to the latest 4.x release

2022-11-30 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10005.
---
Resolution: Duplicate

> Bump Mockito from 3.12.4 to the latest 4.x release
> --
>
> Key: OAK-10005
> URL: https://issues.apache.org/jira/browse/OAK-10005
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>
> Mockito 3.12.4 does not support running with Java 19.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10004) Bump Elasticsearch Java client from 7.17.6 to 7.17.7

2022-11-21 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10004.
---
Fix Version/s: 1.46.0
   Resolution: Done

> Bump Elasticsearch Java client from 7.17.6 to 7.17.7
> 
>
> Key: OAK-10004
> URL: https://issues.apache.org/jira/browse/OAK-10004
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.46.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10005) Bump Mockito from 3.12.4 to the latest 4.x release

2022-11-17 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10005:
-

 Summary: Bump Mockito from 3.12.4 to the latest 4.x release
 Key: OAK-10005
 URL: https://issues.apache.org/jira/browse/OAK-10005
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


Mockito 3.12.4 does not support running with Java 19.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10004) Bump Elasticsearch clients from 7.17.6 to 7.17.7

2022-11-17 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10004:
--
Summary: Bump Elasticsearch clients from 7.17.6 to 7.17.7  (was: Bump 
Elasticsearch clients from 7.17.3 to 7.17.6)

> Bump Elasticsearch clients from 7.17.6 to 7.17.7
> 
>
> Key: OAK-10004
> URL: https://issues.apache.org/jira/browse/OAK-10004
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10004) Bump Elasticsearch Java client from 7.17.6 to 7.17.7

2022-11-17 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10004:
--
Summary: Bump Elasticsearch Java client from 7.17.6 to 7.17.7  (was: Bump 
Elasticsearch clients from 7.17.6 to 7.17.7)

> Bump Elasticsearch Java client from 7.17.6 to 7.17.7
> 
>
> Key: OAK-10004
> URL: https://issues.apache.org/jira/browse/OAK-10004
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10004) Bump Elasticsearch clients from 7.17.3 to 7.17.6

2022-11-17 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10004:
-

 Summary: Bump Elasticsearch clients from 7.17.3 to 7.17.6
 Key: OAK-10004
 URL: https://issues.apache.org/jira/browse/OAK-10004
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-9965) Add support for running unit tests against Elasticsearch 8.4.3

2022-10-07 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-9965:
-
Summary: Add support for running unit tests against Elasticsearch 8.4.3  
(was: Add support for testing with Elastiknn 8.4.3)

> Add support for running unit tests against Elasticsearch 8.4.3
> --
>
> Key: OAK-9965
> URL: https://issues.apache.org/jira/browse/OAK-9965
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-9965) Add support for testing with Elastiknn 8.4.3

2022-10-07 Thread Nuno Santos (Jira)
Nuno Santos created OAK-9965:


 Summary: Add support for testing with Elastiknn 8.4.3
 Key: OAK-9965
 URL: https://issues.apache.org/jira/browse/OAK-9965
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >