[jira] [Created] (OAK-10608) [Indexing job] Improve regex expression used to download from Mongo to make better used of Mongo indexes

2024-01-16 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10608:
-

 Summary: [Indexing job] Improve regex expression used to download 
from Mongo to make better used of Mongo indexes
 Key: OAK-10608
 URL: https://issues.apache.org/jira/browse/OAK-10608
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


The current regex expression used to filter from Mongo the included/excluded 
paths has conditions on both the fields \{{_id}} and \{{_path}}. In most cases, 
the \{{_id}} field contains the path of the node, but when the path is too 
long, the \{{_id}} is replaced by an hash of the path and the full path is 
added to the document as an additional \{{_path}} field. For these cases, the 
regex expression must also check the \{{_path}} field. 

When running an ordered traversal, we use a Mongo index on \{{(_modified, 
_id)}}. So checks on \{{_id}} can be done with just the data retrieved from the 
index. But for the check on \{{_path}}, Mongo needs to read the full document 
from the column store, which slows down significantly the traversal.

Currently, if \{{_id}} does not match, the regex expression will always check 
\{{_path}}, forcing a retrieval of the document. But we only need to check 
\{{_path}} if the \{{_id}} is of the form of a long path id, that is, of the 
pattern \{{4:h...}}, otherwise, if the _id is not a long path, then if it 
does not match the regex, we can be sure that the document is not needed. The 
check that \{{_id}} is an hash can be done without retrieving the full document 
from the column store, so it will be fast. And in the common case, the document 
is not a long path, so this simple check will avoid retrieving the document 
from the column store.

This optimization will have a bit impact when the regex expression matches a 
small fraction of the repository. In the current implementation, Mongo has to 
traverse both the index and the column store for all possible regex filters. 
But with the additional check for long paths, Mongo has still to traverse the 
full index but it will only retrieve from the column store the documents that 
match the filter or the long path documents. And since the index is much 
smaller than the column store and can more easily be cached, this will 
significantly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10582) Build Jackrabbit/jackrabbit-oak-trunk #1306 failed

2024-01-16 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807231#comment-17807231
 ] 

Hudson commented on OAK-10582:
--

Previously failing build now is OK.
 Passed run: [Jackrabbit/jackrabbit-oak-trunk 
#1327|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1327/]
 [console 
log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1327/console]

> Build Jackrabbit/jackrabbit-oak-trunk #1306 failed
> --
>
> Key: OAK-10582
> URL: https://issues.apache.org/jira/browse/OAK-10582
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: continuous integration
>Reporter: Hudson
>Priority: Major
>
> No description is provided
> The build Jackrabbit/jackrabbit-oak-trunk #1306 has failed.
> First failed run: [Jackrabbit/jackrabbit-oak-trunk 
> #1306|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1306/]
>  [console 
> log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1306/console]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10607) Rename Maven property "java.version"

2024-01-16 Thread Konrad Windszus (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konrad Windszus resolved OAK-10607.
---
Fix Version/s: 1.62.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/jackrabbit-oak/commit/deeccbe0d4b8ff210d9adcf31fbe7c7f0d841dbc.

> Rename Maven property "java.version"
> 
>
> Key: OAK-10607
> URL: https://issues.apache.org/jira/browse/OAK-10607
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: parent
>Reporter: Konrad Windszus
>Assignee: Konrad Windszus
>Priority: Major
> Fix For: 1.62.0
>
>
> Maven exposes all Java System Properties as regular Maven properties. 
> Currently the user property {{java.version}} hides the same named system 
> property ( 
> https://books.sonatype.com/mvnref-book/reference/resource-filtering-sect-properties.html#resource-filtering-sect-system-properties),
>  therefore it should be renamed to not hide the current JDK version exposed 
> via the Java System Property.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)