[jira] [Created] (OAK-10662) improve Reproducible Builds

2024-02-23 Thread Herve Boutemy (Jira)
Herve Boutemy created OAK-10662:
---

 Summary: improve Reproducible Builds
 Key: OAK-10662
 URL: https://issues.apache.org/jira/browse/OAK-10662
 Project: Jackrabbit Oak
  Issue Type: Improvement
Affects Versions: 1.60.0
Reporter: Herve Boutemy
 Fix For: 1.62.0


release 1.60.0 is quite good: 143 ok, 11 ko
there are some easy fixes
and probably harder ones later

see
https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/org/apache/jackrabbit/oak/README.md



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10654) Build Jackrabbit/jackrabbit-oak-trunk #1363 failed

2024-02-23 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820157#comment-17820157
 ] 

Hudson commented on OAK-10654:
--

Previously failing build now is OK.
 Passed run: [Jackrabbit/jackrabbit-oak-trunk 
#1373|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1373/]
 [console 
log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1373/console]

> Build Jackrabbit/jackrabbit-oak-trunk #1363 failed
> --
>
> Key: OAK-10654
> URL: https://issues.apache.org/jira/browse/OAK-10654
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: continuous integration
>Reporter: Hudson
>Priority: Major
>
> No description is provided
> The build Jackrabbit/jackrabbit-oak-trunk #1363 has failed.
> First failed run: [Jackrabbit/jackrabbit-oak-trunk 
> #1363|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1363/]
>  [console 
> log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1363/console]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-23 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820143#comment-17820143
 ] 

Julian Reschke commented on OAK-10660:
--

OK, this might be easier than expected. PR soonish.


> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10661) oak-search-elastic: remove workaround for elastic/elasticsearch-java/issues/404

2024-02-23 Thread Fabrizio Fortino (Jira)
Fabrizio Fortino created OAK-10661:
--

 Summary: oak-search-elastic: remove workaround for 
elastic/elasticsearch-java/issues/404
 Key: OAK-10661
 URL: https://issues.apache.org/jira/browse/OAK-10661
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: search, search-elastic
Reporter: Fabrizio Fortino
Assignee: Fabrizio Fortino


https://github.com/elastic/elasticsearch-java/issues/404



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10657) MongoDocumentStore: shrink in-DB documents after updates fail due to 16MB limit

2024-02-23 Thread Manfred Baedke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820072#comment-17820072
 ] 

Manfred Baedke commented on OAK-10657:
--

Feature toggle now implemented, see PR.

> MongoDocumentStore: shrink in-DB documents after updates fail due to 16MB 
> limit
> ---
>
> Key: OAK-10657
> URL: https://issues.apache.org/jira/browse/OAK-10657
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk, mongomk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> To address the 16MB/childorder issue, there are many potential approaches:
> - make GC more aggressive 
> - try to change updates to remove "in-between" changes of ":childOrder" 
> property
> - change the data model of ":childOrder"
> - try to shrink document in DB once size related exception happens
> This ticket is about the last of these options.
> Proposal:
> - improve exception thrown by document store so that it can be acted upon
> - in document store utils add a method that inspects a document and produces 
> UpdateOps suitable to shrink the document
> - DocumentNodeStore commit could catch exception, obtain update ops, apply 
> them, and retry once (this should be dependant on a feature toggle)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-23 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820045#comment-17820045
 ] 

Julian Reschke edited comment on OAK-10660 at 2/23/24 12:42 PM:


Here:

 
{noformat}
diff --git 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
index 1b04f62fa5..5cdf3901da 100644
--- 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
+++ 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
@@ -78,6 +78,9 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
     /** The maximum number of updates to keep in memory */
     private final int updateLimit;+    /** Revisions written by us */
+    private final Set revisions = new HashSet<>();
+
     /**
      * State of the this branch. Either {@link Unmodified}, {@link InMemory}, 
{@link Persisted},
      * {@link ResetFailed} or {@link Merged}.
@@ -321,6 +324,7 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
             c.apply();
             rev = store.done(c, base.getRootRevision().isBranch(), info);
             success = true;
+            revisions.add(c.getRevision());
         } finally {
             if (!success) {
                 store.canceled(c);
 {noformat}

would be a good place to track what revisions are relevant. Now we need to 
figure out how to pass this down to the place where we create the UpdateOps.


was (Author: reschke):
Here:

 
{noformat}
diff --git 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
index 1b04f62fa5..5cdf3901da 100644
--- 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
+++ 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
@@ -78,6 +78,9 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
     /** The maximum number of updates to keep in memory */
     private final int updateLimit;+    /** Revisions written by us */
+    private final Set revisions = new HashSet<>();
+
     /**
      * State of the this branch. Either {@link Unmodified}, {@link InMemory}, 
{@link Persisted},
      * {@link ResetFailed} or {@link Merged}.
@@ -321,6 +324,7 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
             c.apply();
             rev = store.done(c, base.getRootRevision().isBranch(), info);
             success = true;
+            revisions.add(c.getRevision());
         } finally {
             if (!success) {
                 store.canceled(c);
 {noformat}

would be a good place to track what revisions are relevant. Now we need to 
figure out how to pass this down to the place where we create the {UpdateOp}s.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-23 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820045#comment-17820045
 ] 

Julian Reschke commented on OAK-10660:
--

Here:

 
{noformat}
diff --git 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
index 1b04f62fa5..5cdf3901da 100644
--- 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
+++ 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
@@ -78,6 +78,9 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
     /** The maximum number of updates to keep in memory */
     private final int updateLimit;+    /** Revisions written by us */
+    private final Set revisions = new HashSet<>();
+
     /**
      * State of the this branch. Either {@link Unmodified}, {@link InMemory}, 
{@link Persisted},
      * {@link ResetFailed} or {@link Merged}.
@@ -321,6 +324,7 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
             c.apply();
             rev = store.done(c, base.getRootRevision().isBranch(), info);
             success = true;
+            revisions.add(c.getRevision());
         } finally {
             if (!success) {
                 store.canceled(c);
 {noformat}

would be a good place to track what revisions are relevant. Now we need to 
figure out how to pass this down to the place where we create the {UpdateOp}s.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (OAK-10657) MongoDocumentStore: shrink in-DB documents after updates fail due to 16MB limit

2024-02-23 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819969#comment-17819969
 ] 

Julian Reschke edited comment on OAK-10657 at 2/23/24 12:39 PM:


Trying to write down an alternate approach...:
 - While persisting the branch commits, we are persisting large :childOrder 
properties repeatedly. In practice, only the last value is needed, so the 
previous ones could be cleaned up.
 - We currently do not keep information about when (revision) and where (_id) 
we have set :childOrder.
 - The "clean" approach would be to maintain a map of _id/revision that tells 
us in which revision we last set :childOrder. That could be used to pair the 
setting of the new value with a removal of the previous one.
 - But we may be able to simplify that: just maintain a list of _all_ revisions 
that changed :childOrder, and any time we need to set a new value for 
:childOrder, nuke the entries for all of these revisions. This would be 
harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
except fo ra small overhead in processing.

EDIT: opened OAK-10660 to track this


was (Author: reschke):
Trying to write down an alternate approach...:
 - While persisting the branch commits, we are persisting large :childOrder 
properties repeatedly. In practice, only the last value is needed, so the 
previous ones could be cleaned up.
 - We currently do not keep information about when (revision) and where (_id) 
we have set :childOrder.
 - The "clean" approach would be to maintain a map of _id/revision that tells 
us in which revision we last set :childOrder. That could be used to pair the 
setting of the new value with a removal of the previous one.
 - But we may be able to simplify that: just maintain a list of _all_ revisions 
that changed :childOrder, and any time we need to set a new value for 
:childOrder, nuke the entries for all of these revisions. This would be 
harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
except fo ra small overhead in processing.

> MongoDocumentStore: shrink in-DB documents after updates fail due to 16MB 
> limit
> ---
>
> Key: OAK-10657
> URL: https://issues.apache.org/jira/browse/OAK-10657
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk, mongomk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> To address the 16MB/childorder issue, there are many potential approaches:
> - make GC more aggressive 
> - try to change updates to remove "in-between" changes of ":childOrder" 
> property
> - change the data model of ":childOrder"
> - try to shrink document in DB once size related exception happens
> This ticket is about the last of these options.
> Proposal:
> - improve exception thrown by document store so that it can be acted upon
> - in document store utils add a method that inspects a document and produces 
> UpdateOps suitable to shrink the document
> - DocumentNodeStore commit could catch exception, obtain update ops, apply 
> them, and retry once (this should be dependant on a feature toggle)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-23 Thread Julian Reschke (Jira)
Julian Reschke created OAK-10660:


 Summary: DocumentNodeStore: avoid repeated commits of :childOrder 
in branch commits
 Key: OAK-10660
 URL: https://issues.apache.org/jira/browse/OAK-10660
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: documentmk
Reporter: Julian Reschke
Assignee: Julian Reschke


- While persisting the branch commits, we are persisting large :childOrder 
properties repeatedly. In practice, only the last value is needed, so the 
previous ones could be cleaned up.
 - We currently do not keep information about when (revision) and where (_id) 
we have set :childOrder.
 - The "clean" approach would be to maintain a map of _id/revision that tells 
us in which revision we last set :childOrder. That could be used to pair the 
setting of the new value with a removal of the previous one.
 - But we may be able to simplify that: just maintain a list of _all_ revisions 
that changed :childOrder, and any time we need to set a new value for 
:childOrder, nuke the entries for all of these revisions. This would be 
harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10657) MongoDocumentStore: shrink in-DB documents after updates fail due to 16MB limit

2024-02-23 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819969#comment-17819969
 ] 

Julian Reschke commented on OAK-10657:
--

Trying to write down an alternate approach...:
 - While persisting the branch commits, we are persisting large :childOrder 
properties repeatedly. In practice, only the last value is needed, so the 
previous ones could be cleaned up.
 - We currently do not keep information about when (revision) and where (_id) 
we have set :childOrder.
 - The "clean" approach would be to maintain a map of _id/revision that tells 
us in which revision we last set :childOrder. That could be used to pair the 
setting of the new value with a removal of the previous one.
 - But we may be able to simplify that: just maintain a list of _all_ revisions 
that changed :childOrder, and any time we need to set a new value for 
:childOrder, nuke the entries for all of these revisions. This would be 
harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
except fo ra small overhead in processing.

> MongoDocumentStore: shrink in-DB documents after updates fail due to 16MB 
> limit
> ---
>
> Key: OAK-10657
> URL: https://issues.apache.org/jira/browse/OAK-10657
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk, mongomk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> To address the 16MB/childorder issue, there are many potential approaches:
> - make GC more aggressive 
> - try to change updates to remove "in-between" changes of ":childOrder" 
> property
> - change the data model of ":childOrder"
> - try to shrink document in DB once size related exception happens
> This ticket is about the last of these options.
> Proposal:
> - improve exception thrown by document store so that it can be acted upon
> - in document store utils add a method that inspects a document and produces 
> UpdateOps suitable to shrink the document
> - DocumentNodeStore commit could catch exception, obtain update ops, apply 
> them, and retry once (this should be dependant on a feature toggle)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)