[jira] [Created] (OAK-10741) Improve logging for detailedGC

2024-04-03 Thread Rishabh Daim (Jira)
Rishabh Daim created OAK-10741:
--

 Summary: Improve logging for detailedGC
 Key: OAK-10741
 URL: https://issues.apache.org/jira/browse/OAK-10741
 Project: Jackrabbit Oak
  Issue Type: Improvement
Reporter: Rishabh Daim
Assignee: Rishabh Daim






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10740) Collect Orphan nodes deletion metrics

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10740:
---
Labels: DetailedGC  (was: )

> Collect Orphan nodes deletion metrics
> -
>
> Key: OAK-10740
> URL: https://issues.apache.org/jira/browse/OAK-10740
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10740) Collect Orphan nodes deletion metrics

2024-04-03 Thread Rishabh Daim (Jira)
Rishabh Daim created OAK-10740:
--

 Summary: Collect Orphan nodes deletion metrics
 Key: OAK-10740
 URL: https://issues.apache.org/jira/browse/OAK-10740
 Project: Jackrabbit Oak
  Issue Type: New Feature
Reporter: Rishabh Daim
Assignee: Rishabh Daim






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10740) Collect Orphan nodes deletion metrics

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10740:
---
Component/s: documentmk

> Collect Orphan nodes deletion metrics
> -
>
> Key: OAK-10740
> URL: https://issues.apache.org/jira/browse/OAK-10740
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10739) Provide Support for Detailed Garbage Collection in Document Node Store

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10739:
--
Labels: DetailedGC  (was: )

> Provide Support for Detailed Garbage Collection in Document Node Store
> --
>
> Key: OAK-10739
> URL: https://issues.apache.org/jira/browse/OAK-10739
> Project: Jackrabbit Oak
>  Issue Type: Epic
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> We need to provide the support to collect & remove the full garbage for 
> DocumentNodeStore.
> At the time of creating this epic garbage includes orphaned nodes, deleted 
> properties, unmerged branch commits, and old revisions.
>  
> This list can be updated in case a new type of garbage is found.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10736) Collect DetailedGC Stats for DryRun mode

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim resolved OAK-10736.

Resolution: Fixed

> Collect DetailedGC Stats for DryRun mode
> 
>
> Key: OAK-10736
> URL: https://issues.apache.org/jira/browse/OAK-10736
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10733) Filter out hidden properties from content in FlatFileStore

2024-04-03 Thread Nitin Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833657#comment-17833657
 ] 

Nitin Gupta commented on OAK-10733:
---

Reverted the commit here 
[https://github.com/apache/jackrabbit-oak/commit/e220c69ec73f1cf8012d6f702a8eb1d386a4418e]
 . The seems to be due to the isEmpty() check. I will create a new PR.

> Filter out hidden properties from content in FlatFileStore
> --
>
> Key: OAK-10733
> URL: https://issues.apache.org/jira/browse/OAK-10733
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run-commons
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>
> Currently we ignore/filter out hidden nodes while building the FFS but not 
> the hidden properties.
> We however ignore any changes to hidden properties (using the VisibleEditor) 
> during async indexing cycles, so it makes little sense to have these in the 
> FFS.
>  
> This task is to see if these can be removed, and if gives some benefit during 
> reindexing phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (OAK-10733) Filter out hidden properties from content in FlatFileStore

2024-04-03 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke reopened OAK-10733:
--

> Filter out hidden properties from content in FlatFileStore
> --
>
> Key: OAK-10733
> URL: https://issues.apache.org/jira/browse/OAK-10733
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run-commons
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
> Fix For: 1.62.0
>
>
> Currently we ignore/filter out hidden nodes while building the FFS but not 
> the hidden properties.
> We however ignore any changes to hidden properties (using the VisibleEditor) 
> during async indexing cycles, so it makes little sense to have these in the 
> FFS.
>  
> This task is to see if these can be removed, and if gives some benefit during 
> reindexing phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10733) Filter out hidden properties from content in FlatFileStore

2024-04-03 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-10733:
-
Fix Version/s: (was: 1.62.0)

> Filter out hidden properties from content in FlatFileStore
> --
>
> Key: OAK-10733
> URL: https://issues.apache.org/jira/browse/OAK-10733
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run-commons
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>
> Currently we ignore/filter out hidden nodes while building the FFS but not 
> the hidden properties.
> We however ignore any changes to hidden properties (using the VisibleEditor) 
> during async indexing cycles, so it makes little sense to have these in the 
> FFS.
>  
> This task is to see if these can be removed, and if gives some benefit during 
> reindexing phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10733) Filter out hidden properties from content in FlatFileStore

2024-04-03 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833650#comment-17833650
 ] 

Julian Reschke commented on OAK-10733:
--

This might cause an IT failure:


> Filter out hidden properties from content in FlatFileStore
> --
>
> Key: OAK-10733
> URL: https://issues.apache.org/jira/browse/OAK-10733
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run-commons
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
> Fix For: 1.62.0
>
>
> Currently we ignore/filter out hidden nodes while building the FFS but not 
> the hidden properties.
> We however ignore any changes to hidden properties (using the VisibleEditor) 
> during async indexing cycles, so it makes little sense to have these in the 
> FFS.
>  
> This task is to see if these can be removed, and if gives some benefit during 
> reindexing phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (OAK-10733) Filter out hidden properties from content in FlatFileStore

2024-04-03 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833650#comment-17833650
 ] 

Julian Reschke edited comment on OAK-10733 at 4/3/24 3:33 PM:
--

This might cause an IT failure:

https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/org.apache.jackrabbit$oak-run/1430/testReport/junit/org.apache.jackrabbit.oak.index/DocumentStoreIndexerIT/bundling/



was (Author: reschke):
This might cause an IT failure:


> Filter out hidden properties from content in FlatFileStore
> --
>
> Key: OAK-10733
> URL: https://issues.apache.org/jira/browse/OAK-10733
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run-commons
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
> Fix For: 1.62.0
>
>
> Currently we ignore/filter out hidden nodes while building the FFS but not 
> the hidden properties.
> We however ignore any changes to hidden properties (using the VisibleEditor) 
> during async indexing cycles, so it makes little sense to have these in the 
> FFS.
>  
> This task is to see if these can be removed, and if gives some benefit during 
> reindexing phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-8646) Clean up changes from orphaned branch commits

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved OAK-8646.
--
Resolution: Done

+1, marking done then

> Clean up changes from orphaned branch commits
> -
>
> Key: OAK-8646
> URL: https://issues.apache.org/jira/browse/OAK-8646
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Marcel Reutegger
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> The Revision Garbage Collector currently does not clean up changes from 
> orphaned branch commits. Those are branch commits that have not been merged 
> but are still present on documents in the DocumentStore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (OAK-8646) Clean up changes from orphaned branch commits

2024-04-03 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833626#comment-17833626
 ] 

Julian Reschke edited comment on OAK-8646 at 4/3/24 2:58 PM:
-

Yep. (In theory we could also define a version string for the feature, but I 
don't think that's needed)


was (Author: reschke):
Yep.

> Clean up changes from orphaned branch commits
> -
>
> Key: OAK-8646
> URL: https://issues.apache.org/jira/browse/OAK-8646
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Marcel Reutegger
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> The Revision Garbage Collector currently does not clean up changes from 
> orphaned branch commits. Those are branch commits that have not been merged 
> but are still present on documents in the DocumentStore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-8646) Clean up changes from orphaned branch commits

2024-04-03 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833626#comment-17833626
 ] 

Julian Reschke commented on OAK-8646:
-

Yep.

> Clean up changes from orphaned branch commits
> -
>
> Key: OAK-8646
> URL: https://issues.apache.org/jira/browse/OAK-8646
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Marcel Reutegger
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> The Revision Garbage Collector currently does not clean up changes from 
> orphaned branch commits. Those are branch commits that have not been merged 
> but are still present on documents in the DocumentStore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-8646) Clean up changes from orphaned branch commits

2024-04-03 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833623#comment-17833623
 ] 

Stefan Egli commented on OAK-8646:
--

[~reschke], as we're using a feature branch I would suggest we can skip the fix 
version for these?

> Clean up changes from orphaned branch commits
> -
>
> Key: OAK-8646
> URL: https://issues.apache.org/jira/browse/OAK-8646
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Marcel Reutegger
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> The Revision Garbage Collector currently does not clean up changes from 
> orphaned branch commits. Those are branch commits that have not been merged 
> but are still present on documents in the DocumentStore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10193) Garbage collect deleted properties

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10193:
--
Component/s: documentmk

> Garbage collect deleted properties
> --
>
> Key: OAK-10193
> URL: https://issues.apache.org/jira/browse/OAK-10193
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10378) Add metrics for detailed GC

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10378:
--
Component/s: documentmk

> Add metrics for detailed GC
> ---
>
> Key: OAK-10378
> URL: https://issues.apache.org/jira/browse/OAK-10378
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> We need to provide the support to collect metrics for all the 
> deletion/updation done as part of detailedGC cycles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10676) Consider late-writes while removing deleted properties during detailedGC

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10676:
--
Component/s: documentmk

> Consider late-writes while removing deleted properties during detailedGC
> 
>
> Key: OAK-10676
> URL: https://issues.apache.org/jira/browse/OAK-10676
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> We need to take into account the late-writes or inconsistent revisions while 
> removing deleted properties.
>  
> For e.g. In case the property is null in latest revision but that revision is 
> itself not valid/committed/broken, we might need to skip removal of such 
> properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10382) oak-run support for flatfile

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10382:
--
Component/s: documentmk

> oak-run support for flatfile
> 
>
> Key: OAK-10382
> URL: https://issues.apache.org/jira/browse/OAK-10382
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk, oak-run
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
> Fix For: 1.60.0
>
>
> As a follow-up of OAK-10347 we need a wrapper of the SimpleFlatFileUtil - 
> plus (potentially) a full-gc command which runs a full round of detail gc (in 
> DocumentNodeStore that is)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10710) Reset detailedGC settings after running the detailedGC cycle

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10710:
--
Component/s: oak-run

> Reset detailedGC settings after running the detailedGC cycle
> 
>
> Key: OAK-10710
> URL: https://issues.apache.org/jira/browse/OAK-10710
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: oak-run
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10736) Collect DetailedGC Stats for DryRun mode

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10736:
--
Component/s: documentmk

> Collect DetailedGC Stats for DryRun mode
> 
>
> Key: OAK-10736
> URL: https://issues.apache.org/jira/browse/OAK-10736
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10676) Consider late-writes while removing deleted properties during detailedGC

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10676:
--
Labels: DetailedGC  (was: )

> Consider late-writes while removing deleted properties during detailedGC
> 
>
> Key: OAK-10676
> URL: https://issues.apache.org/jira/browse/OAK-10676
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> We need to take into account the late-writes or inconsistent revisions while 
> removing deleted properties.
>  
> For e.g. In case the property is null in latest revision but that revision is 
> itself not valid/committed/broken, we might need to skip removal of such 
> properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10370) Dry-run mode for full GC

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10370:
--
Labels: DetailedGC  (was: )

> Dry-run mode for full GC
> 
>
> Key: OAK-10370
> URL: https://issues.apache.org/jira/browse/OAK-10370
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Ankita Agarwal
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> For detailed GC OAK-10199, a dry-run mode is required where nothing will be 
> deleted, only listed like orphaned branch commits and deleted properties, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10632) Make Embedded DetailedGC Configurable for dryRun mode

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10632:
--
Labels: DetailedGC  (was: )

> Make Embedded DetailedGC Configurable for dryRun mode
> -
>
> Key: OAK-10632
> URL: https://issues.apache.org/jira/browse/OAK-10632
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> We have introduced embedded verification of detailedGC in both normal & 
> dryRun mode.
> We need to make embedded verification configurable in dryRun mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10193) Garbage collect deleted properties

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10193:
--
Labels: DetailedGC  (was: )

> Garbage collect deleted properties
> --
>
> Key: OAK-10193
> URL: https://issues.apache.org/jira/browse/OAK-10193
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-8646) Clean up changes from orphaned branch commits

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-8646:
-
Labels: DetailedGC  (was: )

> Clean up changes from orphaned branch commits
> -
>
> Key: OAK-8646
> URL: https://issues.apache.org/jira/browse/OAK-8646
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Marcel Reutegger
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> The Revision Garbage Collector currently does not clean up changes from 
> orphaned branch commits. Those are branch commits that have not been merged 
> but are still present on documents in the DocumentStore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10199) Skeleton of an additional, extendable "detail" garbage collector based on only "_modified"

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10199:
--
Labels: DetailedGC  (was: )

> Skeleton of an additional, extendable "detail" garbage collector based on 
> only "_modified"
> --
>
> Key: OAK-10199
> URL: https://issues.apache.org/jira/browse/OAK-10199
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> DocumentNodeStore's revision garbage collector currently doesn't clean up 
> 100% of garbage. Several of those gaps have so far been identified, including:
> * OAK-8646 : "Clean up changes from orphaned branch commits"
> * OAK-10193 : "Garbage collect deleted properties"
> The common aspect of the above is the fact that cleaning up that garbage on 
> an existing repository will mean to do a full scan of the entire repository, 
> to find and delete such garbage.
> The current working title for this is "detail gc"
> The ticket here is about creating a skeleton of a garbage collector that the 
> above, individual garbage types can then "hook into".
> There are two parts of the cleanup:
> * an initial, full repository scan
> * an iterative, continuous scan (eg after the above full scan has completed)
> The full repository scan is optional - one could decide to leave the garbage 
> and not worry about it (but enable the continuous scan and thus clean up 
> documents that are changed in the future lazily).
> While the two parts could in theory be based on a different query, it _can_ 
> also be done on the same query.
> One suggested query is to go through all documents where "_modified" is 
> between the previous gc run and an increment, but older than the 
> 'versionGcMaxAgeInSecs' (24h by default) - plus eg taking checkpoints into 
> account.
> A full repository scan is then characterized by setting this "previous gc 
> run" pointer to zero.
> In particular for the full repository scan it is necessary for the gc to run 
> in reasonably small batches - and apply a voluntary throttle, to avoid system 
> overload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10378) Add metrics for detailed GC

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10378:
--
Labels: DetailedGC  (was: )

> Add metrics for detailed GC
> ---
>
> Key: OAK-10378
> URL: https://issues.apache.org/jira/browse/OAK-10378
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>
> We need to provide the support to collect metrics for all the 
> deletion/updation done as part of detailedGC cycles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10633) Make Embedded DetailedGC Configurable in detailedGC

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10633:
--
Labels: DetailedGC  (was: )

> Make Embedded DetailedGC Configurable in detailedGC
> ---
>
> Key: OAK-10633
> URL: https://issues.apache.org/jira/browse/OAK-10633
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10535) Clean up old revisions in a document

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10535:
--
Labels: DetailedGC  (was: )

> Clean up old revisions in a document
> 
>
> Key: OAK-10535
> URL: https://issues.apache.org/jira/browse/OAK-10535
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: José Andrés Cordero Benítez
>Assignee: José Andrés Cordero Benítez
>Priority: Minor
>  Labels: DetailedGC
>
> Introduce a way to safely detect and delete old revisions in a document. This 
> could be useful to cleanup documents that sometimes grows above the supported 
> size in MongoDB (16MB).
> It could be also integrate into the detailed GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10659) Remove orphaned nodes/documents

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10659:
--
Labels: DetailedGC  (was: )

> Remove orphaned nodes/documents
> ---
>
> Key: OAK-10659
> URL: https://issues.apache.org/jira/browse/OAK-10659
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
> Fix For: 1.62.0
>
>
> As part of DetailedGC (also see OAK-10199) we also need to clean up documents 
> that (for some reason) have become orphaned. Orphaned nodes are nodes without 
> a parent, i.e. they fulfill two criterias:
> * they cannot be traversed to - the traversed state would be null / 
> non-existant
> * but reading the node via getNodeAtRevision would properly resolve in an 
> existing node



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10689) Extend oak-run revisions command with "detail" garbage collection

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10689:
--
Labels: DetailedGC  (was: )

> Extend oak-run revisions command with "detail" garbage collection
> -
>
> Key: OAK-10689
> URL: https://issues.apache.org/jira/browse/OAK-10689
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: oak-run
>Reporter: José Andrés Cordero Benítez
>Assignee: José Andrés Cordero Benítez
>Priority: Minor
>  Labels: DetailedGC
>
> Extend the oak-run revisions command to perform a detailed cleanup on a given 
> document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10710) Reset detailedGC settings after running the detailedGC cycle

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10710:
--
Labels: DetailedGC  (was: )

> Reset detailedGC settings after running the detailedGC cycle
> 
>
> Key: OAK-10710
> URL: https://issues.apache.org/jira/browse/OAK-10710
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10736) Collect DetailedGC Stats for DryRun mode

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10736:
--
Labels: DetailedGC  (was: )

> Collect DetailedGC Stats for DryRun mode
> 
>
> Key: OAK-10736
> URL: https://issues.apache.org/jira/browse/OAK-10736
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10676) Consider late-writes while removing deleted properties during detailedGC

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim resolved OAK-10676.

Resolution: Fixed

> Consider late-writes while removing deleted properties during detailedGC
> 
>
> Key: OAK-10676
> URL: https://issues.apache.org/jira/browse/OAK-10676
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>
> We need to take into account the late-writes or inconsistent revisions while 
> removing deleted properties.
>  
> For e.g. In case the property is null in latest revision but that revision is 
> itself not valid/committed/broken, we might need to skip removal of such 
> properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10632) Make Embedded DetailedGC Configurable for dryRun mode

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim resolved OAK-10632.

Resolution: Fixed

> Make Embedded DetailedGC Configurable for dryRun mode
> -
>
> Key: OAK-10632
> URL: https://issues.apache.org/jira/browse/OAK-10632
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>
> We have introduced embedded verification of detailedGC in both normal & 
> dryRun mode.
> We need to make embedded verification configurable in dryRun mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10199) Skeleton of an additional, extendable "detail" garbage collector based on only "_modified"

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim resolved OAK-10199.

Resolution: Fixed

> Skeleton of an additional, extendable "detail" garbage collector based on 
> only "_modified"
> --
>
> Key: OAK-10199
> URL: https://issues.apache.org/jira/browse/OAK-10199
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Rishabh Daim
>Priority: Major
>
> DocumentNodeStore's revision garbage collector currently doesn't clean up 
> 100% of garbage. Several of those gaps have so far been identified, including:
> * OAK-8646 : "Clean up changes from orphaned branch commits"
> * OAK-10193 : "Garbage collect deleted properties"
> The common aspect of the above is the fact that cleaning up that garbage on 
> an existing repository will mean to do a full scan of the entire repository, 
> to find and delete such garbage.
> The current working title for this is "detail gc"
> The ticket here is about creating a skeleton of a garbage collector that the 
> above, individual garbage types can then "hook into".
> There are two parts of the cleanup:
> * an initial, full repository scan
> * an iterative, continuous scan (eg after the above full scan has completed)
> The full repository scan is optional - one could decide to leave the garbage 
> and not worry about it (but enable the continuous scan and thus clean up 
> documents that are changed in the future lazily).
> While the two parts could in theory be based on a different query, it _can_ 
> also be done on the same query.
> One suggested query is to go through all documents where "_modified" is 
> between the previous gc run and an increment, but older than the 
> 'versionGcMaxAgeInSecs' (24h by default) - plus eg taking checkpoints into 
> account.
> A full repository scan is then characterized by setting this "previous gc 
> run" pointer to zero.
> In particular for the full repository scan it is necessary for the gc to run 
> in reasonably small batches - and apply a voluntary throttle, to avoid system 
> overload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10633) Make Embedded DetailedGC Configurable in detailedGC

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim resolved OAK-10633.

Resolution: Fixed

> Make Embedded DetailedGC Configurable in detailedGC
> ---
>
> Key: OAK-10633
> URL: https://issues.apache.org/jira/browse/OAK-10633
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10710) Reset detailedGC settings after running the detailedGC cycle

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim resolved OAK-10710.

Resolution: Fixed

> Reset detailedGC settings after running the detailedGC cycle
> 
>
> Key: OAK-10710
> URL: https://issues.apache.org/jira/browse/OAK-10710
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10378) Add metrics for detailed GC

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim resolved OAK-10378.

Resolution: Fixed

> Add metrics for detailed GC
> ---
>
> Key: OAK-10378
> URL: https://issues.apache.org/jira/browse/OAK-10378
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>
> We need to provide the support to collect metrics for all the 
> deletion/updation done as part of detailedGC cycles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10370) Dry-run mode for full GC

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim resolved OAK-10370.

Resolution: Fixed

> Dry-run mode for full GC
> 
>
> Key: OAK-10370
> URL: https://issues.apache.org/jira/browse/OAK-10370
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Ankita Agarwal
>Assignee: Rishabh Daim
>Priority: Major
>
> For detailed GC OAK-10199, a dry-run mode is required where nothing will be 
> deleted, only listed like orphaned branch commits and deleted properties, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10710) Reset detailedGC settings after running the detailedGC cycle

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10710:
---
Epic Link: OAK-10739

> Reset detailedGC settings after running the detailedGC cycle
> 
>
> Key: OAK-10710
> URL: https://issues.apache.org/jira/browse/OAK-10710
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10382) oak-run support for flatfile

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10382:
--
Epic Link: OAK-10739

> oak-run support for flatfile
> 
>
> Key: OAK-10382
> URL: https://issues.apache.org/jira/browse/OAK-10382
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: oak-run
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
> Fix For: 1.60.0
>
>
> As a follow-up of OAK-10347 we need a wrapper of the SimpleFlatFileUtil - 
> plus (potentially) a full-gc command which runs a full round of detail gc (in 
> DocumentNodeStore that is)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10676) Consider late-writes while removing deleted properties during detailedGC

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10676:
---
Epic Link: OAK-10739

> Consider late-writes while removing deleted properties during detailedGC
> 
>
> Key: OAK-10676
> URL: https://issues.apache.org/jira/browse/OAK-10676
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>
> We need to take into account the late-writes or inconsistent revisions while 
> removing deleted properties.
>  
> For e.g. In case the property is null in latest revision but that revision is 
> itself not valid/committed/broken, we might need to skip removal of such 
> properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10659) Remove orphaned nodes/documents

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10659:
---
Epic Link: OAK-10739

> Remove orphaned nodes/documents
> ---
>
> Key: OAK-10659
> URL: https://issues.apache.org/jira/browse/OAK-10659
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: 1.62.0
>
>
> As part of DetailedGC (also see OAK-10199) we also need to clean up documents 
> that (for some reason) have become orphaned. Orphaned nodes are nodes without 
> a parent, i.e. they fulfill two criterias:
> * they cannot be traversed to - the traversed state would be null / 
> non-existant
> * but reading the node via getNodeAtRevision would properly resolve in an 
> existing node



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10193) Garbage collect deleted properties

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim resolved OAK-10193.

Resolution: Fixed

> Garbage collect deleted properties
> --
>
> Key: OAK-10193
> URL: https://issues.apache.org/jira/browse/OAK-10193
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10586) DetailedGC hardening

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10586:
--
Epic Link: OAK-10739

> DetailedGC hardening
> 
>
> Key: OAK-10586
> URL: https://issues.apache.org/jira/browse/OAK-10586
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
>
> Umbrella ticket for hardening of {{DetailedGC/OAK-10199}} branch. To avoid 
> creating overly many tickets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10583) repeat detailedGC also if provided scope not fully processed

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10583:
--
Epic Link: OAK-10739

> repeat detailedGC also if provided scope not fully processed
> 
>
> Key: OAK-10583
> URL: https://issues.apache.org/jira/browse/OAK-10583
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
>  Labels: DetailedGC
>
> Currently {{needrepeat}} is not set if the provided (detailedGC) scope is 
> "complete", i.e. is reaching the oldest checkpoint or now - maxTimeMillis.
> However, in particular for the initial detailedGC run, the 
> PROGRESS_BATCH_SIZE will likely be hit and thus prevent the full scope to be 
> scanned.
> A repetition of GC will continue from where the batch-interrupted previous 
> run left off, however the {{needrepeat}} is not correctly set in this case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10570) oak-run support for fullgc

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10570:
--
Epic Link: OAK-10739

> oak-run support for fullgc
> --
>
> Key: OAK-10570
> URL: https://issues.apache.org/jira/browse/OAK-10570
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk, oak-run
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
> Fix For: 1.62.0
>
>
> As a follow-up of OAK-10347 we need a full-gc command which runs a full round 
> of detail gc (in DocumentNodeStore that is).
> (split-off from OAK-10382)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10688) Keep only traversed state, remove all other revisions

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10688:
--
Epic Link: OAK-10739

> Keep only traversed state, remove all other revisions
> -
>
> Key: OAK-10688
> URL: https://issues.apache.org/jira/browse/OAK-10688
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
>
> As a slightly different algorithm to OAK-10535 this ticket suggests to 
> calculate the traversedState of a node, then keeps only those revisions 
> needed for that traversedState and removes all others. The main difference is 
> an inversion of logic, where instead of analysing for each revision whether 
> it must be kept or not - this first derives the revision that must be "kept" 
> from the traversedState - then deletes all others.
> This mechanism applies to all (normal and bundled) properties as well as some 
> DocumentNodeStore internal ones, such as "_deleted".
> Below are a list of assumptions to back this:
> * DetailedGC runs only up to the older between the oldest checkpoint and 
> maxRevisionAge (24h by default). Thus a document analysed by DetailedGC is 
> guaranteed to have only 1 revision (per property) that must be kept - as it 
> is guaranteed to not have modifications (revisions) younger than any 
> checkpoint or maxRevisionAge (24h)
> * To find out which revision(s) must be kept, the node tree is traversed from 
> root (based on current head revision) to the target document.
> * Given the first bullet (that we're only looking at nodes that have only 1 
> revision (each, per property) to keep, this traversed node state thus 
> contains the values of those.
> * Hence, based on each of the property key of the traversed state, the 
> corresponding "commit revision" in the document-local map must be calculated. 
> That local map entry must be kept - all others can be deleted.
> * Note that this also cleans up overwritten branch commits of the same branch 
> (as only the last, relevant one is kept)
> As a result of the above, certain other entries can be deleted, namely:
> * any "_commitRoot" entry no longer referenced by the local document
> * any "_bc" entry no longer referenced by the local document
> Independent of the traversedState and the outcome of the cleanup what can 
> also be removed is:
> * any "_revisions" entry older than the current sweepRev
> However: "_revisions" entry that might not be referenced by the local 
> document and are younger than the sweepRev must still be kept, as they might 
> be referenced by child documents (through their "_commitRoot" pointing to the 
> current document). Without checking for children and double-checking the 
> actual use, there could as a result still be some garbage "_revisions" 
> entries left.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10714) DGC : enable embedded verification for tests by default

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10714:
--
Epic Link: OAK-10739

> DGC : enable embedded verification for tests by default
> ---
>
> Key: OAK-10714
> URL: https://issues.apache.org/jira/browse/OAK-10714
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
>
> We should enable embedded verification for DetailedGC for tests by default. 
> (It is already enabled by default via DocumentNodeStoreService, but tests 
> don't always use that)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10584) Checkpoints.getOldestRevisionToKeep shouldn't failed if called read-only

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10584:
--
Labels:   (was: DetailedGC)

> Checkpoints.getOldestRevisionToKeep shouldn't failed if called read-only
> 
>
> Key: OAK-10584
> URL: https://issues.apache.org/jira/browse/OAK-10584
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Affects Versions: 1.60.0
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
> Fix For: 1.62.0
>
>
> Below exception could occur and should be avoided:
> {noformat}
>   java.lang.UnsupportedOperationException: Method - findAndUpdate. Params: 
> [settings, key: checkpoint update {data.r17cea1494d3-0-1=REMOVE_MAP_ENTRY 
> null}]
>   at 
> org.apache.jackrabbit.oak.plugins.document.util.ReadOnlyDocumentStoreWrapperFactory$1.invoke(ReadOnlyDocumentStoreWrapperFactory.java:38)
>   at com.sun.proxy.$Proxy0.findAndUpdate(Unknown Source)
>   at 
> org.apache.jackrabbit.oak.plugins.document.Checkpoints.getOldestRevisionToKeep(Checkpoints.java:149)
>   at 
> org.apache.jackrabbit.oak.plugins.document.VersionGCRecommendations.(VersionGCRecommendations.java:181)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10727) log revisionDetailedGcType

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10727:
--
Epic Link: OAK-10739

> log revisionDetailedGcType
> --
>
> Key: OAK-10727
> URL: https://issues.apache.org/jira/browse/OAK-10727
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10632) Make Embedded DetailedGC Configurable for dryRun mode

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10632:
---
Epic Link: OAK-10739

> Make Embedded DetailedGC Configurable for dryRun mode
> -
>
> Key: OAK-10632
> URL: https://issues.apache.org/jira/browse/OAK-10632
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>
> We have introduced embedded verification of detailedGC in both normal & 
> dryRun mode.
> We need to make embedded verification configurable in dryRun mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10726) Fix BranchCommitGCTest and make it parameterized by gcType (also for VersionGarbageCollectorIT)

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10726:
--
Epic Link: OAK-10739

> Fix BranchCommitGCTest and make it parameterized by gcType (also for 
> VersionGarbageCollectorIT)
> ---
>
> Key: OAK-10726
> URL: https://issues.apache.org/jira/browse/OAK-10726
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10715) embedded verification should use traversed nodeState

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10715:
--
Epic Link: OAK-10739

> embedded verification should use traversed nodeState
> 
>
> Key: OAK-10715
> URL: https://issues.apache.org/jira/browse/OAK-10715
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
>
> Currently DetailedGC's embedded verification uses the headRevision in 
> getNodeAtRevision. it should use the lastRevision of the traversed nodeState 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10724) Introduce detailed gc mode that only deletes orphan nodes and deleted properties

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10724:
--
Epic Link: OAK-10739

> Introduce detailed gc mode that only deletes orphan nodes and deleted 
> properties
> 
>
> Key: OAK-10724
> URL: https://issues.apache.org/jira/browse/OAK-10724
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10736) Collect DetailedGC Stats for DryRun mode

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10736:
---
Epic Link: OAK-10739

> Collect DetailedGC Stats for DryRun mode
> 
>
> Key: OAK-10736
> URL: https://issues.apache.org/jira/browse/OAK-10736
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10728) embedded verification fails if id is from long path

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10728:
--
Epic Link: OAK-10739

> embedded verification fails if id is from long path
> ---
>
> Key: OAK-10728
> URL: https://issues.apache.org/jira/browse/OAK-10728
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10734) DetailedGC must keep entries in "_revisions" for non branch commits, unless older than sweep

2024-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10734:
--
Epic Link: OAK-10739

> DetailedGC must keep entries in "_revisions" for non branch commits, unless 
> older than sweep
> 
>
> Key: OAK-10734
> URL: https://issues.apache.org/jira/browse/OAK-10734
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
>
> Entries in "_revisions" (for non root documents) could be referenced by 
> children in case of non branch commits. They must thus be kept. Unless older 
> than sweep.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10633) Make Embedded DetailedGC Configurable in detailedGC

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10633:
---
Epic Link: OAK-10739

> Make Embedded DetailedGC Configurable in detailedGC
> ---
>
> Key: OAK-10633
> URL: https://issues.apache.org/jira/browse/OAK-10633
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10378) Add metrics for detailed GC

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10378:
---
Epic Link: OAK-10739

> Add metrics for detailed GC
> ---
>
> Key: OAK-10378
> URL: https://issues.apache.org/jira/browse/OAK-10378
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>
> We need to provide the support to collect metrics for all the 
> deletion/updation done as part of detailedGC cycles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10739) Provide Support for Detailed Garbage Collection in Document Node Store

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10739:
---
Description: 
We need to provide the support to collect & remove the full garbage for 
DocumentNodeStore.

At the time of creating this epic garbage includes orphaned nodes, deleted 
properties, unmerged branch commits, and old revisions.

 

This list can be updated in case a new type of garbage is found.

  was:
We need to provide the support to collect & remove the full garbage for 
DocumentNodeStore.

At the time of creating this epic garbage includes orphaned nodes, deleted 
properties, unmerged branch commits, and old revisions.

 

This list can be updated in case new type of garbage is found.


> Provide Support for Detailed Garbage Collection in Document Node Store
> --
>
> Key: OAK-10739
> URL: https://issues.apache.org/jira/browse/OAK-10739
> Project: Jackrabbit Oak
>  Issue Type: Epic
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>
> We need to provide the support to collect & remove the full garbage for 
> DocumentNodeStore.
> At the time of creating this epic garbage includes orphaned nodes, deleted 
> properties, unmerged branch commits, and old revisions.
>  
> This list can be updated in case a new type of garbage is found.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10597) embedded verification for detailedGC

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10597:
---
Epic Link: OAK-10739

> embedded verification for detailedGC
> 
>
> Key: OAK-10597
> URL: https://issues.apache.org/jira/browse/OAK-10597
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: DetailedGC
>
> Introduce an option to detailedGC which triggered an embedded verification. 
> That is, it compares the NodeState of before and after garbage removal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10535) Clean up old revisions in a document

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10535:
---
Epic Link: OAK-10739

> Clean up old revisions in a document
> 
>
> Key: OAK-10535
> URL: https://issues.apache.org/jira/browse/OAK-10535
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: José Andrés Cordero Benítez
>Assignee: José Andrés Cordero Benítez
>Priority: Minor
>
> Introduce a way to safely detect and delete old revisions in a document. This 
> could be useful to cleanup documents that sometimes grows above the supported 
> size in MongoDB (16MB).
> It could be also integrate into the detailed GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10370) Dry-run mode for full GC

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10370:
---
Epic Link: OAK-10739

> Dry-run mode for full GC
> 
>
> Key: OAK-10370
> URL: https://issues.apache.org/jira/browse/OAK-10370
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Ankita Agarwal
>Assignee: Rishabh Daim
>Priority: Major
>
> For detailed GC OAK-10199, a dry-run mode is required where nothing will be 
> deleted, only listed like orphaned branch commits and deleted properties, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10689) Extend oak-run revisions command with "detail" garbage collection

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10689:
---
Epic Link: OAK-10739

> Extend oak-run revisions command with "detail" garbage collection
> -
>
> Key: OAK-10689
> URL: https://issues.apache.org/jira/browse/OAK-10689
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: oak-run
>Reporter: José Andrés Cordero Benítez
>Assignee: José Andrés Cordero Benítez
>Priority: Minor
>
> Extend the oak-run revisions command to perform a detailed cleanup on a given 
> document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10739) Provide Support for Detailed Garbage Collection in Document Node Store

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10739:
---
Summary: Provide Support for Detailed Garbage Collection in Document Node 
Store  (was: Provide Support for Detailed Garbage Collection in Document Store)

> Provide Support for Detailed Garbage Collection in Document Node Store
> --
>
> Key: OAK-10739
> URL: https://issues.apache.org/jira/browse/OAK-10739
> Project: Jackrabbit Oak
>  Issue Type: Epic
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>
> We need to provide the support to collect & remove the full garbage for 
> DocumentStore.
> At the time of creating this epic garbage includes orphaned nodes, deleted 
> properties, unmerged branch commits, and old revisions.
>  
> This list can be updated in case new type of garbage is found.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10739) Provide Support for Detailed Garbage Collection in Document Node Store

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10739:
---
Description: 
We need to provide the support to collect & remove the full garbage for 
DocumentNodeStore.

At the time of creating this epic garbage includes orphaned nodes, deleted 
properties, unmerged branch commits, and old revisions.

 

This list can be updated in case new type of garbage is found.

  was:
We need to provide the support to collect & remove the full garbage for 
DocumentStore.

At the time of creating this epic garbage includes orphaned nodes, deleted 
properties, unmerged branch commits, and old revisions.

 

This list can be updated in case new type of garbage is found.


> Provide Support for Detailed Garbage Collection in Document Node Store
> --
>
> Key: OAK-10739
> URL: https://issues.apache.org/jira/browse/OAK-10739
> Project: Jackrabbit Oak
>  Issue Type: Epic
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>
> We need to provide the support to collect & remove the full garbage for 
> DocumentNodeStore.
> At the time of creating this epic garbage includes orphaned nodes, deleted 
> properties, unmerged branch commits, and old revisions.
>  
> This list can be updated in case new type of garbage is found.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10199) Skeleton of an additional, extendable "detail" garbage collector based on only "_modified"

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10199:
---
Epic Link: OAK-10739

> Skeleton of an additional, extendable "detail" garbage collector based on 
> only "_modified"
> --
>
> Key: OAK-10199
> URL: https://issues.apache.org/jira/browse/OAK-10199
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Rishabh Daim
>Priority: Major
>
> DocumentNodeStore's revision garbage collector currently doesn't clean up 
> 100% of garbage. Several of those gaps have so far been identified, including:
> * OAK-8646 : "Clean up changes from orphaned branch commits"
> * OAK-10193 : "Garbage collect deleted properties"
> The common aspect of the above is the fact that cleaning up that garbage on 
> an existing repository will mean to do a full scan of the entire repository, 
> to find and delete such garbage.
> The current working title for this is "detail gc"
> The ticket here is about creating a skeleton of a garbage collector that the 
> above, individual garbage types can then "hook into".
> There are two parts of the cleanup:
> * an initial, full repository scan
> * an iterative, continuous scan (eg after the above full scan has completed)
> The full repository scan is optional - one could decide to leave the garbage 
> and not worry about it (but enable the continuous scan and thus clean up 
> documents that are changed in the future lazily).
> While the two parts could in theory be based on a different query, it _can_ 
> also be done on the same query.
> One suggested query is to go through all documents where "_modified" is 
> between the previous gc run and an increment, but older than the 
> 'versionGcMaxAgeInSecs' (24h by default) - plus eg taking checkpoints into 
> account.
> A full repository scan is then characterized by setting this "previous gc 
> run" pointer to zero.
> In particular for the full repository scan it is necessary for the gc to run 
> in reasonably small batches - and apply a voluntary throttle, to avoid system 
> overload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-8646) Clean up changes from orphaned branch commits

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-8646:
--
Epic Link: OAK-10739

> Clean up changes from orphaned branch commits
> -
>
> Key: OAK-8646
> URL: https://issues.apache.org/jira/browse/OAK-8646
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Marcel Reutegger
>Assignee: Rishabh Daim
>Priority: Major
>
> The Revision Garbage Collector currently does not clean up changes from 
> orphaned branch commits. Those are branch commits that have not been merged 
> but are still present on documents in the DocumentStore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10193) Garbage collect deleted properties

2024-04-03 Thread Rishabh Daim (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishabh Daim updated OAK-10193:
---
Epic Link: OAK-10739

> Garbage collect deleted properties
> --
>
> Key: OAK-10193
> URL: https://issues.apache.org/jira/browse/OAK-10193
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Rishabh Daim
>Assignee: Rishabh Daim
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10739) Provide Support for Detailed Garbage Collection in Document Store

2024-04-03 Thread Rishabh Daim (Jira)
Rishabh Daim created OAK-10739:
--

 Summary: Provide Support for Detailed Garbage Collection in 
Document Store
 Key: OAK-10739
 URL: https://issues.apache.org/jira/browse/OAK-10739
 Project: Jackrabbit Oak
  Issue Type: Epic
Reporter: Rishabh Daim
Assignee: Rishabh Daim


We need to provide the support to collect & remove the full garbage for 
DocumentStore.

At the time of creating this epic garbage includes orphaned nodes, deleted 
properties, unmerged branch commits, and old revisions.

 

This list can be updated in case new type of garbage is found.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10375) Binary data in logs related to the haystack property

2024-04-03 Thread Fabrizio Fortino (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabrizio Fortino resolved OAK-10375.

Fix Version/s: 1.62.0
   Resolution: Fixed

> Binary data in logs related to the haystack property
> 
>
> Key: OAK-10375
> URL: https://issues.apache.org/jira/browse/OAK-10375
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Nuno Santos
>Assignee: Fabrizio Fortino
>Priority: Major
> Fix For: 1.62.0
>
>
> When indexing documents with the {{haystack0}} property, some log messages 
> contain the binary data of the property. In the log below, I replaced the 
> binary data by {{{}{}}}, but it is usually very 
> long. 
> {noformat}
> 16:30:40.107 [main] ERROR o.a.j.o.p.i.l.LuceneDocumentMaker - could not index 
> similarity field for property 
> haystack0 =  
> and definition 
> PropertyDefinition\{name='jcr:content/metadata/imageFeatures/haystack0', 
> propertyType=0, boost=1.0, isRegexp=false, index=true, stored=false, 
> nodeScopeIndex=true, propertyIndex=true, analyzed=false, ordered=false, 
> useInSuggest=false, useInSimilarity=true, nullCheckEnabled=false, 
> notNullCheckEnabled=false, function=null} 
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (OAK-10375) Binary data in logs related to the haystack property

2024-04-03 Thread Fabrizio Fortino (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabrizio Fortino reassigned OAK-10375:
--

Assignee: Fabrizio Fortino

> Binary data in logs related to the haystack property
> 
>
> Key: OAK-10375
> URL: https://issues.apache.org/jira/browse/OAK-10375
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Nuno Santos
>Assignee: Fabrizio Fortino
>Priority: Major
>
> When indexing documents with the {{haystack0}} property, some log messages 
> contain the binary data of the property. In the log below, I replaced the 
> binary data by {{{}{}}}, but it is usually very 
> long. 
> {noformat}
> 16:30:40.107 [main] ERROR o.a.j.o.p.i.l.LuceneDocumentMaker - could not index 
> similarity field for property 
> haystack0 =  
> and definition 
> PropertyDefinition\{name='jcr:content/metadata/imageFeatures/haystack0', 
> propertyType=0, boost=1.0, isRegexp=false, index=true, stored=false, 
> nodeScopeIndex=true, propertyIndex=true, analyzed=false, ordered=false, 
> useInSuggest=false, useInSimilarity=true, nullCheckEnabled=false, 
> notNullCheckEnabled=false, function=null} 
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10733) Filter out hidden properties from content in FlatFileStore

2024-04-03 Thread Nitin Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nitin Gupta resolved OAK-10733.
---
Fix Version/s: 1.62.0
   Resolution: Fixed

> Filter out hidden properties from content in FlatFileStore
> --
>
> Key: OAK-10733
> URL: https://issues.apache.org/jira/browse/OAK-10733
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run-commons
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
> Fix For: 1.62.0
>
>
> Currently we ignore/filter out hidden nodes while building the FFS but not 
> the hidden properties.
> We however ignore any changes to hidden properties (using the VisibleEditor) 
> during async indexing cycles, so it makes little sense to have these in the 
> FFS.
>  
> This task is to see if these can be removed, and if gives some benefit during 
> reindexing phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10733) Filter out hidden properties from content in FlatFileStore

2024-04-03 Thread Nitin Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833537#comment-17833537
 ] 

Nitin Gupta commented on OAK-10733:
---

trunk : 
[https://github.com/apache/jackrabbit-oak/commit/2b27df56b9901fe107bcad6aed03c402234f590a]
 

> Filter out hidden properties from content in FlatFileStore
> --
>
> Key: OAK-10733
> URL: https://issues.apache.org/jira/browse/OAK-10733
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run-commons
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>
> Currently we ignore/filter out hidden nodes while building the FFS but not 
> the hidden properties.
> We however ignore any changes to hidden properties (using the VisibleEditor) 
> during async indexing cycles, so it makes little sense to have these in the 
> FFS.
>  
> This task is to see if these can be removed, and if gives some benefit during 
> reindexing phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10733) Filter out hidden properties from content in FlatFileStore

2024-04-03 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833528#comment-17833528
 ] 

Julian Reschke commented on OAK-10733:
--

Is this bug resolved with the PR being merged?

> Filter out hidden properties from content in FlatFileStore
> --
>
> Key: OAK-10733
> URL: https://issues.apache.org/jira/browse/OAK-10733
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run-commons
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>
> Currently we ignore/filter out hidden nodes while building the FFS but not 
> the hidden properties.
> We however ignore any changes to hidden properties (using the VisibleEditor) 
> during async indexing cycles, so it makes little sense to have these in the 
> FFS.
>  
> This task is to see if these can be removed, and if gives some benefit during 
> reindexing phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10733) Filter out hidden properties from content in FlatFileStore

2024-04-03 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-10733:
-
Component/s: run-commons

> Filter out hidden properties from content in FlatFileStore
> --
>
> Key: OAK-10733
> URL: https://issues.apache.org/jira/browse/OAK-10733
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run-commons
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>
> Currently we ignore/filter out hidden nodes while building the FFS but not 
> the hidden properties.
> We however ignore any changes to hidden properties (using the VisibleEditor) 
> during async indexing cycles, so it makes little sense to have these in the 
> FFS.
>  
> This task is to see if these can be removed, and if gives some benefit during 
> reindexing phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10730) Log MongoException previously swallowed

2024-04-03 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833500#comment-17833500
 ] 

Stefan Egli commented on OAK-10730:
---

Suggestion created at https://github.com/apache/jackrabbit-oak/pull/1399

> Log MongoException previously swallowed
> ---
>
> Key: OAK-10730
> URL: https://issues.apache.org/jira/browse/OAK-10730
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk
>Reporter: Stefan Egli
>Assignee: Julian Reschke
>Priority: Major
>
> In 
> [MongoDocumentStore.create|https://github.com/apache/jackrabbit-oak/blob/2e996d78f0a565b17287af5691f2c1be7d2e925d/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoDocumentStore.java#L1754-L1756]
>  a MongoException is silently swallowed.
> This code is quite ancient - it was created in svn revision 
> [1451586|https://svn.apache.org/viewvc?view=revision=1451586] - we 
> might thus want to be careful not to cause noise in a case where this 
> swallowing was legitimate.
> I would thus suggest to start logging this at debug or info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10708) DocumentNodeStore: error-log failures to update the journal

2024-04-03 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833495#comment-17833495
 ] 

Julian Reschke commented on OAK-10708:
--

trunk: 
[5d163c8398|https://github.com/apache/jackrabbit-oak/commit/5d163c83989271fbcaebc6c36270a0aed64d992a]

> DocumentNodeStore: error-log failures to update the journal 
> 
>
> Key: OAK-10708
> URL: https://issues.apache.org/jira/browse/OAK-10708
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
>  Labels: candidate_oak_1_22
> Fix For: 1.62.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10708) DocumentNodeStore: error-log failures to update the journal

2024-04-03 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-10708:
-
Labels: candidate_oak_1_22  (was: )

> DocumentNodeStore: error-log failures to update the journal 
> 
>
> Key: OAK-10708
> URL: https://issues.apache.org/jira/browse/OAK-10708
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
>  Labels: candidate_oak_1_22
> Fix For: 1.62.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10708) DocumentNodeStore: error-log failures to update the journal

2024-04-03 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke resolved OAK-10708.
--
Fix Version/s: 1.62.0
   (was: 1.64.0)
   Resolution: Fixed

> DocumentNodeStore: error-log failures to update the journal 
> 
>
> Key: OAK-10708
> URL: https://issues.apache.org/jira/browse/OAK-10708
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.62.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10738) Add default values to user-sync configuration section

2024-04-03 Thread Angela Schreiber (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angela Schreiber resolved OAK-10738.

Fix Version/s: 1.62.0
   Resolution: Fixed

pushed the fix and will try to deploy the site later today.

> Add default values to user-sync configuration section 
> --
>
> Key: OAK-10738
> URL: https://issues.apache.org/jira/browse/OAK-10738
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: doc
>Reporter: Angela Schreiber
>Assignee: Angela Schreiber
>Priority: Minor
> Fix For: 1.62.0
>
>
> the documentation for the external user sync feature does not list the 
> default values of the configuration options at 
> [https://jackrabbit.apache.org/oak/docs/security/authentication/external/defaultusersync.html#Configuration]
> in addition the table is missing the {{enableRFC7613UsercaseMappedProfile}} 
> option both for users and groups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10738) Add default values to user-sync configuration section

2024-04-03 Thread Angela Schreiber (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angela Schreiber updated OAK-10738:
---
Description: 
the documentation for the external user sync feature does not list the default 
values of the configuration options at 
[https://jackrabbit.apache.org/oak/docs/security/authentication/external/defaultusersync.html#Configuration]

in addition the table is missing the {{enableRFC7613UsercaseMappedProfile}} 
option both for users and groups.

  was:the documentation for the external user sync feature does not list the 
default values of the configuration options at 
https://jackrabbit.apache.org/oak/docs/security/authentication/external/defaultusersync.html#Configuration


> Add default values to user-sync configuration section 
> --
>
> Key: OAK-10738
> URL: https://issues.apache.org/jira/browse/OAK-10738
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: doc
>Reporter: Angela Schreiber
>Assignee: Angela Schreiber
>Priority: Minor
>
> the documentation for the external user sync feature does not list the 
> default values of the configuration options at 
> [https://jackrabbit.apache.org/oak/docs/security/authentication/external/defaultusersync.html#Configuration]
> in addition the table is missing the {{enableRFC7613UsercaseMappedProfile}} 
> option both for users and groups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10738) Add default values to user-sync configuration section

2024-04-03 Thread Angela Schreiber (Jira)
Angela Schreiber created OAK-10738:
--

 Summary: Add default values to user-sync configuration section 
 Key: OAK-10738
 URL: https://issues.apache.org/jira/browse/OAK-10738
 Project: Jackrabbit Oak
  Issue Type: Documentation
  Components: doc
Reporter: Angela Schreiber
Assignee: Angela Schreiber


the documentation for the external user sync feature does not list the default 
values of the configuration options at 
https://jackrabbit.apache.org/oak/docs/security/authentication/external/defaultusersync.html#Configuration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10737) weird link in README.md

2024-04-03 Thread Julian Reschke (Jira)
Julian Reschke created OAK-10737:


 Summary: weird link in README.md
 Key: OAK-10737
 URL: https://issues.apache.org/jira/browse/OAK-10737
 Project: Jackrabbit Oak
  Issue Type: Task
Reporter: Julian Reschke


oak-core for some reasons links to oak-api's README.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)