[ 
https://issues.apache.org/jira/browse/OAK-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529188#comment-15529188
 ] 

Marcel Reutegger commented on OAK-4826:
---------------------------------------

The problem is that the failing test interleaves two async index update calls 
in a specific way and expects it to be successful. The first index update is 
triggered and creates a checkpoint C1, but before it proceeds with the actual 
index update and persisting of the reference checkpoint, another index update 
is triggered that completes with a checkpoint C2. It then considers C1 as 
orphaned because he created timestamp of C1 is older than C2. I'm not sure if 
this is really a valid test case. Shouldn't C1 be protected with the lease 
mechanism?

I will change the cleanup implementation in any case to protect against races 
when a checkpoint is created and later persisted as a reference checkpoint 
within the lease time frame.

> Auto removal of orphaned checkpoints
> ------------------------------------
>
>                 Key: OAK-4826
>                 URL: https://issues.apache.org/jira/browse/OAK-4826
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>            Reporter: Chetan Mehrotra
>            Assignee: Marcel Reutegger
>              Labels: candidate_oak_1_4
>             Fix For: 1.6, 1.5.11
>
>         Attachments: OAK-4826.patch, OAK-4826.patch, OAK-4826.patch
>
>
> Currently if in a running system there are some orphaned checkpoint present 
> then they prevent the revision gc (compaction for segment) from being 
> effective. 
> So far the practice has been to use {{oak-run checkpoints rm-unreferenced}} 
> command to clean them up manually. This was set to manual as it was not 
> possible to determine whether current checkpoint is in use or not. 
> rm-unreferenced works with the basis that checkpoints are only made from 
> AsyncIndexUpdate and hence can check if the checkpoint is in use by cross 
> checking with {{:async}} state. Doing it in auto mode is risky as 
> {{checkpoint}} api can be used by any module.
> With OAK-2314 we also record some metadata like {{creator}} and {{name}}. 
> This can be used for auto cleanup. For example in some running system 
> following checkpoints are listed
> {noformat}
> Mon Sep 19 18:02:09 EDT 2016  Sun Jun 16 18:02:09 EDT 2019    
> r15744787d0a-1-1        
>  
> creator=AsyncIndexUpdate
> name=fulltext-async
> thread=sling-default-4070-Registered Service.653
>  
> Mon Sep 19 18:02:09 EDT 2016  Sun Jun 16 18:02:09 EDT 2019    
> r15744787d0a-0-1        
>  
> creator=AsyncIndexUpdate
> name=async
> thread=sling-default-4072-Registered Service.656
>  
> Fri Aug 19 18:57:33 EDT 2016  Thu May 16 18:57:33 EDT 2019    
> r156a50612e1-1-1        
>  
> creator=AsyncIndexUpdate
> name=async
> thread=sling-default-10-Registered Service.654
>  
> Wed Aug 10 12:13:20 EDT 2016  Tue May 07 12:25:52 EDT 2019    
> r156753ac38d-0-1        
>  
> creator=AsyncIndexUpdate
> name=async
> thread=sling-default-6041-Registered Service.1966
> {noformat}
> As can be seen that last 2 checkpoints are orphan and they would prevent 
> revision gc. For auto mode we can use following heuristic
> # List all current checkpoints
> # Only keep the latest checkpoint for given {{creator}} and {{name}} combo. 
> Other entries from same pair which are older i.e. creation time can be 
> consider orphan and deleted
> This logic can be implemented 
> {{org.apache.jackrabbit.oak.checkpoint.Checkpoints}} and can be invoked by 
> Revision GC logic (both in DocumentNodeStore and SegmentNodeStore) to 
> determine the base revision to keep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to