[ 
https://issues.apache.org/jira/browse/OAK-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Varga updated OAK-7209:
-----------------------------
    Affects Version/s: 1.10
                       1.8.2

> Race condition can resurrect blobs during blob GC
> -------------------------------------------------
>
>                 Key: OAK-7209
>                 URL: https://issues.apache.org/jira/browse/OAK-7209
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob-plugins
>    Affects Versions: 1.6.5, 1.10, 1.8.2
>            Reporter: Csaba Varga
>            Assignee: Amit Jain
>            Priority: Minor
>
> A race condition exists between the scheduled blob ID publishing process and 
> the GC process that can resurrect the blobs being deleted by the GC. This is 
> how it can happen:
>  # MarkSweepGarbageCollector.collectGarbage() starts running.
>  # As part of the preparation for sweeping, BlobIdTracker.globalMerge() is 
> called, which merges all blob ID records from the blob store into the local 
> tracker.
>  # Sweeping begins deleting files.
>  # BlobIdTracker.snapshot() gets called by the scheduler. It pushes all blob 
> ID records that were collected and merged in step 2 back into the blob store, 
> then deletes the local copies.
>  # Sweeping completes and tries to remove the successfully deleted blobs from 
> the tracker. Step 4 already deleted those records from the local files, so 
> nothing gets removed.
> The end result is that all blobs removed during the GC run will be considered 
> still alive and causes warnings when later GC runs try to remove them again. 
> The risk is higher the longer the sweep runs, but it can happen during a 
> short but badly timed GC run as well. (We've found it during a GC run that 
> took more than 11 hours to complete.)
> I can see two ways to approach this:
>  # Suspend the execution of BlobIdTracker.snapshot() while Blob GC is in 
> progress. This requires adding new methods to the BlobTracker interface to 
> allow suspending and resuming snapshotting of the tracker.
>  # Have the two overloads of BlobIdTracker.remove() do a globalMerge() before 
> trying to remove anything. This ensures that even if a snapshot() call 
> happened during the GC run, all IDs are "pulled back" into the local tracker 
> and can be removed successfully.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to