[
https://issues.apache.org/jira/browse/OAK-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amit Jain reassigned OAK-7209:
--
Assignee: Amit Jain
> Race condition can resurrect blobs during blob GC
> -
>
> Key: OAK-7209
> URL: https://issues.apache.org/jira/browse/OAK-7209
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: blob-plugins
>Affects Versions: 1.6.5
>Reporter: Csaba Varga
>Assignee: Amit Jain
>Priority: Minor
>
> A race condition exists between the scheduled blob ID publishing process and
> the GC process that can resurrect the blobs being deleted by the GC. This is
> how it can happen:
> # MarkSweepGarbageCollector.collectGarbage() starts running.
> # As part of the preparation for sweeping, BlobIdTracker.globalMerge() is
> called, which merges all blob ID records from the blob store into the local
> tracker.
> # Sweeping begins deleting files.
> # BlobIdTracker.snapshot() gets called by the scheduler. It pushes all blob
> ID records that were collected and merged in step 2 back into the blob store,
> then deletes the local copies.
> # Sweeping completes and tries to remove the successfully deleted blobs from
> the tracker. Step 4 already deleted those records from the local files, so
> nothing gets removed.
> The end result is that all blobs removed during the GC run will be considered
> still alive and causes warnings when later GC runs try to remove them again.
> The risk is higher the longer the sweep runs, but it can happen during a
> short but badly timed GC run as well. (We've found it during a GC run that
> took more than 11 hours to complete.)
> I can see two ways to approach this:
> # Suspend the execution of BlobIdTracker.snapshot() while Blob GC is in
> progress. This requires adding new methods to the BlobTracker interface to
> allow suspending and resuming snapshotting of the tracker.
> # Have the two overloads of BlobIdTracker.remove() do a globalMerge() before
> trying to remove anything. This ensures that even if a snapshot() call
> happened during the GC run, all IDs are "pulled back" into the local tracker
> and can be removed successfully.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)