Scott Yuan created OAK-11817:
--------------------------------

             Summary: Add configurable strict blob verification to 
RemoteBlobProcessor to prevent missing blob files in Cold Standby
                 Key: OAK-11817
                 URL: https://issues.apache.org/jira/browse/OAK-11817
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: segment-tar
    Affects Versions: 1.22.22
         Environment: RHEL 9 + JDK 11+ Apache Sling 1.22
            Reporter: Scott Yuan


With a properly configured TarMK cold standby (e.g., Adobe Experience Manager 
6.5.23) utilizing Apache Jackrabbit Oak segment-tar cold standby with an 
external BlobStore, the cold standby occasionally creates {*}random missing 
blobs under heavy load{*}. After investigating the Apache Jackrabbit Oak source 
code, it appears that the current logic in _RemoteBlobProcessor.java_ assumes 
that if a _SegmentBlob_ has a _blobId_ and _blob.getReference()_ is not null, 
then the blob is physically present and readable.

However, in real-world scenarios, especially with eventual or out-of-band blob 
synchronization (e.g., via rsync), this assumption can be incorrect. The blob 
may be:
 * Not yet copied
 * Deleted by GC
 * Corrupted or unreadable

This leads to runtime errors when the standby node tries to read missing blobs 
that were assumed present.

+*Proposal:*+
Introduce a new *OSGi configuration property* in _StandbyStoreService_ called 
_strictBlobVerify_ When enabled:
 * {{RemoteBlobProcessor}} will *attempt to open and read a few bytes* from the 
blob to verify it is {*}physically present and readable{*}.

 * If the check fails, the blob is {*}re-fetched from the primary node{*}.

This adds a safeguard against false positives from the reference existing check 
only approach and ensures Cold Standby is more robust in environments with 
non-instantaneous blob synchronization.  It allows administrators to toggle 
strict blob verification behavior depending on their setup (e.g. dev vs 
production).

+*Implementation Plan:*+
 # Add _strictBlobVerify_ to _StandbyStoreServiceConfiguration_
 # Read this flag in _StandbyStoreService_
 # Pass it to _RemoteBlobProcessor_
 # In {_}RemoteBlobProcessor.shouldFetchBinary(){_}, verify if reference 
readable if _strictBlobVerify_ has been specified.

+*Benefits:*+
 * Improves reliability of Cold Standby in environments with delayed or 
out-of-band blob sync

 * Prevents silent corruption or missing blobs

 * Configurable to preserve existing behavior for users who don’t need it

 

Example Configuration:
{noformat}
# 
org.apache.jackrabbit.oak.plugins.segment.standby.store.StandbyStoreService.cfg
strictBlobVerify=true{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to