[
https://issues.apache.org/jira/browse/OAK-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Reschke updated OAK-12166:
---------------------------------
Description:
MODE 1 of RDBVersionGC was introduced as part of an earlier VersionGC
refactoring that split the garbage collection implementation into a legacy
compatibility mode and newer optimized execution paths (see e.g. Apache
Jackrabbit Oak VersionGC refactoring and related RDBVersionGC evolution tickets
such as OAK-8932, which introduced MODE=1 as a fallback for compatibility and
database-specific constraints).
While this ensured backward compatibility and stability across different RDBMS
setups, it introduced a performance regression in MODE 1: the implementation
did not leverage SD (split document type) columns even when they were available
in the schema, causing unnecessary full scans of split documents instead of
restricting queries using indexed metadata.
This ticket fixes the regression by enhancing MODE 1 to detect the presence of
SD columns and include them in query conditions, aligning its behavior with the
optimized GC path while preserving compatibility with older schemas where those
columns are absent.
> RDBVersionGC MODE 1 (old method) should leverage SD columns when available
> --------------------------------------------------------------------------
>
> Key: OAK-12166
> URL: https://issues.apache.org/jira/browse/OAK-12166
> Project: Jackrabbit Oak
> Issue Type: Task
> Components: rdbmk
> Reporter: Julian Reschke
> Assignee: Julian Reschke
> Priority: Major
> Labels: candidate_oak_1_22
>
> MODE 1 of RDBVersionGC was introduced as part of an earlier VersionGC
> refactoring that split the garbage collection implementation into a legacy
> compatibility mode and newer optimized execution paths (see e.g. Apache
> Jackrabbit Oak VersionGC refactoring and related RDBVersionGC evolution
> tickets such as OAK-8932, which introduced MODE=1 as a fallback for
> compatibility and database-specific constraints).
> While this ensured backward compatibility and stability across different
> RDBMS setups, it introduced a performance regression in MODE 1: the
> implementation did not leverage SD (split document type) columns even when
> they were available in the schema, causing unnecessary full scans of split
> documents instead of restricting queries using indexed metadata.
> This ticket fixes the regression by enhancing MODE 1 to detect the presence
> of SD columns and include them in query conditions, aligning its behavior
> with the optimized GC path while preserving compatibility with older schemas
> where those columns are absent.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)