[
https://issues.apache.org/jira/browse/OAK-12212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joerg Hoh reassigned OAK-12212:
-------------------------------
Assignee: Joerg Hoh
> Drifts in PersistentDiskCache.cacheSize counter
> -----------------------------------------------
>
> Key: OAK-12212
> URL: https://issues.apache.org/jira/browse/OAK-12212
> Project: Jackrabbit Oak
> Issue Type: Task
> Components: segment-azure
> Affects Versions: 2.0.0
> Reporter: Joerg Hoh
> Assignee: Joerg Hoh
> Priority: Major
>
> h2. Observation
> A heap dump of a long-running instance shows:
> * PersistentDiskCache.maxCacheSizeBytes ≈ 20 GiB (matches the configured
> value)
> * AbstractPersistentCache.cacheSize (an AtomicLong, inherited) ≈ 80 GiB —
> roughly 4× the configured maximum
> The actual cache directory on disk stays at or below the configured limit;
> only the in-memory counter has run away.
> h2. Root cause
> {{PersistentDiskCache.writeSegment(...)}} adds {{fileSize}} to the in-memory
> {{cacheSize}} on every invocation that reaches the write body, but the
> corresponding file on disk is replaced — not added — when the same segment id
> is written more than once. The writesPending guard inside {{writeSegment}}
> only prevents concurrently running tasks for the same id; it does not prevent
> sequentially submitted tasks. On POSIX file systems, {{Files.move(...,
> ATOMIC_MOVE)}} maps to rename(2) and silently replaces the destination, so
> the second (and subsequent) writes leave the directory unchanged in size
> while still incrementing the counter.
> The eviction loop ({{cleanUpInternal}}) walks the directory and subtracts the
> actual length of each deleted file once. The "phantom" bytes contributed by
> redundant writes are therefore never repaid and accumulate monotonically over
> the lifetime of the JVM.
> In addition, two smaller contributing factors keep the drift unidirectional
> (upward):
> * cacheSize is initialized to 0 and is never reconciled against the existing
> cache directory at startup; it relies entirely on incremental accounting
> being correct.
> * The error branch of {{writeSegment}} deletes segmentFile on any
> {{Files.move}} failure but does not decrement the counter for whatever
> contribution that file previously made.
> Triggering workloads Any workload that produces multiple writes for the same
> segment id over time: concurrent cache misses on the same segment (e.g.
> compaction, online GC, indexing, mass traversal, standby replication, warm-up
> after restart). The probability per workload determines the rate at which the
> counter diverges — instances that run weeks/months will drift by tens of GiB
> regardless of how the workload looks at any given moment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)