This is an automated email from the ASF dual-hosted git repository. miroslav pushed a commit to branch issue/OAK-12099_segment in repository https://gitbox.apache.org/repos/asf/jackrabbit-oak.git
commit a1a08f0511437542f8285b6f093e312d87912b30 Author: smiroslav <[email protected]> AuthorDate: Wed Feb 18 11:14:55 2026 +0100 OAK-12099 AGENTS.md for segment node store modules --- oak-segment-azure/AGENTS.md | 96 ++++++++++++++++++++++ oak-segment-remote/AGENTS.md | 64 +++++++++++++++ oak-segment-tar/AGENTS.md | 185 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 345 insertions(+) diff --git a/oak-segment-azure/AGENTS.md b/oak-segment-azure/AGENTS.md new file mode 100644 index 0000000000..da7b7d4967 --- /dev/null +++ b/oak-segment-azure/AGENTS.md @@ -0,0 +1,96 @@ +# AGENTS.md — oak-segment-azure + +## Module Overview + +Azure Blob Storage backend for Oak's segment node store. Implements the +`SegmentNodeStorePersistence` SPI (defined in oak-segment-tar) and extends the abstract +reader/writer from oak-segment-remote. + +Two SDK variants coexist: +- **Modern (v12)** — `org.apache.jackrabbit.oak.segment.azure` package, uses `com.azure.storage.blob` +- **Legacy (v8)** — `org.apache.jackrabbit.oak.segment.azure.v8` package, uses `com.microsoft.azure.storage` + +SDK selection: system property `segment.azure.v12.enabled` (default: v8 for backward compatibility). + +## Key Classes + +| Class | Role | +|-------|------| +| `AzurePersistence` | `SegmentNodeStorePersistence` implementation (v12). Creates archive manager, journal, lock | +| `AzurePersistenceV8` | Same, using legacy v8 SDK | +| `AzureSegmentStoreService` | OSGi component that routes to v12 or v8 based on system property | +| `AzurePersistenceManager` | Factory: creates `AzurePersistence` from OSGi config (handles auth methods) | +| `AzureArchiveManager` | `SegmentArchiveManager` — manages segment archives as blob collections | +| `AzureSegmentArchiveReader` | Extends `AbstractRemoteSegmentArchiveReader` (from oak-segment-remote) | +| `AzureSegmentArchiveWriter` | Extends `AbstractRemoteSegmentArchiveWriter`, with retry and `WriteAccessController` | +| `AzureRepositoryLock` | Distributed lock via Azure blob lease with background renewal thread | +| `AzureJournalFile` | Journal stored as append blobs, rotated at configurable line limit | +| `Configuration` | OSGi metatype config (account, container, auth credentials, etc.) | + +## Authentication Methods + +Configured via OSGi properties (in `Configuration`), resolved in `AzurePersistenceManager`: + +1. **Connection URL** — full connection string (takes precedence) +2. **Access Key** — `accountName` + `accessKey` +3. **SAS Token** — `sharedAccessSignature` +4. **Service Principal** — `clientId` + `clientSecret` + `tenantId` + +Environment variables: `AZURE_ACCOUNT_NAME`, `AZURE_SECRET_KEY`, `AZURE_TENANT_ID`, +`AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET` + +## Repository Lock + +`AzureRepositoryLock` uses an Azure blob lease on a `repo.lock` blob: +- Acquires lease (default 60s duration), renews every 5s in a daemon thread +- Blocks writes via `WriteAccessController` after 20s of renewal failures +- Lease auto-expires on process crash (no manual cleanup) +- See `oak.segment.azure.lock.*` system properties below + +## System Properties + +| Property | Default | Purpose | +|----------|---------|---------| +| `segment.azure.v12.enabled` | false | Use modern v12 SDK instead of legacy v8 | +| `segment.azure.v12.http.verbose.enabled` | false | Verbose HTTP request logging | +| `segment.retry.policy.type` | "fixed" | Retry policy type | +| `segment.azure.retry.attempts` | 5 | Retry count for Azure operations | +| `segment.timeout.execution` | 30 | Read timeout in seconds | +| `segment.azure.batch.copy.size` | 1000 | Batch size for archive copy operations | +| `oak.segment.azure.lock.timeout` | 0 | Lock acquisition timeout (0 = fail immediately) | +| `oak.segment.azure.lock.leaseDurationInSec` | 60 | Blob lease duration | +| `oak.segment.azure.lock.leaseRenewalIntervalInSec` | 5 | Lease renewal frequency | +| `oak.segment.azure.lock.blockWritesAfterInSec` | 20 | Grace period before blocking writes on renewal failure | +| `oak.segment.azure.lock.leaseRenewalTimeoutInMs` | 5000 | Timeout for individual lease renewal calls | +| `org.apache.jackrabbit.oak.segment.azure.journal.lines` | 40000 | Max lines per journal blob before rotation | +| `azure.segment.archive.writer.retries.max` | 16 | Max retries for segment upload | +| `azure.segment.archive.writer.retries.intervalMs` | 5000 | Retry interval for segment upload | + +## OSGi Configuration + +PID: `org.apache.jackrabbit.oak.segment.azure.AzureSegmentStoreService` + +Key properties: `accountName`, `containerName` (default "oak"), `accessKey`, +`rootPath` (default "/oak"), `connectionURL`, `sharedAccessSignature`, `blobEndpoint`, +`clientId`, `clientSecret`, `tenantId`, `role`, `enableSecondaryLocation` + +## OSGi Bundle + +No packages are exported — this module is a leaf bundle. It imports the SPIs from +oak-segment-tar and oak-segment-remote, and embeds Azure SDK dependencies. + +## CLI Tools (`tool` package) + +- `AzureCheck` — integrity check against Azure-backed segment store +- `AzureCompact` — offline compaction +- `SegmentCopy` — copy segments between stores +- `SegmentStoreMigrator` — migrate between storage backends + +## Testing + +Tests run against **Azurite** (Azure Storage emulator) via TestContainers: +- `AzuriteDockerRule` — JUnit rule that starts `mcr.microsoft.com/azure-storage/azurite:3.31.0` +- Tests exist in parallel for both v12 and v8 implementations +- Integration tests (IT suffix) require Docker for Azurite + +The `start-azurite.sh` script is deprecated — tests now use TestContainers directly. \ No newline at end of file diff --git a/oak-segment-remote/AGENTS.md b/oak-segment-remote/AGENTS.md new file mode 100644 index 0000000000..e3ee99e29a --- /dev/null +++ b/oak-segment-remote/AGENTS.md @@ -0,0 +1,64 @@ +# AGENTS.md — oak-segment-remote + +## Module Overview + +Shared base module for cloud segment store backends (oak-segment-azure, oak-segment-aws). +Provides abstract archive reader/writer implementations, an async write queue, persistent +caching (disk and Redis), and write access control. + +This module does **not** implement `SegmentNodeStorePersistence` itself — it provides the +building blocks that cloud-specific modules extend. + +## Key Classes + +| Class | Purpose | +|-------|---------| +| `AbstractRemoteSegmentArchiveReader` | Template for cloud segment readers. Subclasses implement `doReadSegmentToBuffer()` and `doReadDataFile()` | +| `AbstractRemoteSegmentArchiveWriter` | Template for cloud segment writers with optional async queue. Subclasses implement `doWriteArchiveEntry()` and `doWriteDataFile()` | +| `WriteAccessController` | Thread-safe gate for write operations. `disableWriting()` blocks all threads calling `checkWritingAllowed()` until `enableWriting()` is called. Used by repository lock implementations to pause writes during lease renewal failures | +| `RemoteSegmentArchiveEntry` | `SegmentArchiveEntry` implementation carrying UUID, position, length, and generation | +| `RemoteUtilities` | Segment file naming (`{position}.{uuid}`), archive indexing, off-heap buffer allocation | +| `RemoteBlobMetadata` | Serialization of segment metadata to/from blob storage metadata headers | + +## Async Write Queue (`queue` package) + +`SegmentWriteQueue` provides concurrent segment uploads: +- Thread pool size: `oak.segment.remote.threads` (default 5) + 1 emergency retry thread +- Queue capacity: `oak.segment.remote.queue.size` (default 20) +- Failed writes are retried in a dedicated recovery loop +- Queue is flushed and closed when the archive writer closes + +## Persistent Cache (`persistentcache` package) + +Two cache implementations for reducing cloud read latency: + +| Implementation | Backend | Key config | +|----------------|---------|------------| +| `PersistentDiskCache` | Local filesystem with LRU eviction | `diskCacheDirectory`, `diskCacheMaxSizeMB` (default 512) | +| `PersistentRedisCache` | Redis via Jedis connection pool | `redisCacheHost`, `redisCachePort`, `redisCacheExpireSeconds` (default 2 days) | + +OSGi PID: `org.apache.jackrabbit.oak.segment.remote.RemotePersistentCacheService` + +## System Properties + +| Property | Default | Purpose | +|----------|---------|---------| +| `access.off.heap` | false | Use direct (off-heap) ByteBuffers for segment data | +| `oak.segment.remote.threads` | 5 | Write queue worker thread count | +| `oak.segment.remote.queue.size` | 20 | Write queue capacity | + +## OSGi Exports + +All packages are exported (used by oak-segment-azure and oak-segment-aws): +``` +org.apache.jackrabbit.oak.segment.remote +org.apache.jackrabbit.oak.segment.remote.persistentcache +org.apache.jackrabbit.oak.segment.remote.queue +``` + +## Testing + +- `WriteAccessControllerTest` — write gating behavior +- `SegmentWriteQueueTest` — async queue threading and retry logic +- `PersistentDiskCacheTest` — disk cache with LRU eviction +- `PersistentRedisCacheTest` — Redis cache (uses embedded Redis) \ No newline at end of file diff --git a/oak-segment-tar/AGENTS.md b/oak-segment-tar/AGENTS.md new file mode 100644 index 0000000000..fab344276b --- /dev/null +++ b/oak-segment-tar/AGENTS.md @@ -0,0 +1,185 @@ +# AGENTS.md — oak-segment-tar + +## Module Overview + +Immutable segment-based content storage for Oak, using TAR files as the default +persistence format. This is the default NodeStore for single-instance deployments. + +This module has two roles: +1. **SPI definition** — the `spi` packages define the storage abstraction that cloud + backends (oak-segment-azure, oak-segment-aws) implement +2. **TAR implementation** — the concrete FileStore/TarMK storage engine + +Docs: `oak-doc/src/site/markdown/nodestore/segment/overview.md`, +`oak-doc/src/site/markdown/nodestore/segmentmk.md` + +## Key Concepts + +### Segments +Immutable byte containers (max 256 KiB). Two kinds: +- **Data segments** — contain node records, templates, and references to other segments +- **Bulk segments** — contain raw binary data (strings, blobs) + +Segments are identified by a 128-bit UUID (`SegmentId`). Records within a segment are +addressed by `RecordId` (segment + offset). Records are 4-byte aligned. + +### Record Types (`RecordType`) +`NODE`, `TEMPLATE`, `LEAF`, `BRANCH`, `BUCKET`, `LIST`, `VALUE`, `BLOCK`, `BLOB_ID`. +Templates act as "hidden classes" — they encode the structure (property names, types, +child node layout) of a node and are shared across nodes with the same shape. + +### TAR Files +Segments are packed into TAR archives. Each TAR file contains: +- Segment entries (variable size, up to 256 KiB each) +- A binary index at the end for O(log n) segment lookup +- Default max TAR file size: 256 MB + +### Generations and Garbage Collection +Each compaction cycle creates a new **GC generation**. Segments carry their generation +number. Old generations are reclaimed during cleanup. By default, the last 2 generations +are retained (`RETAINED_GENERATIONS_DEFAULT = 2`). + +### Copy-on-Write / MVCC +Segments are never modified after creation. Content changes create new segments along +the modified path. Concurrent readers see a consistent snapshot; a single writer commits +atomically by updating the HEAD revision in the journal. + +## Package Layout + +| Package | Role | +|---------|------| +| `segment.spi.persistence` | **Public SPI** — `SegmentNodeStorePersistence`, `RepositoryLock`, `SegmentArchiveManager`, `JournalFile`, `GCJournalFile`, `ManifestFile`. Cloud backends implement these. | +| `segment.spi.monitor` | **Public SPI** — `IOMonitor`, `FileStoreMonitor`, `RemoteStoreMonitor` | +| `segment.spi.persistence.split` | **Public SPI** — split persistence (different backends per subtree) | +| `segment` | Core model — `Segment`, `SegmentId`, `RecordId`, `RecordType`, `SegmentNodeState`, `SegmentNodeStore`, `SegmentReader`, `SegmentWriter`, `SegmentTracker` | +| `segment.file` | `FileStore`, `FileStoreBuilder`, `ReadOnlyFileStore`, `GarbageCollector`, GC/compaction strategies | +| `segment.file.tar` | TAR file I/O — `TarFiles`, `TarReader`, `TarWriter`, `TarRevisions`, `TarPersistence` | +| `segment.file.tar.index` | TAR index structures (v1/v2) | +| `segment.compaction` | GC configuration — `SegmentGCOptions` (GCType, CompactorType), `SegmentRevisionGCMBean` | +| `segment.data` | Segment binary format — `SegmentData`, `SegmentDataV13` | +| `segment.standby` | Master–slave replication over Netty (codec, client, server, JMX) | +| `segment.scheduler` | Commit coordination — `LockBasedScheduler` | +| `segment.tool` | CLI commands — `Check`, `Compact`, `Backup`, `Restore`, debug tools | +| `segment.osgi` | OSGi service components — `TarPersistenceService`, `SplitPersistenceService` | +| `backup` | Backup/restore — `FileStoreBackup`, `FileStoreRestore` | +| `segment.memory` | In-memory segment store (for testing) | + +## OSGi Exports + +Only the `spi` packages are exported — everything else is internal: +``` +org.apache.jackrabbit.oak.segment.spi +org.apache.jackrabbit.oak.segment.spi.monitor +org.apache.jackrabbit.oak.segment.spi.persistence +org.apache.jackrabbit.oak.segment.spi.persistence.split +org.apache.jackrabbit.oak.segment.spi.persistence.persistentcache +``` + +Changes to exported packages trigger OSGi baseline checks. Changes to internal packages +do not affect downstream bundles. + +## FileStore Lifecycle + +### Opening (`FileStore` constructor via `FileStoreBuilder.build()`) +1. Acquires exclusive `RepositoryLock` via `persistence.lockRepository()` +2. Checks/updates manifest (store version) +3. Creates `SegmentWriter` for system writes +4. Initializes `TarFiles` (loads existing TAR archives) +5. Creates `GarbageCollector` with configured strategies + +### Closing (`FileStore.close()`) +1. Stops the background scheduler +2. Flushes pending writes +3. Releases closeables in order: `repositoryLock` → `tarFiles` → `revisions` +4. Forces GC and reaps pending file deletions + +### Key builder options (`FileStoreBuilder`) +- `withMaxFileSize(int MB)` — TAR file size limit (default 256 MB) +- `withMemoryMapping(boolean)` — memory-mapped I/O (default true on 64-bit JVMs) +- `withSegmentCacheSize(int MB)` — segment cache (default 256 MB) +- `withGCOptions(SegmentGCOptions)` — GC/compaction configuration +- `withBlobStore(BlobStore)` — external blob store (null = inline blobs) + +## Garbage Collection and Compaction + +### GC Types (`SegmentGCOptions.GCType`) +- `FULL` — compacts the entire HEAD state +- `TAIL` — compacts only the diff since the last compaction + +### Compactor Types (`SegmentGCOptions.CompactorType`) +- `CLASSIC_COMPACTOR` — simple single-threaded compaction +- `CHECKPOINT_COMPACTOR` — checkpoint-aware diff compaction +- `PARALLEL_COMPACTOR` — multithreaded (default) + +### Strategy Chain +``` +GarbageCollectionStrategy — orchestrates the full GC cycle + ├─ EstimationStrategy — estimates whether compaction is worthwhile + ├─ CompactionStrategy — executes the compaction + │ ├─ FullCompactionStrategy + │ ├─ TailCompactionStrategy + │ └─ FallbackCompactionStrategy (tries primary, falls back to secondary) + └─ CleanupStrategy — removes unreferenced old-generation segments +``` + +Two GC strategy implementations: +- `DefaultGarbageCollectionStrategy` — standard (estimate → compact → cleanup) +- `CleanupFirstGarbageCollectionStrategy` — cleanup before compaction + +### GC Defaults +| Parameter | Default | +|-----------|---------| +| Retry count | 5 | +| Force timeout | 60 seconds | +| Retained generations | 2 | +| Size delta estimation | 1 GB | +| Memory threshold | 15% | +| Compaction concurrency | 1 | + +## Persistence SPI + +To implement a custom storage backend (as oak-segment-azure and oak-segment-aws do), +implement `SegmentNodeStorePersistence`: + +| Method | Purpose | +|--------|---------| +| `createArchiveManager(...)` | Factory for `SegmentArchiveManager` — manages segment archive reading/writing | +| `segmentFilesExist()` | Check if the store has existing segments | +| `getJournalFile()` | `JournalFile` — revision journal (append-only log of HEAD record IDs) | +| `getGCJournalFile()` | `GCJournalFile` — GC history log | +| `getManifestFile()` | `ManifestFile` — store version metadata | +| `lockRepository()` | `RepositoryLock` — exclusive lock preventing concurrent access | + +The `RepositoryLock` contract requires that the lock is released automatically if the +process crashes (no manual cleanup). The TAR implementation uses file-system locks; +cloud implementations use blob leases (Azure) or DynamoDB locks (AWS). + +## Important System Properties + +| Property | Default | Purpose | +|----------|---------|---------| +| `access.off.heap` | false | Off-heap segment access | +| `gc.classic` | false | Use synchronized classic GC strategy | +| `oak.gc.backoff` | — | Minimum interval between GC runs (ms) | +| `oak.segment.compaction.gcSizeDeltaEstimation` | 1 GB | Minimum size delta to trigger compaction | +| `oak.segmentNodeStore.commitFairLock` | true | Fair locking for commit serialization | +| `oak.checkpoints.lockWaitTime` | 10s | Checkpoint lock acquisition timeout | + +## Testing + +- **JUnit 4** with Mockito 5.x +- `TemporaryFileStore` — JUnit rule that creates an ephemeral FileStore for a test + (`src/test/java/.../segment/test/TemporaryFileStore.java`) +- `SegmentTarFixture` — `NodeStoreFixture` implementation for cross-backend test suites +- In-memory segment store (`segment.memory` package) for fast unit tests +- Standby tests use `NetworkErrorProxy` for simulating network failures +- Integration tests (IT suffix) run with `-PintegrationTesting` + +## Common Pitfalls + +- Changing anything in `segment.spi.*` packages affects cloud backends + (oak-segment-azure, oak-segment-aws) — rebuild with `-pl oak-segment-tar -amd` +- Memory-mapping is enabled by default on 64-bit JVMs; some tests may behave differently + on 32-bit environments +- GC tests can be slow — use `-Dtest=ClassName` to run specific ones +- The standby subsystem embeds Netty — be aware of optional import resolution in OSGi \ No newline at end of file
