sachinAtMS opened a new pull request, #4648:
URL: https://github.com/apache/cassandra/pull/4648

   ## Motivation
   
   When running Cassandra in managed infrastructure environments (e.g., Azure 
VMSS with Medusa backup tooling), a common deployment pattern uses a single 
RAID volume (e.g., 4TB) shared between Cassandra data files and backup 
snapshots. During the backup upload window, both the live SSTable data **and** 
the snapshot copies (which are hardlinks occupying logical space on the same 
filesystem) compete for the same disk. If total data exceeds 50% of the disk, 
the snapshot hardlinks plus live data can fill the volume entirely, causing 
Cassandra to halt writes or crash.
   
   The root cause is that Cassandra's current snapshot mechanism **always** 
creates snapshots as hardlinks under `data_file_directories`, making it 
impossible to redirect snapshot storage to a separate mount/volume without 
modifying application code.
   
   ### Real-World Scenario
   - **Setup**: Azure VMSS nodes, 4TB RAID0 disk, Cassandra data + Medusa 
backup snapshots on the same mount
   - **Problem**: With 2.5TB of data, taking a backup snapshot creates 
hardlinks that cause the filesystem metadata to report >4TB used, triggering 
disk-full conditions
   - **Why hardlinks aren't enough**: While hardlinks don't duplicate data 
blocks, backup tools like Medusa subsequently upload from the snapshot 
directory. During this window, compaction can remove the original SSTables, 
causing the hardlinked snapshot files to become the sole owners of those data 
blocks — effectively doubling disk usage until the upload completes and the 
snapshot is cleared
   
   ## Solution
   
   This patch introduces a new optional `snapshot_directory` configuration 
parameter in `cassandra.yaml`. When set:
   
   1. **Snapshots are written as file copies** to the configured directory 
instead of hardlinks under `data_file_directories`
   2. **Directory structure is mirrored**: 
`<snapshot_directory>/<keyspace>/<table-id>/snapshots/<tag>/`
   3. **All snapshot operations work transparently**: `nodetool snapshot`, 
`nodetool listsnapshots`, `nodetool clearsnapshot`
   4. **Fully backward-compatible**: When `snapshot_directory` is not set, 
behavior is identical to current Cassandra (hardlinks under data dirs)
   
   ### Trade-off: Copies vs Hardlinks
   Using file copies instead of hardlinks means:
   - **Pro**: Snapshot data lives on a separate mount — data disk is never at 
risk of filling due to snapshots
   - **Pro**: The separate snapshot mount can be sized independently (e.g., 
cheaper/slower storage)
   - **Con**: Copies use additional I/O and take longer than hardlink creation
   - **Con**: Copies consume real disk space on the snapshot mount
   
   This trade-off is acceptable for the target use case where disk safety is 
more critical than snapshot speed.
   
   ## Configuration
   
   Add to `cassandra.yaml`:
   
   ```yaml
   # Optional: Redirect snapshot storage to a separate mount point.
   # When set, snapshots use file copies instead of hardlinks.
   # The directory must exist or be creatable by the Cassandra process.
   snapshot_directory: /mnt/backup-disk/cassandra-snapshots
   ```
   
   ## Usage Examples
   
   ### Taking a snapshot (unchanged CLI, new storage location)
   ```bash
   # Take a full snapshot of a keyspace
   $ nodetool snapshot -t daily_backup_20260303 my_keyspace
   Requested creating snapshot(s) for [my_keyspace] with snapshot name 
[daily_backup_20260303] and target [all tables]
   
   # Files are now at:
   # 
/mnt/backup-disk/cassandra-snapshots/my_keyspace/my_table-<id>/snapshots/daily_backup_20260303/
   # NOT at:
   # 
/var/lib/cassandra/data/my_keyspace/my_table-<id>/snapshots/daily_backup_20260303/
   ```
   
   ### Listing snapshots
   ```bash
   $ nodetool listsnapshots
   Snapshot Details:
   Snapshot name          Keyspace name  Column family name  True size  Size on 
disk  Creation time
   daily_backup_20260303  my_keyspace    my_table            6.05 KiB   6.05 
KiB      2026-03-03T10:00:00.000Z
   ```
   
   ### Clearing a snapshot
   ```bash
   $ nodetool clearsnapshot -t daily_backup_20260303 my_keyspace
   Requested clearing snapshot(s) for [my_keyspace] with snapshot name 
[daily_backup_20260303]
   
   # Files removed from /mnt/backup-disk/cassandra-snapshots/...
   ```
   
   ## Changes
   
   ### Files Modified (8 files, ~300 lines added)
   
   | File | Change |
   |------|--------|
   | `Config.java` | New `snapshot_directory` field with Javadoc |
   | `DatabaseDescriptor.java` | Startup validation (directory 
creation/permissions) + `getSnapshotDirectory()` / `hasSnapshotDirectory()` 
accessor methods |
   | `Directories.java` | Core snapshot path redirection: 
`getSnapshotBaseDirectory()`, `getSnapshotSearchPaths()` helpers; modified 
`getSnapshotDirectory()`, `listAllSnapshots()`, `listSnapshotDirsByTag()`, 
`snapshotExists()`, `trueSnapshotsSize()`; new 4-arg `clearSnapshot()` overload 
|
   | `ColumnFamilyStore.java` | Modified `snapshotWithoutMemtable()` to use 
`Files.copy()` when configured; new `copySSTableToDirectory()` helper; 
`clearSnapshot()` passes `checkSnapshotDirectory=true` |
   | `SSTableReader.java` | New `copyToDirectory()` method (alternative to 
`createLinks()`) |
   | `SnapshotLoader.java` | `addSnapshotDirectory()` helper to include 
`snapshot_directory` in snapshot discovery paths |
   | `SnapshotManager.java` | Use default `SnapshotLoader()` constructor (which 
now includes snapshot_directory) |
   | `Keyspace.java` | `clearSnapshot()` passes `checkSnapshotDirectory=true` 
to ensure snapshot files on separate mount are cleaned up |
   
   ### Design Principles
   - **Minimal invasion**: Only snapshot-related code paths are modified; no 
changes to compaction, streaming, repair, or normal read/write paths
   - **Feature-flagged**: All new behavior is gated behind 
`DatabaseDescriptor.hasSnapshotDirectory()`; when not configured, every code 
path falls through to the original logic
   - **Consistent directory structure**: The snapshot directory mirrors the 
`<keyspace>/<table-id>/snapshots/<tag>/` hierarchy from data directories
   
   ## Test Evidence
   
   ### Test Environment
   - Apache Cassandra 4.1.11-SNAPSHOT (branch: `cassandra-4.1`)
   - OpenJDK 11.0.27 on Ubuntu 20.04 (WSL2)
   - Configuration:
     ```yaml
     data_file_directories:
         - /tmp/cassandra-data
     snapshot_directory: /tmp/cassandra-snapshots
     commitlog_directory: /tmp/cassandra-commitlog
     ```
   
   ### Test 1: Startup — `snapshot_directory` recognized
   ```
   INFO  [main] DatabaseDescriptor.java:694 - Snapshot directory configured: 
/tmp/cassandra-snapshots.
     Snapshots will use file copies instead of hardlinks.
   ```
   **PASS**: Configuration is validated and logged at startup.
   
   ### Test 2: Snapshot creation — files stored on separate mount
   ```bash
   $ nodetool snapshot -t test_snap test_ks
   
   # Verify files are in snapshot_directory, NOT in data_directory:
   $ find /tmp/cassandra-snapshots/test_ks -type f | head -5
   
/tmp/cassandra-snapshots/test_ks/test_table-9b3fce30.../snapshots/test_snap/nb-1-big-Data.db
   
/tmp/cassandra-snapshots/test_ks/test_table-9b3fce30.../snapshots/test_snap/nb-1-big-Index.db
   
/tmp/cassandra-snapshots/test_ks/test_table-9b3fce30.../snapshots/test_snap/nb-1-big-Filter.db
   
/tmp/cassandra-snapshots/test_ks/test_table-9b3fce30.../snapshots/test_snap/schema.cql
   
/tmp/cassandra-snapshots/test_ks/test_table-9b3fce30.../snapshots/test_snap/manifest.json
   
   $ find /tmp/cassandra-data/test_ks -path '*/snapshots/*' -type f
   # (empty — no snapshot files in data directory)
   ```
   **PASS**: Snapshot files created in `snapshot_directory`, data directory is 
clean.
   
   ### Test 3: Files are copies, not hardlinks
   ```bash
   # Compare inodes between data file and snapshot file:
   $ stat /tmp/cassandra-data/test_ks/test_table-.../nb-1-big-Data.db | grep 
Inode
     Inode: 264531
   
   $ stat 
/tmp/cassandra-snapshots/test_ks/test_table-.../snapshots/test_snap/nb-1-big-Data.db
 | grep Inode
     Inode: 264601
   ```
   **PASS**: Different inodes confirm these are independent file copies, not 
hardlinks.
   
   ### Test 4: `nodetool listsnapshots` discovers snapshots on separate mount
   ```bash
   $ nodetool listsnapshots
   Snapshot Details:
   Snapshot name  Keyspace name  Column family name  True size  Size on disk  
Creation time
   final_test     test_ks        test_table          6.05 KiB   6.05 KiB      
2026-03-02T17:06:08.221Z
   ```
   **PASS**: Snapshots in `snapshot_directory` are correctly discovered and 
listed.
   
   ### Test 5: `nodetool clearsnapshot` removes files from separate mount
   ```bash
   # Before:
   $ find /tmp/cassandra-snapshots/test_ks -path '*/clear_test/*' -type f | wc 
-l
   10
   
   # Clear:
   $ nodetool clearsnapshot -t clear_test test_ks
   
   # After:
   $ find /tmp/cassandra-snapshots/test_ks -path '*/clear_test/*' -type f | wc 
-l
   0
   
   # Other snapshots remain intact:
   $ find /tmp/cassandra-snapshots/test_ks -path '*/final_test/*' -type f | wc 
-l
   10
   ```
   **PASS**: Snapshot files deleted from separate mount; other snapshots 
unaffected.
   
   ### Test 6: `nodetool listsnapshots` reflects cleared snapshot
   ```bash
   $ nodetool listsnapshots
   Snapshot Details:
   Snapshot name  Keyspace name  Column family name  True size  Size on disk  
Creation time
   final_test     test_ks        test_table          6.05 KiB   6.05 KiB      
2026-03-02T17:06:08.221Z
   
   # Only final_test remains — clear_test is gone.
   ```
   **PASS**: Cleared snapshot no longer appears in listing.
   
   ### Test 7: Backward compatibility (no `snapshot_directory` set)
   When `snapshot_directory` is not configured in `cassandra.yaml`:
   - `DatabaseDescriptor.hasSnapshotDirectory()` returns `false`
   - All code paths fall through to original hardlink-based behavior
   - No changes to snapshot directory structure under `data_file_directories`
   
   **PASS**: Feature is fully opt-in with zero impact on existing deployments.
   
   ## Build Verification
   ```
   $ ant jar
   BUILD SUCCESSFUL
   Total time: 21 seconds
   ```
   
   ## Compatibility
   - **Backward-compatible**: Feature is opt-in; default behavior (no 
`snapshot_directory` configured) is identical to upstream
   - **YAML-compatible**: New field is ignored by older Cassandra versions that 
don't recognize it
   - **No schema changes**: No new system tables or schema modifications


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to