sachinAtMS opened a new pull request, #4648:
URL: https://github.com/apache/cassandra/pull/4648
## Motivation
When running Cassandra in managed infrastructure environments (e.g., Azure
VMSS with Medusa backup tooling), a common deployment pattern uses a single
RAID volume (e.g., 4TB) shared between Cassandra data files and backup
snapshots. During the backup upload window, both the live SSTable data **and**
the snapshot copies (which are hardlinks occupying logical space on the same
filesystem) compete for the same disk. If total data exceeds 50% of the disk,
the snapshot hardlinks plus live data can fill the volume entirely, causing
Cassandra to halt writes or crash.
The root cause is that Cassandra's current snapshot mechanism **always**
creates snapshots as hardlinks under `data_file_directories`, making it
impossible to redirect snapshot storage to a separate mount/volume without
modifying application code.
### Real-World Scenario
- **Setup**: Azure VMSS nodes, 4TB RAID0 disk, Cassandra data + Medusa
backup snapshots on the same mount
- **Problem**: With 2.5TB of data, taking a backup snapshot creates
hardlinks that cause the filesystem metadata to report >4TB used, triggering
disk-full conditions
- **Why hardlinks aren't enough**: While hardlinks don't duplicate data
blocks, backup tools like Medusa subsequently upload from the snapshot
directory. During this window, compaction can remove the original SSTables,
causing the hardlinked snapshot files to become the sole owners of those data
blocks — effectively doubling disk usage until the upload completes and the
snapshot is cleared
## Solution
This patch introduces a new optional `snapshot_directory` configuration
parameter in `cassandra.yaml`. When set:
1. **Snapshots are written as file copies** to the configured directory
instead of hardlinks under `data_file_directories`
2. **Directory structure is mirrored**:
`<snapshot_directory>/<keyspace>/<table-id>/snapshots/<tag>/`
3. **All snapshot operations work transparently**: `nodetool snapshot`,
`nodetool listsnapshots`, `nodetool clearsnapshot`
4. **Fully backward-compatible**: When `snapshot_directory` is not set,
behavior is identical to current Cassandra (hardlinks under data dirs)
### Trade-off: Copies vs Hardlinks
Using file copies instead of hardlinks means:
- **Pro**: Snapshot data lives on a separate mount — data disk is never at
risk of filling due to snapshots
- **Pro**: The separate snapshot mount can be sized independently (e.g.,
cheaper/slower storage)
- **Con**: Copies use additional I/O and take longer than hardlink creation
- **Con**: Copies consume real disk space on the snapshot mount
This trade-off is acceptable for the target use case where disk safety is
more critical than snapshot speed.
## Configuration
Add to `cassandra.yaml`:
```yaml
# Optional: Redirect snapshot storage to a separate mount point.
# When set, snapshots use file copies instead of hardlinks.
# The directory must exist or be creatable by the Cassandra process.
snapshot_directory: /mnt/backup-disk/cassandra-snapshots
```
## Usage Examples
### Taking a snapshot (unchanged CLI, new storage location)
```bash
# Take a full snapshot of a keyspace
$ nodetool snapshot -t daily_backup_20260303 my_keyspace
Requested creating snapshot(s) for [my_keyspace] with snapshot name
[daily_backup_20260303] and target [all tables]
# Files are now at:
#
/mnt/backup-disk/cassandra-snapshots/my_keyspace/my_table-<id>/snapshots/daily_backup_20260303/
# NOT at:
#
/var/lib/cassandra/data/my_keyspace/my_table-<id>/snapshots/daily_backup_20260303/
```
### Listing snapshots
```bash
$ nodetool listsnapshots
Snapshot Details:
Snapshot name Keyspace name Column family name True size Size on
disk Creation time
daily_backup_20260303 my_keyspace my_table 6.05 KiB 6.05
KiB 2026-03-03T10:00:00.000Z
```
### Clearing a snapshot
```bash
$ nodetool clearsnapshot -t daily_backup_20260303 my_keyspace
Requested clearing snapshot(s) for [my_keyspace] with snapshot name
[daily_backup_20260303]
# Files removed from /mnt/backup-disk/cassandra-snapshots/...
```
## Changes
### Files Modified (8 files, ~300 lines added)
| File | Change |
|------|--------|
| `Config.java` | New `snapshot_directory` field with Javadoc |
| `DatabaseDescriptor.java` | Startup validation (directory
creation/permissions) + `getSnapshotDirectory()` / `hasSnapshotDirectory()`
accessor methods |
| `Directories.java` | Core snapshot path redirection:
`getSnapshotBaseDirectory()`, `getSnapshotSearchPaths()` helpers; modified
`getSnapshotDirectory()`, `listAllSnapshots()`, `listSnapshotDirsByTag()`,
`snapshotExists()`, `trueSnapshotsSize()`; new 4-arg `clearSnapshot()` overload
|
| `ColumnFamilyStore.java` | Modified `snapshotWithoutMemtable()` to use
`Files.copy()` when configured; new `copySSTableToDirectory()` helper;
`clearSnapshot()` passes `checkSnapshotDirectory=true` |
| `SSTableReader.java` | New `copyToDirectory()` method (alternative to
`createLinks()`) |
| `SnapshotLoader.java` | `addSnapshotDirectory()` helper to include
`snapshot_directory` in snapshot discovery paths |
| `SnapshotManager.java` | Use default `SnapshotLoader()` constructor (which
now includes snapshot_directory) |
| `Keyspace.java` | `clearSnapshot()` passes `checkSnapshotDirectory=true`
to ensure snapshot files on separate mount are cleaned up |
### Design Principles
- **Minimal invasion**: Only snapshot-related code paths are modified; no
changes to compaction, streaming, repair, or normal read/write paths
- **Feature-flagged**: All new behavior is gated behind
`DatabaseDescriptor.hasSnapshotDirectory()`; when not configured, every code
path falls through to the original logic
- **Consistent directory structure**: The snapshot directory mirrors the
`<keyspace>/<table-id>/snapshots/<tag>/` hierarchy from data directories
## Test Evidence
### Test Environment
- Apache Cassandra 4.1.11-SNAPSHOT (branch: `cassandra-4.1`)
- OpenJDK 11.0.27 on Ubuntu 20.04 (WSL2)
- Configuration:
```yaml
data_file_directories:
- /tmp/cassandra-data
snapshot_directory: /tmp/cassandra-snapshots
commitlog_directory: /tmp/cassandra-commitlog
```
### Test 1: Startup — `snapshot_directory` recognized
```
INFO [main] DatabaseDescriptor.java:694 - Snapshot directory configured:
/tmp/cassandra-snapshots.
Snapshots will use file copies instead of hardlinks.
```
**PASS**: Configuration is validated and logged at startup.
### Test 2: Snapshot creation — files stored on separate mount
```bash
$ nodetool snapshot -t test_snap test_ks
# Verify files are in snapshot_directory, NOT in data_directory:
$ find /tmp/cassandra-snapshots/test_ks -type f | head -5
/tmp/cassandra-snapshots/test_ks/test_table-9b3fce30.../snapshots/test_snap/nb-1-big-Data.db
/tmp/cassandra-snapshots/test_ks/test_table-9b3fce30.../snapshots/test_snap/nb-1-big-Index.db
/tmp/cassandra-snapshots/test_ks/test_table-9b3fce30.../snapshots/test_snap/nb-1-big-Filter.db
/tmp/cassandra-snapshots/test_ks/test_table-9b3fce30.../snapshots/test_snap/schema.cql
/tmp/cassandra-snapshots/test_ks/test_table-9b3fce30.../snapshots/test_snap/manifest.json
$ find /tmp/cassandra-data/test_ks -path '*/snapshots/*' -type f
# (empty — no snapshot files in data directory)
```
**PASS**: Snapshot files created in `snapshot_directory`, data directory is
clean.
### Test 3: Files are copies, not hardlinks
```bash
# Compare inodes between data file and snapshot file:
$ stat /tmp/cassandra-data/test_ks/test_table-.../nb-1-big-Data.db | grep
Inode
Inode: 264531
$ stat
/tmp/cassandra-snapshots/test_ks/test_table-.../snapshots/test_snap/nb-1-big-Data.db
| grep Inode
Inode: 264601
```
**PASS**: Different inodes confirm these are independent file copies, not
hardlinks.
### Test 4: `nodetool listsnapshots` discovers snapshots on separate mount
```bash
$ nodetool listsnapshots
Snapshot Details:
Snapshot name Keyspace name Column family name True size Size on disk
Creation time
final_test test_ks test_table 6.05 KiB 6.05 KiB
2026-03-02T17:06:08.221Z
```
**PASS**: Snapshots in `snapshot_directory` are correctly discovered and
listed.
### Test 5: `nodetool clearsnapshot` removes files from separate mount
```bash
# Before:
$ find /tmp/cassandra-snapshots/test_ks -path '*/clear_test/*' -type f | wc
-l
10
# Clear:
$ nodetool clearsnapshot -t clear_test test_ks
# After:
$ find /tmp/cassandra-snapshots/test_ks -path '*/clear_test/*' -type f | wc
-l
0
# Other snapshots remain intact:
$ find /tmp/cassandra-snapshots/test_ks -path '*/final_test/*' -type f | wc
-l
10
```
**PASS**: Snapshot files deleted from separate mount; other snapshots
unaffected.
### Test 6: `nodetool listsnapshots` reflects cleared snapshot
```bash
$ nodetool listsnapshots
Snapshot Details:
Snapshot name Keyspace name Column family name True size Size on disk
Creation time
final_test test_ks test_table 6.05 KiB 6.05 KiB
2026-03-02T17:06:08.221Z
# Only final_test remains — clear_test is gone.
```
**PASS**: Cleared snapshot no longer appears in listing.
### Test 7: Backward compatibility (no `snapshot_directory` set)
When `snapshot_directory` is not configured in `cassandra.yaml`:
- `DatabaseDescriptor.hasSnapshotDirectory()` returns `false`
- All code paths fall through to original hardlink-based behavior
- No changes to snapshot directory structure under `data_file_directories`
**PASS**: Feature is fully opt-in with zero impact on existing deployments.
## Build Verification
```
$ ant jar
BUILD SUCCESSFUL
Total time: 21 seconds
```
## Compatibility
- **Backward-compatible**: Feature is opt-in; default behavior (no
`snapshot_directory` configured) is identical to upstream
- **YAML-compatible**: New field is ignored by older Cassandra versions that
don't recognize it
- **No schema changes**: No new system tables or schema modifications
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]