Hi,

We added a bunch of new nodes to a cluster (2.1.13) and everything went
fine, except for the number of pending compactions that is staying quite
high on a subset of the new nodes. Over the past 3 days, the pending
compactions have never been less than ~130 on such nodes, with peaks of
~200. On the other nodes, they correctly fluctuate between 0 and ~20, which
has been our norm for a long time.

We are quite paranoid about pending compactions because in the past such
high number caused a lot of data being brought in memory during some reads
and that triggered a chain reaction of full GCs that brought down our
cluster, so we try to monitor them closely.

Some data points that should let the situation speak for itself:

- We use LCS for all our column families

- The cluster is operating absolutely fine and seems healthy, and every
node is handling pretty much the same load in terms of reads and writes.
Also, these nodes with higher pending compactions don't seem in any way
performing worse than the others

- The pending compactions don't go down even when setting the compaction
throughput to unlimited for a very long time

- This is the typical output of compactionstats and tpstats:

$ nodetool compactionstats
pending tasks: 137
   compaction type   keyspace            table     completed         total
   unit   progress
        Compaction     draios   message_data60    6111208394    6939536890
  bytes     88.06%
        Compaction     draios    message_data1   26473390790   37243294809
  bytes     71.08%
Active compaction remaining time :        n/a

$ nodetool tpstats
Pool Name                    Active   Pending      Completed   Blocked  All
time blocked
CounterMutationStage              0         0              0         0
            0
ReadStage                         1         0      111766844         0
            0
RequestResponseStage              0         0      244259493         0
            0
MutationStage                     0         0      163268653         0
            0
ReadRepairStage                   0         0        8933323         0
            0
GossipStage                       0         0         363003         0
            0
CacheCleanupExecutor              0         0              0         0
            0
AntiEntropyStage                  0         0              0         0
            0
MigrationStage                    0         0              2         0
            0
Sampler                           0         0              0         0
            0
ValidationExecutor                0         0              0         0
            0
CommitLogArchiver                 0         0              0         0
            0
MiscStage                         0         0              0         0
            0
MemtableFlushWriter               0         0          32644         0
            0
MemtableReclaimMemory             0         0          32644         0
            0
PendingRangeCalculator            0         0            527         0
            0
MemtablePostFlush                 0         0          36565         0
            0
CompactionExecutor                2        70         108621         0
            0
InternalResponseStage             0         0              0         0
            0
HintedHandoff                     0         0             10         0
            0
Native-Transport-Requests         6         0      188996929         0
        79122

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
PAGED_RANGE                  0
BINARY                       0
READ                         0
MUTATION                     0
_TRACE                       0
REQUEST_RESPONSE             0
COUNTER_MUTATION             0

- If I do a nodetool drain on such nodes, and then wait for a while, the
number of pending compactions stays high even if there are no compactions
being executed anymore and the node is completely idle:

$ nodetool compactionstats
pending tasks: 128

- It's also interesting to notice how the compaction in the previous
example is trying to compact ~37 GB, which is essentially the whole size of
the column family message_data1 as reported by cfstats:

$ nodetool cfstats -H draios.message_data1
Keyspace: draios
Read Count: 208168
Read Latency: 2.4791508685292647 ms.
Write Count: 502529
Write Latency: 0.20701542000561163 ms.
Pending Flushes: 0
Table: message_data1
SSTable count: 261
SSTables in each level: [43/4, 92/10, 125/100, 0, 0, 0, 0, 0, 0]
Space used (live): 36.98 GB
Space used (total): 36.98 GB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 36.21 MB
SSTable Compression Ratio: 0.15461126176169512
Number of keys (estimate): 101025
Memtable cell count: 229344
Memtable data size: 82.4 MB
Memtable off heap memory used: 0 bytes
Memtable switch count: 83
Local read count: 208225
Local read latency: 2.479 ms
Local write count: 502581
Local write latency: 0.208 ms
Pending flushes: 0
Bloom filter false positives: 11497
Bloom filter false ratio: 0.04307
Bloom filter space used: 94.97 KB
Bloom filter off heap memory used: 92.93 KB
Index summary off heap memory used: 57.88 KB
Compression metadata off heap memory used: 36.06 MB
Compacted partition minimum bytes: 447 bytes
Compacted partition maximum bytes: 34.48 MB
Compacted partition mean bytes: 1.51 MB
Average live cells per slice (last five minutes): 26.269698643294515
Maximum live cells per slice (last five minutes): 100.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0

- There are no warnings or errors in the log, even after a clean restart

- Restarting the node doesn't seem to have any effect on the number of
pending compactions

Any help would be very appreciated.

Thank you for reading

Reply via email to