Hi, thanks for the detailed information, it is useful.

SSTables in each level: [43/4, 92/10, 125/100, 0, 0, 0, 0, 0, 0]


Looks like compaction is not doing so hot indeed.

What hardware do you use? Can you see it running at the limits (CPU / disks
IO)? Is there any error on system logs, are disks doing fine?

How is configured the concurrent_compactors setting? It looks like you are
using 2 from 'nodetool compactionstats' output. Could you increase that or
is it the number of cores of the machine?

What size do you use for LCS sstables? Using a small size might produce too
many sstables.

Have those nodes more sstables than the other ones?

Are compactions progressing or getting stuck at some point?

It's also interesting to notice how the compaction in the previous example
> is trying to compact ~37 GB, which is essentially the whole size of the
> column family message_data1 as reported by cfstats


Weird indeed, using LCS. Not a clue about that but a small tip: you can use
'nodetool compactionstats *-H*' to have values in human readable format
since 2.1 :-).

Good luck,
-----------------------
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-21 20:50 GMT+01:00 Gianluca Borello <gianl...@sysdig.com>:

> Hi,
>
> We added a bunch of new nodes to a cluster (2.1.13) and everything went
> fine, except for the number of pending compactions that is staying quite
> high on a subset of the new nodes. Over the past 3 days, the pending
> compactions have never been less than ~130 on such nodes, with peaks of
> ~200. On the other nodes, they correctly fluctuate between 0 and ~20, which
> has been our norm for a long time.
>
> We are quite paranoid about pending compactions because in the past such
> high number caused a lot of data being brought in memory during some reads
> and that triggered a chain reaction of full GCs that brought down our
> cluster, so we try to monitor them closely.
>
> Some data points that should let the situation speak for itself:
>
> - We use LCS for all our column families
>
> - The cluster is operating absolutely fine and seems healthy, and every
> node is handling pretty much the same load in terms of reads and writes.
> Also, these nodes with higher pending compactions don't seem in any way
> performing worse than the others
>
> - The pending compactions don't go down even when setting the compaction
> throughput to unlimited for a very long time
>
> - This is the typical output of compactionstats and tpstats:
>
> $ nodetool compactionstats
> pending tasks: 137
>    compaction type   keyspace            table     completed         total
>    unit   progress
>         Compaction     draios   message_data60    6111208394    6939536890
>   bytes     88.06%
>         Compaction     draios    message_data1   26473390790   37243294809
>   bytes     71.08%
> Active compaction remaining time :        n/a
>
> $ nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked
>  All time blocked
> CounterMutationStage              0         0              0         0
>             0
> ReadStage                         1         0      111766844         0
>             0
> RequestResponseStage              0         0      244259493         0
>             0
> MutationStage                     0         0      163268653         0
>             0
> ReadRepairStage                   0         0        8933323         0
>             0
> GossipStage                       0         0         363003         0
>             0
> CacheCleanupExecutor              0         0              0         0
>             0
> AntiEntropyStage                  0         0              0         0
>             0
> MigrationStage                    0         0              2         0
>             0
> Sampler                           0         0              0         0
>             0
> ValidationExecutor                0         0              0         0
>             0
> CommitLogArchiver                 0         0              0         0
>             0
> MiscStage                         0         0              0         0
>             0
> MemtableFlushWriter               0         0          32644         0
>             0
> MemtableReclaimMemory             0         0          32644         0
>             0
> PendingRangeCalculator            0         0            527         0
>             0
> MemtablePostFlush                 0         0          36565         0
>             0
> CompactionExecutor                2        70         108621         0
>             0
> InternalResponseStage             0         0              0         0
>             0
> HintedHandoff                     0         0             10         0
>             0
> Native-Transport-Requests         6         0      188996929         0
>         79122
>
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> PAGED_RANGE                  0
> BINARY                       0
> READ                         0
> MUTATION                     0
> _TRACE                       0
> REQUEST_RESPONSE             0
> COUNTER_MUTATION             0
>
> - If I do a nodetool drain on such nodes, and then wait for a while, the
> number of pending compactions stays high even if there are no compactions
> being executed anymore and the node is completely idle:
>
> $ nodetool compactionstats
> pending tasks: 128
>
> - It's also interesting to notice how the compaction in the previous
> example is trying to compact ~37 GB, which is essentially the whole size of
> the column family message_data1 as reported by cfstats:
>
> $ nodetool cfstats -H draios.message_data1
> Keyspace: draios
> Read Count: 208168
> Read Latency: 2.4791508685292647 ms.
> Write Count: 502529
> Write Latency: 0.20701542000561163 ms.
> Pending Flushes: 0
> Table: message_data1
> SSTable count: 261
> SSTables in each level: [43/4, 92/10, 125/100, 0, 0, 0, 0, 0, 0]
> Space used (live): 36.98 GB
> Space used (total): 36.98 GB
> Space used by snapshots (total): 0 bytes
> Off heap memory used (total): 36.21 MB
> SSTable Compression Ratio: 0.15461126176169512
> Number of keys (estimate): 101025
> Memtable cell count: 229344
> Memtable data size: 82.4 MB
> Memtable off heap memory used: 0 bytes
> Memtable switch count: 83
> Local read count: 208225
> Local read latency: 2.479 ms
> Local write count: 502581
> Local write latency: 0.208 ms
> Pending flushes: 0
> Bloom filter false positives: 11497
> Bloom filter false ratio: 0.04307
> Bloom filter space used: 94.97 KB
> Bloom filter off heap memory used: 92.93 KB
> Index summary off heap memory used: 57.88 KB
> Compression metadata off heap memory used: 36.06 MB
> Compacted partition minimum bytes: 447 bytes
> Compacted partition maximum bytes: 34.48 MB
> Compacted partition mean bytes: 1.51 MB
> Average live cells per slice (last five minutes): 26.269698643294515
> Maximum live cells per slice (last five minutes): 100.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0
>
> - There are no warnings or errors in the log, even after a clean restart
>
> - Restarting the node doesn't seem to have any effect on the number of
> pending compactions
>
> Any help would be very appreciated.
>
> Thank you for reading
>
>

Reply via email to