Hi,
we're on cassandra 3.11.2, and we're having some issues with repairs.
They take ages to complete, and some time ago the incremental repair
stopped working - that is, SSTables are not being marked as repaired,
even though the repair reports success.

Running a full or incremental repair does not make any difference.

Here's a log of a typical repair (omitted a lot of 'Maximum memory
usage' messages):

INFO  [Repair-Task-12] 2018-06-04 06:29:50,396 RepairRunnable.java:139
- Starting repair command #11 (af1aefc0-67c0-11e8-b07c-c365701888e8),
repairing keyspace prod with repair options (parallelism: parallel,
primary range: false, incremental: true, job threads: 1,
ColumnFamilies: [event], dataCenters: [DC1], hosts: [], # of ranges:
1280, pull repair: false)
INFO  [Repair-Task-12] 2018-06-04 06:29:51,497 RepairSession.java:228
- [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] new session: will
sync /14.0.53.234, /14.0.52.115 on range [...] for asm_log.[event]
INFO  [Repair#11:1] 2018-06-04 06:29:51,776 RepairJob.java:169 -
[repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Requesting merkle trees
for event (to [/14.0.52.115, /14.0.53.234])
INFO  [ValidationExecutor:10] 2018-06-04 06:31:13,859
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB),
cannot allocate chunk of 1.000MiB
WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-06-04 06:32:01,385
NoSpamLogger.java:94 - Out of 14 commit log syncs over the past
134.02s with average duration of 34.90ms, 2 have exceeded the
configured commit interval by an average of 60.66ms
...
INFO  [ValidationExecutor:10] 2018-06-04 13:31:19,011
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB),
cannot allocate chunk of 1.000MiB
INFO  [AntiEntropyStage:1] 2018-06-04 13:37:17,357
RepairSession.java:180 - [repair
#afc2ef90-67c0-11e8-b07c-c365701888e8] Received merkle tree for event
from /14.0.52.115
INFO  [ValidationExecutor:10] 2018-06-04 13:46:19,281
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB),
cannot allocate chunk of 1.000MiB
INFO  [IndexSummaryManager:1] 2018-06-04 13:57:18,772
IndexSummaryRedistribution.java:76 - Redistributing index summaries
INFO  [AntiEntropyStage:1] 2018-06-04 13:58:21,971
RepairSession.java:180 - [repair
#afc2ef90-67c0-11e8-b07c-c365701888e8] Received merkle tree for event
from /14.0.53.234
INFO  [RepairJobTask:4] 2018-06-04 13:58:39,780 SyncTask.java:73 -
[repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Endpoints /14.0.52.115
and /14.0.53.234 have 15406 range(s) out of sync for event
INFO  [RepairJobTask:4] 2018-06-04 13:58:39,781 LocalSyncTask.java:71
- [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Performing streaming
repair of 15406 ranges with /14.0.52.115
INFO  [RepairJobTask:4] 2018-06-04 13:59:49,075
StreamResultFuture.java:90 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8] Executing streaming plan for
Repair
INFO  [StreamConnectionEstablisher:3] 2018-06-04 13:59:49,076
StreamSession.java:266 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8] Starting streaming to
/14.0.52.115
INFO  [StreamConnectionEstablisher:3] 2018-06-04 13:59:49,089
StreamCoordinator.java:264 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8, ID#0] Beginning stream session
with /14.0.52.115
INFO  [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:01:14,423
StreamResultFuture.java:173 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8 ID#0] Prepare completed.
Receiving 321 files(6.238GiB), sending 318 files(6.209GiB)
WARN  [Service Thread] 2018-06-04 14:12:15,578 GCInspector.java:282 -
ConcurrentMarkSweep GC in 4095ms.  CMS Old Gen: 4086661264 ->
1107272664; Par Eden Space: 503316480 -> 0; Par Survivor Space:
21541464 -> 0
...
WARN  [GossipTasks:1] 2018-06-04 14:12:15,677 FailureDetector.java:288
- Not marking nodes down due to local pause of 5123793157 > 5000000000
INFO  [ScheduledTasks:1] 2018-06-04 14:12:20,611 NoSpamLogger.java:91
- Some operations were slow, details available at debug level
(debug.log)
INFO  [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:14:29,188
StreamResultFuture.java:187 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8] Session with /14.0.52.115 is
complete
INFO  [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:14:29,190
StreamResultFuture.java:219 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8] All sessions completed
INFO  [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:14:29,190
LocalSyncTask.java:121 - [repair
#afc2ef90-67c0-11e8-b07c-c365701888e8] Sync complete using session
afc2ef90-67c0-11e8-b07c-c365701888e8 between /14.0.52.115 and
/14.0.53.234 on event
INFO  [RepairJobTask:5] 2018-06-04 14:14:29,191 RepairJob.java:143 -
[repair #afc2ef90-67c0-11e8-b07c-c365701888e8] event is fully synced
INFO  [RepairJobTask:5] 2018-06-04 14:14:29,193 RepairSession.java:270
- [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Session completed
successfully



Tablestats:
               SSTable count: 714
               Space used (live): 489416489322
               Space used (total): 489416489322
               Space used by snapshots (total): 0
               Off heap memory used (total): 311166982
               SSTable Compression Ratio: 0.28468835622230676
               Number of partitions (estimate): 22511321
               Memtable cell count: 6635
               Memtable data size: 3236775
               Memtable off heap memory used: 0
               Memtable switch count: 1649
               ...
               Percent repaired: 85.62
               Bloom filter false positives: 36849
               Bloom filter false ratio: 0.00024
               Bloom filter space used: 87092784
               Bloom filter off heap memory used: 87061568
               Index summary off heap memory used: 15474806
               Compression metadata off heap memory used: 208630608
               Compacted partition minimum bytes: 61
               Compacted partition maximum bytes: 107964792
               Compacted partition mean bytes: 35262
               Average live cells per slice (last five minutes):
301.46281597264624
               Maximum live cells per slice (last five minutes): 3311
               Average tombstones per slice (last five minutes): 1.0
               Maximum tombstones per slice (last five minutes): 1
               Dropped Mutations: 0

Histograms:

Percentile  SSTables     Write Latency      Read Latency    Partition
Size        Cell Count
                             (micros)          (micros)
(bytes)
50%             2.00             24.60           3379.39
3311               310
75%             2.00             29.52           8409.01
51012              5722
95%             2.00             35.43          30130.99
105778              9887
98%             2.00             42.51          43388.63
263210             35425
99%             2.00             42.51          62479.63
263210             35425
Min             0.00              9.89            182.79
 61                 0
Max             3.00            263.21         155469.30
107964792          10090808



Thanks for any help!

Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to