It did look like there where repairs running at the time. The LiveSSTableCount for the entire node is about 2200 tables, for the keyspace that was being repaired its just 150
We run cassandra 3.11.6 so we should be unaffected by cassandra-14096 We use http://cassandra-reaper.io/ for the repairs On Sat, 1 Aug 2020 at 01:49, Erick Ramirez <erick.rami...@datastax.com> wrote: > I don't have specific experience relating to InstanceTidier but when I > saw this, I immediately thought of repairs blowing up the heap. 40K > instances indicates to me that you have thousands of SSTables -- are they > tiny (like 1MB or less)? Otherwise, are they dense nodes (~1TB or more)? > > How do you run repairs? I'm wondering if it's possible that there are > multiple repairs running in parallel like a cron job kicking in while the > previous repair is still running. > > You didn't specify your C* version but my guess is that it's pre-3.11.5. > FWIW the repair issue I'm referring to is CASSANDRA-14096 [1]. > > [1] https://issues.apache.org/jira/browse/CASSANDRA-14096 >