I already experienced the same problem (hundreds of thousands of SSTables) with Cassandra 2.1.2. It seems to appear when running an incremental repair while there is a medium to high insert load on the cluster. The repair goes in a bad state and starts creating way more SSTables than it should (even when there should be nothing to repair).
On 10 February 2015 at 15:46, Eric Stevens <migh...@gmail.com> wrote: > This kind of recovery is definitely not my strong point, so feedback on > this approach would certainly be welcome. > > As I understand it, if you really want to keep that data, you ought to be > able to mv it out of the way to get your node online, then move those files > in a several thousand at a time, nodetool refresh OpsCenter rollups60 && > nodetool compact OpsCenter rollups60; rinse and repeat. This should let > you incrementally restore the data in that keyspace without putting so many > sstables in there that it ooms your cluster again. > > On Tue, Feb 10, 2015 at 3:38 PM, Chris Lohfink <clohfin...@gmail.com> > wrote: > >> yeah... probably just 2.1.2 things and not compactions. Still probably >> want to do something about the 1.6 million files though. It may be worth >> just mv/rm'ing to 60 sec rollup data though unless really attached to it. >> >> Chris >> >> On Tue, Feb 10, 2015 at 4:04 PM, Paul Nickerson <pgn...@gmail.com> wrote: >> >>> I was having trouble with snapshots failing while trying to repair that >>> table ( >>> http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html). I >>> have a repair running on it now, and it seems to be going successfully this >>> time. I am going to wait for that to finish, then try a manual nodetool >>> compact. If that goes successfully, then would it be safe to chalk the lack >>> of compaction on this table in the past up to 2.1.2 problems? >>> >>> >>> ~ Paul Nickerson >>> >>> On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink <clohfin...@gmail.com> >>> wrote: >>> >>>> Your cluster is probably having issues with compactions (with STCS you >>>> should never have this many). I would probably punt with >>>> OpsCenter/rollups60. Turn the node off and move all of the sstables off to >>>> a different directory for backup (or just rm if you really don't care about >>>> 1 minute metrics), than turn the server back on. >>>> >>>> Once you get your cluster running again go back and investigate why >>>> compactions stopped, my guess is you hit an exception in past that killed >>>> your CompactionExecutor and things just built up slowly until you got to >>>> this point. >>>> >>>> Chris >>>> >>>> On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson <pgn...@gmail.com> >>>> wrote: >>>> >>>>> Thank you Rob. I tried a 12 GiB heap size, and still crashed out. >>>>> There are 1,617,289 files under OpsCenter/rollups60. >>>>> >>>>> Once I downgraded Cassandra to 2.1.1 (apt-get install >>>>> cassandra=2.1.1), I was able to start up Cassandra OK with the default >>>>> heap >>>>> size formula. >>>>> >>>>> Now my cluster is running multiple versions of Cassandra. I think I >>>>> will downgrade the rest to 2.1.1. >>>>> >>>>> ~ Paul Nickerson >>>>> >>>>> On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli <rc...@eventbrite.com> >>>>> wrote: >>>>> >>>>>> On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson <pgn...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I am getting an out of memory error why I try to start Cassandra on >>>>>>> one of my nodes. Cassandra will run for a minute, and then exit without >>>>>>> outputting any error in the log file. It is happening while >>>>>>> SSTableReader >>>>>>> is opening a couple hundred thousand things. >>>>>>> >>>>>> ... >>>>>> >>>>>>> Does anyone know how I might get Cassandra on this node running >>>>>>> again? I'm not very familiar with correctly tuning Java memory >>>>>>> parameters, >>>>>>> and I'm not sure if that's the right solution in this case anyway. >>>>>>> >>>>>> >>>>>> Try running 2.1.1, and/or increasing heap size beyond 8gb. >>>>>> >>>>>> Are there actually that many SSTables on disk? >>>>>> >>>>>> =Rob >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >