The "official" recommendation would be 100MB, but it's hard to give a precise answer. Keeping it under the GB seems like a good target. A few patches are pushing the limits of partition sizes so we may soon be more comfortable with big partitions.
Cheers Le jeu. 27 oct. 2016 21:28, Vincent Rischmann <m...@vrischmann.me> a écrit : > Yeah that particular table is badly designed, I intend to fix it, when the > roadmap allows us to do it :) > What is the recommended maximum partition size ? > > Thanks for all the information. > > > On Thu, Oct 27, 2016, at 08:14 PM, Alexander Dejanovski wrote: > > 3.3GB is already too high, and it's surely not good to have well > performing compactions. Still I know changing a data model is no easy thing > to do, but you should try to do something here. > > Anticompaction is a special type of compaction and if an sstable is being > anticompacted, then any attempt to run validation compaction on it will > fail, telling you that you cannot have an sstable being part of 2 repair > sessions at the same time, so incremental repair must be run one node at a > time, waiting for anticompactions to end before moving from one node to the > other. > > Be mindful of running incremental repair on a regular basis once you > started as you'll have two separate pools of sstables (repaired and > unrepaired) that won't get compacted together, which could be a problem if > you want tombstones to be purged efficiently. > > Cheers, > > Le jeu. 27 oct. 2016 17:57, Vincent Rischmann <m...@vrischmann.me> a écrit : > > > Ok, I think we'll give incremental repairs a try on a limited number of > CFs first and then if it goes well we'll progressively switch more CFs to > incremental. > > I'm not sure I understand the problem with anticompaction and validation > running concurrently. As far as I can tell, right now when a CF is repaired > (either via reaper, or via nodetool) there may be compactions running at > the same time. In fact, it happens very often. Is it a problem ? > > As far as big partitions, the biggest one we have is around 3.3Gb. Some > less big partitions are around 500Mb and less. > > > On Thu, Oct 27, 2016, at 05:37 PM, Alexander Dejanovski wrote: > > Oh right, that's what they advise :) > I'd say that you should skip the full repair phase in the migration > procedure as that will obviously fail, and just mark all sstables as > repaired (skip 1, 2 and 6). > Anyway you can't do better, so take a leap of faith there. > > Intensity is already very low and 10000 segments is a whole lot for 9 > nodes, you should not need that many. > > You can definitely pick which CF you'll run incremental repair on, and > still run full repair on the rest. > If you pick our Reaper fork, watch out for schema changes that add > incremental repair fields, and I do not advise to run incremental repair > without it, otherwise you might have issues with anticompaction and > validation compactions running concurrently from time to time. > > One last thing : can you check if you have particularly big partitions in > the CFs that fail to get repaired ? You can run nodetool cfhistograms to > check that. > > Cheers, > > > > On Thu, Oct 27, 2016 at 5:24 PM Vincent Rischmann <m...@vrischmann.me> > wrote: > > > Thanks for the response. > > We do break up repairs between tables, we also tried our best to have no > overlap between repair runs. Each repair has 10000 segments (purely > arbitrary number, seemed to help at the time). Some runs have an intensity > of 0.4, some have as low as 0.05. > > Still, sometimes one particular app (which does a lot of read/modify/write > batches in quorum) gets slowed down to the point we have to stop the repair > run. > > But more annoyingly, since 2 to 3 weeks as I said, it looks like runs > don't progress after some time. Every time I restart reaper, it starts to > repair correctly again, up until it gets stuck. I have no idea why that > happens now, but it means I have to baby sit reaper, and it's becoming > annoying. > > Thanks for the suggestion about incremental repairs. It would probably be > a good thing but it's a little challenging to setup I think. Right now > running a full repair of all keyspaces (via nodetool repair) is going to > take a lot of time, probably like 5 days or more. We were never able to run > one to completion. I'm not sure it's a good idea to disable autocompaction > for that long. > > But maybe I'm wrong. Is it possible to use incremental repairs on some > column family only ? > > > On Thu, Oct 27, 2016, at 05:02 PM, Alexander Dejanovski wrote: > > Hi Vincent, > > most people handle repair with : > - pain (by hand running nodetool commands) > - cassandra range repair : > https://github.com/BrianGallew/cassandra_range_repair > - Spotify Reaper > - and OpsCenter repair service for DSE users > > Reaper is a good option I think and you should stick to it. If it cannot > do the job here then no other tool will. > > You have several options from here : > > - Try to break up your repair table by table and see which ones > actually get stuck > - Check your logs for any repair/streaming error > - Avoid repairing everything : > - you may have expendable tables > - you may have TTLed only tables with no deletes, accessed with > QUORUM CL only > - You can try to relieve repair pressure in Reaper by lowering > repair intensity (on the tables that get stuck) > - You can try adding steps to your repair process by putting a higher > segment count in reaper (on the tables that get stuck) > - And lastly, you can turn to incremental repair. As you're familiar > with Reaper already, you might want to take a look at our Reaper fork that > handles incremental repair : > https://github.com/thelastpickle/cassandra-reaper > If you go down that way, make sure you first mark all sstables as > repaired before you run your first incremental repair, otherwise you'll end > up in anticompaction hell (bad bad place) : > > https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html > Even if people say that's not necessary anymore, it'll save you from a > very bad first experience with incremental repair. > Furthermore, make sure you run repair daily after your first inc > repair run, in order to work on small sized repairs. > > > Cheers, > > > On Thu, Oct 27, 2016 at 4:27 PM Vincent Rischmann <m...@vrischmann.me> > wrote: > > > Hi, > > we have two Cassandra 2.1.15 clusters at work and are having some trouble > with repairs. > > Each cluster has 9 nodes, and the amount of data is not gigantic but some > column families have 300+Gb of data. > We tried to use `nodetool repair` for these tables but at the time we > tested it, it made the whole cluster load too much and it impacted our > production apps. > > Next we saw https://github.com/spotify/cassandra-reaper , tried it and > had some success until recently. Since 2 to 3 weeks it never completes a > repair run, deadlocking itself somehow. > > I know DSE includes a repair service but I'm wondering how do other > Cassandra users manage repairs ? > > Vincent. > > -- > ----------------- > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > > -- > ----------------- > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > > -- > ----------------- > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > > -- ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com