The "official" recommendation would be 100MB, but it's hard to give a
precise answer.
Keeping it under the GB seems like a good target.
A few patches are pushing the limits of partition sizes so we may soon be
more comfortable with big partitions.


Le jeu. 27 oct. 2016 21:28, Vincent Rischmann <> a écrit :

> Yeah that particular table is badly designed, I intend to fix it, when the
> roadmap allows us to do it :)
> What is the recommended maximum partition size ?
> Thanks for all the information.
> On Thu, Oct 27, 2016, at 08:14 PM, Alexander Dejanovski wrote:
> 3.3GB is already too high, and it's surely not good to have well
> performing compactions. Still I know changing a data model is no easy thing
> to do, but you should try to do something here.
> Anticompaction is a special type of compaction and if an sstable is being
> anticompacted, then any attempt to run validation compaction on it will
> fail, telling you that you cannot have an sstable being part of 2 repair
> sessions at the same time, so incremental repair must be run one node at a
> time, waiting for anticompactions to end before moving from one node to the
> other.
> Be mindful of running incremental repair on a regular basis once you
> started as you'll have two separate pools of sstables (repaired and
> unrepaired) that won't get compacted together, which could be a problem if
> you want tombstones to be purged efficiently.
> Cheers,
> Le jeu. 27 oct. 2016 17:57, Vincent Rischmann <> a écrit :
> Ok, I think we'll give incremental repairs a try on a limited number of
> CFs first and then if it goes well we'll progressively switch more CFs to
> incremental.
> I'm not sure I understand the problem with anticompaction and validation
> running concurrently. As far as I can tell, right now when a CF is repaired
> (either via reaper, or via nodetool) there may be compactions running at
> the same time. In fact, it happens very often. Is it a problem ?
> As far as big partitions, the biggest one we have is around 3.3Gb. Some
> less big partitions are around 500Mb and less.
> On Thu, Oct 27, 2016, at 05:37 PM, Alexander Dejanovski wrote:
> Oh right, that's what they advise :)
> I'd say that you should skip the full repair phase in the migration
> procedure as that will obviously fail, and just mark all sstables as
> repaired (skip 1, 2 and 6).
> Anyway you can't do better, so take a leap of faith there.
> Intensity is already very low and 10000 segments is a whole lot for 9
> nodes, you should not need that many.
> You can definitely pick which CF you'll run incremental repair on, and
> still run full repair on the rest.
> If you pick our Reaper fork, watch out for schema changes that add
> incremental repair fields, and I do not advise to run incremental repair
> without it, otherwise you might have issues with anticompaction and
> validation compactions running concurrently from time to time.
> One last thing : can you check if you have particularly big partitions in
> the CFs that fail to get repaired ? You can run nodetool cfhistograms to
> check that.
> Cheers,
> On Thu, Oct 27, 2016 at 5:24 PM Vincent Rischmann <>
> wrote:
> Thanks for the response.
> We do break up repairs between tables, we also tried our best to have no
> overlap between repair runs. Each repair has 10000 segments (purely
> arbitrary number, seemed to help at the time). Some runs have an intensity
> of 0.4, some have as low as 0.05.
> Still, sometimes one particular app (which does a lot of read/modify/write
> batches in quorum) gets slowed down to the point we have to stop the repair
> run.
> But more annoyingly, since 2 to 3 weeks as I said, it looks like runs
> don't progress after some time. Every time I restart reaper, it starts to
> repair correctly again, up until it gets stuck. I have no idea why that
> happens now, but it means I have to baby sit reaper, and it's becoming
> annoying.
> Thanks for the suggestion about incremental repairs. It would probably be
> a good thing but it's a little challenging to setup I think. Right now
> running a full repair of all keyspaces (via nodetool repair) is going to
> take a lot of time, probably like 5 days or more. We were never able to run
> one to completion. I'm not sure it's a good idea to disable autocompaction
> for that long.
> But maybe I'm wrong. Is it possible to use incremental repairs on some
> column family only ?
> On Thu, Oct 27, 2016, at 05:02 PM, Alexander Dejanovski wrote:
> Hi Vincent,
> most people handle repair with :
> - pain (by hand running nodetool commands)
> - cassandra range repair :
> - Spotify Reaper
> - and OpsCenter repair service for DSE users
> Reaper is a good option I think and you should stick to it. If it cannot
> do the job here then no other tool will.
> You have several options from here :
>    - Try to break up your repair table by table and see which ones
>    actually get stuck
>    - Check your logs for any repair/streaming error
>    - Avoid repairing everything :
>    - you may have expendable tables
>       - you may have TTLed only tables with no deletes, accessed with
>       QUORUM CL only
>       - You can try to relieve repair pressure in Reaper by lowering
>    repair intensity (on the tables that get stuck)
>    - You can try adding steps to your repair process by putting a higher
>    segment count in reaper (on the tables that get stuck)
>    - And lastly, you can turn to incremental repair. As you're familiar
>    with Reaper already, you might want to take a look at our Reaper fork that
>    handles incremental repair :
>    If you go down that way, make sure you first mark all sstables as
>    repaired before you run your first incremental repair, otherwise you'll end
>    up in anticompaction hell (bad bad place) :
>    Even if people say that's not necessary anymore, it'll save you from a
>    very bad first experience with incremental repair.
>    Furthermore, make sure you run repair daily after your first inc
>    repair run, in order to work on small sized repairs.
> Cheers,
> On Thu, Oct 27, 2016 at 4:27 PM Vincent Rischmann <>
> wrote:
> Hi,
> we have two Cassandra 2.1.15 clusters at work and are having some trouble
> with repairs.
> Each cluster has 9 nodes, and the amount of data is not gigantic but some
> column families have 300+Gb of data.
> We tried to use `nodetool repair` for these tables but at the time we
> tested it, it made the whole cluster load too much and it impacted our
> production apps.
> Next we saw , tried it and
> had some success until recently. Since 2 to 3 weeks it never completes a
> repair run, deadlocking itself somehow.
> I know DSE includes a repair service but I'm wondering how do other
> Cassandra users manage repairs ?
> Vincent.
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
> Consultant
> Apache Cassandra Consulting
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
> Consultant
> Apache Cassandra Consulting
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
> Consultant
> Apache Cassandra Consulting
> --
Alexander Dejanovski

Apache Cassandra Consulting

Reply via email to