Re: very slow repair
On Thu, Jun 13, 2019 at 2:09 PM Léo FERLIN SUTTON wrote: > Last, but not least: are you using the default number of vnodes, 256? The >> overhead of large number of vnodes (times the number of nodes), can be >> quite significant. We've seen major improvements in repair runtime after >> switching from 256 to 16 vnodes on Cassandra version 3.0. > > > Is there a recommended procedure to switch the amount of vnodes ? > Yes. One should deploy a new virtual DC with desired configuration and rebuild from the original one, then decommission the old virtual DC. With the smaller number of vnodes you should use allocate_tokens_for_keyspace configuration parameter to ensure uniform load distribution. The caveat is that the nodes allocate tokens before they bootstrap, so the very first nodes will not have keyspace information available. This can be worked around, though it is not trivial. See this thread for our past experience: https://lists.apache.org/thread.html/396f2d20397c36b9cff88a0c2c5523154d420ece24a4dafc9fde3d1f@%3Cuser.cassandra.apache.org%3E -- Alex
Re: very slow repair
> > Last, but not least: are you using the default number of vnodes, 256? The > overhead of large number of vnodes (times the number of nodes), can be > quite significant. We've seen major improvements in repair runtime after > switching from 256 to 16 vnodes on Cassandra version 3.0. Is there a recommended procedure to switch the amount of vnodes ? Regards, Leo On Thu, Jun 13, 2019 at 12:06 PM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Thu, Jun 13, 2019 at 10:36 AM R. T. > wrote: > >> >> Well, actually by running cfstats I can see that the totaldiskspaceused >> is about ~ 1.2 TB per node in the DC1 and ~ 1 TB per node in DC2. DC2 was >> off for a while thats why there is a difference in space. >> >> I am using Cassandra 3.0.6 and >> my stream_throughput_outbound_megabits_per_sec is th4e default setting so >> according to my version is (200 Mbps or 25 MB/s) >> > > And the other setting: compaction_throughput_mb_per_sec? It is also > highly relevant for repair performance, as streamed in files need to be > compacted with the existing files on the nodes. In our experience change > in compaction throughput limit is almost linearly reflected by the repair > run time. > > The default 16 MB/s is too limiting for any production grade setup, I > believe. We go as high as 90 MB/s on AWS EBS gp2 data volumes. But don't > take it as a gospel, I'd suggest you start increasing the setting (e.g. by > doubling it) and observe how it affects repair performance (and client > latencies). > > Have you tried with "parallel" instead of "DC parallel" mode? The latter > one is really poorly named and it actually means something else, as neatly > highlighted in this SO answer: https://dba.stackexchange.com/a/175028 > > Last, but not least: are you using the default number of vnodes, 256? The > overhead of large number of vnodes (times the number of nodes), can be > quite significant. We've seen major improvements in repair runtime after > switching from 256 to 16 vnodes on Cassandra version 3.0. > > Cheers, > -- > Alex > >
Re: very slow repair
On Thu, Jun 13, 2019 at 10:36 AM R. T. wrote: > > Well, actually by running cfstats I can see that the totaldiskspaceused is > about ~ 1.2 TB per node in the DC1 and ~ 1 TB per node in DC2. DC2 was off > for a while thats why there is a difference in space. > > I am using Cassandra 3.0.6 and > my stream_throughput_outbound_megabits_per_sec is th4e default setting so > according to my version is (200 Mbps or 25 MB/s) > And the other setting: compaction_throughput_mb_per_sec? It is also highly relevant for repair performance, as streamed in files need to be compacted with the existing files on the nodes. In our experience change in compaction throughput limit is almost linearly reflected by the repair run time. The default 16 MB/s is too limiting for any production grade setup, I believe. We go as high as 90 MB/s on AWS EBS gp2 data volumes. But don't take it as a gospel, I'd suggest you start increasing the setting (e.g. by doubling it) and observe how it affects repair performance (and client latencies). Have you tried with "parallel" instead of "DC parallel" mode? The latter one is really poorly named and it actually means something else, as neatly highlighted in this SO answer: https://dba.stackexchange.com/a/175028 Last, but not least: are you using the default number of vnodes, 256? The overhead of large number of vnodes (times the number of nodes), can be quite significant. We've seen major improvements in repair runtime after switching from 256 to 16 vnodes on Cassandra version 3.0. Cheers, -- Alex
Re: very slow repair
Hi, Thank you for your reply, Well, actually by running cfstats I can see that the totaldiskspaceused is about ~ 1.2 TB per node in the DC1 and ~ 1 TB per node in DC2. DC2 was off for a while thats why there is a difference in space. I am using Cassandra 3.0.6 and my stream_throughput_outbound_megabits_per_sec is th4e default setting so according to my version is (200 Mbps or 25 MB/s) Cheers ‐‐‐ Original Message ‐‐‐ On Thursday, June 13, 2019 6:04 AM, Laxmikant Upadhyay wrote: > Few queries: > 1. What is the cassandra version ? > 2. is the size of table 4TB per node ? > 3. What is the value of compaction_throughput_mb_per_sec and > stream_throughput_outbound_megabits_per_sec ? > > On Thu, Jun 13, 2019 at 5:06 AM R. T. wrote: > >> Hi, >> >> I am trying to run a repair for first time a specific column family in >> specific keyspace and it seems that is going super slow. >> >> I have 6 nodes cluster with 2 Datacenters (RF 2) and the repair is a non >> incremental, DC parallel one. This column family is around 4 TB and it is >> written heavily (compared with other CF) so since it is going to take 2 >> months (according ETA in Reaper) does that mean that when this repair will >> finish the entropy will be again high in this CF ? >> >> How I can speed up the process ? Is there any way to diagnose bottlenecs? >> >> Thank you, >> >> W > > -- > > regards, > Laxmikant Upadhyay
Re: very slow repair
Few queries: 1. What is the cassandra version ? 2. is the size of table 4TB per node ? 3. What is the value of compaction_throughput_mb_per_sec and stream_throughput_outbound_megabits_per_sec ? On Thu, Jun 13, 2019 at 5:06 AM R. T. wrote: > Hi, > > I am trying to run a repair for first time a specific column family in > specific keyspace and it seems that is going super slow. > > I have 6 nodes cluster with 2 Datacenters (RF 2) and the repair is a non > incremental, DC parallel one. This column family is around 4 TB and it is > written heavily (compared with other CF) so since it is going to take 2 > months (according ETA in Reaper) does that mean that when this repair will > finish the entropy will be again high in this CF ? > > How I can speed up the process ? Is there any way to diagnose bottlenecs? > > Thank you, > > W > > -- regards, Laxmikant Upadhyay
very slow repair
Hi, I am trying to run a repair for first time a specific column family in specific keyspace and it seems that is going super slow. I have 6 nodes cluster with 2 Datacenters (RF 2) and the repair is a non incremental, DC parallel one. This column family is around 4 TB and it is written heavily (compared with other CF) so since it is going to take 2 months (according ETA in Reaper) does that mean that when this repair will finish the entropy will be again high in this CF ? How I can speed up the process ? Is there any way to diagnose bottlenecs? Thank you, W