Hi Leena,
Do you have a firewall between the two DCs? If yes, connection
reset can be caused by Cassandra trying to use a TCP connection which is
already closed by the firewall. Please make sure that you set high connection
timeout at firewall. Also, make sure your servers are not overloaded.
Thank you for the update.
The repair fails with the Error 'Failed Creating merkle tree' but does not give
any additional details.
With -pr running on all DC nodes, we see a peer connection reset error, which
then results in hanged repair process even though the TCP connection settings
looks
Don't do pr repairs when using incremental repair, you'll just end up with
loads of anti-compactions.
On 12 October 2016 at 19:11, Harikrishnan Pillai
wrote:
> In my experience dc local repair node by node with
> Pr and par options is best .full repair increased
In my experience dc local repair node by node with
Pr and par options is best .full repair increased sstables
A lot and take days to compact it back or another
Easy option for repair is use a spark job ,read all data with
Consistency all and increase read repair chance to
100 % or use Netflix
Hi Leena,
First thing you should be concerned about is : Why the repair -pr operation
doesnt complete ?
Second comes the question : Which repair option is best?
One probable cause of stuck repairs is : if the firewall between DCs is closing
TCP connections and Cassandra is trying to use such
Agree.
However, if we go from a world where repairs don’t run (or run very unreliably
so C* can’t mark the SSTables as repaired anyways) to a world where repairs run
more reliably (Spark / Tickler approach) – the impact on tombstone removal
doesn’t become any worse (because SS Tables aren’t
Note that the tickle approach doesn’t mark sstables as repaired (it’s a simpler
version of mutation based repair in a sense), so Cassandra has no way to prove
that the data has been repaired.
With tickets like https://issues.apache.org/jira/browse/CASSANDRA-6434, this
has implications on
The default repair process doesn't usually work at scale, unfortunately.
Depending on your data size, you have the following options.
Netflix Tickler: https://github.com/ckalantzis/cassTickler (Read at CL.ALL via
CQL continuously :: Python)
Spotify Reaper: