Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-15 Thread Anuj Wadehra
Hi Leena, Do you have a firewall between the two DCs? If yes, connection reset can be caused by Cassandra trying to use a TCP connection which is already closed by the firewall. Please make sure that you set high connection timeout at firewall. Also, make sure your servers are not overloaded.

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-14 Thread Leena Ghatpande
Thank you for the update. The repair fails with the Error 'Failed Creating merkle tree' but does not give any additional details. With -pr running on all DC nodes, we see a peer connection reset error, which then results in hanged repair process even though the TCP connection settings looks

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-13 Thread kurt Greaves
Don't do pr repairs when using incremental repair, you'll just end up with loads of anti-compactions. On 12 October 2016 at 19:11, Harikrishnan Pillai wrote: > In my experience dc local repair node by node with > Pr and par options is best .full repair increased

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Harikrishnan Pillai
In my experience dc local repair node by node with Pr and par options is best .full repair increased sstables A lot and take days to compact it back or another Easy option for repair is use a spark job ,read all data with Consistency all and increase read repair chance to 100 % or use Netflix

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Anuj Wadehra
Hi Leena, First thing you should be concerned about is : Why the repair -pr operation doesnt complete ? Second comes the question : Which repair option is best? One probable cause of stuck repairs is : if the firewall between DCs is closing TCP connections and Cassandra is trying to use such

RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Anubhav Kale
Agree. However, if we go from a world where repairs don’t run (or run very unreliably so C* can’t mark the SSTables as repaired anyways) to a world where repairs run more reliably (Spark / Tickler approach) – the impact on tombstone removal doesn’t become any worse (because SS Tables aren’t

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Jeff Jirsa
Note that the tickle approach doesn’t mark sstables as repaired (it’s a simpler version of mutation based repair in a sense), so Cassandra has no way to prove that the data has been repaired. With tickets like https://issues.apache.org/jira/browse/CASSANDRA-6434, this has implications on

RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Anubhav Kale
The default repair process doesn't usually work at scale, unfortunately. Depending on your data size, you have the following options. Netflix Tickler: https://github.com/ckalantzis/cassTickler (Read at CL.ALL via CQL continuously :: Python) Spotify Reaper: