Re: Repair scheduling tools

2018-04-16 Thread Blake Eggleston
This thread is mainly focused on how repairs are scheduled, not implementation details of how the repairs themselves work. On 4/16/18, 11:07 AM, "Carl Mueller" wrote: So reading ( https://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1)...

Re: Repair scheduling tools

2018-04-16 Thread Carl Mueller
So reading ( https://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1)... anticompaction problems from repair seem related to the fact that the sstables for a repair range can have data that isn't in the repaired data range, so we then have an sstable with the repaired data (I'm ...

Re: Repair scheduling tools

2018-04-16 Thread Carl Mueller
Is the fundamental nature of sstable fragmentation the big wrench here? I've been trying to imagine aids like an offline repair resolver or a gradual node replacement/regenerator process that could serve as a backstop/insurance for compaction and repair problems. After all, some of the "we don't

Re: Repair scheduling tools

2018-04-12 Thread Rahul Singh
Schedule scheme looks good. I believe in process / sidecar can both coexist. As an admin would love to be able to run one or the other or none. Thank you for taking a lead and producing a plan that can actually be executed. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 12,

Re: Repair scheduling tools

2018-04-12 Thread Joseph Lynch
Given the feedback here and on the ticket, I've written up a proposal for a repair sidecar tool in the ticket's design document. If there are no major concerns we're going to start working

Re: Repair scheduling tools

2018-04-12 Thread Joseph Lynch
> > I personally would rather see improvements to reaper and supporting reaper > so the repair tool improvements aren't tied to Cassandra releases. If we > get to a place where the repair tools are stable then figuring out how to > bundle for the best install makes sense to me. > I view the

Re: Repair scheduling tools

2018-04-10 Thread Elliott Sims
My two cents as a (relatively small) user. I'm coming at this from the ops/user side, so my apologies if some of these don't make sense based on a more detailed understanding of the codebase: Repair is definitely a major missing piece of Cassandra. Integrated would be easier, but a sidecar

Re: Repair scheduling tools

2018-04-08 Thread Jeff Beck
I personally would rather see improvements to reaper and supporting reaper so the repair tool improvements aren't tied to Cassandra releases. If we get to a place where the repair tools are stable then figuring out how to bundle for the best install makes sense to me. If we add things that will

Re: Repair scheduling tools

2018-04-05 Thread kurt greaves
Vnodes is related and because we made it a default lots of people are using it. Repairing a cluster with vnodes is a catastrophe (even a small one is often problematic), but we have to deal with it if we build in repair scheduling. Repair scheduling is very important and we should definitely

Re: Repair scheduling tools

2018-04-05 Thread Nate McCall
I think a take away here is that we can't assume a level of operation maturity will coincide automatically with scale. To make our core features robust, we have to account for less-experienced users. A lot of folks on this thread have *really* strong ops and OpsViz stories. Let's not forget that

Re: Repair scheduling tools

2018-04-05 Thread Jonathan Haddad
Off the top of my head I can remember clusters with 600 or 700 nodes with 256 tokens. Not the best situation, but it’s real. 256 has been the default for better or worse. On Thu, Apr 5, 2018 at 7:41 PM Joseph Lynch wrote: > > > > We see this in larger clusters regularly.

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
> > We see this in larger clusters regularly. Usually folks have just > 'grown into it' because it was the default. > I could understand a few dozen nodes with 256 vnodes, but hundreds is surprising. I have a whitepaper draft lying around showing how vnodes decrease availability in large clusters

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
Sorry sent early. To explain further, the scheduler is entirely decentralized in the proposed design, and no node holds all the information you're talking about in heap at once (in fact no one node would ever hold that information). Each node is responsible only for tokens that they are "primary"

Re: Repair scheduling tools

2018-04-05 Thread Nate McCall
> > Somewhat beside the point, I wasn't aware there were any 100 node + > clusters running with vnodes, if my math is correct they would be > excessively vulnerable to outages with that many vnodes and that many > nodes. Most of the large clusters I've heard of (100 nodes plus) are > running with

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
> > I wouldn't trivialize it, scheduling can end up dealing with more than a > single repair. If theres 1000 keyspace/tables, with 400 nodes and 256 > vnodes on each thats a lot of repairs to plan out and keep track of and can > easily cause heap allocation spikes if opted in. > > Chris The

Re: Repair scheduling tools

2018-04-05 Thread Chris Lohfink
> I do have a hard time buying that an opt-in repair *scheduling* is going to > cause heap problems or impact the daemon significantly; the scheduler > literally reads a few bytes out of a Cassandra table and makes a function > call or two, and then sleeps for 2 minutes. I wouldn't trivialize

Re: Repair scheduling tools

2018-04-05 Thread Rahul Singh
t; > > > > > > users > > > can > > > > > > opt-in. > > > > > > > > > Regarding the risk, yes there will be problems at the > > > > > > > > > beginning > > > but > > > > > i

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
gt; Regarding the risk, yes there will be problems at the beginning > > but > > > > in > > > > > the > > > > > >>> long run, users will appreciate that repair works out of the > box, > > > > just > > > > > like > > > > > &g

Re: Repair scheduling tools

2018-04-05 Thread benjamin roth
there a side car process? > > > > >>> > > > > >>> > > > > >>> On Tue, Apr 3, 2018 at 9:21 PM, sankalp kohli < > > > kohlisank...@gmail.com > > > > >>> wrote: > > > > >>> > > > > &

Re: Repair scheduling tools

2018-04-05 Thread Jonathan Haddad
o > > > >>>> run C*. > > > >>>> > > > >>>> Can we have a side car process which we can add to Apache > Cassandra > > > >>>> offering and we can put this repair their? I am also fine putting > it > > > in > > >

Re: Repair scheduling tools

2018-04-05 Thread benjamin roth
e with running repairs > successfully > > >>> in > > >>>>> production, and seeing the success of distributed scheduled repair > > here > > >>>> at > > >>>>> Netflix, I strongly believe that adding

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
> successfully > > >>> in > > >>>>> production, and seeing the success of distributed scheduled repair > > here > > >>>> at > > >>>>> Netflix, I strongly believe that adding this to Cassandra would be > a > >

Re: Repair scheduling tools

2018-04-04 Thread Ben Bromhead
ssandra would be a > >>>> great > >>>>> addition to the database. I am hoping, we as a community will make it > >>>> easy > >>>>> for teams to operate and run Cassandra by enhancing the core product, > >>> and > >>

Re: Repair scheduling tools

2018-04-04 Thread Jon Haddad
the >>> database >>>>> without external tooling. We can have an experimental flag for the >>>> feature >>>>> and only teams who are confident with the service can enable them, >>> while >>>>> others can fall back to default repa

Re: Repair scheduling tools

2018-04-04 Thread Rahul Singh
a Tangirala* > > > > > > > > Engineering Manager CDE > > > > > > > > *(408) 438-3156 - mobile* > > > > > > > > > > > > > > > > > > > > > > > > On Tue,

Re: Repair scheduling tools

2018-04-04 Thread Dor Laor
(default: false) > > > > > > > > Then users can use the built in auto repair function that would be > > > created > > > > or continue to handle it as now. Default behavior would be "false" > so > > > > nothing changes on its own

Re: Repair scheduling tools

2018-04-04 Thread Dinesh Joshi
> > > nothing changes on its own.  Just wondering why not have that option? > It > > > might accelerate progress as others have already suggested. > > > > > > Kenneth Brotman > > > > > > -Original Message- > > >

Re: Repair scheduling tools

2018-04-03 Thread Qingcun Zhou
ult behavior would be "false" so > > > nothing changes on its own. Just wondering why not have that option? > It > > > might accelerate progress as others have already suggested. > > > > > > Kenneth Brotman > > > > > > --

Re: Repair scheduling tools

2018-04-03 Thread sankalp kohli
ers have already suggested. > > > > Kenneth Brotman > > > > -----Original Message- > > From: Nate McCall [mailto:zznat...@gmail.com] > > Sent: Tuesday, April 03, 2018 1:37 PM > > To: dev > > Subject: Re: Repair scheduling tools > > > > Th

Re: Repair scheduling tools

2018-04-03 Thread Roopa Tangirala
ress as others have already suggested. > > Kenneth Brotman > > -Original Message- > From: Nate McCall [mailto:zznat...@gmail.com] > Sent: Tuesday, April 03, 2018 1:37 PM > To: dev > Subject: Re: Repair scheduling tools > > This document does a really good job of

RE: Repair scheduling tools

2018-04-03 Thread Kenneth Brotman
dering why not have that option? It might accelerate progress as others have already suggested. Kenneth Brotman -Original Message- From: Nate McCall [mailto:zznat...@gmail.com] Sent: Tuesday, April 03, 2018 1:37 PM To: dev Subject: Re: Repair scheduling tools This document does a reall

Re: Repair scheduling tools

2018-04-03 Thread Rahul Singh
Agree on including in the distribution but I think repair can live independently and be run / configured separately. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 3, 2018, 4:37 PM -0400, Nate McCall , wrote: > This document does a really good job of listing

Re: Repair scheduling tools

2018-04-03 Thread Nate McCall
This document does a really good job of listing out some of the issues of coordinating scheduling repair. Regardless of which camp you fall into, it is certainly worth a read. On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch wrote: > I just want to say I think it would be

Re: Repair scheduling tools

2018-04-03 Thread Joseph Lynch
I just want to say I think it would be great for our users if we moved repair scheduling into Cassandra itself. The team here at Netflix has opened the ticket and have written a detailed design document

Re: Repair scheduling tools

2018-04-03 Thread Carl Mueller
LastPickle's reaper should be the starting point of any discussion on repair scheduling. On Tue, Apr 3, 2018 at 12:48 PM, Blake Eggleston wrote: > Hi dev@, > > > > The question of the best way to schedule repairs came up on > CASSANDRA-14346, and I thought it would be good