Re: Side Car New Repo vs not

2018-08-20 Thread dinesh.jo...@yahoo.com.INVALID
An option is to create a mono repo with Cassandra and SideCar as modules that 
could be built independently. This would keep source for both artifacts in the 
same repo and have their own release cadences. That said, I don't have any 
strong opinions at this point. We can try going with a separate repo and 
reevaluate it if it doesn't work out.
Dinesh 

On Monday, August 20, 2018, 9:21:33 PM PDT, Blake Eggleston 
 wrote:  
 
 If the sidecar is going to be on a different release cadence, or support 
interacting with mixed mode clusters, then it should definitely be in a 
separate repo. I don’t even know how branching and merging would work in a repo 
that supports 2 separate release targets and/or mixed mode compatibility, but 
I’m pretty sure it would be a mess.

As a cluster management tool, mixed mode is probably going to be a goal at some 
point. As a new project, it will benefit from not being tied to the C* release 
cycle (which would probably delay any sidecar release until whenever 4.1 is 
cut).


On August 20, 2018 at 3:22:54 PM, Joseph Lynch (joe.e.ly...@gmail.com) wrote:

I think that the pros of incubating the sidecar in tree as a tool first  
outweigh the alternatives at this point of time. Rough tradeoffs that I see:  

Unique pros of in tree sidecar:  
* Faster iteration speed in general. For example when we need to add a new  
JMX endpoint that the sidecar needs, or change something from JMX to a  
virtual table (e.g. for repair, or monitoring) we can do all changes  
including tests as one commit within the main repository and don't have to  
commit to main repo, sidecar repo, and dtest repo (juggling version  
compatibility along the way).  
* We can in the future more easily move serious background functionality  
like compaction or repair itself (not repair scheduling, actual repairing)  
into the sidecar with a single atomic commit, we don't have to do two phase  
commits where we add some IPC mechanism to allow us to support it in both,  
then turn it on in the sidecar, then turn it off in the server, etc...  
* I think that the verification is much easier (sounds like Jonathan  
disagreed on the other thread, I could certainly be wrong), and we don't  
have to worry about testing matrices to assure that the sidecar works with  
various versions as the version of the sidecar that is released with that  
version of Cassandra is the only one we have to certify works. If people  
want to pull in new versions or maintain backports they can do that at  
their discretion/testing.  
* We can iterate and prove value before committing to a choice. Since it  
will be a separate artifact from the start we can always move the artifact  
to a separate repo later (but moving the other way is harder).  
* Users will get the sidecar "for free" when they install the daemon, they  
don't need to take affirmative action to e.g. be able to restart their  
cluster, run repair, or back their data up; it just comes out of the box  
for free.  

Unique pros of a separate repository sidecar:  
* We can use a more modern build system like gradle instead of ant  
* Merging changes is less "scary" I guess (I feel like if you're not  
touching the daemon this is already true but I could see this being less  
worrisome for some).  
* Releasing a separate artifact is somewhat easier from a separate repo  
(especially if we have gradle which makes e.g. building debs and rpms  
trivial).  
* We could backport to previous versions without getting into arguments  
about bug fixes vs features.  
* Committers could be different from the main repo, which ... may be a  
useful thing  

Non unique pros of a sidecar (could be achieved in the main repo or in a  
separate repo):  
* A separate build artifact .jar/.deb/.rpm that can be installed  
separately. It's slightly easier with a separate repo but certainly not out  
of reach within a single repo (indeed the current patch already creates a  
separate jar, and we could create a separate .deb reasonably easily).  
Personally I think having a separate .deb/.rpm is premature at this point  
(for companies that really want it they can build their own packages using  
the .jars), but I think it really is a distracting issue from where the  
patch should go as we can always choose to remove experimental .jar files  
that the main daemon doesn't touch.  
* A separate process lifecycle. No matter where the sidecar goes, we get  
the benefit of restarting it being less dangerous for availability than  
restarting the main daemon.  

That all being said, these are strong opinions weakly held and I would  
rather get something actually committed so that we can prove value one way  
or the other and am therefore, of course, happy to put sidecar patches  
wherever someone can review and commit it.  

-Joey  

On Mon, Aug 20, 2018 at 1:52 PM sankalp kohli   
wrote:  

> Hi,  
> I am starting a new thread to get consensus on where the side car  
> should be contributed.  
>  
> Pleas

Re: Side Car New Repo vs not

2018-08-20 Thread Blake Eggleston
If the sidecar is going to be on a different release cadence, or support 
interacting with mixed mode clusters, then it should definitely be in a 
separate repo. I don’t even know how branching and merging would work in a repo 
that supports 2 separate release targets and/or mixed mode compatibility, but 
I’m pretty sure it would be a mess.

As a cluster management tool, mixed mode is probably going to be a goal at some 
point. As a new project, it will benefit from not being tied to the C* release 
cycle (which would probably delay any sidecar release until whenever 4.1 is 
cut).


On August 20, 2018 at 3:22:54 PM, Joseph Lynch (joe.e.ly...@gmail.com) wrote:

I think that the pros of incubating the sidecar in tree as a tool first  
outweigh the alternatives at this point of time. Rough tradeoffs that I see:  

Unique pros of in tree sidecar:  
* Faster iteration speed in general. For example when we need to add a new  
JMX endpoint that the sidecar needs, or change something from JMX to a  
virtual table (e.g. for repair, or monitoring) we can do all changes  
including tests as one commit within the main repository and don't have to  
commit to main repo, sidecar repo, and dtest repo (juggling version  
compatibility along the way).  
* We can in the future more easily move serious background functionality  
like compaction or repair itself (not repair scheduling, actual repairing)  
into the sidecar with a single atomic commit, we don't have to do two phase  
commits where we add some IPC mechanism to allow us to support it in both,  
then turn it on in the sidecar, then turn it off in the server, etc...  
* I think that the verification is much easier (sounds like Jonathan  
disagreed on the other thread, I could certainly be wrong), and we don't  
have to worry about testing matrices to assure that the sidecar works with  
various versions as the version of the sidecar that is released with that  
version of Cassandra is the only one we have to certify works. If people  
want to pull in new versions or maintain backports they can do that at  
their discretion/testing.  
* We can iterate and prove value before committing to a choice. Since it  
will be a separate artifact from the start we can always move the artifact  
to a separate repo later (but moving the other way is harder).  
* Users will get the sidecar "for free" when they install the daemon, they  
don't need to take affirmative action to e.g. be able to restart their  
cluster, run repair, or back their data up; it just comes out of the box  
for free.  

Unique pros of a separate repository sidecar:  
* We can use a more modern build system like gradle instead of ant  
* Merging changes is less "scary" I guess (I feel like if you're not  
touching the daemon this is already true but I could see this being less  
worrisome for some).  
* Releasing a separate artifact is somewhat easier from a separate repo  
(especially if we have gradle which makes e.g. building debs and rpms  
trivial).  
* We could backport to previous versions without getting into arguments  
about bug fixes vs features.  
* Committers could be different from the main repo, which ... may be a  
useful thing  

Non unique pros of a sidecar (could be achieved in the main repo or in a  
separate repo):  
* A separate build artifact .jar/.deb/.rpm that can be installed  
separately. It's slightly easier with a separate repo but certainly not out  
of reach within a single repo (indeed the current patch already creates a  
separate jar, and we could create a separate .deb reasonably easily).  
Personally I think having a separate .deb/.rpm is premature at this point  
(for companies that really want it they can build their own packages using  
the .jars), but I think it really is a distracting issue from where the  
patch should go as we can always choose to remove experimental .jar files  
that the main daemon doesn't touch.  
* A separate process lifecycle. No matter where the sidecar goes, we get  
the benefit of restarting it being less dangerous for availability than  
restarting the main daemon.  

That all being said, these are strong opinions weakly held and I would  
rather get something actually committed so that we can prove value one way  
or the other and am therefore, of course, happy to put sidecar patches  
wherever someone can review and commit it.  

-Joey  

On Mon, Aug 20, 2018 at 1:52 PM sankalp kohli   
wrote:  

> Hi,  
> I am starting a new thread to get consensus on where the side car  
> should be contributed.  
>  
> Please send your responses with pro/cons of each approach or any other  
> approach. Please be clear which approach you will pick while still giving  
> pros/cons of both approaches.  
>  
> Thanks.  
> Sankalp  
>  


Re: Proposing an Apache Cassandra Management process

2018-08-20 Thread Jeff Jirsa
On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala
 wrote:

> contributions should be evaluated based on the merit of code and their
> value add to the whole offering. I  hope it does not matter whether that
> contribution comes from PMC member or a person who is not a committer.


I hope this goes without saying.


Re: Proposing an Apache Cassandra Management process

2018-08-20 Thread Roopa Tangirala
+1 to everything that Joey articulated with emphasis on the fact that
contributions should be evaluated based on the merit of code and their
value add to the whole offering. I  hope it does not matter whether that
contribution comes from PMC member or a person who is not a committer. I
would like the process to be such that it encourages the new members to be
a part of the community and not shy away from contributing to the code
assuming their contributions are valued differently than committers or PMC
members. It would be sad to see the contributions decrease if we go down
that path.

*Regards,*

*Roopa Tangirala*

Engineering Manager CDE

*(408) 438-3156 - mobile*






On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch  wrote:

> > We are looking to contribute Reaper to the Cassandra project.
> >
> Just to clarify are you proposing contributing Reaper as a project via
> donation or you are planning on contributing the features of Reaper as
> patches to Cassandra? If the former how far along are you on the donation
> process? If the latter, when do you think you would have patches ready for
> consideration / review?
>
>
> > Looking at the patch it's very similar in its base design already, but
> > Reaper does has a lot more to offer. We have all been working hard to
> move
> > it to also being a side-car so it can be contributed. This raises a
> number
> > of relevant questions to this thread: would we then accept both works in
> > the Cassandra project, and what burden would it put on the current PMC to
> > maintain both works.
> >
> I would hope that we would collaborate on merging the best parts of all
> into the official Cassandra sidecar, taking the always on, shared nothing,
> highly available system that we've contributed a patchset for and adding in
> many of the repair features (e.g. schedules, a nice web UI) that Reaper
> has.
>
>
> > I share Stefan's concern that consensus had not been met around a
> > side-car, and that it was somehow default accepted before a patch landed.
>
>
> I feel this is not correct or fair. The sidecar and repair discussions have
> been anything _but_ "default accepted". The timeline of consensus building
> involving the management sidecar and repair scheduling plans:
>
> Dec 2016: Vinay worked with Jon and Alex to try to collaborate on Reaper to
> come up with design goals for a repair scheduler that could work at Netflix
> scale.
>
> ~Feb 2017: Netflix believes that the fundamental design gaps prevented us
> from using Reaper as it relies heavily on remote JMX connections and
> central coordination.
>
> Sep. 2017: Vinay gives a lightning talk at NGCC about a highly available
> and distributed repair scheduling sidecar/tool. He is encouraged by
> multiple committers to build repair scheduling into the daemon itself and
> not as a sidecar so the database is truly eventually consistent.
>
> ~Jun. 2017 - Feb. 2018: Based on internal need and the positive feedback at
> NGCC, Vinay and myself prototype the distributed repair scheduler within
> Priam and roll it out at Netflix scale.
>
> Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 page
> design document for adding repair scheduling to the daemon itself and open
> the design up for feedback from the community. We get feedback from Alex,
> Blake, Nate, Stefan, and Mick. As far as I know there were zero proposals
> to contribute Reaper at this point. We hear the consensus that the
> community would prefer repair scheduling in a separate distributed sidecar
> rather than in the daemon itself and we re-work the design to match this
> consensus, re-aligning with our original proposal at NGCC.
>
> Apr 2018: Blake brings the discussion of repair scheduling to the dev list
> (
>
> https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E
> ).
> Many community members give positive feedback that we should solve it as
> part of Cassandra and there is still no mention of contributing Reaper at
> this point. The last message is my attempted summary giving context on how
> we want to take the best of all the sidecars (OpsCenter, Priam, Reaper) and
> ship them with Cassandra.
>
> Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design document
> for gathering feedback on a general management sidecar. Sankalp and Dinesh
> encourage Vinay and myself to kickstart that sidecar using the repair
> scheduler patch
>
> Apr 2018: Dinesh reaches out to the dev list (
>
> https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E
> )
> about the general management process to gain further feedback. All feedback
> remains positive as it is a potential place for multiple community members
> to contribute their various sidecar functionality.
>
> May-Jul 2017: Vinay and I work on creating a basic sidecar for running the
> repair scheduler based on the feedback from the community in
> CASSANDRA-

Re: Proposing an Apache Cassandra Management process

2018-08-20 Thread dinesh.jo...@yahoo.com.INVALID
On Mon, Aug 20, 2018 at 4:23 AM Mick Semb Wever  wrote:
>
> We are looking to contribute Reaper to the Cassandra project.
>
> Looking at the patch it's very similar in its base design already, but
> Reaper does has a lot more to offer. We have all been working hard to move
> it to also being a side-car so it can be contributed. This raises a number
> of relevant questions to this thread: would we then accept both works in
> the Cassandra project, and what burden would it put on the current PMC to
> maintain both works.
I think the comparison is not fair. The patch that has landed is new and is the 
beginning of a sidecar within Cassandra. It would be unfair to compare its 
features with Reaper which has been around for some time now. 
I proposed a management process (not exactly a sidecar) for Cassandra about 4 
months ago. Had you guys indicated interest in contributing Reaper, we would 
not be discussing two separate implementations. Don't get me wrong, I'm happy 
that we're talking about this right now.
> This seems at odds when we're already struggling to keep up with the
> incoming patches/contributions, and there could be other git repos in the
> project we will need to support in the future too. 
I think this is a great problem to have for a project. This is a sign that the 
pool of contributors is greater than reviewers / committers. I personally have 
been volunteering my time reviewing tickets, fixing flaky tests and generally 
helping out. I definitely think we need more people actively contributing.
> The Reaper project has worked hard in building both its user and
> contributor base. And I would have thought these, including having the
> contributor base overlap with the C* PMC, were prerequisites before moving
> a larger body of work into the project (separate git repo or not). I guess
You're talking about donating a body of code i.e. Reaper which is different 
from building a new feature from scratch.
> There's been little effort in evaluating these two bodies of work, one
> which is largely unknown to us, and my concern is how we would fairly
> support both going into the future?
I don't think we should have two separate implementations as part of the 
project. It would be best if we could have a single sidecar that could have 
features from Reaper as well as the proposed patch.
Thanks,
Dinesh  

Re: Side Car New Repo vs not

2018-08-20 Thread Joseph Lynch
I think that the pros of incubating the sidecar in tree as a tool first
outweigh the alternatives at this point of time. Rough tradeoffs that I see:

Unique pros of in tree sidecar:
* Faster iteration speed in general. For example when we need to add a new
JMX endpoint that the sidecar needs, or change something from JMX to a
virtual table (e.g. for repair, or monitoring) we can do all changes
including tests as one commit within the main repository and don't have to
commit to main repo, sidecar repo, and dtest repo (juggling version
compatibility along the way).
* We can in the future more easily move serious background functionality
like compaction or repair itself (not repair scheduling, actual repairing)
into the sidecar with a single atomic commit, we don't have to do two phase
commits where we add some IPC mechanism to allow us to support it in both,
then turn it on in the sidecar, then turn it off in the server, etc...
* I think that the verification is much easier (sounds like Jonathan
disagreed on the other thread, I could certainly be wrong), and we don't
have to worry about testing matrices to assure that the sidecar works with
various versions as the version of the sidecar that is released with that
version of Cassandra is the only one we have to certify works. If people
want to pull in new versions or maintain backports they can do that at
their discretion/testing.
* We can iterate and prove value before committing to a choice. Since it
will be a separate artifact from the start we can always move the artifact
to a separate repo later (but moving the other way is harder).
* Users will get the sidecar "for free" when they install the daemon, they
don't need to take affirmative action to e.g. be able to restart their
cluster, run repair, or back their data up; it just comes out of the box
for free.

Unique pros of a separate repository sidecar:
* We can use a more modern build system like gradle instead of ant
* Merging changes is less "scary" I guess (I feel like if you're not
touching the daemon this is already true but I could see this being less
worrisome for some).
* Releasing a separate artifact is somewhat easier from a separate repo
(especially if we have gradle which makes e.g. building debs and rpms
trivial).
* We could backport to previous versions without getting into arguments
about bug fixes vs features.
* Committers could be different from the main repo, which ... may be a
useful thing

Non unique pros of a sidecar (could be achieved in the main repo or in a
separate repo):
* A separate build artifact .jar/.deb/.rpm that can be installed
separately. It's slightly easier with a separate repo but certainly not out
of reach within a single repo (indeed the current patch already creates a
separate jar, and we could create a separate .deb reasonably easily).
Personally I think having a separate .deb/.rpm is premature at this point
(for companies that really want it they can build their own packages using
the .jars), but I think it really is a distracting issue from where the
patch should go as we can always choose to remove experimental .jar files
that the main daemon doesn't touch.
* A separate process lifecycle. No matter where the sidecar goes, we get
the benefit of restarting it being less dangerous for availability than
restarting the main daemon.

That all being said, these are strong opinions weakly held and I would
rather get something actually committed so that we can prove value one way
or the other and am therefore, of course, happy to put sidecar patches
wherever someone can review and commit it.

-Joey

On Mon, Aug 20, 2018 at 1:52 PM sankalp kohli 
wrote:

> Hi,
> I am starting a new thread to get consensus on where the side car
> should be contributed.
>
> Please send your responses with pro/cons of each approach or any other
> approach. Please be clear which approach you will pick while still giving
> pros/cons of both approaches.
>
> Thanks.
> Sankalp
>


Re: Proposing an Apache Cassandra Management process

2018-08-20 Thread Joseph Lynch
> We are looking to contribute Reaper to the Cassandra project.
>
Just to clarify are you proposing contributing Reaper as a project via
donation or you are planning on contributing the features of Reaper as
patches to Cassandra? If the former how far along are you on the donation
process? If the latter, when do you think you would have patches ready for
consideration / review?


> Looking at the patch it's very similar in its base design already, but
> Reaper does has a lot more to offer. We have all been working hard to move
> it to also being a side-car so it can be contributed. This raises a number
> of relevant questions to this thread: would we then accept both works in
> the Cassandra project, and what burden would it put on the current PMC to
> maintain both works.
>
I would hope that we would collaborate on merging the best parts of all
into the official Cassandra sidecar, taking the always on, shared nothing,
highly available system that we've contributed a patchset for and adding in
many of the repair features (e.g. schedules, a nice web UI) that Reaper has.


> I share Stefan's concern that consensus had not been met around a
> side-car, and that it was somehow default accepted before a patch landed.


I feel this is not correct or fair. The sidecar and repair discussions have
been anything _but_ "default accepted". The timeline of consensus building
involving the management sidecar and repair scheduling plans:

Dec 2016: Vinay worked with Jon and Alex to try to collaborate on Reaper to
come up with design goals for a repair scheduler that could work at Netflix
scale.

~Feb 2017: Netflix believes that the fundamental design gaps prevented us
from using Reaper as it relies heavily on remote JMX connections and
central coordination.

Sep. 2017: Vinay gives a lightning talk at NGCC about a highly available
and distributed repair scheduling sidecar/tool. He is encouraged by
multiple committers to build repair scheduling into the daemon itself and
not as a sidecar so the database is truly eventually consistent.

~Jun. 2017 - Feb. 2018: Based on internal need and the positive feedback at
NGCC, Vinay and myself prototype the distributed repair scheduler within
Priam and roll it out at Netflix scale.

Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 page
design document for adding repair scheduling to the daemon itself and open
the design up for feedback from the community. We get feedback from Alex,
Blake, Nate, Stefan, and Mick. As far as I know there were zero proposals
to contribute Reaper at this point. We hear the consensus that the
community would prefer repair scheduling in a separate distributed sidecar
rather than in the daemon itself and we re-work the design to match this
consensus, re-aligning with our original proposal at NGCC.

Apr 2018: Blake brings the discussion of repair scheduling to the dev list (
https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E).
Many community members give positive feedback that we should solve it as
part of Cassandra and there is still no mention of contributing Reaper at
this point. The last message is my attempted summary giving context on how
we want to take the best of all the sidecars (OpsCenter, Priam, Reaper) and
ship them with Cassandra.

Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design document
for gathering feedback on a general management sidecar. Sankalp and Dinesh
encourage Vinay and myself to kickstart that sidecar using the repair
scheduler patch

Apr 2018: Dinesh reaches out to the dev list (
https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E)
about the general management process to gain further feedback. All feedback
remains positive as it is a potential place for multiple community members
to contribute their various sidecar functionality.

May-Jul 2017: Vinay and I work on creating a basic sidecar for running the
repair scheduler based on the feedback from the community in
CASSANDRA-14346 and CASSANDRA-14395

Jun 2018: I bump CASSANDRA-14346 indicating we're still working on this,
nobody objects

Jul 2018: Sankalp asks on the dev list if anyone has feature Jiras anyone
needs review for before 4.0, I mention again that we've nearly got the
basic sidecar and repair scheduling work done and will need help with
review. No one responds.

Aug 2018: We submit a patch that brings a basic distributed sidecar and
robust distributed repair to Cassandra itself. Dinesh mentions that he will
try to review. Now folks appear concerned about it being in tree and
instead maybe it should go in a different repo all together. I don't think
we have consensus on the repo choice yet.

This seems at odds when we're already struggling to keep up with the
> incoming patches/contributions, and there could be other git repos in the
> project we will need to support in the future too. But 

Side Car New Repo vs not

2018-08-20 Thread sankalp kohli
Hi,
I am starting a new thread to get consensus on where the side car
should be contributed.

Please send your responses with pro/cons of each approach or any other
approach. Please be clear which approach you will pick while still giving
pros/cons of both approaches.

Thanks.
Sankalp


Re: Proposing an Apache Cassandra Management process

2018-08-20 Thread sankalp kohli
Hi,
Here is how I think we can make progress
1. Consensus is reached that we need side car as this thread is months old
and I do not see any objections. I bumped it again and it is good to see it
being active.
2. There is no consensus on new repo vs not. I will start a thread on it
and lets discuss it.
3. We have 2 implementations now for running repair via side car. This is
actually awesome to see for the community. We should work on JIRA(like we
did for virtual table) to get the best out of both the implementation.

Thanks,
Sankalp



On Mon, Aug 20, 2018 at 4:23 AM Mick Semb Wever  wrote:

>
> On Fri, 17 Aug 2018, at 14:27, Sankalp Kohli wrote:
> > I am bumping this thread because patch has landed for this with repair
> > functionality.
>
>
> We are looking to contribute Reaper to the Cassandra project.
>
> Looking at the patch it's very similar in its base design already, but
> Reaper does has a lot more to offer. We have all been working hard to move
> it to also being a side-car so it can be contributed. This raises a number
> of relevant questions to this thread: would we then accept both works in
> the Cassandra project, and what burden would it put on the current PMC to
> maintain both works.
>
> I share Stefan's concern that consensus had not been met around a
> side-car, and that it was somehow default accepted before a patch landed.
> This seems at odds when we're already struggling to keep up with the
> incoming patches/contributions, and there could be other git repos in the
> project we will need to support in the future too. But I'm also curious
> about the whole "Community over Code" angle to this, how do we encourage
> multiple external works to collaborate together building value in both the
> technical and community.
>
> The Reaper project has worked hard in building both its user and
> contributor base. And I would have thought these, including having the
> contributor base overlap with the C* PMC, were prerequisites before moving
> a larger body of work into the project (separate git repo or not). I guess
> this isn't so much "Community over Code", but it illustrates a concern
> regarding abandoned code when there's no existing track record of
> maintaining it as OSS, as opposed to expecting an existing "show, don't
> tell" culture. Reaper for example has stronger indicators for ongoing
> support and an existing OSS user base: today C* committers having
> contributed to Reaper are Jon, Stefan, Nate, and myself, amongst the 40
> contributors in total. And we've been making steps to involve it more into
> the C* community (eg users ML), without being too presumptuous. On the
> technical side: Reaper supports (or can easily) all the concerns that the
> proposal here raises: distributed nodetool commands, centralising jmx
> interfacing, scheduling ops (repairs, snapshots, compactions, cleanups,
> etc), monitoring and diagnostics, etc etc. It's designed so that it can be
> a single instance, instance-per-datacenter, or side-car (per process). When
> there are multiple instances in a datacenter you get HA. You have a choice
> of different storage backends (memory, postgres, c*). You can ofc use a
> separate C* cluster as a backend so to separate infrastructure data from
> production data. And it's got an UI for C* Diagnostics already (which
> imposes a different jmx interface of polling for events rather than
> subscribing to jmx notifications which we know is problematic, thanks to
> Stefan). Anyway, that's my plug for Reaper :-)
>
> There's been little effort in evaluating these two bodies of work, one
> which is largely unknown to us, and my concern is how we would fairly
> support both going into the future?
>
> Another option would be that this side-car patch first exists as a github
> project for a period of time, on par to how Reaper has been. This will help
> evaluate its use and to first build up its contributors. This makes it
> easier for the C* PMC to choose which projects it would want to formally
> maintain, and to do so based on factors beyond merits of the technical. We
> may even see it converge (or collaborate more) with Reaper, a win for
> everyone.
>
> regards,
> Mick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Proposing an Apache Cassandra Management process

2018-08-20 Thread Mick Semb Wever


On Fri, 17 Aug 2018, at 14:27, Sankalp Kohli wrote:
> I am bumping this thread because patch has landed for this with repair 
> functionality. 


We are looking to contribute Reaper to the Cassandra project. 

Looking at the patch it's very similar in its base design already, but Reaper 
does has a lot more to offer. We have all been working hard to move it to also 
being a side-car so it can be contributed. This raises a number of relevant 
questions to this thread: would we then accept both works in the Cassandra 
project, and what burden would it put on the current PMC to maintain both works.

I share Stefan's concern that consensus had not been met around a side-car, and 
that it was somehow default accepted before a patch landed. This seems at odds 
when we're already struggling to keep up with the incoming 
patches/contributions, and there could be other git repos in the project we 
will need to support in the future too. But I'm also curious about the whole 
"Community over Code" angle to this, how do we encourage multiple external 
works to collaborate together building value in both the technical and 
community.  
 
The Reaper project has worked hard in building both its user and contributor 
base. And I would have thought these, including having the contributor base 
overlap with the C* PMC, were prerequisites before moving a larger body of work 
into the project (separate git repo or not). I guess this isn't so much 
"Community over Code", but it illustrates a concern regarding abandoned code 
when there's no existing track record of maintaining it as OSS, as opposed to 
expecting an existing "show, don't tell" culture. Reaper for example has 
stronger indicators for ongoing support and an existing OSS user base: today C* 
committers having contributed to Reaper are Jon, Stefan, Nate, and myself, 
amongst the 40 contributors in total. And we've been making steps to involve it 
more into the C* community (eg users ML), without being too presumptuous. On 
the technical side: Reaper supports (or can easily) all the concerns that the 
proposal here raises: distributed nodetool commands, centralising jmx 
interfacing, scheduling ops (repairs, snapshots, compactions, cleanups, etc), 
monitoring and diagnostics, etc etc. It's designed so that it can be a single 
instance, instance-per-datacenter, or side-car (per process). When there are 
multiple instances in a datacenter you get HA. You have a choice of different 
storage backends (memory, postgres, c*). You can ofc use a separate C* cluster 
as a backend so to separate infrastructure data from production data. And it's 
got an UI for C* Diagnostics already (which imposes a different jmx interface 
of polling for events rather than subscribing to jmx notifications which we 
know is problematic, thanks to Stefan). Anyway, that's my plug for Reaper :-)

There's been little effort in evaluating these two bodies of work, one which is 
largely unknown to us, and my concern is how we would fairly support both going 
into the future? 

Another option would be that this side-car patch first exists as a github 
project for a period of time, on par to how Reaper has been. This will help 
evaluate its use and to first build up its contributors. This makes it easier 
for the C* PMC to choose which projects it would want to formally maintain, and 
to do so based on factors beyond merits of the technical. We may even see it 
converge (or collaborate more) with Reaper, a win for everyone.

regards,
Mick

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org