Re: Reaper as cassandra-admin

2018-08-27 Thread Dinesh Joshi
> On Aug 27, 2018, at 5:36 PM, Jonathan Haddad  wrote:
> We're hoping to get some feedback on our side if that's something people
> are interested in.  We've gone back and forth privately on our own
> preferences, hopes, dreams, etc, but I feel like a public discussion would
> be healthy at this point.  Does anyone share the view of using Reaper as a
> starting point?  What concerns to people have?


I have briefly looked at the Reaper codebase but I am yet to analyze it better 
to have a real, meaningful opinion. 

My main concern with starting with an existing codebase is that it comes with 
tech debt. This is not specific to Reaper but to any codebase that is imported 
as a whole. This means future developers and patches have to work within the 
confines of the decisions that were already made. Practically speaking once a 
codebase is established there is inertia in making architectural changes and 
we're left dealing with technical debt.

As it stands I am not against the idea of using Reaper's features and I would 
very much like using mature code that has been tested. I would however like to 
propose piece-mealing it into the codebase. This will give the community a 
chance to review what is going in and possibly change some of the design 
decisions upfront. This will also avoid a situation where we have to make many 
breaking changes in the initial versions due to refactoring.

I would also like it if we could compare and contrast the functionality with 
Priam or any other interesting sidecars that folks may want to call out. In 
fact it would be great if we could bring in the best functionality from 
multiple implementations.

Dinesh
-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Side Car New Repo vs not

2018-08-27 Thread Sankalp Kohli
Thanks everyone for the feedback. Looks like we will go with separate repo as 
that is what majority of people prefer. 

Also note that we can always change this approach later as we build the side 
car. 

> On Aug 24, 2018, at 07:00, Eric Evans  wrote:
> 
>> On Thu, Aug 23, 2018 at 3:01 PM sankalp kohli  wrote:
>> 
>> Separate repo is in a majority so far. Please reply to this thread with
>> your responses.
> 
> I think it makes sense for the code, project, and workflows to be
> (de|loosely)-coupled, so the repo should be as well.
> 
> +1 for a separate repository
> 
>> On Tue, Aug 21, 2018 at 4:34 PM Rahul Singh 
>> wrote:
>> 
>>> +1 for separate repo. Especially on git. Maybe make it a submodule.
>>> 
>>> Rahul
>>> On Aug 21, 2018, 3:33 PM -0500, Stefan Podkowinski ,
>>> wrote:
 I'm also currently -1 on the in-tree option.
 
 Additionally to what Aleksey mentioned, I also don't see how we could
 make this work with the current build and release process. Our scripts
 [0] for creating releases (tarballs and native packages), would need
 significant work to add support for an independent side-car. Our ant
 based build process is also not a great start for adding new tasks, let
 alone integrating other tool chains for web components for a potential
>>> UI.
 
 [0] https://git-wip-us.apache.org/repos/asf?p=cassandra-builds.git
 
 
> On 21.08.18 19:20, Aleksey Yeshchenko wrote:
> Sure, allow me to elaborate - at least a little bit. But before I do,
>>> just let me note that this wasn’t a veto -1, just a shorthand for “I don’t
>>> like this option”.
> 
> It would be nice to have sidecar and C* version and release cycles
>>> fully decoupled. I know it *can* be done when in-tree, but the way we vote
>>> on releases with tags off current branches would have to change somehow.
>>> Probably painfully. It would be nice to be able to easily enforce freezes,
>>> like the upcoming one, on the whole C* repo, while allowing feature
>>> development on the sidecar. It would be nice to not have sidecar commits in
>>> emails from commits@ mailing list. It would be nice to not have C* CI
>>> trigger necessarily on sidecar commits. Groups of people working on the two
>>> repos will mostly be different too, so what’s the point in sharing the repo?
> 
> Having an extra repo with its own set of branches is cheap and easy -
>>> we already do that with dtests. I like cleanly separated things when
>>> coupling is avoidable. As such I would prefer the sidecar to live in a
>>> separate new repo, while still being part of the C* project.
> 
> —
> AY
> 
> On 21 August 2018 at 17:06:39, sankalp kohli (kohlisank...@gmail.com)
>>> wrote:
> 
> Hi Aleksey,
> Can you please elaborate on the reasons for your -1? This
> way we can make progress towards any one approach.
> Thanks,
> Sankalp
> 
> On Tue, Aug 21, 2018 at 8:39 AM Aleksey Yeshchenko 
> wrote:
> 
>> FWIW I’m strongly -1 on in-tree approach, and would much prefer a
>>> separate
>> repo, dtest-style.
>> 
>> —
>> AY
>> 
>> On 21 August 2018 at 16:36:02, Jeremiah D Jordan (
>> jeremiah.jor...@gmail.com) wrote:
>> 
>> I think the following is a very big plus of it being in tree:
 * Faster iteration speed in general. For example when we need to
>>> add a
 new
 JMX endpoint that the sidecar needs, or change something from
>>> JMX to a
 virtual table (e.g. for repair, or monitoring) we can do all
>>> changes
 including tests as one commit within the main repository and
>>> don't
>> have
 to
 commit to main repo, sidecar repo,
>> 
>> I also don’t see a reason why the sidecar being in tree means it
>>> would not
>> work in a mixed version cluster. The nodes themselves must work in a
>>> mixed
>> version cluster during a rolling upgrade, I would expect any
>>> management
>> side car to operate in the same manor, in tree or not.
>> 
>> This tool will be pretty tightly coupled with the server, and as
>>> someone
>> with experience developing such tightly coupled tools, it is *much*
>>> easier
>> to make sure you don’t accidentally break them if they are in tree.
>>> How
>> many times has someone updated some JMX interface, updated nodetool,
>>> and
>> then moved on? Breaking all the external tools not in tree, without
>> realizing it. The above point about being able to modify interfaces
>>> and the
>> side car in the same commit is huge in terms of making sure someone
>>> doesn’t
>> inadvertently break the side car while fixing something else.
>> 
>> -Jeremiah
>> 
>> 
>>> On Aug 21, 2018, at 10:28 AM, Jonathan Haddad 
>> wrote:
>>> 
>>> Strongly agree with Blake. In my mind supporting multiple versions
>>> is
>>> mandatory. As I've stated before, we already do it with Reaper, 

Re: Reaper as cassandra-admin

2018-08-27 Thread Rahul Singh
I’d be interested in contributing as well. I’ve been working on a skew review / 
diagnostics tool which feeds off of cfstats/tbstats data (from TXT output to 
CSV to conditionally formatted excel ) and am starting to store data in C* and 
wrap a React based grid on it.

I have backlogged forking the reaper core / UI (api / front end ). It has a lot 
of potential — specifically if the API / Services / UI could be modularized and 
leverage IoC to add functionality via configuration not code.

There are a lot good conventions in both open source and commercial projects 
out there for web based administration tools. The most successful ones do the 
basics related to their tool well and leave the rest to other systems.

The pitfall I don’t want the valuable talent to enter in this group is to 
reinvent the wheel on things that other tools do well and focus on what Admins/ 
Architects/ Developers need. Eg. if Prometheus and Grafana are good for stats, 
keep it - just make it easier to facilitate or compose in Docker.

Another example : There are ideas I had including a data / browser / 
interactive query interface — but Redash or Zeppelin do a good job for the time 
being and no matter how much time I spend on it I probably wouldn’t want make a 
better one.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Aug 27, 2018, 9:22 PM -0400, Mick Semb Wever , wrote:
>
> > Is there a roadmap or release schedule, so we can get an idea of what
> > the Reaper devs have planned for it?
>
>
> Hi Murukesh,
> there's no roadmap per se, as it's open-source and it's the contributions as 
> they come that make it.
>
> What I know that's in progress or been discussed is:
> - more thorough upgrade tests,
> - support for diagnostic events (C* 4.0),
> - more task/operations: compactions, cleanups, sstableupgrades, etc etc,
> - more metrics (better visualisations, for example see the newly added 
> streaming),
> - making the scheduler repair-agnostic (so any task/operation can be 
> scheduled), and
> - making task/operations not based on jmx calls (preparing for non-jmx type 
> tasks).
>
> regards,
> Mick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: Reaper as cassandra-admin

2018-08-27 Thread Mick Semb Wever


> Can you get all of the contributors cleared?
> What’s the architecture? Is it centralized? Is there a sidecar?


Working on it Jeff. Contributors are close to cleared. Copyright is either 
Spotify or Stefan, both whom have CLAs in place with ASF.
Licenses of all npm dependencies are good. Still gotta audit the java deps.

Architecture docs need to be fleshed out, especially to address the 
side-car/management ticket discussions/design.

Reaper is flexible in its design, you can run one or multiple instances. A lot 
of work has been made to move it towards a eventually-consistent at-least-once 
design. The hard-work towards the side-car model we feel is trialled and 
battle-tested, but we do need to re-add the pinning of connections and repair 
segments to localhosts.

There will be questions about some of Reaper's flexibility: for example will we 
still want to support memory and postgres storage backends (neither of which 
support distributed/side-car installations).


> As an aside, it’s frustrating that ya’ll would sit on this for months…

You're quite right Jeff, and I do owe an apology to those that worked on the 
ticket so far. I've been a bit caught off-guard by it all. I'm really hoping we 
can start to converge work and ideas from this point on.

regards,
Mick

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Nodetool refresh v/s sstableloader

2018-08-27 Thread Rajath Subramanyam
Hi Cassandra users, Cassandra dev,

When recovering using SSTables from a snapshot, I want to know what are the
key differences between using:
1. Nodetool refresh and,
2. SSTableloader

Does nodetool refresh have restrictions that need to be met?
Does nodetool refresh work even if there is a change in the topology
between the source cluster and the destination cluster? Does it work if the
token ranges don't match between the source cluster and the destination
cluster? Does it work when an old SSTable in the snapshot has a dropped
column that is not part of the current schema?

I appreciate any help in advance.

Thanks,
Rajath

Rajath Subramanyam


Re: Reaper as cassandra-admin

2018-08-27 Thread Mick Semb Wever


> Is there a roadmap or release schedule, so we can get an idea of what
> the Reaper devs have planned for it?


Hi Murukesh,
 there's no roadmap per se, as it's open-source and it's the contributions as 
they come that make it. 

What I know that's in progress or been discussed is:
 - more thorough upgrade tests,
 - support for diagnostic events (C* 4.0),
 - more task/operations: compactions, cleanups, sstableupgrades, etc etc,
 - more metrics (better visualisations, for example see the newly added 
streaming),
 - making the scheduler repair-agnostic (so any task/operation can be 
scheduled), and
 - making task/operations not based on jmx calls (preparing for non-jmx type 
tasks).

regards,
Mick

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Reaper as cassandra-admin

2018-08-27 Thread Jeff Jirsa
As an aside, it’s frustrating that ya’ll would sit on this for months (first 
e-mail was April); you folks have enough people that know the process to know 
that communicating early and often helps avoid duplicating (expensive) work. 

The best tech needs to go in and we need to leave ourselves with the ability to 
meet the goals of the original proposal (and then some). The reaper UI is nice, 
I wish you’d have talked to the other group of folks to combine efforts in 
April, we’d be much further ahead. 

-- 
Jeff Jirsa


> On Aug 27, 2018, at 6:02 PM, Jeff Jirsa  wrote:
> 
> Can you get all of the contributors cleared?
> What’s the architecture? Is it centralized? Is there a sidecar?
> 
> 
>> On Aug 27, 2018, at 5:36 PM, Jonathan Haddad  wrote:
>> 
>> Hey folks,
>> 
>> Mick brought this up in the sidecar thread, but I wanted to have a clear /
>> separate discussion about what we're thinking with regard to contributing
>> Reaper to the C* project.  In my mind, starting with Reaper is a great way
>> of having an admin right now, that we know works well at the kind of scale
>> we need.  We've worked with a lot of companies putting Reaper in prod (at
>> least 50), running on several hundred clusters.  The codebase has evolved
>> as a direct result of production usage, and we feel it would be great to
>> pair it with the 4.0 release.  There was a LOT of work done on the repair
>> logic to make things work across every supported version of Cassandra, with
>> a great deal of documentation as well.
>> 
>> In case folks aren't aware, in addition to one off and scheduled repairs,
>> Reaper also does cluster wide snapshots, exposes thread pool stats, and
>> visualizes streaming (in trunk).
>> 
>> We're hoping to get some feedback on our side if that's something people
>> are interested in.  We've gone back and forth privately on our own
>> preferences, hopes, dreams, etc, but I feel like a public discussion would
>> be healthy at this point.  Does anyone share the view of using Reaper as a
>> starting point?  What concerns to people have?
>> -- 
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Reaper as cassandra-admin

2018-08-27 Thread Jonathan Haddad
I don't believe #1 should be an issue, Mick has been reaching out.

Alex and Mick are putting together some architecture documentation, I won't
step on their toes.  Currently you can run Reaper as a single instance that
connects to your entire cluster, multiple instances in HA mode, and we're
finishing up the rework of the code to run it as a sidecar.

On Mon, Aug 27, 2018 at 6:02 PM Jeff Jirsa  wrote:

> Can you get all of the contributors cleared?
> What’s the architecture? Is it centralized? Is there a sidecar?
>
>
> > On Aug 27, 2018, at 5:36 PM, Jonathan Haddad  wrote:
> >
> > Hey folks,
> >
> > Mick brought this up in the sidecar thread, but I wanted to have a clear
> /
> > separate discussion about what we're thinking with regard to contributing
> > Reaper to the C* project.  In my mind, starting with Reaper is a great
> way
> > of having an admin right now, that we know works well at the kind of
> scale
> > we need.  We've worked with a lot of companies putting Reaper in prod (at
> > least 50), running on several hundred clusters.  The codebase has evolved
> > as a direct result of production usage, and we feel it would be great to
> > pair it with the 4.0 release.  There was a LOT of work done on the repair
> > logic to make things work across every supported version of Cassandra,
> with
> > a great deal of documentation as well.
> >
> > In case folks aren't aware, in addition to one off and scheduled repairs,
> > Reaper also does cluster wide snapshots, exposes thread pool stats, and
> > visualizes streaming (in trunk).
> >
> > We're hoping to get some feedback on our side if that's something people
> > are interested in.  We've gone back and forth privately on our own
> > preferences, hopes, dreams, etc, but I feel like a public discussion
> would
> > be healthy at this point.  Does anyone share the view of using Reaper as
> a
> > starting point?  What concerns to people have?
> > --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Reaper as cassandra-admin

2018-08-27 Thread Murukesh Mohanan
Is there a roadmap or release schedule, so we can get an idea of what
the Reaper devs have planned for it?


Yours,
Murukesh Mohanan

On Tue, 28 Aug 2018 at 10:02, Jeff Jirsa  wrote:
>
> Can you get all of the contributors cleared?
> What’s the architecture? Is it centralized? Is there a sidecar?
>
>
> > On Aug 27, 2018, at 5:36 PM, Jonathan Haddad  wrote:
> >
> > Hey folks,
> >
> > Mick brought this up in the sidecar thread, but I wanted to have a clear /
> > separate discussion about what we're thinking with regard to contributing
> > Reaper to the C* project.  In my mind, starting with Reaper is a great way
> > of having an admin right now, that we know works well at the kind of scale
> > we need.  We've worked with a lot of companies putting Reaper in prod (at
> > least 50), running on several hundred clusters.  The codebase has evolved
> > as a direct result of production usage, and we feel it would be great to
> > pair it with the 4.0 release.  There was a LOT of work done on the repair
> > logic to make things work across every supported version of Cassandra, with
> > a great deal of documentation as well.
> >
> > In case folks aren't aware, in addition to one off and scheduled repairs,
> > Reaper also does cluster wide snapshots, exposes thread pool stats, and
> > visualizes streaming (in trunk).
> >
> > We're hoping to get some feedback on our side if that's something people
> > are interested in.  We've gone back and forth privately on our own
> > preferences, hopes, dreams, etc, but I feel like a public discussion would
> > be healthy at this point.  Does anyone share the view of using Reaper as a
> > starting point?  What concerns to people have?
> > --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Reaper as cassandra-admin

2018-08-27 Thread Jeff Jirsa
Can you get all of the contributors cleared?
What’s the architecture? Is it centralized? Is there a sidecar?


> On Aug 27, 2018, at 5:36 PM, Jonathan Haddad  wrote:
> 
> Hey folks,
> 
> Mick brought this up in the sidecar thread, but I wanted to have a clear /
> separate discussion about what we're thinking with regard to contributing
> Reaper to the C* project.  In my mind, starting with Reaper is a great way
> of having an admin right now, that we know works well at the kind of scale
> we need.  We've worked with a lot of companies putting Reaper in prod (at
> least 50), running on several hundred clusters.  The codebase has evolved
> as a direct result of production usage, and we feel it would be great to
> pair it with the 4.0 release.  There was a LOT of work done on the repair
> logic to make things work across every supported version of Cassandra, with
> a great deal of documentation as well.
> 
> In case folks aren't aware, in addition to one off and scheduled repairs,
> Reaper also does cluster wide snapshots, exposes thread pool stats, and
> visualizes streaming (in trunk).
> 
> We're hoping to get some feedback on our side if that's something people
> are interested in.  We've gone back and forth privately on our own
> preferences, hopes, dreams, etc, but I feel like a public discussion would
> be healthy at this point.  Does anyone share the view of using Reaper as a
> starting point?  What concerns to people have?
> -- 
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Reaper as cassandra-admin

2018-08-27 Thread Jonathan Haddad
Hey folks,

Mick brought this up in the sidecar thread, but I wanted to have a clear /
separate discussion about what we're thinking with regard to contributing
Reaper to the C* project.  In my mind, starting with Reaper is a great way
of having an admin right now, that we know works well at the kind of scale
we need.  We've worked with a lot of companies putting Reaper in prod (at
least 50), running on several hundred clusters.  The codebase has evolved
as a direct result of production usage, and we feel it would be great to
pair it with the 4.0 release.  There was a LOT of work done on the repair
logic to make things work across every supported version of Cassandra, with
a great deal of documentation as well.

In case folks aren't aware, in addition to one off and scheduled repairs,
Reaper also does cluster wide snapshots, exposes thread pool stats, and
visualizes streaming (in trunk).

We're hoping to get some feedback on our side if that's something people
are interested in.  We've gone back and forth privately on our own
preferences, hopes, dreams, etc, but I feel like a public discussion would
be healthy at this point.  Does anyone share the view of using Reaper as a
starting point?  What concerns to people have?
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Transient Replication 4.0 status update

2018-08-27 Thread Ariel Weisberg
Hi all,

I wanted to give everyone an update on how development of Transient Replication 
is going and where we are going to be as of 9/1. Blake Eggleston, Alex Petrov, 
Benedict Elliott Smith, and myself have been working to get TR implemented for 
4.0. Up to now we have avoided merging anything related to TR to trunk because 
we weren't 100% sure we were going to make the 9/1 deadline and even minimal TR 
functionality requires significant changes (see 14405).

We focused on getting a minimal set of deployable functionality working, and 
want to avoid overselling what's going to work in the first version. The 
feature is marked explicitly as experimental and has to be enabled via a 
feature flag in cassandra.yaml. The expected audience for TR in 4.0 is more 
experienced users who are ready to tackle deploying experimental functionality. 
As it is deployed by experienced users and we gain more confidence in it and 
remove caveats the # of users it will be appropriate for will expand.

For 4.0 it looks like we will be able to merge TR with support for normal reads 
and writes without monotonic reads. Monotonic reads require blocking read 
repair and blocking read repair with TR requires further changes that aren't 
feasible by 9/1.

Future TR support would look something like

4.0.next:
* vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)

4.next:
* Monotonic reads (https://issues.apache.org/jira/browse/CASSANDRA-14665)
* LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
* Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
* Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)

Possibly never:
* Materialized views
 
Probably never:
* Secondary indexes

The most difficult changes to support Transient Replication should be behind 
us. LWT, Batch log, and counters shouldn't be that hard to make transient 
replication aware. Monotonic reads require some changes to the read path, but 
are at least conceptually not that hard to support. I am confident that by 
4.next TR will have fewer tradeoffs.

If you want to take a peek the current feature branch is 
https://github.com/aweisberg/cassandra/tree/14409-7 although we will be moving 
to 14409-8 to rebase on to trunk.

Regards,
Ariel

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org