RE: Cassandra Needs to Grow Up by Version Five!

2018-02-23 Thread Kenneth Brotman
A sincere thank you for everyone that replied.  I will heavy lift the docs for 
a while, do my Slender Cassandra reference project and then I’ll try to find 
one or two areas where I can contribute code to get going on that.  

I'll have a few JIRA's started by the end of the workday.

Kenneth Brotman


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 9:50 AM, Eric Plowe  wrote:

> Cassandra, hard to use? I disagree completely. With that said, there are
> definitely deficiencies in certain parts of the documentation, but nothing
> that is a show stopper.


True, there are no show-stoppers from the docs side, it's just all those
little things--they add up.

We’ve been using Cassandra since the sub 1.0 days and have had nothing but
> great things to say about it.
>
> With that said, its an open source project; you get from it what you’re
> willing to put in. If you just expect something that installs, asks a
> couple of questions and you’re off to the races, Cassandra might not be for
> you.
>
> If you’re willing to put in the time to understand how Cassandra works,
> and how it fits into your use case, and if it is the right fit for your use
> case, you’ll be more than happy, I bet.
>

We are using Cassandra since v2.1 for more than 2 years now, and installing
was never a problem.  It does work and allows us to sleep well, which
cannot be underappreciated.

The problems begin when you need to do operations.  You never know what
exactly will happen when you start a certain repair command or how the
streaming will happen in case of bootstrap/rebuild, and the docs just
aren't detailed enough, so you have go the trial and error path most of the
time.

Regards,
--
Alex


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Sylvain Lebresne
>
> I have to disagree with people here and point out that just creating
> JIRA's and (trying to) have discussions about these issues will not lead to
> change in any reasonable timeframe, because everyone who could do the work
> has an endless list of bigger fish to fry. I strongly encourage you to get
> involved and write some code, or pay someone to do it, because to put it
> bluntly, it's *very* unlikely your JIRA's will get actioned unless you
> contribute significantly to them yourself.
>

Though I don't truly disagree with the overall point that getting into code
is the surest way to get something you care about see progress, I'd love
for this to not be understood as "we don't care about your idea unless you
bring code". There has been tons of JIRA tickets in the past suggesting
improvements where some contributor said "you know what, that's a good
idea" and implemented it. I've certainly see it happen numerous times and
trust I did it a lot as well (and sure, it happens dis-proportionally more
for small improvement than for lets-rewrite-the-whole-database ones, for
obvious reasons hopefully).

So if you have a relatively concrete idea for an improvement, I'd say,
please, share it. Don't get me wrong though, please do your homework first
and take a few minutes googling/JIRA searching to see if that hasn't been
discussed first; don't assume your time is more valuable than that of other
contributors. It's rude to assume so (I'd say in general, but even more so
because it's a free-as-in-beer software).

That said, and to paraphrase what others have said, one should always come
to this with a few understandings:
- For all that people may like your idea and have the time to help it get
in, there is not guarantee here. And yes, more often than not, contributors
already have a list of things they want to fix and only a finite amount of
time for contributions, so the bar for your idea to make it in some other
contributor "list" is probably high. And remember that behavior science
strongly suggests that you thinking your ideas are obviously the most
important ones likely involves a fair amount of bias. That's why
contributing the code yourself, if possible, definitively helps a lot.
- A distributed database is not exactly a simple software. In particular,
Cassandra make the choice to be fully distributed, which is a clear
trade-off: it gives it very interesting properties (scalability, fault
tolerance, ...) almost for free, but it makes some things quite a bit more
challenging. My point being, some things may look like easy problem to
solve on the surface, but are in fact more complex than they appear (which
in turns means solving them take much more time that it seems, and we get
back to contribution time/efforts not be infinite). So it's imo a good idea
to seek first to understand why things are a certain way rather than assume
than contributors don't care.
- Cassandra is not perfect, no software is, but don't assume contributors
are not aware of the weaknesses. We are for the most part. So if those
weaknesses are still there, it's generally (there is of course exceptions)
due to some combination of 1) a lack of time, 2) the difficulties of
solving those weaknesses (without creating new, worth ones) and 3) some
actually well though trade-off (we accept that weakness as the price for
other strengths). As such, if you come simply pointing deficiencies, you
may feel like you are pointing things nobody knows, but chances are, you
aren't. You're probably just reminding contributors how frustrating it is
they don't have time to solve everything. Pointing deficiencies is ok, but
unless you take the time to offer some constructive steps to improve as
well, it's often useless to be honest.

--
Sylvain


RE: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Jacques-Henri Berthemet
Hi Kenneth,

As a Cassandra user I value usability, but since it's a database I value 
consistency and performance even more. If you want usability and documentation 
you can use Datastax DSE, after all that's where they add value on top of 
Cassandra. Since Datastax actually paid dev to work Cassandra internals, it's 
understandable that they kept some part (usability) for their own product. We 
all notice that when you google for some CQL commands you'll always end up to 
Datastax site, it would be great if that was not the case but it would take a 
lot of time.

Also, as a manager you're not supposed to fight with devs but to allocate 
tasks/time. If you have to choose between enhancing documentation and fixing 
this bad race condition that corrupts data, I hope you'd choose the later.

As for filling Jiras, if you create one like "I want a UI to setup TLS" it 
would be the kind of Jira nobody would implement, it takes a lot of time, 
touches security and may not be that useful in the end.

Last point on usability for Cassandra, as an end user it's very difficult to 
see the progress on it, but since I'm using Cassandra internals for my custom 
secondary index I can tell you that there was a huge rework between Cassandra 
2.2 and 3.x, PartitionIterators are a very elegant solution and is really 
helpful in my case, great work guys :)
--
Jacques-Henri Berthemet

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, February 21, 2018 11:54 PM
To: d...@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: RE: Cassandra Needs to Grow Up by Version Five!

Hi Akash,

I get the part about outside work which is why in replying to Jeff Jirsa I was 
suggesting the big companies could justify taking it on easy enough and you 
know actually pay the people who would be working at it so those people could 
have a life.

The part I don't get is the aversion to usability.  Isn't that what you think 
about when you are coding?  "Am I making this thing I'm building easy to use?"  
If you were programming for me, we would be constantly talking about what we 
are building and how we can make things easier for users.  If I had to fight 
with a developer, architect or engineer about usability all the time, they 
would be gone and quick.  How do approach programming if you aren't trying to 
make things easy.

Kenneth Brotman

-Original Message-
From: Akash Gangil [mailto:akashg1...@gmail.com]
Sent: Wednesday, February 21, 2018 2:24 PM
To: d...@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

I would second Jon in the arguments he made. Contributing outside work is 
draining and really requires a lot of commitment. If someone requires features 
around usability etc, just pay for it, period.

On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman < 
kenbrot...@yahoo.com.invalid> wrote:

> Jon,
>
> Very sorry that you don't see the value of the time I'm taking for this.
> I don't have demands; I do have a stern warning and I'm right Jon.  
> Please be very careful not to mischaracterized my words Jon.
>
> You suggest I put things in JIRA's, then seem to suggest that I'd be 
> lucky if anyone looked at it and did anything. That's what I figured too.
>
> I don't appreciate the hostility.  You will understand more fully in 
> the next post where I'm coming from.  Try to keep the conversation civilized.
> I'm trying or at least so you understand I think what I'm doing is 
> saving your gig and mine.  I really like a lot of people is this group.
>
> I've come to a preliminary assessment on things.  Soon the cloud will 
> clear or I'll be gone.  Don't worry.  I'm a very peaceful person and 
> like you I am driven by real important projects that I feel compelled 
> to work on for the good of others.  I don't have time for people to 
> hand hold a database and I can't get stuck with my projects on the wrong 
> stuff.
>
> Kenneth Brotman
>
>
> -Original Message-
> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon 
> Haddad
> Sent: Wednesday, February 21, 2018 12:44 PM
> To: user@cassandra.apache.org
> Cc: d...@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> Ken,
>
> Maybe it’s not clear how open source projects work, so let me try to 
> explain.  There’s a bunch of us who either get paid by someone or 
> volunteer on our free time.  The folks that get paid, (yay!) usually 
> take direction on what the priorities are, and work on projects that 
> directly affect our jobs.  That means that someone needs to care 
> enough about the features you want to work on them, if you’re not going to do 
> it yourself.
>
> Now as others have said already, please put your list of demands in 
> JIRA, if someone i

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Eric Plowe
Cassandra, hard to use? I disagree completely. With that said, there are
definitely deficiencies in certain parts of the documentation, but nothing
that is a show stopper. We’ve been using Cassandra since the sub 1.0 days
and have had nothing but great things to say about it.

With that said, its an open source project; you get from it what you’re
willing to put in. If you just expect something that installs, asks a
couple of questions and you’re off to the races, Cassandra might not be for
you.

If you’re willing to put in the time to understand how Cassandra works, and
how it fits into your use case, and if it is the right fit for your use
case, you’ll be more than happy, I bet.

If there are things that are lacking, that you can’t find a work around
for, submit a PR! That’s the beauty of open source projects.

On Thu, Feb 22, 2018 at 2:55 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Wed, Feb 21, 2018 at 7:54 PM, Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>>
>>
>> However, I think the shots at Cassandra are generally unfair. When I
>> started working with it, the DataStax documentation was some of the best
>> documentation I had seen on any project, especially an open source one.
>>
>
> Oh, don't get me started on documentation, especially the DataStax one.  I
> come from Postgres.  In comparison, Cassandra documentation is mostly
> non-existent (and this is just a way to avoid listing other uncomfortable
> epithets).
>
> Not sure if I would be able to submit patches to improve that, however,
> since most of the time it would require me to already know the answer to my
> questions when the doc is incomplete.
>
> The move from DataStax to Apache.org for docs is actually good, IMO, since
> the docs were maintained very poorly and there was no real leverage to
> influence that.
>
> Cheers,
> --
> Alex
>
>


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Oleksandr Shulgin
On Wed, Feb 21, 2018 at 7:54 PM, Durity, Sean R  wrote:

>
>
> However, I think the shots at Cassandra are generally unfair. When I
> started working with it, the DataStax documentation was some of the best
> documentation I had seen on any project, especially an open source one.
>

Oh, don't get me started on documentation, especially the DataStax one.  I
come from Postgres.  In comparison, Cassandra documentation is mostly
non-existent (and this is just a way to avoid listing other uncomfortable
epithets).

Not sure if I would be able to submit patches to improve that, however,
since most of the time it would require me to already know the answer to my
questions when the doc is incomplete.

The move from DataStax to Apache.org for docs is actually good, IMO, since
the docs were maintained very poorly and there was no real leverage to
influence that.

Cheers,
--
Alex


RE: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Kenneth Brotman
 

Jeff,

 

I already addressed everything you said.  Boy! Would I like to bring up the out 
of date articles on the web that trip people up and the lousy documentation on 
the Apache website but I can’t because a lot of folks don’t know me or why I’m 
saying these things.  

 

I will be making another post that I hope clarifies what’s going on with me.  
After that I will either be a freakishly valuable asset to this community or I 
will be a freakishly valuable asset to another community.  

 

You sure have a funny way of reigning in people that are used to helping out.  
You sure misjudged me.  Wow.

 

Kenneth Brotman

 

From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Wednesday, February 21, 2018 3:12 PM
To: cassandra
Cc: Cassandra DEV
Subject: Re: Cassandra Needs to Grow Up by Version Five!

 

 

On Wed, Feb 21, 2018 at 2:53 PM, Kenneth Brotman <kenbrot...@yahoo.com.invalid> 
wrote:

Hi Akash,

I get the part about outside work which is why in replying to Jeff Jirsa I was 
suggesting the big companies could justify taking it on easy enough and you 
know actually pay the people who would be working at it so those people could 
have a life.

The part I don't get is the aversion to usability.  Isn't that what you think 
about when you are coding?  "Am I making this thing I'm building easy to use?"  
If you were programming for me, we would be constantly talking about what we 
are building and how we can make things easier for users.  If I had to fight 
with a developer, architect or engineer about usability all the time, they 
would be gone and quick.  How do approach programming if you aren't trying to 
make things easy.

 

 

There's no aversion to usability, you're assuming things that just aren't true 
Nobody's against usability, we've just prioritized other things HIGHER. We make 
those decisions in part by looking at open JIRAs and determining what's asked 
for the most, what members of the community have contributed, and then balance 
that against what we ourselves care about. You're making a statement that it 
should be the top priority for the next release, with no JIRA, and history of 
contributing (and indeed, no real clear sign that you even understand the full 
extent of the database), no sign that you're willing to do the work yourself, 
and making a ton of assumptions about the level of effort and ROI.

 

I would love for Cassandra to be easier to use, I'm sure everyone does. There's 
a dozen features I'd love to add if I had infinite budget and infinite 
manpower. But what you're asking for is A LOT of effort and / or A LOT of 
money, and you're assuming someone's going to step up and foot the bill, but 
there's no real reason to believe that's the case. 

 

In the mean time, everyone's spending hours replying to this thread that is 0% 
actionable. We would all have been objectively better off had everyone ignored 
this thread and just spent 10 minutes writing some section of the docs. So the 
next time I get the urge to reply, I'm just going to do that instead.

 

 

 



Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread kurt greaves
>
> Instead of saying "Make X better" you can quantify "Here's how we can make
> X better" in a jira and the conversation will continue with interested
> parties (opening jiras are free!). Being combative and insulting project on
> mailing list may help vent some frustrations but it is counter productive
> and makes people defensive.

Yep. In the Cassandra project you'll have a very hard time convincing
someone else (under someone elses pay) to work on what you want even if you
approach it in the right way. Being assertive/aggressive is sure to remove
all chances entirely.
OSS for such large projects as Cassandra only works if we have a variety of
perspectives all working on the project together, as it's not very feasible
for volunteers to get into the C* project on their own time (nor will it
ever be). At the moment we don't have enough different perspectives working
on the project and the only way to improve that is get involved (preferably
writing some code).

I have to disagree with people here and point out that just creating JIRA's
and (trying to) have discussions about these issues will not lead to change
in any reasonable timeframe, because everyone who could do the work has an
endless list of bigger fish to fry. I strongly encourage you to get
involved and write some code, or pay someone to do it, because to put it
bluntly, it's *very* unlikely your JIRA's will get actioned unless you
contribute significantly to them yourself.

Of course there are also other ways to contribute as well, but by far the
most effective would be to contribute fixes, the next most effective would
be to contribute documentation and help users on the mailing list. Your
Slender Cassandra project is a great example of this, because despite C*
being hard to administer, it would give a lot of users examples to work
off. If people can get it working properly with the right advice, usability
is not such a big issue.
​


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Chris Lohfink
Instead of saying "Make X better" you can quantify "Here's how we can make X 
better" in a jira and the conversation will continue with interested parties 
(opening jiras are free!). Being combative and insulting project on mailing 
list may help vent some frustrations but it is counter productive and makes 
people defensive.

People are not averse to usability, quite the opposite actually. People do tend 
to be averse to conversations opened up with "cassandra is an idiot" with no 
clear definition of how to make it better or what a better solution would look 
like though. Note however that saying "make backups better" or "look at 
marketing literature for these guys" is hard for an engineer or architect to 
break into actionable item. Coming up with cool ideas on how to do something 
will more likely hook a developer into working on it then trying to shame the 
community with a sales pitch from another DB's sales guy.

Chris

> On Feb 21, 2018, at 4:53 PM, Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
> wrote:
> 
> Hi Akash,
> 
> I get the part about outside work which is why in replying to Jeff Jirsa I 
> was suggesting the big companies could justify taking it on easy enough and 
> you know actually pay the people who would be working at it so those people 
> could have a life.
> 
> The part I don't get is the aversion to usability.  Isn't that what you think 
> about when you are coding?  "Am I making this thing I'm building easy to 
> use?"  If you were programming for me, we would be constantly talking about 
> what we are building and how we can make things easier for users.  If I had 
> to fight with a developer, architect or engineer about usability all the 
> time, they would be gone and quick.  How do approach programming if you 
> aren't trying to make things easy.
> 
> Kenneth Brotman
> 
> -Original Message-
> From: Akash Gangil [mailto:akashg1...@gmail.com] 
> Sent: Wednesday, February 21, 2018 2:24 PM
> To: d...@cassandra.apache.org
> Cc: user@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
> 
> I would second Jon in the arguments he made. Contributing outside work is 
> draining and really requires a lot of commitment. If someone requires 
> features around usability etc, just pay for it, period.
> 
> On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman < 
> kenbrot...@yahoo.com.invalid> wrote:
> 
>> Jon,
>> 
>> Very sorry that you don't see the value of the time I'm taking for this.
>> I don't have demands; I do have a stern warning and I'm right Jon.  
>> Please be very careful not to mischaracterized my words Jon.
>> 
>> You suggest I put things in JIRA's, then seem to suggest that I'd be 
>> lucky if anyone looked at it and did anything. That's what I figured too.
>> 
>> I don't appreciate the hostility.  You will understand more fully in 
>> the next post where I'm coming from.  Try to keep the conversation civilized.
>> I'm trying or at least so you understand I think what I'm doing is 
>> saving your gig and mine.  I really like a lot of people is this group.
>> 
>> I've come to a preliminary assessment on things.  Soon the cloud will 
>> clear or I'll be gone.  Don't worry.  I'm a very peaceful person and 
>> like you I am driven by real important projects that I feel compelled 
>> to work on for the good of others.  I don't have time for people to 
>> hand hold a database and I can't get stuck with my projects on the wrong 
>> stuff.
>> 
>> Kenneth Brotman
>> 
>> 
>> -Original Message-
>> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon 
>> Haddad
>> Sent: Wednesday, February 21, 2018 12:44 PM
>> To: user@cassandra.apache.org
>> Cc: d...@cassandra.apache.org
>> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>> 
>> Ken,
>> 
>> Maybe it’s not clear how open source projects work, so let me try to 
>> explain.  There’s a bunch of us who either get paid by someone or 
>> volunteer on our free time.  The folks that get paid, (yay!) usually 
>> take direction on what the priorities are, and work on projects that 
>> directly affect our jobs.  That means that someone needs to care 
>> enough about the features you want to work on them, if you’re not going to 
>> do it yourself.
>> 
>> Now as others have said already, please put your list of demands in 
>> JIRA, if someone is interested, they will work on it.  You may need to 
>> contribute a little more than you’ve done already, be prepared to get 
>> involved if you actually want to to see something get done.  Perhap

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Jason Brown
Hi all,

I'd like to deescalate a bit here.

Since this is an Apache and an OSS project, contributions come in many
forms: code, speaking/advocacy, documentation, support, project management,
and so on. None of these things come for free.

Ken, I appreciate you bring up these usability topics; they are certainly
valid concerns. You've mentioned you are working on posting of some sort
that I think will amount to an enumerated list of the topics/issues you
feel need addressing. Some may be simple changes, some may be more
invasive, some we can consider implementing, some not. I look forward to a
positive discussion.

I think what would be best would be for you to complete that list and work
with the community, in a *positive and constructive manner*, towards
getting it done. That is certainly contributing, and contributing in a big
way: project management. Working with the community is going to be the most
beneficial path for everyone.

Ken, if you feel like you'd like some help getting such an initiative
going, and contributing substantively to it (not necessarily in terms of
code) please feel free to reach out to me directly (jasedbr...@gmail.com).

Hoping this leads somewhere positive, that benefits everyone,

-Jason



On Wed, Feb 21, 2018 at 2:53 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> Hi Akash,
>
> I get the part about outside work which is why in replying to Jeff Jirsa I
> was suggesting the big companies could justify taking it on easy enough and
> you know actually pay the people who would be working at it so those people
> could have a life.
>
> The part I don't get is the aversion to usability.  Isn't that what you
> think about when you are coding?  "Am I making this thing I'm building easy
> to use?"  If you were programming for me, we would be constantly talking
> about what we are building and how we can make things easier for users.  If
> I had to fight with a developer, architect or engineer about usability all
> the time, they would be gone and quick.  How do approach programming if you
> aren't trying to make things easy.
>
> Kenneth Brotman
>
> -Original Message-
> From: Akash Gangil [mailto:akashg1...@gmail.com]
> Sent: Wednesday, February 21, 2018 2:24 PM
> To: d...@cassandra.apache.org
> Cc: user@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> I would second Jon in the arguments he made. Contributing outside work is
> draining and really requires a lot of commitment. If someone requires
> features around usability etc, just pay for it, period.
>
> On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
> > Jon,
> >
> > Very sorry that you don't see the value of the time I'm taking for this.
> > I don't have demands; I do have a stern warning and I'm right Jon.
> > Please be very careful not to mischaracterized my words Jon.
> >
> > You suggest I put things in JIRA's, then seem to suggest that I'd be
> > lucky if anyone looked at it and did anything. That's what I figured too.
> >
> > I don't appreciate the hostility.  You will understand more fully in
> > the next post where I'm coming from.  Try to keep the conversation
> civilized.
> > I'm trying or at least so you understand I think what I'm doing is
> > saving your gig and mine.  I really like a lot of people is this group.
> >
> > I've come to a preliminary assessment on things.  Soon the cloud will
> > clear or I'll be gone.  Don't worry.  I'm a very peaceful person and
> > like you I am driven by real important projects that I feel compelled
> > to work on for the good of others.  I don't have time for people to
> > hand hold a database and I can't get stuck with my projects on the wrong
> stuff.
> >
> > Kenneth Brotman
> >
> >
> > -----Original Message-
> > From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon
> > Haddad
> > Sent: Wednesday, February 21, 2018 12:44 PM
> > To: user@cassandra.apache.org
> > Cc: d...@cassandra.apache.org
> > Subject: Re: Cassandra Needs to Grow Up by Version Five!
> >
> > Ken,
> >
> > Maybe it’s not clear how open source projects work, so let me try to
> > explain.  There’s a bunch of us who either get paid by someone or
> > volunteer on our free time.  The folks that get paid, (yay!) usually
> > take direction on what the priorities are, and work on projects that
> > directly affect our jobs.  That means that someone needs to care
> > enough about the features you want to work on them, if you’re not going
> to do it yourself.
> >
> > Now as others have said already, please put your list of demands in
> &g

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Jeff Jirsa
On Wed, Feb 21, 2018 at 2:53 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> Hi Akash,
>
> I get the part about outside work which is why in replying to Jeff Jirsa I
> was suggesting the big companies could justify taking it on easy enough and
> you know actually pay the people who would be working at it so those people
> could have a life.
>
> The part I don't get is the aversion to usability.  Isn't that what you
> think about when you are coding?  "Am I making this thing I'm building easy
> to use?"  If you were programming for me, we would be constantly talking
> about what we are building and how we can make things easier for users.  If
> I had to fight with a developer, architect or engineer about usability all
> the time, they would be gone and quick.  How do approach programming if you
> aren't trying to make things easy.
>


There's no aversion to usability, you're assuming things that just aren't
true. Nobody's against usability, we've just prioritized other things
HIGHER. We make those decisions in part by looking at open JIRAs and
determining what's asked for the most, what members of the community have
contributed, and then balance that against what we ourselves care about.
You're making a statement that it should be the top priority for the next
release, with no JIRA, and history of contributing (and indeed, no real
clear sign that you even understand the full extent of the database), no
sign that you're willing to do the work yourself, and making a ton of
assumptions about the level of effort and ROI.

I would love for Cassandra to be easier to use, I'm sure everyone does.
There's a dozen features I'd love to add if I had infinite budget and
infinite manpower. But what you're asking for is A LOT of effort and / or A
LOT of money, and you're assuming someone's going to step up and foot the
bill, but there's no real reason to believe that's the case.

In the mean time, everyone's spending hours replying to this thread that is
0% actionable. We would all have been objectively better off had everyone
ignored this thread and just spent 10 minutes writing some section of the
docs. So the next time I get the urge to reply, I'm just going to do that
instead.


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Brandon Williams
The only progress from this point is what Jon said: enumerate and detail
your issues in jira tickets.

On Wed, Feb 21, 2018 at 4:53 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> Hi Akash,
>
> I get the part about outside work which is why in replying to Jeff Jirsa I
> was suggesting the big companies could justify taking it on easy enough and
> you know actually pay the people who would be working at it so those people
> could have a life.
>
> The part I don't get is the aversion to usability.  Isn't that what you
> think about when you are coding?  "Am I making this thing I'm building easy
> to use?"  If you were programming for me, we would be constantly talking
> about what we are building and how we can make things easier for users.  If
> I had to fight with a developer, architect or engineer about usability all
> the time, they would be gone and quick.  How do approach programming if you
> aren't trying to make things easy.
>
> Kenneth Brotman
>
> -Original Message-
> From: Akash Gangil [mailto:akashg1...@gmail.com]
> Sent: Wednesday, February 21, 2018 2:24 PM
> To: d...@cassandra.apache.org
> Cc: user@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> I would second Jon in the arguments he made. Contributing outside work is
> draining and really requires a lot of commitment. If someone requires
> features around usability etc, just pay for it, period.
>
> On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
> > Jon,
> >
> > Very sorry that you don't see the value of the time I'm taking for this.
> > I don't have demands; I do have a stern warning and I'm right Jon.
> > Please be very careful not to mischaracterized my words Jon.
> >
> > You suggest I put things in JIRA's, then seem to suggest that I'd be
> > lucky if anyone looked at it and did anything. That's what I figured too.
> >
> > I don't appreciate the hostility.  You will understand more fully in
> > the next post where I'm coming from.  Try to keep the conversation
> civilized.
> > I'm trying or at least so you understand I think what I'm doing is
> > saving your gig and mine.  I really like a lot of people is this group.
> >
> > I've come to a preliminary assessment on things.  Soon the cloud will
> > clear or I'll be gone.  Don't worry.  I'm a very peaceful person and
> > like you I am driven by real important projects that I feel compelled
> > to work on for the good of others.  I don't have time for people to
> > hand hold a database and I can't get stuck with my projects on the wrong
> stuff.
> >
> > Kenneth Brotman
> >
> >
> > -----Original Message-
> > From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon
> > Haddad
> > Sent: Wednesday, February 21, 2018 12:44 PM
> > To: user@cassandra.apache.org
> > Cc: d...@cassandra.apache.org
> > Subject: Re: Cassandra Needs to Grow Up by Version Five!
> >
> > Ken,
> >
> > Maybe it’s not clear how open source projects work, so let me try to
> > explain.  There’s a bunch of us who either get paid by someone or
> > volunteer on our free time.  The folks that get paid, (yay!) usually
> > take direction on what the priorities are, and work on projects that
> > directly affect our jobs.  That means that someone needs to care
> > enough about the features you want to work on them, if you’re not going
> to do it yourself.
> >
> > Now as others have said already, please put your list of demands in
> > JIRA, if someone is interested, they will work on it.  You may need to
> > contribute a little more than you’ve done already, be prepared to get
> > involved if you actually want to to see something get done.  Perhaps
> > learning a little more about Cassandra’s internals and the people
> > involved will reveal some of the design decisions and priorities of the
> project.
> >
> > Third, you seem to be a little obsessed with market share.  While
> > market share is fun to talk about, *most* of us that are working on
> > and contributing to Cassandra do so because it does actually solve a
> > problem we have, and solves it reasonably well.  If some magic open
> > source DB appears out of no where and does everything you want
> > Cassandra to, and is bug free, keeps your data consistent,
> > automatically does backups, comes with really nice cert management, ad
> > hoc querying, amazing materialized views that are perfect, no caveats
> > to secondary indexes, and somehow still gives you linear scalability
> > without any

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Kenneth Brotman
Hi Akash,

I get the part about outside work which is why in replying to Jeff Jirsa I was 
suggesting the big companies could justify taking it on easy enough and you 
know actually pay the people who would be working at it so those people could 
have a life.

The part I don't get is the aversion to usability.  Isn't that what you think 
about when you are coding?  "Am I making this thing I'm building easy to use?"  
If you were programming for me, we would be constantly talking about what we 
are building and how we can make things easier for users.  If I had to fight 
with a developer, architect or engineer about usability all the time, they 
would be gone and quick.  How do approach programming if you aren't trying to 
make things easy.

Kenneth Brotman

-Original Message-
From: Akash Gangil [mailto:akashg1...@gmail.com] 
Sent: Wednesday, February 21, 2018 2:24 PM
To: d...@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

I would second Jon in the arguments he made. Contributing outside work is 
draining and really requires a lot of commitment. If someone requires features 
around usability etc, just pay for it, period.

On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman < 
kenbrot...@yahoo.com.invalid> wrote:

> Jon,
>
> Very sorry that you don't see the value of the time I'm taking for this.
> I don't have demands; I do have a stern warning and I'm right Jon.  
> Please be very careful not to mischaracterized my words Jon.
>
> You suggest I put things in JIRA's, then seem to suggest that I'd be 
> lucky if anyone looked at it and did anything. That's what I figured too.
>
> I don't appreciate the hostility.  You will understand more fully in 
> the next post where I'm coming from.  Try to keep the conversation civilized.
> I'm trying or at least so you understand I think what I'm doing is 
> saving your gig and mine.  I really like a lot of people is this group.
>
> I've come to a preliminary assessment on things.  Soon the cloud will 
> clear or I'll be gone.  Don't worry.  I'm a very peaceful person and 
> like you I am driven by real important projects that I feel compelled 
> to work on for the good of others.  I don't have time for people to 
> hand hold a database and I can't get stuck with my projects on the wrong 
> stuff.
>
> Kenneth Brotman
>
>
> -Original Message-
> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon 
> Haddad
> Sent: Wednesday, February 21, 2018 12:44 PM
> To: user@cassandra.apache.org
> Cc: d...@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> Ken,
>
> Maybe it’s not clear how open source projects work, so let me try to 
> explain.  There’s a bunch of us who either get paid by someone or 
> volunteer on our free time.  The folks that get paid, (yay!) usually 
> take direction on what the priorities are, and work on projects that 
> directly affect our jobs.  That means that someone needs to care 
> enough about the features you want to work on them, if you’re not going to do 
> it yourself.
>
> Now as others have said already, please put your list of demands in 
> JIRA, if someone is interested, they will work on it.  You may need to 
> contribute a little more than you’ve done already, be prepared to get 
> involved if you actually want to to see something get done.  Perhaps 
> learning a little more about Cassandra’s internals and the people 
> involved will reveal some of the design decisions and priorities of the 
> project.
>
> Third, you seem to be a little obsessed with market share.  While 
> market share is fun to talk about, *most* of us that are working on 
> and contributing to Cassandra do so because it does actually solve a 
> problem we have, and solves it reasonably well.  If some magic open 
> source DB appears out of no where and does everything you want 
> Cassandra to, and is bug free, keeps your data consistent, 
> automatically does backups, comes with really nice cert management, ad 
> hoc querying, amazing materialized views that are perfect, no caveats 
> to secondary indexes, and somehow still gives you linear scalability 
> without any mental overhead whatsoever then sure, people might start 
> using it.  And that’s actually OK, because if that happens we’ll all 
> be incredibly pumped out of our minds because we won’t have to work as 
> hard.  If on the slim chance that doesn’t manifest, those of us that 
> use Cassandra and are part of the community will keep working on the 
> things we care about, iterating, and improving things.  Maybe someone will 
> even take a look at your JIRA issues.
>
> Further filling the mailing list with your grievances will likely not 
> help you progress tow

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Akash Gangil
I would second Jon in the arguments he made. Contributing outside work is
draining and really requires a lot of commitment. If someone requires
features around usability etc, just pay for it, period.

On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> Jon,
>
> Very sorry that you don't see the value of the time I'm taking for this.
> I don't have demands; I do have a stern warning and I'm right Jon.  Please
> be very careful not to mischaracterized my words Jon.
>
> You suggest I put things in JIRA's, then seem to suggest that I'd be lucky
> if anyone looked at it and did anything. That's what I figured too.
>
> I don't appreciate the hostility.  You will understand more fully in the
> next post where I'm coming from.  Try to keep the conversation civilized.
> I'm trying or at least so you understand I think what I'm doing is saving
> your gig and mine.  I really like a lot of people is this group.
>
> I've come to a preliminary assessment on things.  Soon the cloud will
> clear or I'll be gone.  Don't worry.  I'm a very peaceful person and like
> you I am driven by real important projects that I feel compelled to work on
> for the good of others.  I don't have time for people to hand hold a
> database and I can't get stuck with my projects on the wrong stuff.
>
> Kenneth Brotman
>
>
> -Original Message-
> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon
> Haddad
> Sent: Wednesday, February 21, 2018 12:44 PM
> To: user@cassandra.apache.org
> Cc: d...@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> Ken,
>
> Maybe it’s not clear how open source projects work, so let me try to
> explain.  There’s a bunch of us who either get paid by someone or volunteer
> on our free time.  The folks that get paid, (yay!) usually take direction
> on what the priorities are, and work on projects that directly affect our
> jobs.  That means that someone needs to care enough about the features you
> want to work on them, if you’re not going to do it yourself.
>
> Now as others have said already, please put your list of demands in JIRA,
> if someone is interested, they will work on it.  You may need to contribute
> a little more than you’ve done already, be prepared to get involved if you
> actually want to to see something get done.  Perhaps learning a little more
> about Cassandra’s internals and the people involved will reveal some of the
> design decisions and priorities of the project.
>
> Third, you seem to be a little obsessed with market share.  While market
> share is fun to talk about, *most* of us that are working on and
> contributing to Cassandra do so because it does actually solve a problem we
> have, and solves it reasonably well.  If some magic open source DB appears
> out of no where and does everything you want Cassandra to, and is bug free,
> keeps your data consistent, automatically does backups, comes with really
> nice cert management, ad hoc querying, amazing materialized views that are
> perfect, no caveats to secondary indexes, and somehow still gives you
> linear scalability without any mental overhead whatsoever then sure, people
> might start using it.  And that’s actually OK, because if that happens
> we’ll all be incredibly pumped out of our minds because we won’t have to
> work as hard.  If on the slim chance that doesn’t manifest, those of us
> that use Cassandra and are part of the community will keep working on the
> things we care about, iterating, and improving things.  Maybe someone will
> even take a look at your JIRA issues.
>
> Further filling the mailing list with your grievances will likely not help
> you progress towards your goal of a Cassandra that’s easier to use, so I
> encourage you to try to be a little more productive and try to help rather
> than just complain, which is not constructive.  I did a quick search for
> your name on the mailing list, and I’ve seen very little from you, so to
> everyone’s who’s been around for a while and trying to help you it looks
> like you’re just some random dude asking for people to work for free on the
> things you’re asking for, without offering anything back in return.
>
> Jon
>
>
> > On Feb 21, 2018, at 11:56 AM, Kenneth Brotman
> <kenbrot...@yahoo.com.INVALID> wrote:
> >
> > Josh,
> >
> > To say nothing is indifference.  If you care about your community,
> sometimes don't you have to bring up a subject even though you know it's
> also temporarily adding some discomfort?
> >
> > As to opening a JIRA, I've got a very specific topic to try in mind
> now.  An easy one I'll work on and then announce.  Someone else will have
> to do the coding.  A

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Kenneth Brotman
Jon,

Very sorry that you don't see the value of the time I'm taking for this.  I 
don't have demands; I do have a stern warning and I'm right Jon.  Please be 
very careful not to mischaracterized my words Jon.

You suggest I put things in JIRA's, then seem to suggest that I'd be lucky if 
anyone looked at it and did anything. That's what I figured too.  

I don't appreciate the hostility.  You will understand more fully in the next 
post where I'm coming from.  Try to keep the conversation civilized.  I'm 
trying or at least so you understand I think what I'm doing is saving your gig 
and mine.  I really like a lot of people is this group.

I've come to a preliminary assessment on things.  Soon the cloud will clear or 
I'll be gone.  Don't worry.  I'm a very peaceful person and like you I am 
driven by real important projects that I feel compelled to work on for the good 
of others.  I don't have time for people to hand hold a database and I can't 
get stuck with my projects on the wrong stuff.  

Kenneth Brotman


-Original Message-
From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
Sent: Wednesday, February 21, 2018 12:44 PM
To: user@cassandra.apache.org
Cc: d...@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

Ken,

Maybe it’s not clear how open source projects work, so let me try to explain.  
There’s a bunch of us who either get paid by someone or volunteer on our free 
time.  The folks that get paid, (yay!) usually take direction on what the 
priorities are, and work on projects that directly affect our jobs.  That means 
that someone needs to care enough about the features you want to work on them, 
if you’re not going to do it yourself. 

Now as others have said already, please put your list of demands in JIRA, if 
someone is interested, they will work on it.  You may need to contribute a 
little more than you’ve done already, be prepared to get involved if you 
actually want to to see something get done.  Perhaps learning a little more 
about Cassandra’s internals and the people involved will reveal some of the 
design decisions and priorities of the project.  

Third, you seem to be a little obsessed with market share.  While market share 
is fun to talk about, *most* of us that are working on and contributing to 
Cassandra do so because it does actually solve a problem we have, and solves it 
reasonably well.  If some magic open source DB appears out of no where and does 
everything you want Cassandra to, and is bug free, keeps your data consistent, 
automatically does backups, comes with really nice cert management, ad hoc 
querying, amazing materialized views that are perfect, no caveats to secondary 
indexes, and somehow still gives you linear scalability without any mental 
overhead whatsoever then sure, people might start using it.  And that’s 
actually OK, because if that happens we’ll all be incredibly pumped out of our 
minds because we won’t have to work as hard.  If on the slim chance that 
doesn’t manifest, those of us that use Cassandra and are part of the community 
will keep working on the things we care about, iterating, and improving things. 
 Maybe someone will even take a look at your JIRA issues.  

Further filling the mailing list with your grievances will likely not help you 
progress towards your goal of a Cassandra that’s easier to use, so I encourage 
you to try to be a little more productive and try to help rather than just 
complain, which is not constructive.  I did a quick search for your name on the 
mailing list, and I’ve seen very little from you, so to everyone’s who’s been 
around for a while and trying to help you it looks like you’re just some random 
dude asking for people to work for free on the things you’re asking for, 
without offering anything back in return.

Jon


> On Feb 21, 2018, at 11:56 AM, Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
> wrote:
> 
> Josh,
> 
> To say nothing is indifference.  If you care about your community, sometimes 
> don't you have to bring up a subject even though you know it's also 
> temporarily adding some discomfort?  
> 
> As to opening a JIRA, I've got a very specific topic to try in mind now.  An 
> easy one I'll work on and then announce.  Someone else will have to do the 
> coding.  A year from now I would probably just knock it out to make sure it's 
> as easy as I expect it to be but to be honest, as I've been saying, I'm not 
> set up to do that right now.  I've barely looked at any Cassandra code; for 
> one; everyone on this list probably codes more than I do, secondly; and 
> lastly, it's a good one for someone that wants an easy one to start with: 
> vNodes.  I've already seen too many people seeking assistance with the vNode 
> setting.
> 
> And you can expect as others have been mentioning that there should be 
> similar ones on compaction, repair and backup. 
> 

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Jon Haddad
Ken,

Maybe it’s not clear how open source projects work, so let me try to explain.  
There’s a bunch of us who either get paid by someone or volunteer on our free 
time.  The folks that get paid, (yay!) usually take direction on what the 
priorities are, and work on projects that directly affect our jobs.  That means 
that someone needs to care enough about the features you want to work on them, 
if you’re not going to do it yourself. 

Now as others have said already, please put your list of demands in JIRA, if 
someone is interested, they will work on it.  You may need to contribute a 
little more than you’ve done already, be prepared to get involved if you 
actually want to to see something get done.  Perhaps learning a little more 
about Cassandra’s internals and the people involved will reveal some of the 
design decisions and priorities of the project.  

Third, you seem to be a little obsessed with market share.  While market share 
is fun to talk about, *most* of us that are working on and contributing to 
Cassandra do so because it does actually solve a problem we have, and solves it 
reasonably well.  If some magic open source DB appears out of no where and does 
everything you want Cassandra to, and is bug free, keeps your data consistent, 
automatically does backups, comes with really nice cert management, ad hoc 
querying, amazing materialized views that are perfect, no caveats to secondary 
indexes, and somehow still gives you linear scalability without any mental 
overhead whatsoever then sure, people might start using it.  And that’s 
actually OK, because if that happens we’ll all be incredibly pumped out of our 
minds because we won’t have to work as hard.  If on the slim chance that 
doesn’t manifest, those of us that use Cassandra and are part of the community 
will keep working on the things we care about, iterating, and improving things. 
 Maybe someone will even take a look at your JIRA issues.  

Further filling the mailing list with your grievances will likely not help you 
progress towards your goal of a Cassandra that’s easier to use, so I encourage 
you to try to be a little more productive and try to help rather than just 
complain, which is not constructive.  I did a quick search for your name on the 
mailing list, and I’ve seen very little from you, so to everyone’s who’s been 
around for a while and trying to help you it looks like you’re just some random 
dude asking for people to work for free on the things you’re asking for, 
without offering anything back in return.

Jon


> On Feb 21, 2018, at 11:56 AM, Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
> wrote:
> 
> Josh, 
> 
> To say nothing is indifference.  If you care about your community, sometimes 
> don't you have to bring up a subject even though you know it's also 
> temporarily adding some discomfort?  
> 
> As to opening a JIRA, I've got a very specific topic to try in mind now.  An 
> easy one I'll work on and then announce.  Someone else will have to do the 
> coding.  A year from now I would probably just knock it out to make sure it's 
> as easy as I expect it to be but to be honest, as I've been saying, I'm not 
> set up to do that right now.  I've barely looked at any Cassandra code; for 
> one; everyone on this list probably codes more than I do, secondly; and 
> lastly, it's a good one for someone that wants an easy one to start with: 
> vNodes.  I've already seen too many people seeking assistance with the vNode 
> setting.
> 
> And you can expect as others have been mentioning that there should be 
> similar ones on compaction, repair and backup. 
> 
> Microsoft knows poor usability gives them an easy market to take over. And 
> they make it easy to switch.
> 
> Beginning at 4:17 in the video, it says the following:
> 
>   "You don't need to worry about replica sets, quorum or read repair.  
> You can focus on writing correct application logic."
> 
> At 4:42, it says:
>   "Hopefully this gives you a quick idea of how seamlessly you can bring 
> your existing Cassandra applications to Azure Cosmos DB.  No code changes are 
> required.  It works with your favorite Cassandra tools and drivers including 
> for example native Cassandra driver for Spark. And it takes seconds to get 
> going, and it's elastically and globally scalable."
> 
> More to come,
> 
> Kenneth Brotman
> 
> -Original Message-
> From: Josh McKenzie [mailto:jmcken...@apache.org] 
> Sent: Wednesday, February 21, 2018 8:28 AM
> To: d...@cassandra.apache.org
> Cc: User
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
> 
> There's a disheartening amount of "here's where Cassandra is bad, and here's 
> what it needs to do for me for free" happening in this thread.
> 
> This is open-source software. Everyone is *strongly encour

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Durity, Sean R
It is instructive to listen to the concerns of new and existing users in order 
to improve a product like Cassandra, but I think the school yard taunt model 
isn’t the most effective.

In my experience with open and closed source databases, there are always things 
that could be improved. Many have a historical base in how the product evolved 
over time. A newcomer sees those as rough edges right away. In other cases, the 
database creators have often widened their scope to try and solve every data 
problem. This creates the complexity of too many configuration options, etc. 
Even the best RDBMS (Informix!) battled these kinds of issues.

Cassandra, though, introduced another angle of difficulty. In trying to relate 
to RDBMS users (pun intended), it often borrowed terminology to make it seem 
familiar. But they don’t work the same way or even solve the same problems. The 
classic example is secondary indexes. For RDBMS, they are very useful; for 
Cassandra, they are anathema (except for very narrow cases).

However, I think the shots at Cassandra are generally unfair. When I started 
working with it, the DataStax documentation was some of the best documentation 
I had seen on any project, especially an open source one. (If anything the 
cooling off between Apache Cassandra and DataStax may be the most serious 
misstep so far…) The more I learned about how Cassandra worked, the more I 
marveled at the clever combination of intricate solutions (gossip, merkle 
trees, compaction strategies, etc.) to solve specific data problems. This is a 
great product! It has given me lots of sleep-filled nights over the last 4+ 
years. My customers love it, once I explain what it should be used for (and 
what it shouldn’t). I applaud the contributors, whether coders or users. Thank 
you!

Finally, a note on backup. Backing up a distributed system is tough, but 
restores are even more complex (if you want no down-time, no extra disk space, 
point-in-time recovery, etc). If you want to investigate why it is a tough 
problem for Cassandra, go look at RecoverX from Datos IO. They have solved many 
of the problems, but it isn’t an easy task. You could ask people to try and 
recreate all that, or just point them to a working solution. If backup and 
recovery is required (and I would argue it isn’t always required), it is 
probably worth paying for.


Sean Durity
From: Josh McKenzie [mailto:jmcken...@apache.org]
Sent: Wednesday, February 21, 2018 11:28 AM
To: d...@cassandra.apache.org
Cc: User <user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Cassandra Needs to Grow Up by Version Five!

There's a disheartening amount of "here's where Cassandra is bad, and here's 
what it needs to do for me for free" happening in this thread.

This is open-source software. Everyone is *strongly encouraged* to submit a 
patch to move the needle on *any* of these things being complained about in 
this thread.

For the Apache 
Way<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apache.org_foundation_governance_=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=2rQSVEnngxWT4yH5056Hyg7HIoaXWYKxcndEyMQhGDU=rcKJB94vQnrbZaED-nzTrMFsTPedeCHopB8ch79XB7s=>
 to work, people need to step up and meaningfully contribute to a project to 
scratch their own itch instead of just waiting for a random 
corporation-subsidized engineer to happen to have interests that align with 
them and contribute that to the project.

Beating a dead horse for things everyone on the project knows are serious pain 
points is not productive.

On Wed, Feb 21, 2018 at 5:45 AM, Oleksandr Shulgin 
<oleksandr.shul...@zalando.de<mailto:oleksandr.shul...@zalando.de>> wrote:
On Mon, Feb 19, 2018 at 10:01 AM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid<mailto:kenbrot...@yahoo.com.invalid>> wrote:

>
> >> Cluster wide management should be a big theme in any next major release.
> >>
> >Na. Stability and testing should be a big theme in the next major release.
> >
>
> Double Na on that one Jeff.  I think you have a concern there about the
> need to test sufficiently to ensure the stability of the next major
> release.  That makes perfect sense.- for every release, especially the
> major ones.  Continuous improvement is not a phase of development for
> example.  CI should be in everything, in every phase.  Stability and
> testing a part of every release not just one.  A major release should be a
> nice step from the previous major release though.
>

I guess what Jeff refers to is the tick-tock release cycle experiment,
which has proven to be a complete disaster by popular opinion.

There's also the "materialized views" feature which failed to materialize
in the end (pun intended) and had to be declared experimental retroactively.

Another prominent example is incremental repair which was introduced as the
default option in 2.2 and now is

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread DuyHai Doan
So before buying any marketing claims from Microsoft or whoever, maybe
should you try to use it extensively ?

And talking about backup, have a look at DynamoDB:
http://i68.tinypic.com/n1b6yr.jpg

>From my POV, if a multi-billions company like Amazon doesn't get it right
or can't make it easy for end-user (without involving  an unwieldy Hadoop
machinery:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBPipeline.html),
what Cassandra offers in term of back-up restore is more than satisfactory




On Wed, Feb 21, 2018 at 8:56 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

>  Josh,
>
> To say nothing is indifference.  If you care about your community,
> sometimes don't you have to bring up a subject even though you know it's
> also temporarily adding some discomfort?
>
> As to opening a JIRA, I've got a very specific topic to try in mind now.
> An easy one I'll work on and then announce.  Someone else will have to do
> the coding.  A year from now I would probably just knock it out to make
> sure it's as easy as I expect it to be but to be honest, as I've been
> saying, I'm not set up to do that right now.  I've barely looked at any
> Cassandra code; for one; everyone on this list probably codes more than I
> do, secondly; and lastly, it's a good one for someone that wants an easy
> one to start with: vNodes.  I've already seen too many people seeking
> assistance with the vNode setting.
>
> And you can expect as others have been mentioning that there should be
> similar ones on compaction, repair and backup.
>
> Microsoft knows poor usability gives them an easy market to take over. And
> they make it easy to switch.
>
> Beginning at 4:17 in the video, it says the following:
>
> "You don't need to worry about replica sets, quorum or read
> repair.  You can focus on writing correct application logic."
>
> At 4:42, it says:
> "Hopefully this gives you a quick idea of how seamlessly you can
> bring your existing Cassandra applications to Azure Cosmos DB.  No code
> changes are required.  It works with your favorite Cassandra tools and
> drivers including for example native Cassandra driver for Spark. And it
> takes seconds to get going, and it's elastically and globally scalable."
>
> More to come,
>
> Kenneth Brotman
>
> -Original Message-
> From: Josh McKenzie [mailto:jmcken...@apache.org]
> Sent: Wednesday, February 21, 2018 8:28 AM
> To: d...@cassandra.apache.org
> Cc: User
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> There's a disheartening amount of "here's where Cassandra is bad, and
> here's what it needs to do for me for free" happening in this thread.
>
> This is open-source software. Everyone is *strongly encouraged* to submit
> a patch to move the needle on *any* of these things being complained about
> in this thread.
>
> For the Apache Way <https://www.apache.org/foundation/governance/> to
> work, people need to step up and meaningfully contribute to a project to
> scratch their own itch instead of just waiting for a random
> corporation-subsidized engineer to happen to have interests that align with
> them and contribute that to the project.
>
> Beating a dead horse for things everyone on the project knows are serious
> pain points is not productive.
>
> On Wed, Feb 21, 2018 at 5:45 AM, Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> > On Mon, Feb 19, 2018 at 10:01 AM, Kenneth Brotman <
> > kenbrot...@yahoo.com.invalid> wrote:
> >
> > >
> > > >> Cluster wide management should be a big theme in any next major
> > release.
> > > >>
> > > >Na. Stability and testing should be a big theme in the next major
> > release.
> > > >
> > >
> > > Double Na on that one Jeff.  I think you have a concern there about
> > > the need to test sufficiently to ensure the stability of the next
> > > major release.  That makes perfect sense.- for every release,
> > > especially the major ones.  Continuous improvement is not a phase of
> > > development for example.  CI should be in everything, in every
> > > phase.  Stability and testing a part of every release not just one.
> > > A major release should be
> > a
> > > nice step from the previous major release though.
> > >
> >
> > I guess what Jeff refers to is the tick-tock release cycle experiment,
> > which has proven to be a complete disaster by popular opinion.
> >
> > There's also the "materialized views" feature which failed to
> > materialize in the end (pun intended) and had to be declar

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Kenneth Brotman
 Josh, 

To say nothing is indifference.  If you care about your community, sometimes 
don't you have to bring up a subject even though you know it's also temporarily 
adding some discomfort?  

As to opening a JIRA, I've got a very specific topic to try in mind now.  An 
easy one I'll work on and then announce.  Someone else will have to do the 
coding.  A year from now I would probably just knock it out to make sure it's 
as easy as I expect it to be but to be honest, as I've been saying, I'm not set 
up to do that right now.  I've barely looked at any Cassandra code; for one; 
everyone on this list probably codes more than I do, secondly; and lastly, it's 
a good one for someone that wants an easy one to start with: vNodes.  I've 
already seen too many people seeking assistance with the vNode setting.

And you can expect as others have been mentioning that there should be similar 
ones on compaction, repair and backup. 

Microsoft knows poor usability gives them an easy market to take over. And they 
make it easy to switch.

Beginning at 4:17 in the video, it says the following:

"You don't need to worry about replica sets, quorum or read repair.  
You can focus on writing correct application logic."

At 4:42, it says:
"Hopefully this gives you a quick idea of how seamlessly you can bring 
your existing Cassandra applications to Azure Cosmos DB.  No code changes are 
required.  It works with your favorite Cassandra tools and drivers including 
for example native Cassandra driver for Spark. And it takes seconds to get 
going, and it's elastically and globally scalable."

More to come,

Kenneth Brotman

-Original Message-
From: Josh McKenzie [mailto:jmcken...@apache.org] 
Sent: Wednesday, February 21, 2018 8:28 AM
To: d...@cassandra.apache.org
Cc: User
Subject: Re: Cassandra Needs to Grow Up by Version Five!

There's a disheartening amount of "here's where Cassandra is bad, and here's 
what it needs to do for me for free" happening in this thread.

This is open-source software. Everyone is *strongly encouraged* to submit a 
patch to move the needle on *any* of these things being complained about in 
this thread.

For the Apache Way <https://www.apache.org/foundation/governance/> to work, 
people need to step up and meaningfully contribute to a project to scratch 
their own itch instead of just waiting for a random corporation-subsidized 
engineer to happen to have interests that align with them and contribute that 
to the project.

Beating a dead horse for things everyone on the project knows are serious pain 
points is not productive.

On Wed, Feb 21, 2018 at 5:45 AM, Oleksandr Shulgin < 
oleksandr.shul...@zalando.de> wrote:

> On Mon, Feb 19, 2018 at 10:01 AM, Kenneth Brotman < 
> kenbrot...@yahoo.com.invalid> wrote:
>
> >
> > >> Cluster wide management should be a big theme in any next major
> release.
> > >>
> > >Na. Stability and testing should be a big theme in the next major
> release.
> > >
> >
> > Double Na on that one Jeff.  I think you have a concern there about 
> > the need to test sufficiently to ensure the stability of the next 
> > major release.  That makes perfect sense.- for every release, 
> > especially the major ones.  Continuous improvement is not a phase of 
> > development for example.  CI should be in everything, in every 
> > phase.  Stability and testing a part of every release not just one.  
> > A major release should be
> a
> > nice step from the previous major release though.
> >
>
> I guess what Jeff refers to is the tick-tock release cycle experiment, 
> which has proven to be a complete disaster by popular opinion.
>
> There's also the "materialized views" feature which failed to 
> materialize in the end (pun intended) and had to be declared 
> experimental retroactively.
>
> Another prominent example is incremental repair which was introduced 
> as the default option in 2.2 and now is not recommended to use because 
> of so many corner cases where it can fail.  So again experimental as an 
> afterthought.
>
> Not to mention that even if you are aware of the default incremental 
> and go with full repair instead, you're still up for a sad surprise:
> anti-compaction will be triggered despite the "full" repair.  Because 
> anti-compaction is only disabled in case of sub-range repair (don't 
> ask why), so you need to use something advanced like Reaper if you 
> want to avoid that.  I don't think you'll ever find this in the documentation.
>
> Honestly, for an eventually-consistent system like Cassandra 
> anti-entropy repair is one of the most important pieces to get right.  
> And Cassandra fails really badly on that one: the feature is not 
> really well d

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Josh McKenzie
There's a disheartening amount of "here's where Cassandra is bad, and
here's what it needs to do for me for free" happening in this thread.

This is open-source software. Everyone is *strongly encouraged* to submit a
patch to move the needle on *any* of these things being complained about in
this thread.

For the Apache Way  to work,
people need to step up and meaningfully contribute to a project to scratch
their own itch instead of just waiting for a random corporation-subsidized
engineer to happen to have interests that align with them and contribute
that to the project.

Beating a dead horse for things everyone on the project knows are serious
pain points is not productive.

On Wed, Feb 21, 2018 at 5:45 AM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Mon, Feb 19, 2018 at 10:01 AM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
> >
> > >> Cluster wide management should be a big theme in any next major
> release.
> > >>
> > >Na. Stability and testing should be a big theme in the next major
> release.
> > >
> >
> > Double Na on that one Jeff.  I think you have a concern there about the
> > need to test sufficiently to ensure the stability of the next major
> > release.  That makes perfect sense.- for every release, especially the
> > major ones.  Continuous improvement is not a phase of development for
> > example.  CI should be in everything, in every phase.  Stability and
> > testing a part of every release not just one.  A major release should be
> a
> > nice step from the previous major release though.
> >
>
> I guess what Jeff refers to is the tick-tock release cycle experiment,
> which has proven to be a complete disaster by popular opinion.
>
> There's also the "materialized views" feature which failed to materialize
> in the end (pun intended) and had to be declared experimental
> retroactively.
>
> Another prominent example is incremental repair which was introduced as the
> default option in 2.2 and now is not recommended to use because of so many
> corner cases where it can fail.  So again experimental as an afterthought.
>
> Not to mention that even if you are aware of the default incremental and go
> with full repair instead, you're still up for a sad surprise:
> anti-compaction will be triggered despite the "full" repair.  Because
> anti-compaction is only disabled in case of sub-range repair (don't ask
> why), so you need to use something advanced like Reaper if you want to
> avoid that.  I don't think you'll ever find this in the documentation.
>
> Honestly, for an eventually-consistent system like Cassandra anti-entropy
> repair is one of the most important pieces to get right.  And Cassandra
> fails really badly on that one: the feature is not really well designed,
> poorly implemented and under-documented.
>
> In a summary, IMO, Cassandra is a poor implementation of some good ideas.
> It is a collection of hacks, not features.  They sometimes play together
> accidentally, and rarely by design.
>
> Regards,
> --
> Alex
>


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Oleksandr Shulgin
On Mon, Feb 19, 2018 at 10:01 AM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

>
> >> Cluster wide management should be a big theme in any next major release.
> >>
> >Na. Stability and testing should be a big theme in the next major release.
> >
>
> Double Na on that one Jeff.  I think you have a concern there about the
> need to test sufficiently to ensure the stability of the next major
> release.  That makes perfect sense.- for every release, especially the
> major ones.  Continuous improvement is not a phase of development for
> example.  CI should be in everything, in every phase.  Stability and
> testing a part of every release not just one.  A major release should be a
> nice step from the previous major release though.
>

I guess what Jeff refers to is the tick-tock release cycle experiment,
which has proven to be a complete disaster by popular opinion.

There's also the "materialized views" feature which failed to materialize
in the end (pun intended) and had to be declared experimental retroactively.

Another prominent example is incremental repair which was introduced as the
default option in 2.2 and now is not recommended to use because of so many
corner cases where it can fail.  So again experimental as an afterthought.

Not to mention that even if you are aware of the default incremental and go
with full repair instead, you're still up for a sad surprise:
anti-compaction will be triggered despite the "full" repair.  Because
anti-compaction is only disabled in case of sub-range repair (don't ask
why), so you need to use something advanced like Reaper if you want to
avoid that.  I don't think you'll ever find this in the documentation.

Honestly, for an eventually-consistent system like Cassandra anti-entropy
repair is one of the most important pieces to get right.  And Cassandra
fails really badly on that one: the feature is not really well designed,
poorly implemented and under-documented.

In a summary, IMO, Cassandra is a poor implementation of some good ideas.
It is a collection of hacks, not features.  They sometimes play together
accidentally, and rarely by design.

Regards,
--
Alex


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Ben Slater
I’ve been bitting my tongue because I don’t normally like to directly plug
our service on the mailing list but if you’re going to compare Cassandra to
a full managed service from Microsoft then you really should check out
Instaclustr (www.instaclustr.com) and you’ll find that we take care of many
of this issues you have raised is just the same way that Microsoft does
with CosmosDB (ie hiding them behind our managed service tooling).

Cheers
Ben

On Wed, 21 Feb 2018 at 19:22 DuyHai Doan <doanduy...@gmail.com> wrote:

> For UI and interactive data exploration there is already the Cassandra
> interpreter for Apache Zeppelin that is more than decent for the job
>
> On Wed, Feb 21, 2018 at 9:19 AM, Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
>
>> But what does this video really show? That Microsoft managed to run
>> Cassandra as a SaaS product with nice UI?
>> Google did that years ago with BigTable and Amazon with DynamoDB.
>>
>> I agree that we need more tools, but not so much for querying (although
>> that would also help a bit), but just in general the project feels
>> unapproachable right now.
>> Besides the excellent DataStax documentation there is little best
>> practice knowledge about how to operate and provision Cassandra clusters.
>> Having some recipes for Chef, Puppet or Ansible that show the most common
>> settings (or some Cloudfoundry/GCP Templates or Helm Charts) would be
>> really useful.
>> Also a list of all the projects that Cassandra goes well with (like TLP
>> Reaper and and Netflix's Priam etc..)
>>
>> greetings Daniel
>>
>> On Wed, 21 Feb 2018 at 07:23 Kenneth Brotman <kenbrot...@yahoo.com.invalid>
>> wrote:
>>
>>> If you watch this video through you'll see why usability is so
>>> important.  You can't ignore usability issues.
>>>
>>> Cassandra does not exist in a vacuum.  The competitors are world class.
>>>
>>> The video is on the New Cassandra API for Azure Cosmos DB:
>>> https://www.youtube.com/watch?v=1Sf4McGN1AQ
>>>
>>> Kenneth Brotman
>>>
>>> -----Original Message-----
>>> From: Daniel Hölbling-Inzko [mailto:daniel.hoelbling-in...@bitmovin.com]
>>> Sent: Tuesday, February 20, 2018 1:28 AM
>>> To: user@cassandra.apache.org; James Briggs
>>> Cc: d...@cassandra.apache.org
>>> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>>>
>>> Hi,
>>>
>>> I have to add my own two cents here as the main thing that keeps me from
>>> really running Cassandra is the amount of pain running it incurs.
>>> Not so much because it's actually painful but because the tools are so
>>> different and the documentation and best practices are scattered across a
>>> dozen outdated DataStax articles and this mailing list etc.. We've been
>>> hesitant (although our use case is perfect for using Cassandra) to deploy
>>> Cassandra to any critical systems as even after a year of running it we
>>> still don't have the operational experience to confidently run critical
>>> systems with it.
>>>
>>> Simple things like a foolproof / safe cluster-wide S3 Backup (like
>>> Elasticsearch has it) would for example solve a TON of issues for new
>>> people. I don't need it auto-scheduled or something, but having to
>>> configure cron jobs across the whole cluster is a pain in the ass for small
>>> teams.
>>> To be honest, even the way snapshots are done right now is already super
>>> painful. Every other system I operated so far will just create one backup
>>> folder I can export, in C* the Backup is scattered across a bunch of
>>> different Keyspace folders etc.. needless to say that it took a while until
>>> I trusted my backup scripts fully.
>>>
>>> And especially for a Database I believe Backup/Restore needs to be a
>>> non-issue that's documented front and center. If not smaller teams just
>>> don't have the resources to dedicate to learning and building the tools
>>> around it.
>>>
>>> Now that the team is getting larger we could spare the resources to
>>> operate these things, but switching from a well-understood RDBMs schema to
>>> Cassandra is now incredibly hard and will probably take years.
>>>
>>> greetings Daniel
>>>
>>> On Tue, 20 Feb 2018 at 05:56 James Briggs <james.bri...@yahoo.com
>>> .invalid>
>>> wrote:
>>>
>>> > Kenneth:
>>> >
>>> > What you said is not wrong.
>>> >
>>&g

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread DuyHai Doan
For UI and interactive data exploration there is already the Cassandra
interpreter for Apache Zeppelin that is more than decent for the job

On Wed, Feb 21, 2018 at 9:19 AM, Daniel Hölbling-Inzko <
daniel.hoelbling-in...@bitmovin.com> wrote:

> But what does this video really show? That Microsoft managed to run
> Cassandra as a SaaS product with nice UI?
> Google did that years ago with BigTable and Amazon with DynamoDB.
>
> I agree that we need more tools, but not so much for querying (although
> that would also help a bit), but just in general the project feels
> unapproachable right now.
> Besides the excellent DataStax documentation there is little best practice
> knowledge about how to operate and provision Cassandra clusters.
> Having some recipes for Chef, Puppet or Ansible that show the most common
> settings (or some Cloudfoundry/GCP Templates or Helm Charts) would be
> really useful.
> Also a list of all the projects that Cassandra goes well with (like TLP
> Reaper and and Netflix's Priam etc..)
>
> greetings Daniel
>
> On Wed, 21 Feb 2018 at 07:23 Kenneth Brotman <kenbrot...@yahoo.com.invalid>
> wrote:
>
>> If you watch this video through you'll see why usability is so
>> important.  You can't ignore usability issues.
>>
>> Cassandra does not exist in a vacuum.  The competitors are world class.
>>
>> The video is on the New Cassandra API for Azure Cosmos DB:
>> https://www.youtube.com/watch?v=1Sf4McGN1AQ
>>
>> Kenneth Brotman
>>
>> -Original Message-
>> From: Daniel Hölbling-Inzko [mailto:daniel.hoelbling-in...@bitmovin.com]
>> Sent: Tuesday, February 20, 2018 1:28 AM
>> To: user@cassandra.apache.org; James Briggs
>> Cc: d...@cassandra.apache.org
>> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>>
>> Hi,
>>
>> I have to add my own two cents here as the main thing that keeps me from
>> really running Cassandra is the amount of pain running it incurs.
>> Not so much because it's actually painful but because the tools are so
>> different and the documentation and best practices are scattered across a
>> dozen outdated DataStax articles and this mailing list etc.. We've been
>> hesitant (although our use case is perfect for using Cassandra) to deploy
>> Cassandra to any critical systems as even after a year of running it we
>> still don't have the operational experience to confidently run critical
>> systems with it.
>>
>> Simple things like a foolproof / safe cluster-wide S3 Backup (like
>> Elasticsearch has it) would for example solve a TON of issues for new
>> people. I don't need it auto-scheduled or something, but having to
>> configure cron jobs across the whole cluster is a pain in the ass for small
>> teams.
>> To be honest, even the way snapshots are done right now is already super
>> painful. Every other system I operated so far will just create one backup
>> folder I can export, in C* the Backup is scattered across a bunch of
>> different Keyspace folders etc.. needless to say that it took a while until
>> I trusted my backup scripts fully.
>>
>> And especially for a Database I believe Backup/Restore needs to be a
>> non-issue that's documented front and center. If not smaller teams just
>> don't have the resources to dedicate to learning and building the tools
>> around it.
>>
>> Now that the team is getting larger we could spare the resources to
>> operate these things, but switching from a well-understood RDBMs schema to
>> Cassandra is now incredibly hard and will probably take years.
>>
>> greetings Daniel
>>
>> On Tue, 20 Feb 2018 at 05:56 James Briggs <james.bri...@yahoo.com.
>> invalid>
>> wrote:
>>
>> > Kenneth:
>> >
>> > What you said is not wrong.
>> >
>> > Vertica and Riak are examples of distributed databases that don't
>> > require hand-holding.
>> >
>> > Cassandra is for Java-programmer DIYers, or more often Datastax
>> > clients, at this point.
>> > Thanks, James.
>> >
>> > --
>> > *From:* Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
>> > *To:* user@cassandra.apache.org
>> > *Cc:* d...@cassandra.apache.org
>> > *Sent:* Monday, February 19, 2018 4:56 PM
>> >
>> > *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
>> >
>> > Jeff, you helped me figure out what I was missing.  It just took me a
>> > day to digest what you wrote.  I’m coming over from another type of
>> > engineering.  I didn’t know and it’s not really 

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Daniel Hölbling-Inzko
But what does this video really show? That Microsoft managed to run
Cassandra as a SaaS product with nice UI?
Google did that years ago with BigTable and Amazon with DynamoDB.

I agree that we need more tools, but not so much for querying (although
that would also help a bit), but just in general the project feels
unapproachable right now.
Besides the excellent DataStax documentation there is little best practice
knowledge about how to operate and provision Cassandra clusters.
Having some recipes for Chef, Puppet or Ansible that show the most common
settings (or some Cloudfoundry/GCP Templates or Helm Charts) would be
really useful.
Also a list of all the projects that Cassandra goes well with (like TLP
Reaper and and Netflix's Priam etc..)

greetings Daniel

On Wed, 21 Feb 2018 at 07:23 Kenneth Brotman <kenbrot...@yahoo.com.invalid>
wrote:

> If you watch this video through you'll see why usability is so important.
> You can't ignore usability issues.
>
> Cassandra does not exist in a vacuum.  The competitors are world class.
>
> The video is on the New Cassandra API for Azure Cosmos DB:
> https://www.youtube.com/watch?v=1Sf4McGN1AQ
>
> Kenneth Brotman
>
> -Original Message-
> From: Daniel Hölbling-Inzko [mailto:daniel.hoelbling-in...@bitmovin.com]
> Sent: Tuesday, February 20, 2018 1:28 AM
> To: user@cassandra.apache.org; James Briggs
> Cc: d...@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> Hi,
>
> I have to add my own two cents here as the main thing that keeps me from
> really running Cassandra is the amount of pain running it incurs.
> Not so much because it's actually painful but because the tools are so
> different and the documentation and best practices are scattered across a
> dozen outdated DataStax articles and this mailing list etc.. We've been
> hesitant (although our use case is perfect for using Cassandra) to deploy
> Cassandra to any critical systems as even after a year of running it we
> still don't have the operational experience to confidently run critical
> systems with it.
>
> Simple things like a foolproof / safe cluster-wide S3 Backup (like
> Elasticsearch has it) would for example solve a TON of issues for new
> people. I don't need it auto-scheduled or something, but having to
> configure cron jobs across the whole cluster is a pain in the ass for small
> teams.
> To be honest, even the way snapshots are done right now is already super
> painful. Every other system I operated so far will just create one backup
> folder I can export, in C* the Backup is scattered across a bunch of
> different Keyspace folders etc.. needless to say that it took a while until
> I trusted my backup scripts fully.
>
> And especially for a Database I believe Backup/Restore needs to be a
> non-issue that's documented front and center. If not smaller teams just
> don't have the resources to dedicate to learning and building the tools
> around it.
>
> Now that the team is getting larger we could spare the resources to
> operate these things, but switching from a well-understood RDBMs schema to
> Cassandra is now incredibly hard and will probably take years.
>
> greetings Daniel
>
> On Tue, 20 Feb 2018 at 05:56 James Briggs <james.bri...@yahoo.com.invalid>
> wrote:
>
> > Kenneth:
> >
> > What you said is not wrong.
> >
> > Vertica and Riak are examples of distributed databases that don't
> > require hand-holding.
> >
> > Cassandra is for Java-programmer DIYers, or more often Datastax
> > clients, at this point.
> > Thanks, James.
> >
> > ----------
> > *From:* Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
> > *To:* user@cassandra.apache.org
> > *Cc:* d...@cassandra.apache.org
> > *Sent:* Monday, February 19, 2018 4:56 PM
> >
> > *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
> >
> > Jeff, you helped me figure out what I was missing.  It just took me a
> > day to digest what you wrote.  I’m coming over from another type of
> > engineering.  I didn’t know and it’s not really documented.  Cassandra
> > runs in a data center.  Now days that means the nodes are going to be
> > in managed containers, Docker containers, managed by Kerbernetes,
> > Meso or something, and for that reason anyone operating Cassandra in a
> > real world setting would not encounter the issues I raised in the way I
> described.
> >
> > Shouldn’t the architectural diagrams people reference indicate that in
> > some way?  That would have help me.
> >
> > Kenneth Brotman
> >
> > *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com]
> > *Sent:* Monday, February 19, 2018

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Prasenjit Sarkar
Jeff,

I don't think you can push the topic of usability back to developers by
asking them to open JIRAs. It is upon the technical leaders of the
Cassandra community to take the initiative in this regard. We can argue
back and forth on the dynamics of open source projects, but the usability
concerns of Cassandra is a reality that can not be ignored.

Prasenjit

PS My views, not those of my employer

On Tue, Feb 20, 2018 at 10:22 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> If you watch this video through you'll see why usability is so important.
> You can't ignore usability issues.
>
> Cassandra does not exist in a vacuum.  The competitors are world class.
>
> The video is on the New Cassandra API for Azure Cosmos DB:
> https://www.youtube.com/watch?v=1Sf4McGN1AQ
>
> Kenneth Brotman
>
> -Original Message-
> From: Daniel Hölbling-Inzko [mailto:daniel.hoelbling-in...@bitmovin.com]
> Sent: Tuesday, February 20, 2018 1:28 AM
> To: user@cassandra.apache.org; James Briggs
> Cc: d...@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> Hi,
>
> I have to add my own two cents here as the main thing that keeps me from
> really running Cassandra is the amount of pain running it incurs.
> Not so much because it's actually painful but because the tools are so
> different and the documentation and best practices are scattered across a
> dozen outdated DataStax articles and this mailing list etc.. We've been
> hesitant (although our use case is perfect for using Cassandra) to deploy
> Cassandra to any critical systems as even after a year of running it we
> still don't have the operational experience to confidently run critical
> systems with it.
>
> Simple things like a foolproof / safe cluster-wide S3 Backup (like
> Elasticsearch has it) would for example solve a TON of issues for new
> people. I don't need it auto-scheduled or something, but having to
> configure cron jobs across the whole cluster is a pain in the ass for small
> teams.
> To be honest, even the way snapshots are done right now is already super
> painful. Every other system I operated so far will just create one backup
> folder I can export, in C* the Backup is scattered across a bunch of
> different Keyspace folders etc.. needless to say that it took a while until
> I trusted my backup scripts fully.
>
> And especially for a Database I believe Backup/Restore needs to be a
> non-issue that's documented front and center. If not smaller teams just
> don't have the resources to dedicate to learning and building the tools
> around it.
>
> Now that the team is getting larger we could spare the resources to
> operate these things, but switching from a well-understood RDBMs schema to
> Cassandra is now incredibly hard and will probably take years.
>
> greetings Daniel
>
> On Tue, 20 Feb 2018 at 05:56 James Briggs <james.bri...@yahoo.com.invalid>
> wrote:
>
> > Kenneth:
> >
> > What you said is not wrong.
> >
> > Vertica and Riak are examples of distributed databases that don't
> > require hand-holding.
> >
> > Cassandra is for Java-programmer DIYers, or more often Datastax
> > clients, at this point.
> > Thanks, James.
> >
> > ----------
> > *From:* Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
> > *To:* user@cassandra.apache.org
> > *Cc:* d...@cassandra.apache.org
> > *Sent:* Monday, February 19, 2018 4:56 PM
> >
> > *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
> >
> > Jeff, you helped me figure out what I was missing.  It just took me a
> > day to digest what you wrote.  I’m coming over from another type of
> > engineering.  I didn’t know and it’s not really documented.  Cassandra
> > runs in a data center.  Now days that means the nodes are going to be
> > in managed containers, Docker containers, managed by Kerbernetes,
> > Meso or something, and for that reason anyone operating Cassandra in a
> > real world setting would not encounter the issues I raised in the way I
> described.
> >
> > Shouldn’t the architectural diagrams people reference indicate that in
> > some way?  That would have help me.
> >
> > Kenneth Brotman
> >
> > *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com]
> > *Sent:* Monday, February 19, 2018 10:43 AM
> > *To:* 'user@cassandra.apache.org'
> > *Cc:* 'd...@cassandra.apache.org'
> > *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
> >
> > Well said.  Very fair.  I wouldn’t mind hearing from others still
> > You’re a good guy!
> >
> > Kenneth Brotman
> >
> > *From:* Jeff Jirsa [mailto:jji...@g

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Kenneth Brotman
If you watch this video through you'll see why usability is so important.  You 
can't ignore usability issues.  

Cassandra does not exist in a vacuum.  The competitors are world class.  

The video is on the New Cassandra API for Azure Cosmos DB:
https://www.youtube.com/watch?v=1Sf4McGN1AQ

Kenneth Brotman

-Original Message-
From: Daniel Hölbling-Inzko [mailto:daniel.hoelbling-in...@bitmovin.com] 
Sent: Tuesday, February 20, 2018 1:28 AM
To: user@cassandra.apache.org; James Briggs
Cc: d...@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

Hi,

I have to add my own two cents here as the main thing that keeps me from really 
running Cassandra is the amount of pain running it incurs.
Not so much because it's actually painful but because the tools are so 
different and the documentation and best practices are scattered across a dozen 
outdated DataStax articles and this mailing list etc.. We've been hesitant 
(although our use case is perfect for using Cassandra) to deploy Cassandra to 
any critical systems as even after a year of running it we still don't have the 
operational experience to confidently run critical systems with it.

Simple things like a foolproof / safe cluster-wide S3 Backup (like 
Elasticsearch has it) would for example solve a TON of issues for new people. I 
don't need it auto-scheduled or something, but having to configure cron jobs 
across the whole cluster is a pain in the ass for small teams.
To be honest, even the way snapshots are done right now is already super 
painful. Every other system I operated so far will just create one backup 
folder I can export, in C* the Backup is scattered across a bunch of different 
Keyspace folders etc.. needless to say that it took a while until I trusted my 
backup scripts fully.

And especially for a Database I believe Backup/Restore needs to be a non-issue 
that's documented front and center. If not smaller teams just don't have the 
resources to dedicate to learning and building the tools around it.

Now that the team is getting larger we could spare the resources to operate 
these things, but switching from a well-understood RDBMs schema to Cassandra is 
now incredibly hard and will probably take years.

greetings Daniel

On Tue, 20 Feb 2018 at 05:56 James Briggs <james.bri...@yahoo.com.invalid>
wrote:

> Kenneth:
>
> What you said is not wrong.
>
> Vertica and Riak are examples of distributed databases that don't 
> require hand-holding.
>
> Cassandra is for Java-programmer DIYers, or more often Datastax 
> clients, at this point.
> Thanks, James.
>
> --
> *From:* Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
> *To:* user@cassandra.apache.org
> *Cc:* d...@cassandra.apache.org
> *Sent:* Monday, February 19, 2018 4:56 PM
>
> *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
>
> Jeff, you helped me figure out what I was missing.  It just took me a 
> day to digest what you wrote.  I’m coming over from another type of 
> engineering.  I didn’t know and it’s not really documented.  Cassandra 
> runs in a data center.  Now days that means the nodes are going to be 
> in managed containers, Docker containers, managed by Kerbernetes,  
> Meso or something, and for that reason anyone operating Cassandra in a 
> real world setting would not encounter the issues I raised in the way I 
> described.
>
> Shouldn’t the architectural diagrams people reference indicate that in 
> some way?  That would have help me.
>
> Kenneth Brotman
>
> *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com]
> *Sent:* Monday, February 19, 2018 10:43 AM
> *To:* 'user@cassandra.apache.org'
> *Cc:* 'd...@cassandra.apache.org'
> *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
>
> Well said.  Very fair.  I wouldn’t mind hearing from others still  
> You’re a good guy!
>
> Kenneth Brotman
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com <jji...@gmail.com>]
> *Sent:* Monday, February 19, 2018 9:10 AM
> *To:* cassandra
> *Cc:* Cassandra DEV
> *Subject:* Re: Cassandra Needs to Grow Up by Version Five!
>
> There's a lot of things below I disagree with, but it's ok. I 
> convinced myself not to nit-pick every point.
>
> https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of 
> Stefan's work with cert management
>
> Beyond that, I encourage you to do what Michael suggested: open JIRAs 
> for things you care strongly about, work on them if you have time. 
> Sometime this year we'll schedule a NGCC (Next Generation Cassandra 
> Conference) where we talk about future project work and direction, I 
> encourage you to attend if you're able (I encourage anyone who cares 
> about the direction of Cassandra to attend, it's probably be either 
> free or very low cost, just to cover a venue an

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Russell Bateman
I ask Cassandra to be a database that is high-performance, highly 
scalable with no single point of failure. Anything "cool" that's added 
beyond must be added only as a separate, optional ring around Cassandra 
and must not get in the way of my usage.


Yes, I would like some help with some of what's listed here, but you 
should understand that most shops adopting Cassandra are already going 
to have DevOps/database management personnel, expertise, methods, 
protocols and, in some instances, tools already in place. Even the small 
shop I work in has guys saddled with taking care of Cassandra (I'm a 
developer and not one of these guys) and seem not to share these 
concerns because they've already got it covered (like the specific YAML 
configuration complaint).


If there were an option or two I'd like to see, one would be the ability 
to duplicate data centers exactly (as part of what we stipulate when 
creating our KEYSPACE), but this is probably something I want because of 
what we were doing up until or what we wanted when we adopted Cassandra 
for our future product direction. I would also like to see an option in 
Cassandra configuration for absolutelylocking out access to certain 
commands (like DROP TABLE, DROP INDEXand DELETE).


From my point of view as a developer, I've had to do many of these 
things also for MongoDB, PostgreSQL, MySQL and other databases over my 
career.


I'm not criticizing these concerns and suggestions. I'm just pointing 
out that, in my opinion, not everything said here is in the realm of, 
"duh, Cassandra needs to grow up."


There's so much right about Cassandra, from the great, unequaled 
technology to the very liberal licensing model without which I could not 
be here.


Russ Bateman


On 02/18/2018 10:39 PM, Kenneth Brotman wrote:


Cassandra feels like an unfinished program to me.  The problem is not 
that it’s open source or cutting edge.  It’s an open source cutting 
edge program that lacks some of its basic functionality.  We are all 
stuck addressing fundamental mechanical tasks for Cassandra because 
the basic code that would do that part has not been contributed yet.


Ease of use issues need to be given much more attention.  For an 
administrator, the ease of use of Cassandra is very poor.


Furthermore, currently Cassandra is an idiot.  We have to do 
everything for Cassandra. Contrast that with the fact that we are in 
the dawn of artificial intelligence.


Software exists to automate tasks for humans, not mechanize humans to 
administer tasks for a database.  I’m an engineering type.  My job is 
to apply science and technology to solve real world problems.  And 
that’s where I need an organization’s I.T. talent to focus; not in 
crank starting an unfinished database.


For example, I should be able to go to any node, replace the 
Cassandra.yaml file and have a prompt on the display ask me if I want 
to update all the yaml files across the cluster.  I shouldn’t have to 
manually modify yaml files on each node or have to create a script for 
some third party automation tool to do it.


I should not have to turn off service, clear directories, restart 
service in coordination with the other nodes.  It’s already a computer 
system.  It can do those things on its own.


How about read repair.  First there is something wrong with the name.  
Maybe it should be called Consistency Repair.  An administrator 
shouldn’t have to do anything.  It should be a behavior of Cassandra 
that is programmed in. It should consider the GC setting of each node, 
calculate how often it has to run repair, when it should run it so all 
the nodes aren’t trying at the same time and when other circumstances 
indicate it should also run it.


Certificate management should be automated.

Cluster wide management should be a big theme in any next major 
release. What is a major release?  How many major releases could a 
program have before all the coding for basic stuff like installation, 
configuration and maintenance is included!


Finish the basic coding of Cassandra, make it easy to use for 
administrators, make is smart, add cluster wide management.  Keep 
Cassandra competitive or it will soon be the old Model T we all 
remember fondly.


I ask the Committee to compile a list of all such items, make a plan, 
and commit to including the completed and tested code as part of major 
release 5.0.  I further ask that release 4.0 not be delayed and then 
there be an unusually short skip to version 5.0.


Kenneth Brotman





Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Carl Mueller
I think what is really necessary is providing table-level recipes for
storing data. We need a lot of real world examples and the resulting
schema, compaction strategies, and tunings that were performed for them.
Right now I don't see such crucial cookbook data in the project.

AI is a bit ridiculous, we'd need to AI a collection of big data systems,
and then cassandra would need to have an entirely separate AI system
ingesting ALL THE DATA that comes into the already Big Data system in order
to... what... what would we have the AI do? restructure schemas? Switch
compaction strategeis? Add/subtract nodes? Increase/decrease RF? Those are
all insane things to allocate to AI approaches which are not transparent to
the factors and processing that led to the conclusions. The best we could
hope for are recommendations.

On Tue, Feb 20, 2018 at 5:39 AM, Kyrylo Lebediev <kyrylo_lebed...@epam.com>
wrote:

> Agree with you, Daniel, regarding gaps in documentation.
>
>
> ---
>
> At the same time I disagree with the folks who are complaining in this
> thread about some functionality like 'advanced backup' etc is missing out
> of the box.
>
> We all live in the time where there are literally tons of open-source
> tools (automation, monitoring) and languages are available, also there are
> some really powerful SaaS solutions on the market which support C*
> (Datadog, for instance).
>
>
> For example, while C* provides basic building blocks for anti-entropy
> repairs [I mean basic usage of 'nodetool repair' is not suitable for
> large production clusters], Reaper (many thanks to Spotify and
> TheLastPickle!) which uses this basic functionality solves the  task very
> well for real-world C* setups.
>
>
> Something is missing  / could be improved in your opinion - we're in era
> of open-source. Create your own tool, let's say for C* backups automation
> using EBS snapshots, and upload it on GitHub.
>
>
> C* is a DB-engine, not a fully-automated self-contained suite.
> End-users are able to work on automation of routine [3rd party projects],
> meanwhile C* contributors may focus on core functionality.
>
> --
>
> Going back to documentation topic, as far as I understand, DataStax is no
> longer main C* contributor  and is focused on own C*-based proprietary
> software [correct me smb if I'm wrong].
>
> This has led us to the situation when development of C* is progressing (as
> far as I understand, work is done mainly by some large C* users having
> enough resources to contribute to the C* project to get the features they
> need), but there is no single company which has taken over actualization of
> C* documentation / Wiki.
>
> Honestly, even DataStax's documentation is  too concise and  is missing a
> lot of important details.
>
> [BTW, just've taken a look at https://cassandra.apache.org/doc/latest/
> and it looks not that 'bad':  despite of TODOs it contains a lot of
> valuable information]
>
>
> So, I feel the C* Community has to join efforts on enriching existing
> documentation / resurrection of Wiki [where can be placed howto's,
> information about 3rd party automations and integrations etc].
>
> By the Community I mean all of us including myself.
>
>
>
> Regards,
>
> Kyrill
> --
> *From:* Daniel Hölbling-Inzko <daniel.hoelbling-in...@bitmovin.com>
> *Sent:* Tuesday, February 20, 2018 11:28:13 AM
> *To:* user@cassandra.apache.org; James Briggs
>
> *Cc:* d...@cassandra.apache.org
> *Subject:* Re: Cassandra Needs to Grow Up by Version Five!
>
> Hi,
>
> I have to add my own two cents here as the main thing that keeps me from
> really running Cassandra is the amount of pain running it incurs.
> Not so much because it's actually painful but because the tools are so
> different and the documentation and best practices are scattered across a
> dozen outdated DataStax articles and this mailing list etc.. We've been
> hesitant (although our use case is perfect for using Cassandra) to deploy
> Cassandra to any critical systems as even after a year of running it we
> still don't have the operational experience to confidently run critical
> systems with it.
>
> Simple things like a foolproof / safe cluster-wide S3 Backup (like
> Elasticsearch has it) would for example solve a TON of issues for new
> people. I don't need it auto-scheduled or something, but having to
> configure cron jobs across the whole cluster is a pain in the ass for small
> teams.
> To be honest, even the way snapshots are done right now is already super
> painful. Every other system I operated so far will just create one backup
> folder I can export, in C* the Backup is scattered across a bunch of
> different Keyspace folders etc.. needless to say

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Kyrylo Lebediev
Agree with you, Daniel, regarding gaps in documentation.


---

At the same time I disagree with the folks who are complaining in this thread 
about some functionality like 'advanced backup' etc is missing out of the box.

We all live in the time where there are literally tons of open-source tools 
(automation, monitoring) and languages are available, also there are some 
really powerful SaaS solutions on the market which support C* (Datadog, for 
instance).


For example, while C* provides basic building blocks for anti-entropy repairs 
[I mean basic usage of 'nodetool repair' is not suitable for large production 
clusters], Reaper (many thanks to Spotify and TheLastPickle!) which uses this 
basic functionality solves the  task very well for real-world C* setups.


Something is missing  / could be improved in your opinion - we're in era of 
open-source. Create your own tool, let's say for C* backups automation using 
EBS snapshots, and upload it on GitHub.


C* is a DB-engine, not a fully-automated self-contained suite.
End-users are able to work on automation of routine [3rd party projects], 
meanwhile C* contributors may focus on core functionality.

--

Going back to documentation topic, as far as I understand, DataStax is no 
longer main C* contributor  and is focused on own C*-based proprietary software 
[correct me smb if I'm wrong].

This has led us to the situation when development of C* is progressing (as far 
as I understand, work is done mainly by some large C* users having enough 
resources to contribute to the C* project to get the features they need), but 
there is no single company which has taken over actualization of C* 
documentation / Wiki.

Honestly, even DataStax's documentation is  too concise and  is missing a lot 
of important details.

[BTW, just've taken a look at https://cassandra.apache.org/doc/latest/ and it 
looks not that 'bad':  despite of TODOs it contains a lot of valuable 
information]


So, I feel the C* Community has to join efforts on enriching existing 
documentation / resurrection of Wiki [where can be placed howto's, information 
about 3rd party automations and integrations etc].

By the Community I mean all of us including myself.



Regards,

Kyrill


From: Daniel Hölbling-Inzko <daniel.hoelbling-in...@bitmovin.com>
Sent: Tuesday, February 20, 2018 11:28:13 AM
To: user@cassandra.apache.org; James Briggs
Cc: d...@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

Hi,

I have to add my own two cents here as the main thing that keeps me from really 
running Cassandra is the amount of pain running it incurs.
Not so much because it's actually painful but because the tools are so 
different and the documentation and best practices are scattered across a dozen 
outdated DataStax articles and this mailing list etc.. We've been hesitant 
(although our use case is perfect for using Cassandra) to deploy Cassandra to 
any critical systems as even after a year of running it we still don't have the 
operational experience to confidently run critical systems with it.

Simple things like a foolproof / safe cluster-wide S3 Backup (like 
Elasticsearch has it) would for example solve a TON of issues for new people. I 
don't need it auto-scheduled or something, but having to configure cron jobs 
across the whole cluster is a pain in the ass for small teams.
To be honest, even the way snapshots are done right now is already super 
painful. Every other system I operated so far will just create one backup 
folder I can export, in C* the Backup is scattered across a bunch of different 
Keyspace folders etc.. needless to say that it took a while until I trusted my 
backup scripts fully.

And especially for a Database I believe Backup/Restore needs to be a non-issue 
that's documented front and center. If not smaller teams just don't have the 
resources to dedicate to learning and building the tools around it.

Now that the team is getting larger we could spare the resources to operate 
these things, but switching from a well-understood RDBMs schema to Cassandra is 
now incredibly hard and will probably take years.

greetings Daniel

On Tue, 20 Feb 2018 at 05:56 James Briggs <james.bri...@yahoo.com.invalid> 
wrote:
Kenneth:

What you said is not wrong.

Vertica and Riak are examples of distributed databases that don't require 
hand-holding.

Cassandra is for Java-programmer DIYers, or more often Datastax clients, at 
this point.
Thanks, James.


From: Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Cc: d...@cassandra.apache.org<mailto:d...@cassandra.apache.org>
Sent: Monday, February 19, 2018 4:56 PM

Subject: RE: Cassandra Needs to Grow Up by Version Five!

Jeff, you helped me figure out what I was missing.  It just took me a day to 
digest what you wrote.  I’m coming over from another type of enginee

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Daniel Hölbling-Inzko
Hi,

I have to add my own two cents here as the main thing that keeps me from
really running Cassandra is the amount of pain running it incurs.
Not so much because it's actually painful but because the tools are so
different and the documentation and best practices are scattered across a
dozen outdated DataStax articles and this mailing list etc.. We've been
hesitant (although our use case is perfect for using Cassandra) to deploy
Cassandra to any critical systems as even after a year of running it we
still don't have the operational experience to confidently run critical
systems with it.

Simple things like a foolproof / safe cluster-wide S3 Backup (like
Elasticsearch has it) would for example solve a TON of issues for new
people. I don't need it auto-scheduled or something, but having to
configure cron jobs across the whole cluster is a pain in the ass for small
teams.
To be honest, even the way snapshots are done right now is already super
painful. Every other system I operated so far will just create one backup
folder I can export, in C* the Backup is scattered across a bunch of
different Keyspace folders etc.. needless to say that it took a while until
I trusted my backup scripts fully.

And especially for a Database I believe Backup/Restore needs to be a
non-issue that's documented front and center. If not smaller teams just
don't have the resources to dedicate to learning and building the tools
around it.

Now that the team is getting larger we could spare the resources to operate
these things, but switching from a well-understood RDBMs schema to
Cassandra is now incredibly hard and will probably take years.

greetings Daniel

On Tue, 20 Feb 2018 at 05:56 James Briggs <james.bri...@yahoo.com.invalid>
wrote:

> Kenneth:
>
> What you said is not wrong.
>
> Vertica and Riak are examples of distributed databases that don't require
> hand-holding.
>
> Cassandra is for Java-programmer DIYers, or more often Datastax clients,
> at this point.
> Thanks, James.
>
> --
> *From:* Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
> *To:* user@cassandra.apache.org
> *Cc:* d...@cassandra.apache.org
> *Sent:* Monday, February 19, 2018 4:56 PM
>
> *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
>
> Jeff, you helped me figure out what I was missing.  It just took me a day
> to digest what you wrote.  I’m coming over from another type of
> engineering.  I didn’t know and it’s not really documented.  Cassandra runs
> in a data center.  Now days that means the nodes are going to be in managed
> containers, Docker containers, managed by Kerbernetes,  Meso or something,
> and for that reason anyone operating Cassandra in a real world setting
> would not encounter the issues I raised in the way I described.
>
> Shouldn’t the architectural diagrams people reference indicate that in
> some way?  That would have help me.
>
> Kenneth Brotman
>
> *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com]
> *Sent:* Monday, February 19, 2018 10:43 AM
> *To:* 'user@cassandra.apache.org'
> *Cc:* 'd...@cassandra.apache.org'
> *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
>
> Well said.  Very fair.  I wouldn’t mind hearing from others still.  You’re
> a good guy!
>
> Kenneth Brotman
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com <jji...@gmail.com>]
> *Sent:* Monday, February 19, 2018 9:10 AM
> *To:* cassandra
> *Cc:* Cassandra DEV
> *Subject:* Re: Cassandra Needs to Grow Up by Version Five!
>
> There's a lot of things below I disagree with, but it's ok. I convinced
> myself not to nit-pick every point.
>
> https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of
> Stefan's work with cert management
>
> Beyond that, I encourage you to do what Michael suggested: open JIRAs for
> things you care strongly about, work on them if you have time. Sometime
> this year we'll schedule a NGCC (Next Generation Cassandra Conference)
> where we talk about future project work and direction, I encourage you to
> attend if you're able (I encourage anyone who cares about the direction of
> Cassandra to attend, it's probably be either free or very low cost, just to
> cover a venue and some food). If nothing else, you'll meet some of the
> teams who are working on the project, and learn why they've selected the
> projects on which they're working. You'll have an opportunity to pitch your
> vision, and maybe you can talk some folks into helping out.
>
> - Jeff
>
>
>
>
> On Mon, Feb 19, 2018 at 1:01 AM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
> Comments inline
>
> >-Original Message-
> >From: Jeff Jirsa [mailto:jji...@gmail.com]
> >Sent: Sunday, February 18, 2018 10:58 PM
> >To: user@cassandra.apache.org

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-19 Thread James Briggs
Kenneth:
What you said is not wrong.

Vertica and Riak are examples of distributed databases that don't require 
hand-holding.

Cassandra is for Java-programmer DIYers, or more often Datastax clients, at 
this point.
Thanks, James.

  From: Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
 To: user@cassandra.apache.org 
Cc: d...@cassandra.apache.org
 Sent: Monday, February 19, 2018 4:56 PM
 Subject: RE: Cassandra Needs to Grow Up by Version Five!
   
#yiv0297673896 #yiv0297673896 -- _filtered #yiv0297673896 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv0297673896 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv0297673896 
#yiv0297673896 p.yiv0297673896MsoNormal, #yiv0297673896 
li.yiv0297673896MsoNormal, #yiv0297673896 div.yiv0297673896MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv0297673896 a:link, 
#yiv0297673896 span.yiv0297673896MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv0297673896 a:visited, #yiv0297673896 
span.yiv0297673896MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv0297673896 
p.yiv0297673896MsoAcetate, #yiv0297673896 li.yiv0297673896MsoAcetate, 
#yiv0297673896 div.yiv0297673896MsoAcetate 
{margin:0in;margin-bottom:.0001pt;font-size:8.0pt;}#yiv0297673896 
span.yiv0297673896EmailStyle17 {color:#1F497D;}#yiv0297673896 
span.yiv0297673896BalloonTextChar {}#yiv0297673896 
span.yiv0297673896EmailStyle20 {color:#1F497D;}#yiv0297673896 
.yiv0297673896MsoChpDefault {font-size:10.0pt;} _filtered #yiv0297673896 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv0297673896 div.yiv0297673896WordSection1 
{}#yiv0297673896 Jeff, you helped me figure out what I was missing.  It just 
took me a day to digest what you wrote.  I’m coming over from another type of 
engineering.  I didn’t know and it’s not really documented.  Cassandra runs in 
a data center.  Now days that means the nodes are going to be in managed 
containers, Docker containers, managed by Kerbernetes,  Meso or something, and 
for that reason anyone operating Cassandra in a real world setting would not 
encounter the issues I raised in the way I described.  Shouldn’t the 
architectural diagrams people reference indicate that in some way?  That would 
have help me.  Kenneth Brotman  From: Kenneth Brotman 
[mailto:kenbrot...@yahoo.com] 
Sent: Monday, February 19, 2018 10:43 AM
To: 'user@cassandra.apache.org'
Cc: 'd...@cassandra.apache.org'
Subject: RE: Cassandra Needs to Grow Up by Version Five!  Well said.  Very 
fair.  I wouldn’t mind hearing from others still.  You’re a good guy!  Kenneth 
Brotman  From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Monday, February 19, 2018 9:10 AM
To: cassandra
Cc: Cassandra DEV
Subject: Re: Cassandra Needs to Grow Up by Version Five!  There's a lot of 
things below I disagree with, but it's ok. I convinced myself not to nit-pick 
every point.  https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of 
Stefan's work with cert management  Beyond that, I encourage you to do what 
Michael suggested: open JIRAs for things you care strongly about, work on them 
if you have time. Sometime this year we'll schedule a NGCC (Next Generation 
Cassandra Conference) where we talk about future project work and direction, I 
encourage you to attend if you're able (I encourage anyone who cares about the 
direction of Cassandra to attend, it's probably be either free or very low 
cost, just to cover a venue and some food). If nothing else, you'll meet some 
of the teams who are working on the project, and learn why they've selected the 
projects on which they're working. You'll have an opportunity to pitch your 
vision, and maybe you can talk some folks into helping out.   - Jeff        On 
Mon, Feb 19, 2018 at 1:01 AM, Kenneth Brotman <kenbrot...@yahoo.com.invalid> 
wrote:Comments inline

>-Original Message-
>From: Jeff Jirsa [mailto:jji...@gmail.com]
>Sent: Sunday, February 18, 2018 10:58 PM
>To: user@cassandra.apache.org
>Cc: d...@cassandra.apache.org
>Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
>Comments inline
>
>
>> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
>> wrote:
>>
> >Cassandra feels like an unfinished program to me. The problem is not that 
> >it’s open source or cutting edge.  It’s an open source cutting edge program 
> >that lacks some of its basic functionality.  We are all stuck addressing 
> >fundamental mechanical tasks for Cassandra because the basic code that would 
> >do that part has not been contributed yet.
>>
>There’s probably 2-3 reasons why here:
>
>1) Historically the pmc has tried to keep the scope of the project very 
>narrow. It’s a database. We don’t ship drivers. We don’t ship developer tools. 
>We don’t ship fancy UIs. We ship a database. I think for the most part the 
>narrow vision has been for the best, but maybe it’s time to recon

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-19 Thread Kenneth Brotman
Jeff, you helped me figure out what I was missing.  It just took me a day to 
digest what you wrote.  I’m coming over from another type of engineering.  I 
didn’t know and it’s not really documented.  Cassandra runs in a data center.  
Now days that means the nodes are going to be in managed containers, Docker 
containers, managed by Kerbernetes,  Meso or something, and for that reason 
anyone operating Cassandra in a real world setting would not encounter the 
issues I raised in the way I described.

 

Shouldn’t the architectural diagrams people reference indicate that in some 
way?  That would have help me.

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Monday, February 19, 2018 10:43 AM
To: 'user@cassandra.apache.org'
Cc: 'd...@cassandra.apache.org'
Subject: RE: Cassandra Needs to Grow Up by Version Five!

 

Well said.  Very fair.  I wouldn’t mind hearing from others still.  You’re a 
good guy!

 

Kenneth Brotman

 

From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Monday, February 19, 2018 9:10 AM
To: cassandra
Cc: Cassandra DEV
Subject: Re: Cassandra Needs to Grow Up by Version Five!

 

There's a lot of things below I disagree with, but it's ok. I convinced myself 
not to nit-pick every point.

 

https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of Stefan's work 
with cert management

 

Beyond that, I encourage you to do what Michael suggested: open JIRAs for 
things you care strongly about, work on them if you have time. Sometime this 
year we'll schedule a NGCC (Next Generation Cassandra Conference) where we talk 
about future project work and direction, I encourage you to attend if you're 
able (I encourage anyone who cares about the direction of Cassandra to attend, 
it's probably be either free or very low cost, just to cover a venue and some 
food). If nothing else, you'll meet some of the teams who are working on the 
project, and learn why they've selected the projects on which they're working. 
You'll have an opportunity to pitch your vision, and maybe you can talk some 
folks into helping out. 

 

- Jeff

 

 

 

 

On Mon, Feb 19, 2018 at 1:01 AM, Kenneth Brotman <kenbrot...@yahoo.com.invalid> 
wrote:

Comments inline

>-Original Message-
>From: Jeff Jirsa [mailto:jji...@gmail.com]
>Sent: Sunday, February 18, 2018 10:58 PM
>To: user@cassandra.apache.org
>Cc: d...@cassandra.apache.org
>Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
>Comments inline
>
>
>> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
>> wrote:
>>
> >Cassandra feels like an unfinished program to me. The problem is not that 
> >it’s open source or cutting edge.  It’s an open source cutting edge program 
> >that lacks some of its basic functionality.  We are all stuck addressing 
> >fundamental mechanical tasks for Cassandra because the basic code that would 
> >do that part has not been contributed yet.
>>
>There’s probably 2-3 reasons why here:
>
>1) Historically the pmc has tried to keep the scope of the project very 
>narrow. It’s a database. We don’t ship drivers. We don’t ship developer tools. 
>We don’t ship fancy UIs. We ship a database. I think for the most part the 
>narrow vision has been for the best, but maybe it’s time to reconsider some of 
>the scope.
>
>Postgres will autovacuum to prevent wraparound (hopefully),  but everyone I 
>know running Postgres uses flexible-freeze in cron - sometimes it’s ok to let 
>the database have its opinions and let third party tools fill in the gaps.
>

I can appreciate the desire to stay in scope.  I believe usability is the King. 
 When users have to learn the database, then learn what they have to automate, 
then learn an automation tool and then use the automation tool to do something 
that is as fundamental as the fundamental tasks I described, then something is 
missing from the database itself that is adversely affecting usability - and 
that is very bad.  Where those big companies need to calculate the ROI is in 
the cost of acquiring or training the next group of users.  Consider how steep 
the learning curve is for new users.  Consider the business case for improving 
ease of use.

>2) Cassandra is, by definition, a database for large scale problems. Most of 
>the companies working on/with it tend to be big companies. Big companies often 
>have pre-existing automation that solved the stuff you consider fundamental 
>tasks, so there’s probably nobody actively working on the solved problems that 
>you may consider missing features - for many people they’re already solved.
>

I could be wrong but it sounds like a lot of the code work is done, and if the 
companies would take the time to contribute more code, then the rest of the 
code needed could be generated easily.

>3) It’s not nearly as basic as you think it 

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-19 Thread Kenneth Brotman
Well said.  Very fair.  I wouldn’t mind hearing from others still.  You’re a 
good guy!

 

Kenneth Brotman

 

From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Monday, February 19, 2018 9:10 AM
To: cassandra
Cc: Cassandra DEV
Subject: Re: Cassandra Needs to Grow Up by Version Five!

 

There's a lot of things below I disagree with, but it's ok. I convinced myself 
not to nit-pick every point.

 

https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of Stefan's work 
with cert management

 

Beyond that, I encourage you to do what Michael suggested: open JIRAs for 
things you care strongly about, work on them if you have time. Sometime this 
year we'll schedule a NGCC (Next Generation Cassandra Conference) where we talk 
about future project work and direction, I encourage you to attend if you're 
able (I encourage anyone who cares about the direction of Cassandra to attend, 
it's probably be either free or very low cost, just to cover a venue and some 
food). If nothing else, you'll meet some of the teams who are working on the 
project, and learn why they've selected the projects on which they're working. 
You'll have an opportunity to pitch your vision, and maybe you can talk some 
folks into helping out. 

 

- Jeff

 

 

 

 

On Mon, Feb 19, 2018 at 1:01 AM, Kenneth Brotman <kenbrot...@yahoo.com.invalid> 
wrote:

Comments inline

>-Original Message-
>From: Jeff Jirsa [mailto:jji...@gmail.com]
>Sent: Sunday, February 18, 2018 10:58 PM
>To: user@cassandra.apache.org
>Cc: d...@cassandra.apache.org
>Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
>Comments inline
>
>
>> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
>> wrote:
>>
> >Cassandra feels like an unfinished program to me. The problem is not that 
> >it’s open source or cutting edge.  It’s an open source cutting edge program 
> >that lacks some of its basic functionality.  We are all stuck addressing 
> >fundamental mechanical tasks for Cassandra because the basic code that would 
> >do that part has not been contributed yet.
>>
>There’s probably 2-3 reasons why here:
>
>1) Historically the pmc has tried to keep the scope of the project very 
>narrow. It’s a database. We don’t ship drivers. We don’t ship developer tools. 
>We don’t ship fancy UIs. We ship a database. I think for the most part the 
>narrow vision has been for the best, but maybe it’s time to reconsider some of 
>the scope.
>
>Postgres will autovacuum to prevent wraparound (hopefully),  but everyone I 
>know running Postgres uses flexible-freeze in cron - sometimes it’s ok to let 
>the database have its opinions and let third party tools fill in the gaps.
>

I can appreciate the desire to stay in scope.  I believe usability is the King. 
 When users have to learn the database, then learn what they have to automate, 
then learn an automation tool and then use the automation tool to do something 
that is as fundamental as the fundamental tasks I described, then something is 
missing from the database itself that is adversely affecting usability - and 
that is very bad.  Where those big companies need to calculate the ROI is in 
the cost of acquiring or training the next group of users.  Consider how steep 
the learning curve is for new users.  Consider the business case for improving 
ease of use.

>2) Cassandra is, by definition, a database for large scale problems. Most of 
>the companies working on/with it tend to be big companies. Big companies often 
>have pre-existing automation that solved the stuff you consider fundamental 
>tasks, so there’s probably nobody actively working on the solved problems that 
>you may consider missing features - for many people they’re already solved.
>

I could be wrong but it sounds like a lot of the code work is done, and if the 
companies would take the time to contribute more code, then the rest of the 
code needed could be generated easily.

>3) It’s not nearly as basic as you think it is. Datastax seemingly had a 
>multi-person team on opscenter, and while it was better than anything else 
>around last time I used it (before it stopped supporting the OSS version), it 
>left a lot to be desired. It’s probably 2-3 engineers working for a month  to 
>have any sort of meaningful, reliable, mostly trivial cluster-managing UI, and 
>I can think of about 10 JIRAs I’d rather see that time be spent on first.

How about 6-9 engineers working 12 months a year on it then.  I'm not kidding.  
For a big company with revenues in the tens of billions or more, and a heavy 
use of Cassandra nodes, it's easy to make a case for having a full time person 
or more that involved.  They aren't paying for using the open source code that 
is Cassandra.  Let's see what would the licensing fees be for a big company if 
the costs where like

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-19 Thread Jeff Jirsa
There's a lot of things below I disagree with, but it's ok. I convinced
myself not to nit-pick every point.

https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of Stefan's
work with cert management

Beyond that, I encourage you to do what Michael suggested: open JIRAs for
things you care strongly about, work on them if you have time. Sometime
this year we'll schedule a NGCC (Next Generation Cassandra Conference)
where we talk about future project work and direction, I encourage you to
attend if you're able (I encourage anyone who cares about the direction of
Cassandra to attend, it's probably be either free or very low cost, just to
cover a venue and some food). If nothing else, you'll meet some of the
teams who are working on the project, and learn why they've selected the
projects on which they're working. You'll have an opportunity to pitch your
vision, and maybe you can talk some folks into helping out.

- Jeff




On Mon, Feb 19, 2018 at 1:01 AM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> Comments inline
>
> >-Original Message-
> >From: Jeff Jirsa [mailto:jji...@gmail.com]
> >Sent: Sunday, February 18, 2018 10:58 PM
> >To: user@cassandra.apache.org
> >Cc: d...@cassandra.apache.org
> >Subject: Re: Cassandra Needs to Grow Up by Version Five!
> >
> >Comments inline
> >
> >
> >> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman
> <kenbrot...@yahoo.com.INVALID> wrote:
> >>
> > >Cassandra feels like an unfinished program to me. The problem is not
> that it’s open source or cutting edge.  It’s an open source cutting edge
> program that lacks some of its basic functionality.  We are all stuck
> addressing fundamental mechanical tasks for Cassandra because the basic
> code that would do that part has not been contributed yet.
> >>
> >There’s probably 2-3 reasons why here:
> >
> >1) Historically the pmc has tried to keep the scope of the project very
> narrow. It’s a database. We don’t ship drivers. We don’t ship developer
> tools. We don’t ship fancy UIs. We ship a database. I think for the most
> part the narrow vision has been for the best, but maybe it’s time to
> reconsider some of the scope.
> >
> >Postgres will autovacuum to prevent wraparound (hopefully),  but everyone
> I know running Postgres uses flexible-freeze in cron - sometimes it’s ok to
> let the database have its opinions and let third party tools fill in the
> gaps.
> >
>
> I can appreciate the desire to stay in scope.  I believe usability is the
> King.  When users have to learn the database, then learn what they have to
> automate, then learn an automation tool and then use the automation tool to
> do something that is as fundamental as the fundamental tasks I described,
> then something is missing from the database itself that is adversely
> affecting usability - and that is very bad.  Where those big companies need
> to calculate the ROI is in the cost of acquiring or training the next group
> of users.  Consider how steep the learning curve is for new users.
> Consider the business case for improving ease of use.
>
> >2) Cassandra is, by definition, a database for large scale problems. Most
> of the companies working on/with it tend to be big companies. Big companies
> often have pre-existing automation that solved the stuff you consider
> fundamental tasks, so there’s probably nobody actively working on the
> solved problems that you may consider missing features - for many people
> they’re already solved.
> >
>
> I could be wrong but it sounds like a lot of the code work is done, and if
> the companies would take the time to contribute more code, then the rest of
> the code needed could be generated easily.
>
> >3) It’s not nearly as basic as you think it is. Datastax seemingly had a
> multi-person team on opscenter, and while it was better than anything else
> around last time I used it (before it stopped supporting the OSS version),
> it left a lot to be desired. It’s probably 2-3 engineers working for a
> month  to have any sort of meaningful, reliable, mostly trivial
> cluster-managing UI, and I can think of about 10 JIRAs I’d rather see that
> time be spent on first.
>
> How about 6-9 engineers working 12 months a year on it then.  I'm not
> kidding.  For a big company with revenues in the tens of billions or more,
> and a heavy use of Cassandra nodes, it's easy to make a case for having a
> full time person or more that involved.  They aren't paying for using the
> open source code that is Cassandra.  Let's see what would the licensing
> fees be for a big company if the costs where like Microsoft or Oracle would
> charge for their enterprise level relational database?   What's the
> co

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-19 Thread Kenneth Brotman
Comments inline

>-Original Message-
>From: Jeff Jirsa [mailto:jji...@gmail.com] 
>Sent: Sunday, February 18, 2018 10:58 PM
>To: user@cassandra.apache.org
>Cc: d...@cassandra.apache.org
>Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
>Comments inline 
>
>
>> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
>> wrote:
>>
> >Cassandra feels like an unfinished program to me. The problem is not that 
> >it’s open source or cutting edge.  It’s an open source cutting edge program 
> >that lacks some of its basic functionality.  We are all stuck addressing 
> >fundamental mechanical tasks for Cassandra because the basic code that would 
> >do that part has not been contributed yet.
>> 
>There’s probably 2-3 reasons why here:
>
>1) Historically the pmc has tried to keep the scope of the project very 
>narrow. It’s a database. We don’t ship drivers. We don’t ship developer tools. 
>We don’t ship fancy UIs. We ship a database. I think for the most part the 
>narrow vision has been for the best, but maybe it’s time to reconsider some of 
>the scope. 
>
>Postgres will autovacuum to prevent wraparound (hopefully),  but everyone I 
>know running Postgres uses flexible-freeze in cron - sometimes it’s ok to let 
>the database have its opinions and let third party tools fill in the gaps.
>

I can appreciate the desire to stay in scope.  I believe usability is the King. 
 When users have to learn the database, then learn what they have to automate, 
then learn an automation tool and then use the automation tool to do something 
that is as fundamental as the fundamental tasks I described, then something is 
missing from the database itself that is adversely affecting usability - and 
that is very bad.  Where those big companies need to calculate the ROI is in 
the cost of acquiring or training the next group of users.  Consider how steep 
the learning curve is for new users.  Consider the business case for improving 
ease of use. 

>2) Cassandra is, by definition, a database for large scale problems. Most of 
>the companies working on/with it tend to be big companies. Big companies often 
>have pre-existing automation that solved the stuff you consider fundamental 
>tasks, so there’s probably nobody actively working on the solved problems that 
>you may consider missing features - for many people they’re already solved.
>

I could be wrong but it sounds like a lot of the code work is done, and if the 
companies would take the time to contribute more code, then the rest of the 
code needed could be generated easily.

>3) It’s not nearly as basic as you think it is. Datastax seemingly had a 
>multi-person team on opscenter, and while it was better than anything else 
>around last time I used it (before it stopped supporting the OSS version), it 
>left a lot to be desired. It’s probably 2-3 engineers working for a month  to 
>have any sort of meaningful, reliable, mostly trivial cluster-managing UI, and 
>I can think of about 10 JIRAs I’d rather see that time be spent on first. 

How about 6-9 engineers working 12 months a year on it then.  I'm not kidding.  
For a big company with revenues in the tens of billions or more, and a heavy 
use of Cassandra nodes, it's easy to make a case for having a full time person 
or more that involved.  They aren't paying for using the open source code that 
is Cassandra.  Let's see what would the licensing fees be for a big company if 
the costs where like Microsoft or Oracle would charge for their enterprise 
level relational database?   What's the contribution of one or two people in 
comparison.

>> Ease of use issues need to be given much more attention.  For an 
>> administrator, the ease of use of Cassandra is very poor. 
>>
>>Furthermore, currently Cassandra is an idiot.  We have to do everything for 
>>Cassandra. Contrast that with the fact that we are in the dawn of artificial 
>>intelligence.
>> 
>
>And for everything you think is obvious, there’s a 50% chance someone else 
>will have already solved differently, and your obvious new solution will be 
>seen as an inconvenient assumption and complexity they won’t appreciate. Open 
>source projects get to walk a fine line of trying to be useful without making 
>too many assumptions, being “too” opinionated, or overstepping bounds. We may 
>be too conservative, but it’s very easy to go too far in the opposite 
>direction. 
>

I appreciate that but when such concerns result in inaction instead of 
resolution that is no good.

>> Software exists to automate tasks for humans, not mechanize humans to 
>> administer tasks for a database.  I’m an engineering type.  My job is to 
>> apply science and technology to solve real world problems.  And th

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-18 Thread Jeff Jirsa
Comments inline 


> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman  
> wrote:
> 
> Cassandra feels like an unfinished program to me. The problem is not that 
> it’s open source or cutting edge.  It’s an open source cutting edge program 
> that lacks some of its basic functionality.  We are all stuck addressing 
> fundamental mechanical tasks for Cassandra because the basic code that would 
> do that part has not been contributed yet.
> 
There’s probably 2-3 reasons why here:

1) Historically the pmc has tried to keep the scope of the project very narrow. 
It’s a database. We don’t ship drivers. We don’t ship developer tools. We don’t 
ship fancy UIs. We ship a database. I think for the most part the narrow vision 
has been for the best, but maybe it’s time to reconsider some of the scope. 

Postgres will autovacuum to prevent wraparound (hopefully),  but everyone I 
know running Postgres uses flexible-freeze in cron - sometimes it’s ok to let 
the database have its opinions and let third party tools fill in the gaps.

2) Cassandra is, by definition, a database for large scale problems. Most of 
the companies working on/with it tend to be big companies. Big companies often 
have pre-existing automation that solved the stuff you consider fundamental 
tasks, so there’s probably nobody actively working on the solved problems that 
you may consider missing features - for many people they’re already solved.

3) It’s not nearly as basic as you think it is. Datastax seemingly had a 
multi-person team on opscenter, and while it was better than anything else 
around last time I used it (before it stopped supporting the OSS version), it 
left a lot to be desired. It’s probably 2-3 engineers working for a month  to 
have any sort of meaningful, reliable, mostly trivial cluster-managing UI, and 
I can think of about 10 JIRAs I’d rather see that time be spent on first. 

> Ease of use issues need to be given much more attention.  For an 
> administrator, the ease of use of Cassandra is very poor. 
> 
> Furthermore, currently Cassandra is an idiot.  We have to do everything for 
> Cassandra. Contrast that with the fact that we are in the dawn of artificial 
> intelligence.
> 

And for everything you think is obvious, there’s a 50% chance someone else will 
have already solved differently, and your obvious new solution will be seen as 
an inconvenient assumption and complexity they won’t appreciate. Open source 
projects get to walk a fine line of trying to be useful without making too many 
assumptions, being “too” opinionated, or overstepping bounds. We may be too 
conservative, but it’s very easy to go too far in the opposite direction. 

> Software exists to automate tasks for humans, not mechanize humans to 
> administer tasks for a database.  I’m an engineering type.  My job is to 
> apply science and technology to solve real world problems.  And that’s where 
> I need an organization’s I.T. talent to focus; not in crank starting an 
> unfinished database.
> 

And that’s why nobody’s done it - we all have bigger problems we’re being paid 
to solve, and nobody’s felt it necessary. Because it’s not necessary, it’s 
nice, but not required.

> For example, I should be able to go to any node, replace the Cassandra.yaml 
> file and have a prompt on the display ask me if I want to update all the yaml 
> files across the cluster.  I shouldn’t have to manually modify yaml files on 
> each node or have to create a script for some third party automation tool to 
> do it. 
> 
I don’t see this ever happening.  Your config management already pushes files 
around your infrastructure, Cassandra doesn’t need to do it. 

> I should not have to turn off service, clear directories, restart service in 
> coordination with the other nodes.  It’s already a computer system.  It can 
> do those things on its own.
> 

The only time you should be doing this is when you’re wiping nodes from failed 
bootstrap, and that stopped being required in 2.2.
> How about read repair.  First there is something wrong with the name.  Maybe 
> it should be called Consistency Repair.  An administrator shouldn’t have to 
> do anything.  It should be a behavior of Cassandra that is programmed in. It 
> should consider the GC setting of each node, calculate how often it has to 
> run repair, when it should run it so all the nodes aren’t trying at the same 
> time and when other circumstances indicate it should also run it.
> 
There’s a good argument to be made that something like Reaper should be shipped 
with Cassandra. There’s another good argument that most tools like this end up 
needing some sort of leader election for scheduling and that goes against a lot 
of the fundamental assumptions in Cassandra (all nodes are equal, etc) - 
solving that problem is probably at least part of why you haven’t seen them 
built into the db. “Leader election is easy” you’ll say, and I’ll laugh and 
tell you about users I know who have DCs go