from:"Blake Eggleston"

Re: [VOTE] CEP-21 Transactional Cluster Metadata

2023-02-08 Thread Blake Eggleston

+1

> On Feb 6, 2023, at 8:15 AM, Sam Tunnicliffe  wrote:
> 
> Hi everyone,
> 
> I would like to start a vote on this CEP.
> 
> Proposal:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
> 
> Discussion:
> https://lists.apache.org/thread/h25skwkbdztz9hj2pxtgh39rnjfzckk7
> 
> The vote will be open for 72 hours.
> A vote passes if there are at least three binding +1s and no binding vetoes.
> 
> Thanks,
> Sam

Re: [VOTE] CEP-34: mTLS based client and internode authenticators

2023-07-21 Thread Blake Eggleston

+1

> On Jul 21, 2023, at 9:57 AM, Jyothsna Konisa  wrote:
> 
> Hi Everyone!
> 
> I would like to start a vote thread for CEP-34.
> 
> Proposal: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-34%3A+mTLS+based+client+and+internode+authenticators
> JIRA   : 
> https://issues.apache.org/jira/browse/CASSANDRA-18554
> Draft Implementation : https://github.com/apache/cassandra/pull/2372
> Discussion : 
> https://lists.apache.org/thread/pnfg65r76rbbs70hwhsz94ds6yo2042f
> 
> The vote will be open for 72 hours. A vote passes if there are at least 3 
> binding +1s and no binding vetoes.
> 
> Thanks,
> Jyothsna Konisa.

Re: [VOTE] Release dtest-api 0.0.16

2023-08-19 Thread Blake Eggleston

+1On Aug 17, 2023, at 12:37 AM, Alex Petrov  wrote:+1On Thu, Aug 17, 2023, at 4:46 AM, Brandon Williams wrote:+1Kind Regards,BrandonOn Wed, Aug 16, 2023 at 4:34 PM Dinesh Joshi  wrote:>> Proposing the test build of in-jvm dtest API 0.0.16 for release.>> Repository:> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git>> Candidate SHA:> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/1ba6ef93d0721741b5f6d6d72cba3da03fe78438> tagged with 0.0.16>> Artifacts:> https://repository.apache.org/content/repositories/orgapachecassandra-1307/org/apache/cassandra/dtest-api/0.0.16/>> Key signature: 53371F9B1B425A336988B6A03B6042413D323470>> Changes since last release:>> * CASSANDRA-18727 - JMXUtil.getJmxConnector should retry connection attempts>> The vote will be open for 24 hours. Everyone who has tested the build> is invited to vote. Votes by PMC members are considered binding. A> vote passes if there are at least three binding +1s.>

Re: [VOTE] Release Apache Cassandra 5.0-rc1

2024-06-27 Thread Blake Eggleston

Looking at the ticket, I’d say Jon’s concern is legitimate. The segfaults Jon 
is seeing are probably caused by paxos V2 when combined with off heap memtables 
for the reason Benedict suggests in the JIRA. This problem will continue to 
exist in 5.0. Unfortunately, it looks like the patch posted is not enough to 
address the issue and will need to be a bit more involved to properly fix the 
problem.

While this is not a regression, I think Jon’s point about trie memtables 
increasing usage of off heap memtables is a good one, and anyway we shouldn’t 
be doing major releases with known process crashing bugs.

So I’m voting -1 on this release and will work with Jon and Benedict to get 
this fixed.

Thanks,

Blake

> On Jun 26, 2024, at 6:47 AM, Josh McKenzie  wrote:
> 
> Blake or Benedict - can either of you speak to Jon's concerns around 
> CASSANDRA-19668?
> 
> On Wed, Jun 26, 2024, at 12:18 AM, Jeff Jirsa wrote:
>> 
>> +1
>> 
>> 
>> 
>>> On Jun 25, 2024, at 5:04 AM, Mick Semb Wever  wrote:
>>> 
>>> 
>>> Proposing the test build of Cassandra 5.0-rc1 for release.
>>> 
>>> sha1: b43f0b2e9f4cb5105764ef9cf4ece404a740539a
>>> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative
>>> Maven Artifacts: 
>>> https://repository.apache.org/content/repositories/orgapachecassandra-1336/org/apache/cassandra/cassandra-all/5.0-rc1/
>>> 
>>> The Source and Build Artifacts, and the Debian and RPM packages and 
>>> repositories, are available here: 
>>> https://dist.apache.org/repos/dist/dev/cassandra/5.0-rc1/
>>> 
>>> The vote will be open for 72 hours (longer if needed). Everyone who has 
>>> tested the build is invited to vote. Votes by PMC members are considered 
>>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>> 
>>> [1]: CHANGES.txt: 
>>> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/CHANGES.txt
>>> [2]: NEWS.txt: 
>>> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/NEWS.txt

Re: In need of reviewers

2018-05-11 Thread Blake Eggleston

I'll spend a day or two working through some of these next week.

On 5/11/18, 3:44 AM, "kurt greaves"  wrote:

We've got a bunch of tickets that are either in need of review or just a
bit of feedback. Would be very grateful for any help here :).

Bugs:
https://issues.apache.org/jira/browse/CASSANDRA-14365
https://issues.apache.org/jira/browse/CASSANDRA-14204
https://issues.apache.org/jira/browse/CASSANDRA-14162
https://issues.apache.org/jira/browse/CASSANDRA-14126
https://issues.apache.org/jira/browse/CASSANDRA-14365
https://issues.apache.org/jira/browse/CASSANDRA-14099
https://issues.apache.org/jira/browse/CASSANDRA-14073
https://issues.apache.org/jira/browse/CASSANDRA-14063
https://issues.apache.org/jira/browse/CASSANDRA-14056
https://issues.apache.org/jira/browse/CASSANDRA-14054
https://issues.apache.org/jira/browse/CASSANDRA-14013
https://issues.apache.org/jira/browse/CASSANDRA-13841
https://issues.apache.org/jira/browse/CASSANDRA-13698

Improvements:
https://issues.apache.org/jira/browse/CASSANDRA-14309
https://issues.apache.org/jira/browse/CASSANDRA-10789
https://issues.apache.org/jira/browse/CASSANDRA-14443
https://issues.apache.org/jira/browse/CASSANDRA-13010
https://issues.apache.org/jira/browse/CASSANDRA-11559
https://issues.apache.org/jira/browse/CASSANDRA-10789
https://issues.apache.org/jira/browse/CASSANDRA-10023
https://issues.apache.org/jira/browse/CASSANDRA-8460

Cheers,
Kurt




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Testing 4.0 Post-Freeze

2018-07-10 Thread Blake Eggleston

+1 from me as well. Let's try it out

On 7/10/18, 11:23 AM, "Sam Tunnicliffe"  wrote:

+1 here too

On Tue, 10 Jul 2018 at 18:52, Marcus Eriksson  wrote:

> +1 here as well
>
> On Tue, Jul 10, 2018 at 7:06 PM Aleksey Yeshchenko 
> wrote:
>
> > +1 from me too.
> >
> > —
> > AY
> >
> > On 10 July 2018 at 04:17:26, Mick Semb Wever (m...@apache.org) wrote:
> >
> >
> > > We have done all this for previous releases and we know it has not
> > worked
> > > well. So how giving it one more try is going to help here. Can someone
> > > outline what will change for 4.0 which will make it more successful?
> >
> >
> > I (again) agree with you Sankalp :-)
> >
> > Why not try something new?
> > It's easier to discuss these things more genuinely after trying it out.
> >
> > One of the differences in the branching approaches: to feature-freeze on
> a
> > 4.0 branch or on trunk; is who it is that has to then merge and work 
with
> > multiple branches.
> >
> > Where that small but additional effort is placed I think becomes a 
signal
> > to what the community values most: new features or stability.
> >
> > I think most folk would vote for stability, so why not give this 
approach
> > a go and to learn from it.
> > It also creates an incentive to make the feature-freeze period as short
> as
> > possible, moving us towards an eventual goal of not needing to
> > feature-freeze at all.
> >
> > regards,
> > Mick
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 2.2.13

2018-07-27 Thread Blake Eggleston


+1

On July 26, 2018 at 9:26:48 PM, Marcus Eriksson (krum...@gmail.com) wrote:

+1 

On Fri, Jul 27, 2018 at 12:05 AM kurt greaves  wrote: 

> +1 nb 
> 
> On Fri., 27 Jul. 2018, 00:20 Sam Tunnicliffe,  wrote: 
> 
> > +1 
> > 
> > On 25 July 2018 at 08:17, Michael Shuler  wrote: 
> > 
> > > I propose the following artifacts for release as 2.2.13. 
> > > 
> > > sha1: 3482370df5672c9337a16a8a52baba53b70a4fe8 
> > > Git: 
> > > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a= 
> > > shortlog;h=refs/tags/2.2.13-tentative 
> > > Artifacts: 
> > > https://repository.apache.org/content/repositories/ 
> > > orgapachecassandra-1167/org/apache/cassandra/apache-cassandra/2.2.13/ 
> > > Staging repository: 
> > > https://repository.apache.org/content/repositories/ 
> > > orgapachecassandra-1167/ 
> > > 
> > > The Debian and RPM packages are available here: 
> > > http://people.apache.org/~mshuler 
> > > 
> > > The vote will be open for 72 hours (longer if needed). 
> > > 
> > > [1]: CHANGES.txt: 
> > > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a= 
> > > blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.13-tentative 
> > > [2]: NEWS.txt: 
> > > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a= 
> > > blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.13-tentative 
> > > 
> > > - 
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org 
> > > 
> > > 
> > 
>

Re: [VOTE] Release Apache Cassandra 3.11.3 (Take 2)

2018-07-27 Thread Blake Eggleston

+1


On July 26, 2018 at 9:27:27 PM, Marcus Eriksson (krum...@gmail.com) wrote:

+1 

On Fri, Jul 27, 2018 at 4:59 AM Vinay Chella  
wrote: 

> +1 nb. 
> 
> Here are the failed tests (circleci 
> < 
> https://circleci.com/gh/vinaykumarchella/workflows/cassandra/tree/3.11.3_tentative
>  
> >), 
> if anyone is curious about failed tests. 
> 
> dtests-no-vnodes (5 Failed tests) 
> test_short_read - consistency_test.TestConsistency 
> test_describecluster_more_information_three_datacenters - 
> nodetool_test.TestNodetool 
> test_failure_threshold_deletions - paging_test.TestPagingWithDeletions 
> test_closing_connections - thrift_hsha_test.TestThriftHSHA 
> test_mutation_v5 - write_failures_test.TestWriteFailures 
> 
> dtests-with-vnodes (6 failed tests) 
> test_14330 - consistency_test.TestConsistency 
> test_remote_query - cql_test.TestCQLSlowQuery 
> test_describecluster_more_information_three_datacenters - 
> nodetool_test.TestNodetool 
> test_failure_threshold_deletions - paging_test.TestPagingWithDeletions 
> test_closing_connections - thrift_hsha_test.TestThriftHSHA 
> test_mutation_v5 - write_failures_test.TestWriteFailures 
> 
> Regards, 
> Vinay Chella 
> 
> 
> On Thu, Jul 26, 2018 at 3:06 PM kurt greaves  wrote: 
> 
> > +1 nb 
> > 
> > On Fri., 27 Jul. 2018, 00:20 Sam Tunnicliffe,  wrote: 
> > 
> > > +1 
> > > 
> > > On 25 July 2018 at 08:16, Michael Shuler  
> wrote: 
> > > 
> > > > I propose the following artifacts for release as 3.11.3. 
> > > > 
> > > > sha1: 31d5d870f9f5b56391db46ba6cdf9e0882d8a5c0 
> > > > Git: 
> > > > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a= 
> > > > shortlog;h=refs/tags/3.11.3-tentative 
> > > > Artifacts: 
> > > > https://repository.apache.org/content/repositories/ 
> > > > orgapachecassandra-1164/org/apache/cassandra/apache-cassandra/3.11.3/ 
> > > > Staging repository: 
> > > > https://repository.apache.org/content/repositories/ 
> > > > orgapachecassandra-1164/ 
> > > > 
> > > > The Debian and RPM packages are available here: 
> > > > http://people.apache.org/~mshuler 
> > > > 
> > > > The vote will be open for 72 hours (longer if needed). 
> > > > 
> > > > [1]: CHANGES.txt: 
> > > > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a= 
> > > > blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.3-tentative 
> > > > [2]: NEWS.txt: 
> > > > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a= 
> > > > blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.3-tentative 
> > > > 
> > > > - 
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org 
> > > > 
> > > > 
> > > 
> > 
>

Re: [VOTE] Release Apache Cassandra 3.0.17 (Take 2)

2018-07-27 Thread Blake Eggleston

+1


On July 26, 2018 at 9:27:11 PM, Marcus Eriksson (krum...@gmail.com) wrote:

+1 

On Fri, Jul 27, 2018 at 5:03 AM Vinay Chella  
wrote: 

> +1 nb. 
> 
> Here are the test results. 
> https://circleci.com/gh/vinaykumarchella/cassandra/tree/3.0.17_tentative 
> 
> Most of the failed tests are related to snapshot_test.TestArchiveCommitlog. 
> 
> Regards, 
> Vinay Chella 
> 
> 
> On Thu, Jul 26, 2018 at 3:05 PM kurt greaves  wrote: 
> 
> > +1 nb 
> > 
> > On Fri., 27 Jul. 2018, 00:20 Sam Tunnicliffe,  wrote: 
> > 
> > > +1 
> > > 
> > > On 25 July 2018 at 08:17, Michael Shuler  
> wrote: 
> > > 
> > > > I propose the following artifacts for release as 3.0.17. 
> > > > 
> > > > sha1: d52c7b8c595cc0d06fc3607bf16e3f595f016bb6 
> > > > Git: 
> > > > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a= 
> > > > shortlog;h=refs/tags/3.0.17-tentative 
> > > > Artifacts: 
> > > > https://repository.apache.org/content/repositories/ 
> > > > orgapachecassandra-1165/org/apache/cassandra/apache-cassandra/3.0.17/ 
> > > > Staging repository: 
> > > > https://repository.apache.org/content/repositories/ 
> > > > orgapachecassandra-1165/ 
> > > > 
> > > > The Debian and RPM packages are available here: 
> > > > http://people.apache.org/~mshuler 
> > > > 
> > > > The vote will be open for 72 hours (longer if needed). 
> > > > 
> > > > [1]: CHANGES.txt: 
> > > > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a= 
> > > > blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.17-tentative 
> > > > [2]: NEWS.txt: 
> > > > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a= 
> > > > blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.17-tentative 
> > > > 
> > > > - 
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org 
> > > > 
> > > > 
> > > 
> > 
>

Re: Proposing an Apache Cassandra Management process

2018-08-17 Thread Blake Eggleston

I'd be more in favor of making it a separate project, basically for all the 
reasons listed below. I'm assuming we'd want a management process to work 
across different versions, which will be more awkward if it's in tree. Even if 
that's not the case, keeping it in a different repo at this point will make 
iteration easier than if it were in tree. I'd imagine (or at least hope) that 
validating the management process for release would be less difficult than the 
main project, so tying them to the Cassandra release cycle seems unnecessarily 
restrictive.

On August 17, 2018 at 12:07:18 AM, Dinesh Joshi 
(dinesh.jo...@yahoo.com.invalid) wrote:

> On Aug 16, 2018, at 9:27 PM, Sankalp Kohli  wrote: 
> 
> I am bumping this thread because patch has landed for this with repair 
> functionality. 
> 
> I have a following proposal for this which I can put in the JIRA or doc 
> 
> 1. We should see if we can keep this in a separate repo like Dtest. 

This would imply a looser coupling between the two. Keeping things in-tree is 
my preferred approach. It makes testing, dependency management and code sharing 
easier. 

> 2. It should have its own release process. 

This means now there would be two releases that need to be managed and 
coordinated. 

> 3. It should have integration tests for different versions of Cassandra it 
> will support. 

Given the lack of test infrastructure - this will be hard especially if you 
have to qualify a matrix of builds. 

Dinesh 
- 
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Side Car New Repo vs not

2018-08-20 Thread Blake Eggleston

If the sidecar is going to be on a different release cadence, or support 
interacting with mixed mode clusters, then it should definitely be in a 
separate repo. I don’t even know how branching and merging would work in a repo 
that supports 2 separate release targets and/or mixed mode compatibility, but 
I’m pretty sure it would be a mess.

As a cluster management tool, mixed mode is probably going to be a goal at some 
point. As a new project, it will benefit from not being tied to the C* release 
cycle (which would probably delay any sidecar release until whenever 4.1 is 
cut).

On August 20, 2018 at 3:22:54 PM, Joseph Lynch (joe.e.ly...@gmail.com) wrote:

I think that the pros of incubating the sidecar in tree as a tool first  
outweigh the alternatives at this point of time. Rough tradeoffs that I see:  

Unique pros of in tree sidecar:  
* Faster iteration speed in general. For example when we need to add a new  
JMX endpoint that the sidecar needs, or change something from JMX to a  
virtual table (e.g. for repair, or monitoring) we can do all changes  
including tests as one commit within the main repository and don't have to  
commit to main repo, sidecar repo, and dtest repo (juggling version  
compatibility along the way).  
* We can in the future more easily move serious background functionality  
like compaction or repair itself (not repair scheduling, actual repairing)  
into the sidecar with a single atomic commit, we don't have to do two phase  
commits where we add some IPC mechanism to allow us to support it in both,  
then turn it on in the sidecar, then turn it off in the server, etc...  
* I think that the verification is much easier (sounds like Jonathan  
disagreed on the other thread, I could certainly be wrong), and we don't  
have to worry about testing matrices to assure that the sidecar works with  
various versions as the version of the sidecar that is released with that  
version of Cassandra is the only one we have to certify works. If people  
want to pull in new versions or maintain backports they can do that at  
their discretion/testing.  
* We can iterate and prove value before committing to a choice. Since it  
will be a separate artifact from the start we can always move the artifact  
to a separate repo later (but moving the other way is harder).  
* Users will get the sidecar "for free" when they install the daemon, they  
don't need to take affirmative action to e.g. be able to restart their  
cluster, run repair, or back their data up; it just comes out of the box  
for free.  

Unique pros of a separate repository sidecar:  
* We can use a more modern build system like gradle instead of ant  
* Merging changes is less "scary" I guess (I feel like if you're not  
touching the daemon this is already true but I could see this being less  
worrisome for some).  
* Releasing a separate artifact is somewhat easier from a separate repo  
(especially if we have gradle which makes e.g. building debs and rpms  
trivial).  
* We could backport to previous versions without getting into arguments  
about bug fixes vs features.  
* Committers could be different from the main repo, which ... may be a  
useful thing  

Non unique pros of a sidecar (could be achieved in the main repo or in a  
separate repo):  
* A separate build artifact .jar/.deb/.rpm that can be installed  
separately. It's slightly easier with a separate repo but certainly not out  
of reach within a single repo (indeed the current patch already creates a  
separate jar, and we could create a separate .deb reasonably easily).  
Personally I think having a separate .deb/.rpm is premature at this point  
(for companies that really want it they can build their own packages using  
the .jars), but I think it really is a distracting issue from where the  
patch should go as we can always choose to remove experimental .jar files  
that the main daemon doesn't touch.  
* A separate process lifecycle. No matter where the sidecar goes, we get  
the benefit of restarting it being less dangerous for availability than  
restarting the main daemon.  

That all being said, these are strong opinions weakly held and I would  
rather get something actually committed so that we can prove value one way  
or the other and am therefore, of course, happy to put sidecar patches  
wherever someone can review and commit it.  

-Joey  

On Mon, Aug 20, 2018 at 1:52 PM sankalp kohli   
wrote:  

> Hi,  
> I am starting a new thread to get consensus on where the side car  
> should be contributed.  
>  
> Please send your responses with pro/cons of each approach or any other  
> approach. Please be clear which approach you will pick while still giving  
> pros/cons of both approaches.  
>  
> Thanks.  
> Sankalp  
>

Re: Reaper as cassandra-admin

2018-08-28 Thread Blake Eggleston

I haven’t settled on a position yet (will have more time think about things 
after the 9/1 freeze), but I wanted to point out that the argument that 
something new should be written because an existing project has tech debt, and 
we'll do it the right way this time, is a pretty common software engineering 
mistake. The thing you’re replacing usually needs to have some really serious 
problems to make it worth replacing.

I’m sure reaper will bring tech debt with it, but I doubt it's a hopeless mess. 
It would bring a relatively mature project as well as a community of users and 
developers that the other options won’t. It’s probably a lot less work to 
rework whatever shortcomings reaper has, add new-hotness repair schedulers to 
it, and get people to actually use them than it would be to write something 
from scratch and build community confidence in it and get reaper users to 
switch.

On August 28, 2018 at 1:40:59 PM, Roopa Tangirala 
(rtangir...@netflix.com.invalid) wrote:
I share Dinesh's concern too regarding tech debt with existing codebase.  
Its good we have multiple solutions for repairs which have been always  
painful in Cassandra. It would be great to see the community take the best  
pieces from the available solutions and roll it into the fresh side car  
which will help ease Cassandra's maintenance for lot of folks.  

My main concern with starting with an existing codebase is that it comes  
with tech debt. This is not specific to Reaper but to any codebase that is  
imported as a whole. This means future developers and patches have to work  
within the confines of the decisions that were already made. Practically  
speaking once a codebase is established there is inertia in making  
architectural changes and we're left dealing with technical debt.  

*Regards,*  

*Roopa Tangirala*  

Engineering Manager CDE  

*(408) 438-3156 - mobile*  

On Mon, Aug 27, 2018 at 10:49 PM Dinesh Joshi  
 wrote:  

> > On Aug 27, 2018, at 5:36 PM, Jonathan Haddad  wrote:  
> > We're hoping to get some feedback on our side if that's something people  
> > are interested in. We've gone back and forth privately on our own  
> > preferences, hopes, dreams, etc, but I feel like a public discussion  
> would  
> > be healthy at this point. Does anyone share the view of using Reaper as  
> a  
> > starting point? What concerns to people have?  
>  
>  
> I have briefly looked at the Reaper codebase but I am yet to analyze it  
> better to have a real, meaningful opinion.  
>  
> My main concern with starting with an existing codebase is that it comes  
> with tech debt. This is not specific to Reaper but to any codebase that is  
> imported as a whole. This means future developers and patches have to work  
> within the confines of the decisions that were already made. Practically  
> speaking once a codebase is established there is inertia in making  
> architectural changes and we're left dealing with technical debt.  
>  
> As it stands I am not against the idea of using Reaper's features and I  
> would very much like using mature code that has been tested. I would  
> however like to propose piece-mealing it into the codebase. This will give  
> the community a chance to review what is going in and possibly change some  
> of the design decisions upfront. This will also avoid a situation where we  
> have to make many breaking changes in the initial versions due to  
> refactoring.  
>  
> I would also like it if we could compare and contrast the functionality  
> with Priam or any other interesting sidecars that folks may want to call  
> out. In fact it would be great if we could bring in the best functionality  
> from multiple implementations.  
>  
> Dinesh  
> -  
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
> For additional commands, e-mail: dev-h...@cassandra.apache.org  
>  
>

Re: Reaper as cassandra-admin

2018-08-28 Thread Blake Eggleston

> FTR nobody has called Reaper a "hopeless mess".

I didn't mean they did. I just meant that it's generally a bad idea to do a 
rewrite unless the thing being rewritten is a hopeless mess, which reaper 
probably isn't. I realize this isn't technically a rewrite since we're not 
talking about actually rewriting something that's part of the project, but a 
lot of the same reasoning applies to starting work on a new admin tool vs using 
reaper as a starting point. It's not a strictly technical decision either. The 
community of users and developers already established around reaper is also a 
consideration.
On August 28, 2018 at 3:53:02 PM, dinesh.jo...@yahoo.com.INVALID 
(dinesh.jo...@yahoo.com.invalid) wrote:

On Tuesday, August 28, 2018, 2:52:03 PM PDT, Blake Eggleston 
 wrote:  
> I’m sure reaper will bring tech debt with it, but I doubt it's a hopeless 
> mess.   
FTR nobody has called Reaper a "hopeless mess".  
> It would bring a relatively mature project as well as a community of users> 
> and developers that the other options won’t. It’s probably a lot less work to 
> > rework whatever shortcomings reaper has, add new-hotness repair  

You can bring in parts of a relatively mature project that minimize refactoring 
& changes that need to be made once imported. You can also bring in best parts 
of multiples projects without importing entire codebases.  
Dinesh  

On August 28, 2018 at 1:40:59 PM, Roopa Tangirala 
(rtangir...@netflix.com.invalid) wrote:  
I share Dinesh's concern too regarding tech debt with existing codebase.   
Its good we have multiple solutions for repairs which have been always   
painful in Cassandra. It would be great to see the community take the best   
pieces from the available solutions and roll it into the fresh side car   
which will help ease Cassandra's maintenance for lot of folks.   

My main concern with starting with an existing codebase is that it comes   
with tech debt. This is not specific to Reaper but to any codebase that is   
imported as a whole. This means future developers and patches have to work   
within the confines of the decisions that were already made. Practically   
speaking once a codebase is established there is inertia in making   
architectural changes and we're left dealing with technical debt.   

*Regards,*   

*Roopa Tangirala*   

Engineering Manager CDE   

*(408) 438-3156 - mobile*   

On Mon, Aug 27, 2018 at 10:49 PM Dinesh Joshi   
 wrote:   

> > On Aug 27, 2018, at 5:36 PM, Jonathan Haddad  wrote:   
> > We're hoping to get some feedback on our side if that's something people   
> > are interested in. We've gone back and forth privately on our own   
> > preferences, hopes, dreams, etc, but I feel like a public discussion   
> would   
> > be healthy at this point. Does anyone share the view of using Reaper as   
> a   
> > starting point? What concerns to people have?   
>   
>   
> I have briefly looked at the Reaper codebase but I am yet to analyze it   
> better to have a real, meaningful opinion.   
>   
> My main concern with starting with an existing codebase is that it comes   
> with tech debt. This is not specific to Reaper but to any codebase that is   
> imported as a whole. This means future developers and patches have to work   
> within the confines of the decisions that were already made. Practically   
> speaking once a codebase is established there is inertia in making   
> architectural changes and we're left dealing with technical debt.   
>   
> As it stands I am not against the idea of using Reaper's features and I   
> would very much like using mature code that has been tested. I would   
> however like to propose piece-mealing it into the codebase. This will give   
> the community a chance to review what is going in and possibly change some   
> of the design decisions upfront. This will also avoid a situation where we   
> have to make many breaking changes in the initial versions due to   
> refactoring.   
>   
> I would also like it if we could compare and contrast the functionality   
> with Priam or any other interesting sidecars that folks may want to call   
> out. In fact it would be great if we could bring in the best functionality   
> from multiple implementations.   
>   
> Dinesh   
> -   
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org   
> For additional commands, e-mail: dev-h...@cassandra.apache.org   
>   
>

Re: Proposing an Apache Cassandra Management process

2018-09-07 Thread Blake Eggleston

I think we should accept the reaper project as is and make that cassandra 
management process 1.0, then integrate the netflix scheduler (and other new 
features) into that.

The ultimate goal would be for the netflix scheduler to become the default 
repair scheduler, but I think using reaper as the starting point makes it 
easier to get there. 

Reaper would bring a prod user base that would realistically take 2-3 years to 
build up with a new project. As an operator, switching to a cassandra 
management process that’s basically a re-brand of an existing and commonly used 
management process isn’t super risky. Asking operators to switch to a new 
process is a much harder sell. 

On September 7, 2018 at 4:17:10 PM, Jeff Jirsa (jji...@gmail.com) wrote:

How can we continue moving this forward?  

Mick/Jon/TLP folks, is there a path here where we commit the  
Netflix-provided management process, and you augment Reaper to work with it?  
Is there a way we can make a larger umbrella that's modular that can  
support either/both?  
Does anyone believe there's a clear, objective argument that one is  
strictly better than the other? I haven't seen one.  



On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala  
 wrote:  

> +1 to everything that Joey articulated with emphasis on the fact that  
> contributions should be evaluated based on the merit of code and their  
> value add to the whole offering. I hope it does not matter whether that  
> contribution comes from PMC member or a person who is not a committer. I  
> would like the process to be such that it encourages the new members to be  
> a part of the community and not shy away from contributing to the code  
> assuming their contributions are valued differently than committers or PMC  
> members. It would be sad to see the contributions decrease if we go down  
> that path.  
>  
> *Regards,*  
>  
> *Roopa Tangirala*  
>  
> Engineering Manager CDE  
>  
> *(408) 438-3156 - mobile*  
>  
>  
>  
>  
>  
>  
> On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch   
> wrote:  
>  
> > > We are looking to contribute Reaper to the Cassandra project.  
> > >  
> > Just to clarify are you proposing contributing Reaper as a project via  
> > donation or you are planning on contributing the features of Reaper as  
> > patches to Cassandra? If the former how far along are you on the donation  
> > process? If the latter, when do you think you would have patches ready  
> for  
> > consideration / review?  
> >  
> >  
> > > Looking at the patch it's very similar in its base design already, but  
> > > Reaper does has a lot more to offer. We have all been working hard to  
> > move  
> > > it to also being a side-car so it can be contributed. This raises a  
> > number  
> > > of relevant questions to this thread: would we then accept both works  
> in  
> > > the Cassandra project, and what burden would it put on the current PMC  
> to  
> > > maintain both works.  
> > >  
> > I would hope that we would collaborate on merging the best parts of all  
> > into the official Cassandra sidecar, taking the always on, shared  
> nothing,  
> > highly available system that we've contributed a patchset for and adding  
> in  
> > many of the repair features (e.g. schedules, a nice web UI) that Reaper  
> > has.  
> >  
> >  
> > > I share Stefan's concern that consensus had not been met around a  
> > > side-car, and that it was somehow default accepted before a patch  
> landed.  
> >  
> >  
> > I feel this is not correct or fair. The sidecar and repair discussions  
> have  
> > been anything _but_ "default accepted". The timeline of consensus  
> building  
> > involving the management sidecar and repair scheduling plans:  
> >  
> > Dec 2016: Vinay worked with Jon and Alex to try to collaborate on Reaper  
> to  
> > come up with design goals for a repair scheduler that could work at  
> Netflix  
> > scale.  
> >  
> > ~Feb 2017: Netflix believes that the fundamental design gaps prevented us  
> > from using Reaper as it relies heavily on remote JMX connections and  
> > central coordination.  
> >  
> > Sep. 2017: Vinay gives a lightning talk at NGCC about a highly available  
> > and distributed repair scheduling sidecar/tool. He is encouraged by  
> > multiple committers to build repair scheduling into the daemon itself and  
> > not as a sidecar so the database is truly eventually consistent.  
> >  
> > ~Jun. 2017 - Feb. 2018: Based on internal need and the positive feedback  
> at  
> > NGCC, Vinay and myself prototype the distributed repair scheduler within  
> > Priam and roll it out at Netflix scale.  
> >  
> > Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 page  
> > design document for adding repair scheduling to the daemon itself and  
> open  
> > the design up for feedback from the community. We get feedback from Alex,  
> > Blake, Nate, Stefan, and Mick. As far as I know there were zero proposals  
> > to contribute Reaper at this point.

Re: Proposing an Apache Cassandra Management process

2018-09-07 Thread Blake Eggleston

What’s the benefit of doing it that way vs starting with reaper and integrating 
the netflix scheduler? If reaper was just a really inappropriate choice for the 
cassandra management process, I could see that being a better approach, but I 
don’t think that’s the case.

If our management process isn’t a drop in replacement for reaper, then reaper 
will continue to exist, which will split the user and developers base between 
the 2 projects. That won't be good for either project.

On September 7, 2018 at 6:12:01 PM, Jeff Jirsa (jji...@gmail.com) wrote:

I’d also like to see the end state you describe: reaper UI wrapping the Netflix 
management process with pluggable scheduling (either as is with reaper now, or 
using the Netflix scheduler), but I don’t think that means we need to start 
with reaper - if personally prefer the opposite direction, starting with 
something small and isolated and layering on top.  

--  
Jeff Jirsa  


> On Sep 7, 2018, at 5:42 PM, Blake Eggleston  wrote:  
>  
> I think we should accept the reaper project as is and make that cassandra 
> management process 1.0, then integrate the netflix scheduler (and other new 
> features) into that.  
>  
> The ultimate goal would be for the netflix scheduler to become the default 
> repair scheduler, but I think using reaper as the starting point makes it 
> easier to get there.  
>  
> Reaper would bring a prod user base that would realistically take 2-3 years 
> to build up with a new project. As an operator, switching to a cassandra 
> management process that’s basically a re-brand of an existing and commonly 
> used management process isn’t super risky. Asking operators to switch to a 
> new process is a much harder sell.  
>  
> On September 7, 2018 at 4:17:10 PM, Jeff Jirsa (jji...@gmail.com) wrote:  
>  
> How can we continue moving this forward?  
>  
> Mick/Jon/TLP folks, is there a path here where we commit the  
> Netflix-provided management process, and you augment Reaper to work with it?  
> Is there a way we can make a larger umbrella that's modular that can  
> support either/both?  
> Does anyone believe there's a clear, objective argument that one is  
> strictly better than the other? I haven't seen one.  
>  
>  
>  
> On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala  
>  wrote:  
>  
>> +1 to everything that Joey articulated with emphasis on the fact that  
>> contributions should be evaluated based on the merit of code and their  
>> value add to the whole offering. I hope it does not matter whether that  
>> contribution comes from PMC member or a person who is not a committer. I  
>> would like the process to be such that it encourages the new members to be  
>> a part of the community and not shy away from contributing to the code  
>> assuming their contributions are valued differently than committers or PMC  
>> members. It would be sad to see the contributions decrease if we go down  
>> that path.  
>>  
>> *Regards,*  
>>  
>> *Roopa Tangirala*  
>>  
>> Engineering Manager CDE  
>>  
>> *(408) 438-3156 - mobile*  
>>  
>>  
>>  
>>  
>>  
>>  
>> On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch   
>> wrote:  
>>  
>>>> We are looking to contribute Reaper to the Cassandra project.  
>>>>  
>>> Just to clarify are you proposing contributing Reaper as a project via  
>>> donation or you are planning on contributing the features of Reaper as  
>>> patches to Cassandra? If the former how far along are you on the donation  
>>> process? If the latter, when do you think you would have patches ready  
>> for  
>>> consideration / review?  
>>>  
>>>  
>>>> Looking at the patch it's very similar in its base design already, but  
>>>> Reaper does has a lot more to offer. We have all been working hard to  
>>> move  
>>>> it to also being a side-car so it can be contributed. This raises a  
>>> number  
>>>> of relevant questions to this thread: would we then accept both works  
>> in  
>>>> the Cassandra project, and what burden would it put on the current PMC  
>> to  
>>>> maintain both works.  
>>>>  
>>> I would hope that we would collaborate on merging the best parts of all  
>>> into the official Cassandra sidecar, taking the always on, shared  
>> nothing,  
>>> highly available system that we've contributed a patchset for and adding  
>> in  
>>> many of the repair features (e.g. schedules, a nice web UI) that Reaper  
>>> has.  
>>>  
>>>  
>>>

Re: Proposing an Apache Cassandra Management process

2018-09-07 Thread Blake Eggleston

Right, I understand the arguments for starting a new project. I’m not saying 
reaper is, technically speaking, the best place to start. The point I’m trying 
to make is that the non-technical advantages of using an existing project as a 
starting point may outweigh the technical benefits of a clean slate. Whether 
that’s the case or not, it’s not a strictly technical decision, and the 
non-technical advantages of starting with reaper need to be weighed.

On September 7, 2018 at 8:19:50 PM, Jeff Jirsa (jji...@gmail.com) wrote:

The benefit is that it more closely matched the design doc, from 5 months ago, 
which is decidedly not about coordinating repair - it’s about a general purpose 
management tool, where repair is one of many proposed tasks  

https://docs.google.com/document/d/1UV9pE81NaIUF3g4L1wxq09nT11AkSQcMijgLFwGsY3s/edit
  


By starting with a tool that is built to run repair, you’re sacrificing 
generality and accepting something purpose built for one sub task. It’s an 
important subtask, and it’s a nice tool, but it’s not an implementation of the 
proposal, it’s an alternative that happens to do some of what was proposed.  

--  
Jeff Jirsa  


> On Sep 7, 2018, at 6:53 PM, Blake Eggleston  wrote:  
>  
> What’s the benefit of doing it that way vs starting with reaper and 
> integrating the netflix scheduler? If reaper was just a really inappropriate 
> choice for the cassandra management process, I could see that being a better 
> approach, but I don’t think that’s the case.  
>  
> If our management process isn’t a drop in replacement for reaper, then reaper 
> will continue to exist, which will split the user and developers base between 
> the 2 projects. That won't be good for either project.  
>  
> On September 7, 2018 at 6:12:01 PM, Jeff Jirsa (jji...@gmail.com) wrote:  
>  
> I’d also like to see the end state you describe: reaper UI wrapping the 
> Netflix management process with pluggable scheduling (either as is with 
> reaper now, or using the Netflix scheduler), but I don’t think that means we 
> need to start with reaper - if personally prefer the opposite direction, 
> starting with something small and isolated and layering on top.  
>  
> --  
> Jeff Jirsa  
>  
>  
>> On Sep 7, 2018, at 5:42 PM, Blake Eggleston  wrote:  
>>  
>> I think we should accept the reaper project as is and make that cassandra 
>> management process 1.0, then integrate the netflix scheduler (and other new 
>> features) into that.  
>>  
>> The ultimate goal would be for the netflix scheduler to become the default 
>> repair scheduler, but I think using reaper as the starting point makes it 
>> easier to get there.  
>>  
>> Reaper would bring a prod user base that would realistically take 2-3 years 
>> to build up with a new project. As an operator, switching to a cassandra 
>> management process that’s basically a re-brand of an existing and commonly 
>> used management process isn’t super risky. Asking operators to switch to a 
>> new process is a much harder sell.  
>>  
>> On September 7, 2018 at 4:17:10 PM, Jeff Jirsa (jji...@gmail.com) wrote:  
>>  
>> How can we continue moving this forward?  
>>  
>> Mick/Jon/TLP folks, is there a path here where we commit the  
>> Netflix-provided management process, and you augment Reaper to work with it? 
>>  
>> Is there a way we can make a larger umbrella that's modular that can  
>> support either/both?  
>> Does anyone believe there's a clear, objective argument that one is  
>> strictly better than the other? I haven't seen one.  
>>  
>>  
>>  
>> On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala  
>>  wrote:  
>>  
>>> +1 to everything that Joey articulated with emphasis on the fact that  
>>> contributions should be evaluated based on the merit of code and their  
>>> value add to the whole offering. I hope it does not matter whether that  
>>> contribution comes from PMC member or a person who is not a committer. I  
>>> would like the process to be such that it encourages the new members to be  
>>> a part of the community and not shy away from contributing to the code  
>>> assuming their contributions are valued differently than committers or PMC  
>>> members. It would be sad to see the contributions decrease if we go down  
>>> that path.  
>>>  
>>> *Regards,*  
>>>  
>>> *Roopa Tangirala*  
>>>  
>>> Engineering Manager CDE  
>>>  
>>> *(408) 438-3156 - mobile*  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>> On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch

Re: Proposing an Apache Cassandra Management process

2018-09-12 Thread Blake Eggleston

Reading through the history Sankalp posted (I think it was originally posted by 
Joey?), I think part of the problem we’re having here is that we’re trying to 
solve at least 3 problems with a single solution. Also, I don’t think everyone 
has the same goals in mind. The issues we’re trying to solve are: 

Repair scheduling - original proposal was for an in-process distributed 
scheduler to make cassandra eventually consistent without relying on external 
tools.

Sidecar - proposed as a helper co-process to make stuff like cluster wide 
nodetool command execution, health check, etc easier. I don’t think the 
original proposal mentioned repair.

Ops center like management application with a ui seems to have made it’s way 
into the mix at some point

These are all intended to make cassandra easier to operate, but they’re really 
separate features. It would be more productive to focus on each one as it’s own 
feature instead of trying to design a one size fits all and does everything 
management tool.

On September 12, 2018 at 6:25:11 PM, sankalp kohli (kohlisank...@gmail.com) 
wrote:

Here is a list of open discussion points from the voting thread. I think  
some are already answered but I will still gather these questions here.  

From several people:  
1. Vote is rushed and we need more time for discussion.  

From Sylvain  
2. About the voting process...I think that was addressed by Jeff Jirsa and  
deserves a separate thread as it is not directly related to this thread.  
3. Does the project need a side car.  

From Jonathan Haddad  
4. Are people doing +1 willing to contribute  

From Jonathan Ellis  
5. List of feature set, maturity, maintainer availability from Reaper or  
any other project being donated.  

Mick Semb Wever  
6. We should not vote on these things and instead build consensus.  

Open Questions from this thread  
7. What technical debts we are talking about in Reaper. Can someone give  
concrete examples.  
8. What is the timeline of donating Reaper to Apache Cassandra.  

On Wed, Sep 12, 2018 at 3:49 PM sankalp kohli   
wrote:  

> (Using this thread and not the vote thread intentionally)  
> For folks talking about vote being rushed. I would use the email from  
> Joseph to show this is not rushed. There was no email on this thread for 4  
> months until I pinged.  
>  
>  
> Dec 2016: Vinay worked with Jon and Alex to try to collaborate on Reaper to  
> come up with design goals for a repair scheduler that could work at Netflix  
> scale.  
>  
> ~Feb 2017: Netflix believes that the fundamental design gaps prevented us  
> from using Reaper as it relies heavily on remote JMX connections and  
> central coordination.  
>  
> Sep. 2017: Vinay gives a lightning talk at NGCC about a highly available  
> and distributed repair scheduling sidecar/tool. He is encouraged by  
> multiple committers to build repair scheduling into the daemon itself and  
> not as a sidecar so the database is truly eventually consistent.  
>  
> ~Jun. 2017 - Feb. 2018: Based on internal need and the positive feedback at  
> NGCC, Vinay and myself prototype the distributed repair scheduler within  
> Priam and roll it out at Netflix scale.  
>  
> Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 page  
> design document for adding repair scheduling to the daemon itself and open  
> the design up for feedback from the community. We get feedback from Alex,  
> Blake, Nate, Stefan, and Mick. As far as I know there were zero proposals  
> to contribute Reaper at this point. We hear the consensus that the  
> community would prefer repair scheduling in a separate distributed sidecar  
> rather than in the daemon itself and we re-work the design to match this  
> consensus, re-aligning with our original proposal at NGCC.  
>  
> Apr 2018: Blake brings the discussion of repair scheduling to the dev list  
> (  
>  
> https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E
>   
> ).  
> Many community members give positive feedback that we should solve it as  
> part of Cassandra and there is still no mention of contributing Reaper at  
> this point. The last message is my attempted summary giving context on how  
> we want to take the best of all the sidecars (OpsCenter, Priam, Reaper) and  
> ship them with Cassandra.  
>  
> Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design document  
> for gathering feedback on a general management sidecar. Sankalp and Dinesh  
> encourage Vinay and myself to kickstart that sidecar using the repair  
> scheduler patch  
>  
> Apr 2018: Dinesh reaches out to the dev list (  
>  
> https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E
>   
> )  
> about the general management process to gain further feedback. All feedback  
> remains positive as it is a potential place for multiple community members  
> to contr

Re: JIRA Workflow Proposals

2018-12-04 Thread Blake Eggleston

1: A
2: +1
3: +1
4: +1
5: +1
6: +1

> On Dec 4, 2018, at 11:19 AM, Benedict Elliott Smith  
> wrote:
> 
> Sorry, 4. Is inconsistent.  First instance should be.
> 
>> - 4. Priorities: Keep ‘High' priority
> 
> 
>> On 4 Dec 2018, at 19:12, Benedict Elliott Smith > > wrote:
>> 
>> Ok, so after an initial flurry everyone has lost interest :)
>> 
>> I think we should take a quick poll (not a vote), on people’s positions on 
>> the questions raised so far.  If people could try to take the time to stake 
>> a +1/-1, or A/B, for each item, that would be really great.  This poll will 
>> not be the end of discussions, but will (hopefully) at least draw a line 
>> under the current open questions.
>> 
>> I will start with some verbiage, then summarise with options for everyone to 
>> respond to.  You can scroll to the summary immediately if you like.
>> 
>> - 1. Component: Multi-select or Cascading-select (i.e. only one component 
>> possible per ticket, but neater UX)
>> - 2. Labels: rather than litigate people’s positions, I propose we do the 
>> least controversial thing, which is to simply leave labels intact, and only 
>> supplement them with the new schema information.  We can later revisit if we 
>> decide it’s getting messy.
>> - 3. "First review completed; second review ongoing": I don’t think we need 
>> to complicate the process; if there are two reviews in flight, the first 
>> reviewer can simply comment that they are done when ready, and the second 
>> reviewer can move the status once they are done.  If the first reviewer 
>> wants substantive changes, they can move the status to "Change Request” 
>> before the other reviewer completes, if they like.  Everyone involved can 
>> probably negotiate this fairly well, but we can introduce some specific 
>> guidance on how to conduct yourself here in a follow-up.  
>> - 4. Priorities: Keep ‘High' priority
>> - 5. Mandatory Platform and Feature. Make mandatory by introducing new “All” 
>> and “None” (respectively) options, so always possible to select an option.
>> - 6. Environment field: Remove?
>> 
>> I think this captures everything that has been brought up so far, except for 
>> the suggestion to make "Since Version” a “Version” - but that needs more 
>> discussion, as I don’t think there’s a clear alternative proposal yet.
>> 
>> Summary:
>> 
>> 1: Component. (A) Multi-select; (B) Cascading-select
>> 2: Labels: leave alone +1/-1
>> 3: No workflow changes for first/second review: +1/-1
>> 4: Priorities: Including High +1/-1
>> 5: Mandatory Platform and Feature: +1/-1
>> 6: Remove Environment field: +1/-1
>> 
>> I will begin.
>> 
>> 1: A
>> 2: +1
>> 3: +1
>> 4: +1
>> 5: Don’t mind
>> 6: +1
>> 
>> 
>> 
>> 
>>> On 29 Nov 2018, at 22:04, Scott Andreas >>  >> >> wrote:
>>> 
>>> If I read Josh’s reply right, I think the suggestion is to periodically 
>>> review active labels and promote those that are demonstrably useful to 
>>> components (cf. folksonomy -> 
>>> taxonomy>>  
>>> >> >>). I 
>>> hadn’t read the reply as indicating that labels should be zero’d out 
>>> periodically. In any case, I agree that reviewing active labels and 
>>> re-evaluating our taxonomy from time to time sounds great; I don’t think 
>>> I’d zero them, though.
>>> 
>>> Responding to a few comments:
>>> 
>>> –––
>>> 
>>> – To Joey’s question about issues languishing in Triage: I like the idea of 
>>> an SLO for the “triage” state. I am happy to commit time and resources to 
>>> triaging newly-reported issues, and to JIRA pruning/gardening in general. I 
>>> spent part of the weekend before last adding components to a few hundred 
>>> open issues and preparing the Confluence reports mentioned in the other 
>>> thread. It was calming. We can also figure out how to rotate / share this 
>>> responsibility.
>>> 
>>> – Labels discussion: If we adopt a more structured component hierarchy to 
>>> treat as our primary method of organization, keep labels around for people 
>>> to use as they’d like (e.g., for custom JQL queries useful to their 
>>> workflows), and periodically promote those that are widely useful, I think 
>>> that sounds like a fine outcome.
>>> 
>>> – On Sankalp’s question of issue reporter / new contributor burden: I 
>>> actually think the pruning of fields on the “new issue form” makes 
>>> reporting issues easier and ensures that information we need is captured. 
>>> Having the triage step will also provide a nice task queue for screening 
>>> bugs, and ensures a contributor’s taken a look + screened appropriately 
>>> (rather than support requests immediately being marked “Critical/Blocker”

Re: [VOTE] Change Jira Workflow

2018-12-17 Thread Blake Eggleston

+1

> On Dec 17, 2018, at 9:31 AM, jay.zhu...@yahoo.com.INVALID wrote:
> 
> +1
> 
>On Monday, December 17, 2018, 9:10:55 AM PST, Jason Brown 
>  wrote:  
> 
> +1.
> 
> On Mon, Dec 17, 2018 at 7:36 AM Michael Shuler 
> wrote:
> 
>> +1
>> 
>> --
>> Michael
>> 
>> On 12/17/18 9:19 AM, Benedict Elliott Smith wrote:
>>> I propose these changes <
>> https://cwiki.apache.org/confluence/display/CASSANDRA/JIRA+Workflow+Proposals>*
>> to the Jira Workflow for the project.  The vote will be open for 72 hours**.
>>> 
>>> I am, of course, +1.
>>> 
>>> * With the addendum of the mailing list discussion <
>> https://lists.apache.org/thread.html/e4668093169aa4ef52f2bea779333f04a0afde8640c9a79a8c86ee74@%3Cdev.cassandra.apache.org%3E>;
>> in case of any conflict arising from a mistake on my part in the wiki, the
>> consensus reached by polling the mailing list will take precedence.
>>> ** I won’t be around to close the vote, as I will be on vacation.
>> Everyone is welcome to ignore the result until I get back in a couple of
>> weeks, or if anybody is eager feel free to close the vote and take some
>> steps towards implementation.
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Modeling Time Series data

2019-01-11 Thread Blake Eggleston

This is a question for the user list.

> On Jan 11, 2019, at 1:51 PM, Akash Gangil  wrote:
> 
> Hi,
> 
> I have a data model where the partition key for a lot of tables is based on
> time
> (year, month, day, hour)
> 
> Would this create a hotspot in my cluster, given all the writes/reads would
> go to the same node for a given hour? Or does the cassandra storage engine
> also takes into account the table info like table name, when distributing
> the data?
> 
> If the above model would be a problem, what's the suggested way to solve
> this? Add tablename to partition key?
> 
> 
> 
> -- 
> Akash


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Both Java 8 and Java 11 required for producing a tarball

2019-03-13 Thread Blake Eggleston

You may want to wait until CASSANDRA-14607 is finished before starting on 
14712. I think it will end up unwinding most of the stuff requiring building 
with dual jdks (either as part of that ticket or an immediate follow on).

I'm still working on making sure I haven't broken anything, but I'm currently 
able to build with a single jdk (8 or 11) without any problems. I also haven't 
run into any problems running a strictly jdk 8 build in java 11. I have more 
testing to do, but it seems ok so far.

> On Mar 13, 2019, at 4:14 PM, Stefan Miklosovic 
>  wrote:
> 
> Hi,
> 
> how do I assign myself to
> https://issues.apache.org/jira/browse/CASSANDRA-14712 ?
> 
> I read the doco here (1) but I think that workflow does not apply to
> cassandra-builds repo.
> 
> Should I do this first and then notify people? Until then it might happen
> that my time would be wasted as somebody else would start to work on that
> simultaneously.
> 
> (1) https://cassandra.apache.org/doc/latest/development/patches.html#patches
> 
> On Thu, 14 Mar 2019 at 08:46, Jordan West  wrote:
> 
>> A couple related JIRAs for reference:
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-14714
>> https://issues.apache.org/jira/browse/CASSANDRA-14712
>> 
>> On Wed, Mar 6, 2019 at 7:34 PM Michael Shuler 
>> wrote:
>> 
>>> On 3/6/19 7:10 PM, Stefan Miklosovic wrote:
 I am trying to build 4.0 from sources and prior to this I was doing
 
 ant artifacts
 
 in order to get distribution tarball to play with.
 
 If I understand this right, if I do not run Ant with Java 11,
 java.version.8 will be true so it will skip building tarballs.
>>> 
>>> Correct. You'll get a JDK8-only jar, but no full tar artifact set.
>>> 
 1) Why would one couldnt create a tarball while running on Java 8 only?
>>> 
>>> The build system needs a dual-JDK install to build the artifacts with
>>> support for each/both.
>>> 
 2) What is the current status of Java 11 / Java 8? Is it there just "to
>>> try
 it out if it runs with that" or are there different reasons behind it?
>>> 
>>> JDK8 runtime is the default, JDK11 runtime is optional, but supported.
>>> Here's the JIRA with all the details:
>>> https://issues.apache.org/jira/browse/CASSANDRA-9608
>>> 
>>> I just pushed a WIP branch to do a dual-JDK build via docker, since we
>>> need to work on this, too. (lines may wrap:)
>>> 
>>> git clone -b tar-artifact-build
>>> https://gitbox.apache.org/repos/asf/cassandra-builds.git
>>> 
>>> cd cassandra-builds/
>>> 
>>> docker build -t cass-build-tars -f docker/buster-image.docker docker/
>>> 
>>> docker run --rm -v `pwd`/dist:/dist `docker images -f
>>> label=org.cassandra.buildenv=buster -q` /home/build/build-tars.sh trunk
>>> 
>>> After all that, here's my tar artifacts:
>>> 
>>> (tar-artifact-build)mshuler@hana:~/git/cassandra-builds$ ls -l dist/
>>> total 94328
>>> -rw-r--r-- 1 mshuler mshuler 50385890 Mar  6 21:16
>>> apache-cassandra-4.0-SNAPSHOT-bin.tar.gz
>>> -rw-r--r-- 1 mshuler mshuler 46198947 Mar  6 21:16
>>> apache-cassandra-4.0-SNAPSHOT-src.tar.gz
>>> 
>>> Or you could drop a dual-JDK install on your machine, export the env
>>> vars you found and `ant artifacts` should produce the tars :)
>>> 
>>> --
>>> Kind regards,
>>> Michael
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>> 
> 
> Stefan Miklosovic


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Blake Eggleston

Well said Josh. You’ve pretty much summarized my thoughts on this as well.

+1 to moving forward with this

> On Apr 11, 2019, at 10:15 PM, Joshua McKenzie  wrote:
> 
> As one of the two people that re-wrote all our unit tests to try and help
> Sylvain get 8099 out the door, I think it's inaccurate to compare the scope
> and potential stability impact of this work to the truly sweeping work that
> went into 8099 (not to downplay the scope and extent of this work here).
> 
> TBH, one of the big reasons we tend to drop such large PRs is the fact that
>> Cassandra's code is highly intertwined and it makes it hard to precisely
>> change things. We need to iterate towards interfaces that allow us to
>> iterate quickly and reduce the amount of highly intertwined code. It helps
>> with testing as well. I want us to have a meaningful discussion around it
>> before we drop a big PR.
> 
> This has been a huge issue with our codebase since at least back when I
> first encountered it five years ago. To date, while we have made progress
> on this front, it's been nowhere near sufficient to mitigate the issues in
> the codebase and allow for large, meaningful changes in smaller incremental
> patches or commits. Having yet another discussion around this (there have
> been many, many of them over the years) as a blocker for significant work
> to go into the codebase is an unnecessary and dangerous blocker. Not to say
> we shouldn't formalize a path to try and make incremental progress to
> improve the situation, far from it, but blocking other progress on a
> decade's worth of accumulated hygiene problems isn't going to make the
> community focus on fixing those problems imo, it'll just turn away
> contributions.
> 
> So let me second jd (and many others') opinion here: "it makes sense to get
> it right the first time, rather than applying bandaids to 4.0 and rewriting
> things for 4.next". And fwiw, asking people who have already done a huge
> body of work to reformat that work into a series of commits or to break up
> that work in a fashion that's more to the liking of people not involved in
> either the writing of the patch or reviewing of it doesn't make much sense
> to me. As I am neither an assignee nor reviewer on this contribution, I
> leave it up to the parties involved to do things professionally and with a
> high standard of quality. Admittedly, a large code change merging in like
> this has implications for rebasing on anyone else's work that's in flight,
> but be it one commit merged or 50, or be it one JIRA ticket or ten, the
> end-result is the same; any large contribution in any format will ripple
> outwards and require re-work from others in the community.
> 
> The one thing I *would* strongly argue for is performance benchmarking of
> the new messaging code on a representative sample of different
> general-purpose queries, LWT's, etc, preferably in a 3 node RF=3 cluster,
> plus a healthy suite of jmh micro-benches (assuming they're not already in
> the diff. If they are, disregard / sorry). From speaking with Aleksey
> offline about this work, my understanding is that that's something they
> plan on doing before putting a bow on things.
> 
> In the balance between "fear change because it destabilizes" and "go forth
> blindly into that dark night, rewriting All The Things", I think the
> Cassandra project's willingness to jettison the old and introduce the new
> has served it well in keeping relevant as the years have gone by. I'd hate
> to see that culture of progress get mired in a dogmatic adherence to
> requirements on commit counts, lines of code allowed / expected on a given
> patch set, or any other metrics that might stymie the professional needs of
> some of the heaviest contributors to the project.
> 
> On Wed, Apr 10, 2019 at 5:03 PM Oleksandr Petrov 
> wrote:
> 
>> Sorry to pick only a few points to address, but I think these are ones
>> where I can contribute productively to the discussion.
>> 
>>> In principle, I agree with the technical improvements you
>> mention (backpressure / checksumming / etc). These things should be there.
>> Are they a hard requirement for 4.0?
>> 
>> One thing that comes to mind is protocol versioning and consistency. If
>> changes adding checksumming and handshake do not make it to 4.0, we grow
>> the upgrade matrix and have to put changes to the separate protocol
>> version. I'm not sure how many other internode protocol changes we have
>> planned for 4.next, but this is definitely something we should keep in
>> mind.
>> 
>>> 2. We should not be measuring complexity in LoC with the exception that
>> all 20k lines *do need to be review* (not just the important parts and
>> because code refactoring tools have bugs too) and more lines take more
>> time.
>> 
>> Everything should definitely be reviewed. But with different rigour: one
>> thing is to review byte arithmetic and protocol formats and completely
>> different thing is to verify that Verb moved from one place

Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Blake Eggleston

It seems like one of the main points of contention isn’t so much the the 
content of the patch, but more about the amount of review this patch has/will 
receive relative to its perceived risk. If it’s the latter, then I think it 
would be more effective to explain why that’s the case, and what level of 
review would be more appropriate.

I’m personally +0  on requiring additional review. I feel like the 3 people 
involved so far have sufficient expertise, and trust them to be responsible, 
including soliciting additional reviews if they feel they’re needed.

If dev@ does collectively want more eyes on this, I’d suggest we solicit 
reviews from people who are very familiar with the messaging code, and let them 
decide what additional work and documentation they’d need to make a review 
manageable, if any. Everyone has their own review style, and there’s no need to 
ask for a bunch of additional work if it’s not needed.

> On Apr 12, 2019, at 12:46 PM, Jordan West  wrote:
> 
> Since their seems to be an assumption that I haven’t read the code let me
> clarify: I am working on making time to be a reviewer on this and I have
> already spent a few hours with the patch before I sent any replies, likely
> more than most who are replying here. Again, because I disagree on
> non-technical matters does not mean I haven’t considered the technical. I
> am sharing what I think is necessary for the authors
> to make review higher quality. I will not compromise my review standards on
> a patch like this as I have said already. Telling me to review it to talk
> more about it directly ignores my feedback and requires me to acquiesce all
> of my concerns, which as I said I won’t do as a reviewer.
> 
> And yes I am arguing for changing how the Cassandra community approaches
> large patches. In the same way the freeze changed how we approached major
> releases and the decision to do so has been a net benefit as measured by
> quality and stability. Existing community members have already chimed in in
> support of things like better commit hygiene.
> 
> The past approaches haven’t prioritized quality and stability and it really
> shows. What I and others here are suggesting has worked all over our
> industry and is adopted by companies big (like google as i linked
> previously) and small (like many startups I and others have worked for).
> Everything we want to do: better testing, better review, better code, is
> made easier with better design review, better discussion, and more
> digestible patches among many of the other things suggested in this thread.
> 
> Jordan
> 
> On Fri, Apr 12, 2019 at 12:01 PM Benedict Elliott Smith 
> wrote:
> 
>> I would once again exhort everyone making these kinds of comment to
>> actually read the code, and to comment on Jira.  Preferably with a
>> justification by reference to the code for how or why it would improve the
>> patch.
>> 
>> As far as a design document is concerned, it’s very unclear what is being
>> requested.  We already had plans, as Jordan knows, to produce a wiki page
>> for posterity, and a blog post closer to release.  However, I have never
>> heard of this as a requirement for review, or for commit.  We have so far
>> taken two members of the community through the patch over video chat, and
>> would be more than happy to do the same for others.  So far nobody has had
>> any difficulty getting to grips with its structure.
>> 
>> If the project wants to modify its normal process for putting a patch
>> together, this is a whole different can of worms, and I am strongly -1.
>> I’m not sure what precedent we’re trying to set imposing arbitrary
>> constraints pre-commit for work that has already met the project’s
>> inclusion criteria.
>> 
>> 
>>> On 12 Apr 2019, at 18:58, Pavel Yaskevich  wrote:
>>> 
>>> I haven't actually looked at the code
>> 
>> 
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Time for a new 3.0/3.11 release?

2019-07-01 Thread Blake Eggleston

Hi dev@,

Any objections to doing a new 3.0 and 3.11 release? Both branches have 
accumulated a decent number of changes since their last release, the highlights 
being improved merkle tree footprint, a gossip race, and a handful of 2.1 -> 
3.x upgrade bugs.

Thanks,

Blake


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: fixing paging state for 4.0

2019-09-24 Thread Blake Eggleston

Changing paging state format is kind of a pain since the driver treats it as an 
opaque blob. I'd prefer we went with Sylvain's suggestion to just interpret 
Integer.MAX_VALUE as "no limit", which would be a lot simpler to implement.

> On Sep 24, 2019, at 10:44 AM, Jon Haddad  wrote:
> 
> I'm working with a team who just ran into CASSANDRA-14683 [1], which I
> didn't realize was an issue till now.
> 
> Anyone have an interest in fixing full table pagination?  I'm not sure of
> the full implications of changing the int to a long in the paging stage.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-14683

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Attention to serious bug CASSANDRA-15081

2019-09-24 Thread Blake Eggleston

This looks like a dupe of CASSANDRA-15086, which has been committed and will be 
included in 3.0.19.

> On Sep 11, 2019, at 5:10 PM, Cameron Zemek  wrote:
> 
> Have had multiple customer hit this CASSANDRA-15081 issue now, where
> upgrading from older versions the sstables contain an unknown column (its
> not present in the dropped_columns in the schema)
> 
> This bug is serious as reads return incorrect results and if you run scrub
> it will drop the row. So hoping to bring it some attention to have the
> issue resolved. Note I have included a patch that I think does not cause
> any regressions elsewhere.
> 
> Regards,
> Cameron


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: fixing paging state for 4.0

2019-09-24 Thread Blake Eggleston

Right, mixed version clusters. The opaque blob isn't versioned, and there isn't 
an opportunity for min version negotiation that you have with the messaging 
service. The result is situations where a client begins a read on one node, and 
attempts to read the next page from a different node over a protocol version 
where the paging state serialization format has changed. This causes an 
exception deserializing the paging state and the read fails.

There are ways around this, but they're not comprehensive (I think), and 
they're much more involved than just interpreting Integer.MAX_VALUE as 
unlimited. The "right" solution would be for the paging state to be 
deserialized/serialized on the client side, but that won't happen in 4.0.

> On Sep 24, 2019, at 1:12 PM, Jon Haddad  wrote:
> 
> What's the pain point?  Is it because of mixed version clusters or is there
> something else that makes it a problem?
> 
> On Tue, Sep 24, 2019 at 11:03 AM Blake Eggleston
>  wrote:
> 
>> Changing paging state format is kind of a pain since the driver treats it
>> as an opaque blob. I'd prefer we went with Sylvain's suggestion to just
>> interpret Integer.MAX_VALUE as "no limit", which would be a lot simpler to
>> implement.
>> 
>>> On Sep 24, 2019, at 10:44 AM, Jon Haddad  wrote:
>>> 
>>> I'm working with a team who just ran into CASSANDRA-14683 [1], which I
>>> didn't realize was an issue till now.
>>> 
>>> Anyone have an interest in fixing full table pagination?  I'm not sure of
>>> the full implications of changing the int to a long in the paging stage.
>>> 
>>> https://issues.apache.org/jira/browse/CASSANDRA-14683
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: fixing paging state for 4.0

2019-09-24 Thread Blake Eggleston

Right, that's the problem with changing the paging state format. It doesn't 
work in mixed mode.

> On Sep 24, 2019, at 4:47 PM, Jeremiah Jordan  wrote:
> 
> Clients do negotiate the protocol version they use when connecting. If the 
> server bumped the protocol version then this larger paging state could be 
> part of the new protocol version. But that doesn’t solve the problem for 
> existing versions.
> 
> The special treatment of Integer.MAX_VALUE can be done back to 3.x and fix 
> the bug in all versions, letting users requests to receive all of their data. 
>  Which realistically is probably what someone who sets the protocol level 
> query limit to Integer.MAX_VALUE is trying to do.
> 
> -Jeremiah
> 
>> On Sep 24, 2019, at 4:09 PM, Blake Eggleston  
>> wrote:
>> 
>> Right, mixed version clusters. The opaque blob isn't versioned, and there 
>> isn't an opportunity for min version negotiation that you have with the 
>> messaging service. The result is situations where a client begins a read on 
>> one node, and attempts to read the next page from a different node over a 
>> protocol version where the paging state serialization format has changed. 
>> This causes an exception deserializing the paging state and the read fails.
>> 
>> There are ways around this, but they're not comprehensive (I think), and 
>> they're much more involved than just interpreting Integer.MAX_VALUE as 
>> unlimited. The "right" solution would be for the paging state to be 
>> deserialized/serialized on the client side, but that won't happen in 4.0.
>> 
>>> On Sep 24, 2019, at 1:12 PM, Jon Haddad  wrote:
>>> 
>>> What's the pain point?  Is it because of mixed version clusters or is there
>>> something else that makes it a problem?
>>> 
>>>> On Tue, Sep 24, 2019 at 11:03 AM Blake Eggleston
>>>>  wrote:
>>>> 
>>>> Changing paging state format is kind of a pain since the driver treats it
>>>> as an opaque blob. I'd prefer we went with Sylvain's suggestion to just
>>>> interpret Integer.MAX_VALUE as "no limit", which would be a lot simpler to
>>>> implement.
>>>> 
>>>>> On Sep 24, 2019, at 10:44 AM, Jon Haddad  wrote:
>>>>> 
>>>>> I'm working with a team who just ran into CASSANDRA-14683 [1], which I
>>>>> didn't realize was an issue till now.
>>>>> 
>>>>> Anyone have an interest in fixing full table pagination?  I'm not sure of
>>>>> the full implications of changing the int to a long in the paging stage.
>>>>> 
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D14683&d=DwIFAg&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=6_gWDV_kv-TQJ8GyBlYfcrhPGl7WmGYGEJ9ET6rPARo&s=LcYkbQwf4gzl8tnMcVbFKr3PeZ_u8mHHnXTBRWtIZFU&e=
>>>>>  
>>>> 
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>> 
>>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: fixing paging state for 4.0

2019-09-24 Thread Blake Eggleston

Yes, but if a client is connected to 2 different nodes, and is using a 
different protocol for each, the paging state formats aren’t going to match if 
it tries to use the paging date from one connection on the other.

> On Sep 24, 2019, at 7:14 PM, J. D. Jordan  wrote:
> 
> It is inherently versioned by the protocol version being used for the 
> connection.
> 
>> On Sep 24, 2019, at 9:06 PM, Jon Haddad  wrote:
>> 
>> The problem is that the payload isn't versioned, because the individual
>> fields aren't really part of the protocol.  I think the long term fix
>> should be to add the fields of the paging state to the protocol itself
>> rather than have it just be some serialized blob.  Then we don't have to
>> deal with separately versioning the paging state.
>> 
>> I think recognizing max int as special number that just means "a lot" is
>> fine for now till we have time to rework it is a reasonable approach.
>> 
>> Jon
>> 
>>> On Tue, Sep 24, 2019 at 6:52 PM J. D. Jordan 
>>> wrote:
>>> 
>>> Are their drivers that try to do mixed protocol version connections?  If
>>> so that would be a mistake on the drivers part if it sent the new paging
>>> state to an old server.  Pretty easily protected against in said driver
>>> when it implements support for the new protocol version.  The payload is
>>> opaque, but that doesn’t mean a driver would send the new payload to an old
>>> server.
>>> 
>>> Many of the drivers I have looked at don’t do mixed version connections.
>>> If they start at a higher version they will not connect to older nodes that
>>> don’t support it. Or they will connect to the newer nodes with the older
>>> protocol version. In either of those cases there is no problem.
>>> 
>>> Protocol changes aside, I would suggest fixing the bug starting back on
>>> 3.x by changing the meaning of MAX. Whether or not the limit is switched to
>>> a var int in a bumped protocol version.
>>> 
>>> -Jeremiah
>>> 
>>> 
>>>> On Sep 24, 2019, at 8:28 PM, Blake Eggleston
>>>  wrote:
>>>> 
>>>> Right, that's the problem with changing the paging state format. It
>>> doesn't work in mixed mode.
>>>> 
>>>>> On Sep 24, 2019, at 4:47 PM, Jeremiah Jordan 
>>> wrote:
>>>>> 
>>>>> Clients do negotiate the protocol version they use when connecting. If
>>> the server bumped the protocol version then this larger paging state could
>>> be part of the new protocol version. But that doesn’t solve the problem for
>>> existing versions.
>>>>> 
>>>>> The special treatment of Integer.MAX_VALUE can be done back to 3.x and
>>> fix the bug in all versions, letting users requests to receive all of their
>>> data.  Which realistically is probably what someone who sets the protocol
>>> level query limit to Integer.MAX_VALUE is trying to do.
>>>>> 
>>>>> -Jeremiah
>>>>> 
>>>>>>> On Sep 24, 2019, at 4:09 PM, Blake Eggleston
>>>  wrote:
>>>>>> 
>>>>>> Right, mixed version clusters. The opaque blob isn't versioned, and
>>> there isn't an opportunity for min version negotiation that you have with
>>> the messaging service. The result is situations where a client begins a
>>> read on one node, and attempts to read the next page from a different node
>>> over a protocol version where the paging state serialization format has
>>> changed. This causes an exception deserializing the paging state and the
>>> read fails.
>>>>>> 
>>>>>> There are ways around this, but they're not comprehensive (I think),
>>> and they're much more involved than just interpreting Integer.MAX_VALUE as
>>> unlimited. The "right" solution would be for the paging state to be
>>> deserialized/serialized on the client side, but that won't happen in 4.0.
>>>>>> 
>>>>>>>> On Sep 24, 2019, at 1:12 PM, Jon Haddad  wrote:
>>>>>>>> 
>>>>>>>> What's the pain point?  Is it because of mixed version clusters or is
>>> there
>>>>>>> something else that makes it a problem?
>>>>>>> 
>>>>>>>> On Tue, Sep 24, 2019 at 11:03 AM Blake Eggleston
>>>>>>>>  wrote:
&g

Re: fixing paging state for 4.0

2019-09-24 Thread Blake Eggleston

Sorry, I misread your earlier email. Yes, there are drivers that do mixed 
protocol versions. Not sure if the 4.0 java driver does, but at least one 
previous version did.

> On Sep 24, 2019, at 7:19 PM, Blake Eggleston  
> wrote:
> 
> Yes, but if a client is connected to 2 different nodes, and is using a 
> different protocol for each, the paging state formats aren’t going to match 
> if it tries to use the paging date from one connection on the other.
> 
>> On Sep 24, 2019, at 7:14 PM, J. D. Jordan  wrote:
>> 
>> It is inherently versioned by the protocol version being used for the 
>> connection.
>> 
>>> On Sep 24, 2019, at 9:06 PM, Jon Haddad  wrote:
>>> 
>>> The problem is that the payload isn't versioned, because the individual
>>> fields aren't really part of the protocol.  I think the long term fix
>>> should be to add the fields of the paging state to the protocol itself
>>> rather than have it just be some serialized blob.  Then we don't have to
>>> deal with separately versioning the paging state.
>>> 
>>> I think recognizing max int as special number that just means "a lot" is
>>> fine for now till we have time to rework it is a reasonable approach.
>>> 
>>> Jon
>>> 
>>>> On Tue, Sep 24, 2019 at 6:52 PM J. D. Jordan 
>>>> wrote:
>>>> 
>>>> Are their drivers that try to do mixed protocol version connections?  If
>>>> so that would be a mistake on the drivers part if it sent the new paging
>>>> state to an old server.  Pretty easily protected against in said driver
>>>> when it implements support for the new protocol version.  The payload is
>>>> opaque, but that doesn’t mean a driver would send the new payload to an old
>>>> server.
>>>> 
>>>> Many of the drivers I have looked at don’t do mixed version connections.
>>>> If they start at a higher version they will not connect to older nodes that
>>>> don’t support it. Or they will connect to the newer nodes with the older
>>>> protocol version. In either of those cases there is no problem.
>>>> 
>>>> Protocol changes aside, I would suggest fixing the bug starting back on
>>>> 3.x by changing the meaning of MAX. Whether or not the limit is switched to
>>>> a var int in a bumped protocol version.
>>>> 
>>>> -Jeremiah
>>>> 
>>>> 
>>>>> On Sep 24, 2019, at 8:28 PM, Blake Eggleston
>>>>  wrote:
>>>>> 
>>>>> Right, that's the problem with changing the paging state format. It
>>>> doesn't work in mixed mode.
>>>>> 
>>>>>> On Sep 24, 2019, at 4:47 PM, Jeremiah Jordan 
>>>> wrote:
>>>>>> 
>>>>>> Clients do negotiate the protocol version they use when connecting. If
>>>> the server bumped the protocol version then this larger paging state could
>>>> be part of the new protocol version. But that doesn’t solve the problem for
>>>> existing versions.
>>>>>> 
>>>>>> The special treatment of Integer.MAX_VALUE can be done back to 3.x and
>>>> fix the bug in all versions, letting users requests to receive all of their
>>>> data.  Which realistically is probably what someone who sets the protocol
>>>> level query limit to Integer.MAX_VALUE is trying to do.
>>>>>> 
>>>>>> -Jeremiah
>>>>>> 
>>>>>>>> On Sep 24, 2019, at 4:09 PM, Blake Eggleston
>>>>  wrote:
>>>>>>> 
>>>>>>> Right, mixed version clusters. The opaque blob isn't versioned, and
>>>> there isn't an opportunity for min version negotiation that you have with
>>>> the messaging service. The result is situations where a client begins a
>>>> read on one node, and attempts to read the next page from a different node
>>>> over a protocol version where the paging state serialization format has
>>>> changed. This causes an exception deserializing the paging state and the
>>>> read fails.
>>>>>>> 
>>>>>>> There are ways around this, but they're not comprehensive (I think),
>>>> and they're much more involved than just interpreting Integer.MAX_VALUE as
>>>> unlimited. The "right" solution would be for the paging state to be
>>>> deserialized/serialized on the client side, but

Re: Can we kick off a release?

2019-10-23 Thread Blake Eggleston

Looks like 15193 has been committed. Are we waiting on anything else before 
cutting the next set of releases?

> On Oct 8, 2019, at 1:11 PM, Jon Haddad  wrote:
> 
> I forgot to mention, we should also release alpha2 of 4.0.
> 
> 
> On Tue, Oct 8, 2019 at 1:04 PM Michael Shuler 
> wrote:
> 
>> Thanks Sam, I'm following #15193 and should catch the status change there.
>> 
>> Michael
>> 
>> On Tue, Oct 8, 2019 at 6:17 AM Sam Tunnicliffe  wrote:
>>> 
>>> CASSANDRA-15193 just got +1’d yesterday and would be good to include in
>> the 3.0 and 3.11 releases. If you don’t mind holding off while I add a
>> cqlsh test and merge it, that’d be good.
>>> 
>>> Thanks,
>>> Sam
>>> 
 On 7 Oct 2019, at 22:54, Michael Shuler 
>> wrote:
 
 Will do! I probably won't get this done this evening, so will send out
 the emails tomorrow.
 
 Thanks,
 Michael
 
 On Mon, Oct 7, 2019 at 2:37 PM Jon Haddad  wrote:
> 
> Michael,
> 
> Would you mind kicking off builds and starting a vote thread for the
>> latest
> 2.2, 3.0 and 3.11 builds?
> 
> Much appreciated,
> Jon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 2.2.15

2019-10-25 Thread Blake Eggleston

+1

> On Oct 25, 2019, at 8:57 AM, Jeff Jirsa  wrote:
> 
> +1
> 
> 
> On Fri, Oct 25, 2019 at 7:18 AM Sam Tunnicliffe  wrote:
> 
>> +1
>> 
>>> On 24 Oct 2019, at 18:25, Michael Shuler  wrote:
>>> 
>>> I propose the following artifacts for release as 2.2.15.
>>> 
>>> sha1: 4ee4ceea28a1cb77b283c7ce0135340ddff02086
>>> Git:
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.15-tentative
>>> Artifacts:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1179/org/apache/cassandra/apache-cassandra/2.2.15/
>>> Staging repository:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1179/
>>> 
>>> The Debian and RPM packages are available here:
>> http://people.apache.org/~mshuler
>>> 
>>> The vote will be open for 72 hours (longer if needed).
>>> 
>>> [1]: CHANGES.txt:
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.15-tentative
>>> [2]: NEWS.txt:
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/2.2.15-tentative
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 3.0.19

2019-10-25 Thread Blake Eggleston

+1

> On Oct 25, 2019, at 8:58 AM, Jeff Jirsa  wrote:
> 
> +1
> 
> 
> On Fri, Oct 25, 2019 at 7:18 AM Sam Tunnicliffe  wrote:
> 
>> +1
>> 
>>> On 24 Oct 2019, at 18:25, Michael Shuler  wrote:
>>> 
>>> I propose the following artifacts for release as 3.0.19.
>>> 
>>> sha1: a81bfd6b7db3a373430b3c4e8f4e930b199796f0
>>> Git:
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.19-tentative
>>> Artifacts:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1183/org/apache/cassandra/apache-cassandra/3.0.19/
>>> Staging repository:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1183/
>>> 
>>> The Debian and RPM packages are available here:
>> http://people.apache.org/~mshuler
>>> 
>>> The vote will be open for 72 hours (longer if needed).
>>> 
>>> [1]: CHANGES.txt:
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.19-tentative
>>> [2]: NEWS.txt:
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/3.0.19-tentative
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 3.11.5

2019-10-25 Thread Blake Eggleston

+1

> On Oct 25, 2019, at 8:58 AM, Jeff Jirsa  wrote:
> 
> +1
> 
> 
> On Fri, Oct 25, 2019 at 7:18 AM Sam Tunnicliffe  wrote:
> 
>> +1
>> 
>>> On 24 Oct 2019, at 18:26, Michael Shuler  wrote:
>>> 
>>> I propose the following artifacts for release as 3.11.5.
>>> 
>>> sha1: b697af87f8e1b20d22948390d516dba1fbb9eee7
>>> Git:
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.11.5-tentative
>>> Artifacts:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1184/org/apache/cassandra/apache-cassandra/3.11.5/
>>> Staging repository:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1184/
>>> 
>>> The Debian and RPM packages are available here:
>> http://people.apache.org/~mshuler
>>> 
>>> The vote will be open for 72 hours (longer if needed).
>>> 
>>> [1]: CHANGES.txt:
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.5-tentative
>>> [2]: NEWS.txt:
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/3.11.5-tentative
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 4.0-alpha2

2019-10-25 Thread Blake Eggleston

+1

> On Oct 25, 2019, at 8:57 AM, Jeff Jirsa  wrote:
> 
> +1
> 
> 
> On Fri, Oct 25, 2019 at 8:06 AM Jon Haddad  wrote:
> 
>> +1
>> 
>> On Fri, Oct 25, 2019 at 10:18 AM Sam Tunnicliffe  wrote:
>> 
>>> +1
>>> 
 On 24 Oct 2019, at 18:26, Michael Shuler 
>> wrote:
 
 I propose the following artifacts for release as 4.0-alpha2.
 
 sha1: ca928a49c68186bdcd57dea8b10c30991c6a3c55
 Git:
>>> 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-alpha2-tentative
 Artifacts:
>>> 
>> https://repository.apache.org/content/repositories/orgapachecassandra-1185/org/apache/cassandra/apache-cassandra/4.0-alpha2/
 Staging repository:
>>> 
>> https://repository.apache.org/content/repositories/orgapachecassandra-1185/
 
 The Debian and RPM packages are available here:
>>> http://people.apache.org/~mshuler
 
 The vote will be open for 72 hours (longer if needed).
 
 [1]: CHANGES.txt:
>>> 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-alpha2-tentative
 [2]: NEWS.txt:
>>> 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0-alpha2-tentative
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 4.0-alpha4

2020-04-14 Thread Blake Eggleston

+1

> On Apr 14, 2020, at 5:09 AM, e.dimitr...@gmail.com wrote:
> 
> I also can’t see them. I think it matters to which interface is the link. 
> 
> And +1 from me, thanks! 
> 
>> On 14 Apr 2020, at 7:53, Erick Ramirez  wrote:
>> 
>> 
>>> 
>>> 
 All java8 UTs, jvmdtests and dtests pass
 
>>> https://circleci.com/workflow-run/d7b3f62d-c9ad-43d6-9152-2655e27feccb?signup-404=true
>>> 
>>> 
>>> Is anyone else able to see this^ circleci page?
>>> For me, it never loads, and isn't the first time I've been unable to
>>> see others' circleci results.
>>> 
>> 
>> All that shows up for me is:
>> 
>> Workflows  >>  null  >>  null  >> null
>> 0 jobs in this workflow
>> 
>> 
>> and a spinning widget.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Project governance wiki doc (take 2)

2020-06-22 Thread Blake Eggleston

+1

> On Jun 20, 2020, at 8:12 AM, Joshua McKenzie  wrote:
> 
> Link to doc:
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Project+Governance
> 
> Change since previous cancelled vote:
> "A simple majority of this electorate becomes the low-watermark for votes
> in favour necessary to pass a motion, with new PMC members added to the
> calculation."
> 
> This previously read "super majority". We have lowered the low water mark
> to "simple majority" to balance strong consensus against risk of stall due
> to low participation.
> 
> 
>   - Vote will run through 6/24/20
>   - pmc votes considered binding
>   - simple majority of binding participants passes the vote
>   - committer and community votes considered advisory
> 
> Lastly, I propose we take the count of pmc votes in this thread as our
> initial roll call count for electorate numbers and low watermark
> calculation on subsequent votes.
> 
> Thanks again everyone (and specifically Benedict and Jon) for the time and
> collaboration on this.
> 
> ~Josh


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] Future of MVs

2020-06-30 Thread Blake Eggleston

+1 for deprecation and removal (assuming a credible plan to fix them doesn't 
materialize)

> On Jun 30, 2020, at 12:43 PM, Jon Haddad  wrote:
> 
> A couple days ago when writing a separate email I came across this DataStax
> blog post discussing MVs [1].  Imagine my surprise when I noticed the date
> was five years ago...
> 
> While at TLP, I helped numerous customers move off of MVs, mostly because
> they affected stability of clusters in a horrific way.  The most telling
> project involved helping someone create new tables to manage 1GB of data
> because the views performed so poorly they made the cluster unresponsive
> and unusable.  Despite being around for five years, they've seen very
> little improvement that makes them usable for non trivial, non laptop
> workloads.
> 
> Since the original commits, it doesn't look like there's been much work to
> improve them, and they're yet another feature I ended up saying "just don't
> use".  I haven't heard any plans to improve them in any meaningful way -
> either to address their issues with performance or the inability to repair
> them.
> 
> The original contributor of MVs (Carl Yeksigian) seems to have disappeared
> from the project, meaning we have a broken feature without a maintainer,
> and no plans to fix it.
> 
> As we move forward with the 4.0 release, we should consider this an
> opportunity to deprecate materialized views, and remove them in 5.0.  We
> should take this opportunity to learn from the mistake and raise the bar
> for new features to undergo a much more thorough run the wringer before
> merging.
> 
> I'm curious what folks think - am I way off base here?  Am I missing a JIRA
> that can magically fix the issues with performance, availability &
> correctness?
> 
> [1]
> https://www.datastax.com/blog/2015/06/new-cassandra-30-materialized-views
> [2] https://issues.apache.org/jira/browse/CASSANDRA-6477


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Media coordination (was: [VOTE] Release Apache Cassandra 4.0-beta1)

2020-07-20 Thread Blake Eggleston

I don't think Benedict mentioned anything about people's motives or intentions, 
he simply had a concern about how marketing timelines became a factor in a 
release vote without the approval of the PMC. I think this is a reasonable 
concern, and doesn't mean that he's assuming bad intentions. That's my reading 
at least, although maybe I missed something?

> On Jul 20, 2020, at 7:58 AM, Joshua McKenzie  wrote:
> 
>> 
>> If you are criticised, it is often because of the action you took;
> 
> Actually, in this case and many others it's because of people's unfounded
> assumptions about motives, incentives, and actions taken and has little to
> do with reality. Which is the definition of not assuming positive intent.
> 
> On Mon, Jul 20, 2020 at 10:41 AM Benedict Elliott Smith 
> wrote:
> 
>> Thanks Sally, really appreciate your insight.
>> 
>> To respond to the community discourse around this:
>> 
>>> Keep your announcement plans ... private: limit discussions to the PMC
>> 
>> This is all that I was asking and expecting: if somebody is making
>> commitments on behalf of the community (such as that a release can be
>> expected on day X), this should be coordinated with the PMC.  While it
>> seems to transpire that no such commitments were made, had they been made
>> without the knowledge of the PMC this would in my view be problematic.
>> This is not at all like development work, as has been alleged, since that
>> only takes effect after public agreement by the community.
>> 
>> IMO, in general, public engagements should be run past the PMC as a final
>> pre-flight check regardless of any commitment being made, as the PMC should
>> have visibility into these activities and have the opportunity to influence
>> them.
>> 
>>> There has been nothing about this internally at DS
>> 
>> I would ask that you refrain from making such claims, unless you can be
>> certain that you would have been privy to all such internal discussions.
>> 
>>> there's really no reason not to assume best intentions here
>> 
>> This is a recurring taking point, that I wish we would retire except where
>> a clear assumption of bad faith has been made.  If you are criticised, it
>> is often because of the action you took; any intention you had may be
>> irrelevant to the criticism.  In this case, when you act on behalf of the
>> community, your intentions are insufficient: you must have the community's
>> authority to act.
>> 
>> 
>> On 20/07/2020, 14:00, "Sally Khudairi"  wrote:
>> 
>>Hello everyone --Mick pinged me about this; I wanted to respond
>> on-list for efficacy.
>> 
>>We've had dozens of companies successfully help Apache Projects and
>> their communities help spread the word on their projects with their PR and
>> marketing teams. Here are some best practices:
>> 
>>1) Timing. Ensure that the Project has announced the project milestone
>> first to their lists as well as announce@ before any media coverage takes
>> place. If you're planning to time the announcements to take place in
>> tandem, be careful with embargoes, as not everyone is able to honor them.
>> We've been burned in the past with this.
>> 
>>2) Messaging. Keep your announcement plans and draft press releases,
>> etc., private: limit discussions to the PMC. Drafting announcements on
>> public lists, such as user@, whilst inclusive, may inadvertently expose
>> your news prematurely to the press, bloggers, and others before its ready.
>> This can be detrimental to having your news scooped before you actually
>> announce it, or conversely, having the news come out and nobody is
>> interested in covering it as it's been leaking for a while. We've also been
>> burned in the past with this. Synching messaging is also helpful to ensure
>> that the PMC speaks with a unified voice: the worst thing that can happen
>> is having someone say one thing in the media and another member of the PMC
>> saying something else, even if it's their personal opinion. Fragmentation
>> helps no-one. This recently happened with a Project on a rather
>> controversial topic, so the press was excited to see dissent within the
>> community as it gave them more to report about. Keep things co
>> ol: don't be the feature cover of a gossip tabloid.
>> 
>>3) Positioning. It's critical that whomever is speaking on behalf of
>> the Project identify themselves as such. This means that the PMC needs to
>> have a few spokespeople lined up in case of any media queries, and that the
>> spokespeople supporting the project are from different organizations so you
>> can . I cannot stress enough the need to exhibit diversity, even if
>> everyone working on the media/marketing side is from a single organization
>> --the ASF comes down hard on companies that "own" projects: we take
>> vendor-neutrality very seriously. What's worked well with organizations
>> that have pitched the press on behalf of a project is to pitch the project
>> news, have spokespeople from other organizations speak

Re: Media coordination (was: [VOTE] Release Apache Cassandra 4.0-beta1)

2020-07-20 Thread Blake Eggleston

Characterizing alternate or conflicting points of view as assuming bad 
intentions without justification is both unproductive and unhealthy for the 
project.

> On Jul 20, 2020, at 9:14 AM, Joshua McKenzie  wrote:
> 
> This kind of back and forth isn't productive for the project so I'm not
> taking this discussion further. Just want to call it out here so you or
> others aren't left waiting for a reply.
> 
> We can agree to disagree.
> 
> On Mon, Jul 20, 2020 at 11:59 AM Benedict Elliott Smith 
> wrote:
> 
>> Firstly, that is a very strong claim that in this particular case is
>> disputed by the facts.  You made a very specific claim that the delay was
>> "risking our currently lined up coordination with journalists and other
>> channels". I am not the only person to interpret this as implying
>> coordination with journalists, contingent on a release schedule not agreed
>> by the PMC.  This was based on semantics only; as far as I can tell, no
>> intentions or assumptions have entered into this debate, except on your
>> part.
>> 
>>> Which is the definition of not assuming positive intent.
>> 
>> Secondly, this is not the definition of positive intent.  Positive intent
>> only indicates that you "mean well"
>> 
>> Thirdly, in many recent disputes about governance, you have made a
>> negative claim about my behaviour, or ascribed negative connotations to
>> statements I have made; this is a very thinly veiled example, as I am
>> clearly the object of this criticism.  I think it has reached a point where
>> I can perhaps legitimately claim that you are not assuming positive intent?
>> 
>>> motives, incentives ... little to do with reality
>> 
>> It feels like we should return to this earlier discussion, since you
>> appear to feel it is incomplete?  At the very least you seem to have taken
>> the wrong message from my statements, and it is perhaps negatively
>> colouring our present interactions.
>> 
>> 
>> On 20/07/2020, 15:59, "Joshua McKenzie"  wrote:
>> 
>>> 
>>> If you are criticised, it is often because of the action you took;
>> 
>>Actually, in this case and many others it's because of people's
>> unfounded
>>assumptions about motives, incentives, and actions taken and has
>> little to
>>do with reality. Which is the definition of not assuming positive
>> intent.
>> 
>>On Mon, Jul 20, 2020 at 10:41 AM Benedict Elliott Smith <
>> bened...@apache.org>
>>wrote:
>> 
>>> Thanks Sally, really appreciate your insight.
>>> 
>>> To respond to the community discourse around this:
>>> 
 Keep your announcement plans ... private: limit discussions to the
>> PMC
>>> 
>>> This is all that I was asking and expecting: if somebody is making
>>> commitments on behalf of the community (such as that a release can be
>>> expected on day X), this should be coordinated with the PMC.  While
>> it
>>> seems to transpire that no such commitments were made, had they been
>> made
>>> without the knowledge of the PMC this would in my view be
>> problematic.
>>> This is not at all like development work, as has been alleged, since
>> that
>>> only takes effect after public agreement by the community.
>>> 
>>> IMO, in general, public engagements should be run past the PMC as a
>> final
>>> pre-flight check regardless of any commitment being made, as the PMC
>> should
>>> have visibility into these activities and have the opportunity to
>> influence
>>> them.
>>> 
 There has been nothing about this internally at DS
>>> 
>>> I would ask that you refrain from making such claims, unless you can
>> be
>>> certain that you would have been privy to all such internal
>> discussions.
>>> 
 there's really no reason not to assume best intentions here
>>> 
>>> This is a recurring taking point, that I wish we would retire except
>> where
>>> a clear assumption of bad faith has been made.  If you are
>> criticised, it
>>> is often because of the action you took; any intention you had may be
>>> irrelevant to the criticism.  In this case, when you act on behalf
>> of the
>>> community, your intentions are insufficient: you must have the
>> community's
>>> authority to act.
>>> 
>>> 
>>> On 20/07/2020, 14:00, "Sally Khudairi"  wrote:
>>> 
>>>Hello everyone --Mick pinged me about this; I wanted to respond
>>> on-list for efficacy.
>>> 
>>>We've had dozens of companies successfully help Apache Projects
>> and
>>> their communities help spread the word on their projects with their
>> PR and
>>> marketing teams. Here are some best practices:
>>> 
>>>1) Timing. Ensure that the Project has announced the project
>> milestone
>>> first to their lists as well as announce@ before any media coverage
>> takes
>>> place. If you're planning to time the announcements to take place in
>>> tandem, be careful with embargoes, as not everyone is able to honor
>> them.
>>> We've been burned in the past with this.
>>> 
>>>2) Messaging. Keep your announcement plans and draft press
>> releases,
>>> etc., private: limit disc

Re: [VOTE] Release Apache Cassandra 4.0-beta1 (take2)

2020-07-20 Thread Blake Eggleston

+1

> On Jul 20, 2020, at 9:56 AM, Jon Haddad  wrote:
> 
> +1, thanks Mick for rerolling.
> 
> On Mon, Jul 20, 2020 at 6:42 AM Joshua McKenzie 
> wrote:
> 
>> +1
>> 
>> On Mon, Jul 20, 2020 at 8:51 AM Jake Luciani  wrote:
>> 
>>> +1
>>> 
>>> On Mon, Jul 20, 2020 at 8:08 AM Andrés de la Peña <
>>> a.penya.gar...@gmail.com>
>>> wrote:
>>> 
 +1 (nb)
 
 On Mon, 20 Jul 2020 at 12:58, João Reis 
>> wrote:
 
> +1 (nb)
> 
> The drivers smoke test suite looks good:
> 
> 
> 
 
>>> 
>> https://ci.appveyor.com/project/DataStax/cassandra-drivers-smoke-test/builds/34194004
> 
> Mick Semb Wever  escreveu no dia sábado, 18/07/2020
>>> à(s)
> 00:27:
> 
>> Proposing the test build of Cassandra 4.0-beta1 for release.
>> 
>> sha1: 972da6fcffa87b3a1684362a2bab97db853372d8
>> Git:
>> 
>> 
> 
 
>>> 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-beta1-tentative
>> Maven Artifacts:
>> 
>> 
> 
 
>>> 
>> https://repository.apache.org/content/repositories/orgapachecassandra-1211/org/apache/cassandra/cassandra-all/4.0-beta1/
>> 
>> The Source and Build Artifacts, and the Debian and RPM packages and
>> repositories, are available here:
>> https://dist.apache.org/repos/dist/dev/cassandra/4.0-beta1/
>> 
>> The vote will be open for 60 hours (longer if needed). I've taken
>> 12
> hours
>> off the normal 72 hours and this follows closely after the initial
>> 4.0-beta1 vote. Everyone who has tested the build is invited to
>> vote.
> Votes
>> by PMC members are considered binding. A vote passes if there are
>> at
> least
>> three binding +1s and no -1s.
>> 
>> Eventual publishing and announcement of the 4.0-beta1 release will
>> be
>> coordinated, as described in
>> 
>> 
> 
 
>>> 
>> https://lists.apache.org/thread.html/r537fe799e7d5e6d72ac791fdbe9098ef0344c55400c7f68ff65abe51%40%3Cdev.cassandra.apache.org%3E
>> 
>> [1]: CHANGES.txt:
>> 
>> 
> 
 
>>> 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-beta1-tentative
>> [2]: NEWS.txt:
>> 
>> 
> 
 
>>> 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0-beta1-tentative
>> 
> 
 
>>> 
>>> 
>>> --
>>> http://twitter.com/tjake
>>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Check in on CASSANDRA-15393

2020-08-27 Thread Blake Eggleston

Hi dev@,

Mick asked that I check in w/ the dev list about CASSANDRA-15393. There's some 
concern regarding the patch and it's suitability for inclusion in 4.0-beta.

CASSANDRA-15393 reduces garbage created by compaction and the read paths by 
about 25%. It's part of CASSANDRA-15387, which, including this patch, reduces 
garbage from the read and compaction paths by about 50%. CASSANDRA-15393 does 
this by supporting byte array backed cell and clustering types, which is 
acheived by abstracting the backing type (ByteBuffer/byte[]) from the 
serialization logic. 

To avoid paying the allocation cost of adding a container object, singleton 
"accessor" objects are used to operate on the actual data. See here for an 
example: https://gist.github.com/bdeggleston/52910225b817a8d54353125ca03f521d

Mick and Robert Stupp have raised a few concerns, summarized below:

1. The patch is large (208 files / ~3.5k LOC)
2. Concerns about impact on stability
3. Parameterizing cell/clustering value types in this way makes 
ClassCastExceptions possible.
4. implications of feature freeze

The patch is large, but the vast majority of it is adding type parameters to 
things. The changes here are wide, but not deep. The most complex parts are the 
collection serializers and other places where we're now having to do offset 
bookkeeping. These should be carefully reviewed, but they shouldn't be too 
difficult to verify and I've added some randomized tests to check them against 
a wide range of schemas. I'll also run some diff tests against clusters 
internally.

Parameterizing cell and clustering values does make ClassCastExceptions 
possible, but java's type system guards against this for the most part. 
Regarding the feature freeze, I don't think it applies to performance 
improvements.

Back to the point about stability though: in pracice, compaction gc is a major 
contributor to cluster instability. In my experience, about 30% of availability 
issues are gc related. Also, compaction gc tends to be the limiting factor for 
repair, host replacements, and other topology changes, which limits how quickly 
you can recover from other issues. So the patch does add some risk, but I think 
it's a net win for stability.

Thoughts?
-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Check in on CASSANDRA-15393

2020-08-27 Thread Blake Eggleston

Caleb is currently working through his second round of review, and Marcus said 
he's about halfway through his review as of this morning. So I'd expect it to 
be committed within a week or so.

> On Aug 27, 2020, at 5:09 PM, Joshua McKenzie  wrote:
> 
> Is there an ETA on them landing? The later, the more risk to stability of
> GA due to lack of time soaking.
> 
> On Thu, Aug 27, 2020 at 4:01 PM Blake Eggleston
>  wrote:
> 
>> Hi dev@,
>> 
>> Mick asked that I check in w/ the dev list about CASSANDRA-15393. There's
>> some concern regarding the patch and it's suitability for inclusion in
>> 4.0-beta.
>> 
>> CASSANDRA-15393 reduces garbage created by compaction and the read paths
>> by about 25%. It's part of CASSANDRA-15387, which, including this patch,
>> reduces garbage from the read and compaction paths by about 50%.
>> CASSANDRA-15393 does this by supporting byte array backed cell and
>> clustering types, which is acheived by abstracting the backing type
>> (ByteBuffer/byte[]) from the serialization logic.
>> 
>> To avoid paying the allocation cost of adding a container object,
>> singleton "accessor" objects are used to operate on the actual data. See
>> here for an example:
>> https://gist.github.com/bdeggleston/52910225b817a8d54353125ca03f521d
>> 
>> Mick and Robert Stupp have raised a few concerns, summarized below:
>> 
>> 1. The patch is large (208 files / ~3.5k LOC)
>> 2. Concerns about impact on stability
>> 3. Parameterizing cell/clustering value types in this way makes
>> ClassCastExceptions possible.
>> 4. implications of feature freeze
>> 
>> The patch is large, but the vast majority of it is adding type parameters
>> to things. The changes here are wide, but not deep. The most complex parts
>> are the collection serializers and other places where we're now having to
>> do offset bookkeeping. These should be carefully reviewed, but they
>> shouldn't be too difficult to verify and I've added some randomized tests
>> to check them against a wide range of schemas. I'll also run some diff
>> tests against clusters internally.
>> 
>> Parameterizing cell and clustering values does make ClassCastExceptions
>> possible, but java's type system guards against this for the most part.
>> Regarding the feature freeze, I don't think it applies to performance
>> improvements.
>> 
>> Back to the point about stability though: in pracice, compaction gc is a
>> major contributor to cluster instability. In my experience, about 30% of
>> availability issues are gc related. Also, compaction gc tends to be the
>> limiting factor for repair, host replacements, and other topology changes,
>> which limits how quickly you can recover from other issues. So the patch
>> does add some risk, but I think it's a net win for stability.
>> 
>> Thoughts?
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 3.11.8

2020-08-28 Thread Blake Eggleston

+1

> On Aug 28, 2020, at 6:37 AM, Mick Semb Wever  wrote:
> 
> Proposing the test build of Cassandra 3.11.8 for release.
> 
> sha1: 8b29b698630960a0ebb2c695cc5b21dee4686d09
> Git:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.11.8-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1217/org/apache/cassandra/cassandra-all/3.11.8/
> 
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/3.11.8/
> 
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
> 
> [1]: CHANGES.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.8-tentative
> [2]: NEWS.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/3.11.8-tentative


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 2.2.18

2020-08-28 Thread Blake Eggleston

+1

> On Aug 28, 2020, at 5:44 AM, Mick Semb Wever  wrote:
> 
> Proposing the test build of Cassandra 2.2.18 for release.
> 
> sha1: d4938cf4e488a9ef3ac48164a3e946f16255d721
> Git:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.18-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1215/org/apache/cassandra/cassandra-all/2.2.18/
> 
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/2.2.18/
> 
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
> 
> [1]: CHANGES.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.18-tentative
> [2]: NEWS.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/2.2.18-tentative


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 2.1.22

2020-08-28 Thread Blake Eggleston

+1

> On Aug 28, 2020, at 8:55 AM, Jeff Jirsa  wrote:
> 
> +1
> 
> 
> On Fri, Aug 28, 2020 at 8:42 AM Mick Semb Wever  wrote:
> 
>> Proposing the test build of Cassandra 2.1.22 for release.
>> 
>> sha1: 94e9149c22f6a7772c0015e1b1ef2e2961155c0a
>> Git:
>> 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.22-tentative
>> Maven Artifacts:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1214
>> /org/apache/cassandra/cassandra-all/2.1.22/
>> 
>> 
>> The Source and Build Artifacts, and the Debian and RPM packages and
>> repositories, are available here:
>> https://dist.apache.org/repos/dist/dev/cassandra/2.1.22/
>> 
>> The vote will be open for 72 hours (longer if needed). Everyone who has
>> tested the build is invited to vote. Votes by PMC members are considered
>> binding. A vote passes if there are at least three binding +1s and no -1's.
>> 
>> [1]: CHANGES.txt:
>> 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.1.22-tentative
>> [2]: NEWS.txt:
>> 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/2.1.22-tentative
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 3.0.22

2020-08-28 Thread Blake Eggleston

+1

> On Aug 28, 2020, at 6:09 AM, Mick Semb Wever  wrote:
> 
> Proposing the test build of Cassandra 3.0.22 for release.
> 
> sha1: 45331bb612dc7847efece7e26cdd0b376bd11249
> Git:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.22-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1216/org/apache/cassandra/cassandra-all/3.0.22/
> 
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/3.0.22/
> 
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
> 
> [1]: CHANGES.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.22-tentative
> [2]: NEWS.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/3.0.22-tentative


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 4.0-beta2

2020-08-28 Thread Blake Eggleston

+1

> On Aug 28, 2020, at 7:18 AM, Mick Semb Wever  wrote:
> 
> Proposing the test build of Cassandra 4.0-beta2 for release.
> 
> sha1: 56eadf2004399a80f0733041cacf03839832249a
> Git:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-beta2-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1218/org/apache/cassandra/cassandra-all/4.0-beta2/
> 
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/4.0-beta2/
> 
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
> 
> [1]: CHANGES.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-beta2-tentative
> [2]: NEWS.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0-beta2-tentative


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] Change style guide to recommend use of @Override

2020-09-02 Thread Blake Eggleston

+1

> On Sep 1, 2020, at 11:27 AM, David Capwell  wrote:
> 
> Currently our style guide recommends to avoid using @Override and updates
> intellij's code style to exclude it by default; I would like to propose we
> change this recommendation to use it and to update intellij's style to
> include it by default.
> 
> @Override is used by javac to enforce that a method is in fact overriding
> from an abstract class or an interface and if this stops being true (such
> as a refactor happens) then a compiler error is thrown; when we default to
> excluding, it makes it harder to detect that a refactor catches all
> implementations and can lead to subtle and hard to track down bugs.
> 
> This proposal is for new code and would not be to go rewrite all code at
> once, but would recommend new code adopt this style, and to pull old code
> forward which is related to changes being made (similar to our stance on
> imports).
> 
> If people are ok with this, I will file a JIRA, update the docs, and
> update intellij's formatting.
> 
> Thanks for your time!


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Accept the Harry donation

2020-09-17 Thread Blake Eggleston

+1

> On Sep 16, 2020, at 2:45 AM, Mick Semb Wever  wrote:
> 
> This vote is about officially accepting the Harry donation from Alex Petrov
> and Benedict Elliott Smith, that was worked on in CASSANDRA-15348.
> 
> The Incubator IP Clearance has been filled out at
> http://incubator.apache.org/ip-clearance/apache-cassandra-harry.html
> 
> This vote is a required part of the IP Clearance process. It follows the
> same voting rules as releases, i.e. from the PMC a minimum of three +1s and
> no -1s.
> 
> Please cast your votes:
>   [ ] +1 Accept the contribution into Cassandra
>   [ ] -1 Do not


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-20 Thread Blake Eggleston

I’d also prefer #3 over #4

> On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith  
> wrote:
> 
> Well, I expressed a preference for #3 over #4, particularly for the 3.x 
> series.  However at this point, I think the lack of a clear project decision 
> means we can punt it back to you and Sylvain to make the final call.
> 
> On 20/11/2020, 16:23, "Benjamin Lerer"  wrote:
> 
>I will try to summarize the discussion to clarify the outcome.
> 
>Mick is in favor of #4
>Summanth is in favor of #4
>Sylvain answer was not clear for me. I understood it like I prefer #3 to #4
>and I am also fine with #1
>Jeff is in favor of #3 and will understand #4
>David is in favor #3 (fix bug and add flag to roll back to old behavior) in
>4.0 and #4 in 3.0 and 3.11
> 
>Do not hesitate to correct me if I misunderstood your answer.
> 
>Based on these answers it seems clear that most people prefer to go for #3
>or #4.
> 
>The choice between #3 (fix correctness opt-in to current behavior) and #4
>(current behavior opt-in to correctness) is a bit less clear specially if
>we consider the 3.X branches or 4.0.
> 
>Does anybody as some idea on how to choose between those 2 choices or some
>extra opinions on #3 versus #4?
> 
> 
> 
> 
> 
> 
>>On Wed, Nov 18, 2020 at 9:45 PM David Capwell  wrote:
>> 
>> I feel that #4 (fix bug and add flag to roll back to old behavior) is best.
>> 
>> About the alternative implementation, I am fine adding it to 3.x and 4.0,
>> but should treat it as a different path disabled by default that you can
>> opt-into, with a plan to opt-in by default "eventually".
>> 
>> On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith <
>> bened...@apache.org>
>> wrote:
>> 
>>> Perhaps there might be broader appetite to weigh in on which major
>>> releases we might target for work that fixes the correctness bug without
>>> serious performance regression?
>>> 
>>> i.e., if we were to fix the correctness bug now, introducing a serious
>>> performance regression (either opt-in or opt-out), but were to land work
>>> without this problem for 5.0, would there be appetite to backport this
>> work
>>> to any of 4.0, 3.11 or 3.0?
>>> 
>>> 
>>> On 18/11/2020, 18:31, "Jeff Jirsa"  wrote:
>>> 
>>>This is complicated and relatively few people on earth understand it,
>>> so
>>>having little feedback is mostly expected, unfortunately.
>>> 
>>>My normal emotional response is "correctness is required, opt-in to
>>>performance improvements that sacrifice strict correctness", but I'm
>>> also
>>>sure this is going to surprise people, and would understand / accept
>> #4
>>>(default to current, opt-in to correct).
>>> 
>>> 
>>>On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith <
>>> bened...@apache.org>
>>>wrote:
>>> 
 It doesn't seem like there's much enthusiasm for any of the options
 available here...
 
 On 12/11/2020, 14:37, "Benedict Elliott Smith" <
>> bened...@apache.org
 
 wrote:
 
> Is the new implementation a separate, distinctly modularized
>>> new
 body of work
 
It’s primarily a distinct, modularised and new body of work,
>>> however
 there is some shared code that has been modified - namely
>>> PaxosState, in
 which legacy code is maintained but modified for compatibility, and
>>> the
 system.paxos table (which receives a new column, and slightly
>>> modified
 serialization code).  It is conceptually an optimised version of
>> the
 existing algorithm.
 
If there's a chance of being of value to 4.0, I can try to put
>>> up a
 patch next week alongside a high level description of the changes.
 
> But a performance regression is a regression, I'm not
>>> shrugging it
 off.
 
I don't want to give the impression I'm shrugging off the
>>> correctness
 issue either. It's a serious issue to fix, but since all successful
>>> updates
 to the database are linearizable, I think it's likely that many
 applications behave correctly with the present semantics, or at
>> least
 encounter only transient errors. No doubt many also do not, but I
>>> have no
 idea of the ratio.
 
The regression isn't itself a simple issue either - depending
>> on
>>> the
 topology and message latencies it is not difficult to produce
>>> inescapable
 contention, i.e. guaranteed timeouts - that might persist as long
>> as
 clients continue to retry. It could be quite a serious degradation
>> of
 service to impose on our users.
 
I don't pretend to know the correct way to make a decision
>>> balancing
 these considerations, but I am perhaps more concerned about
>> imposing
 service outages than I am temporarily maintaining semantics our
>>> users have
 apparently accepted for years - though I absolutely share your
 embarrassment there.
 
 
On 12/11/2020, 12:41, "Jos

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-23 Thread Blake Eggleston

+1 to correctness, and I like the yaml idea

> On Nov 23, 2020, at 4:20 AM, Paulo Motta  wrote:
> 
> +1 to defaulting for correctness.
> 
> In addition to that, how about making it a mandatory cassandra.yaml
> property defaulting to correctness? This would make upgrades with an old
> cassandra.yaml fail unless an option is explicitly specified, making
> operators aware of the issue and forcing them to make a choice.
> 
>> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer <
>> benjamin.le...@datastax.com> escreveu:
>> 
>> Thank you very much to everybody that provided feedback. It helped a lot to
>> limit our options.
>> 
>> Unfortunately, it seems that some poor soul (me, really!!!) will have to
>> make the final call between #3 and #4.
>> 
>> If I reformulate the question to: Do we default to *correctness *or to
>> *performance*?
>> 
>> I would choose to default to *correctness*.
>> 
>> Of course the situation is more complex than that but it seems that
>> somebody has to make a call and live with it. It seems to me that being
>> blamed for choosing correctness is easier to live with ;-)
>> 
>> Benjamin
>> 
>> PS: I tried to push the choice on Sylvain but he dodged the bullet.
>> 
>> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith <
>> bened...@apache.org>
>> wrote:
>> 
>>> I think I meant #4 __‍♂️
>>> 
>>> On 20/11/2020, 21:11, "Blake Eggleston" 
>>> wrote:
>>> 
>>>I’d also prefer #3 over #4
>>> 
>>>> On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith <
>>> bened...@apache.org> wrote:
>>>> 
>>>> Well, I expressed a preference for #3 over #4, particularly for
>> the
>>> 3.x series.  However at this point, I think the lack of a clear project
>>> decision means we can punt it back to you and Sylvain to make the final
>>> call.
>>>> 
>>>> On 20/11/2020, 16:23, "Benjamin Lerer" <
>> benjamin.le...@datastax.com>
>>> wrote:
>>>> 
>>>>   I will try to summarize the discussion to clarify the outcome.
>>>> 
>>>>   Mick is in favor of #4
>>>>   Summanth is in favor of #4
>>>>   Sylvain answer was not clear for me. I understood it like I
>>> prefer #3 to #4
>>>>   and I am also fine with #1
>>>>   Jeff is in favor of #3 and will understand #4
>>>>   David is in favor #3 (fix bug and add flag to roll back to old
>>> behavior) in
>>>>   4.0 and #4 in 3.0 and 3.11
>>>> 
>>>>   Do not hesitate to correct me if I misunderstood your answer.
>>>> 
>>>>   Based on these answers it seems clear that most people prefer to
>>> go for #3
>>>>   or #4.
>>>> 
>>>>   The choice between #3 (fix correctness opt-in to current
>>> behavior) and #4
>>>>   (current behavior opt-in to correctness) is a bit less clear
>>> specially if
>>>>   we consider the 3.X branches or 4.0.
>>>> 
>>>>   Does anybody as some idea on how to choose between those 2
>>> choices or some
>>>>   extra opinions on #3 versus #4?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>>   On Wed, Nov 18, 2020 at 9:45 PM David Capwell <
>>> dcapw...@gmail.com> wrote:
>>>>> 
>>>>> I feel that #4 (fix bug and add flag to roll back to old behavior)
>>> is best.
>>>>> 
>>>>> About the alternative implementation, I am fine adding it to 3.x
>>> and 4.0,
>>>>> but should treat it as a different path disabled by default that
>>> you can
>>>>> opt-into, with a plan to opt-in by default "eventually".
>>>>> 
>>>>> On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith <
>>>>> bened...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Perhaps there might be broader appetite to weigh in on which
>> major
>>>>>> releases we might target for work that fixes the correctness bug
>>> without
>>>>>> serious performance regression?
>>>>>> 
>>>>>> i.e., if we were to fix the correctness bug now, introducing a
>>> serious
>>>>>> performance regression (either opt-in or opt-out), but were to
>>> land work
>>>>>>

Re: [VOTE] Release dtest-api 0.0.7

2020-12-03 Thread Blake Eggleston

Proposing the test build of in-jvm dtest API 0.0.7 for release.

Repository:https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git;a=shortlog;h=refs/tags/0.0.7

Candidate 
SHA:https://github.com/apache/cassandra-in-jvm-dtest-api/commit/d5174b1f44b7d9cb919d4975b4d437041273c09c
tagged with 0.0.7
Artifact:https://repository.apache.org/content/repositories/orgapachecassandra-1225/org/apache/cassandra/dtest-api/0.0.7/

Key signature: 9E66CEC6106D578D0B1EB9BFF1000962B7F6840C

Changes since last release:

  * CASSANDRA-16136: Add Metrics to instance API
  * CASSANDRA-16272: Nodetool assert apis do not include the new
stdout and stderr in the failure message

The vote will be open for 24 hours. Everyone who has tested the build
is invited to vote. Votes by PMC members are considered binding. A
vote passes if there are at least three binding +1s.

-- Alex

Re: [VOTE] Release dtest-api 0.0.7

2020-12-03 Thread Blake Eggleston

+1, sorry for the html barf :)

> On Dec 3, 2020, at 9:53 AM, Blake Eggleston  
> wrote:
> 
> Proposing the test build of in-jvm dtest API 0.0.7 for 
> release.
> 
> Repository:https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git;a=shortlog;h=refs/tags/0.0.7
> 
> Candidate 
> SHA:https://github.com/apache/cassandra-in-jvm-dtest-api/commit/d5174b1f44b7d9cb919d4975b4d437041273c09c
> tagged with 0.0.7
> Artifact:https://repository.apache.org/content/repositories/orgapachecassandra-1225/org/apache/cassandra/dtest-api/0.0.7/
> 
> Key signature: 9E66CEC6106D578D0B1EB9BFF1000962B7F6840C
> 
> Changes since last release:
> 
>   * CASSANDRA-16136: Add Metrics to instance API
>   * CASSANDRA-16272: Nodetool assert apis do not include the new
> stdout and stderr in the failure message
> 
> The vote will be open for 24 hours. Everyone who has tested the build
> is invited to vote. Votes by PMC members are considered binding. A
> vote passes if there are at least three binding +1s.
> 
> -- Alex
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 2.2.0-rc2

2015-07-08 Thread Blake Eggleston

-1. I've found some problems with 2.2 commit log replay in
https://issues.apache.org/jira/browse/CASSANDRA-9749 that could lose data
in some situations.


On Wed, Jul 8, 2015 at 7:19 AM Michael Shuler 
wrote:

> +1 non-binding
>
> On 07/06/2015 01:47 PM, Jake Luciani wrote:
> > I propose the following artifacts for release as 2.2.0-rc2.
> >
> > sha1: ebc50d783505854f04f183297ad3009b9095b07e
> > Git:
> >
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.0-rc2-tentative
> > Artifacts:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1065/org/apache/cassandra/apache-cassandra/2.2.0-rc2/
> > Staging repository:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1065/
> >
> > The artifacts as well as the debian package are also available here:
> > http://people.apache.org/~jake
> >
> > The vote will be open for 72 hours (longer if needed).
> >
> > [1]: http://goo.gl/C1QdHh (CHANGES.txt)
> > [2]: http://goo.gl/NPABEq (NEWS.txt)
> >
>
>

CASSANDRA-9143

2016-08-24 Thread Blake Eggleston

Hi everyone,

I just posted a proposed solution to some issues with incremental repair in 
CASSANDRA-9143. The solution involves non-trivial changes to the way 
incremental repair works, so I’m giving it a shout out on the dev list in the 
spirit of increasing the flow of information here.

Summary of problem:

Anticompaction excludes sstables that have been, or are, compacting. 
Anticompactions can also fail on a single machine due to any number of reasons. 
In either of these scenarios, a potentially large amount of data will be marked 
as unrepaired on one machine that’s marked as repaired on the others. During 
the next incremental repair, this potentially large amount of data will be 
unnecessarily streamed out to the other nodes, because it won’t be in their 
unrepaired data.

Proposed solution:

Add a ‘pending repair’ bucket to the existing repaired and unrepaired sstable 
buckets. We do the anticompaction up front, but put the anticompacted data into 
the pending bucket. From here, the repair proceeds normally against the pending 
sstables, with the streamed sstables also going into the pending buckets. Once 
all nodes have completed streaming, the pending sstables are moved into the 
repaired bucket, or back into unrepaired if there’s a failure.

- Blake

Re: CASSANDRA-9143

2016-08-24 Thread Blake Eggleston

Agreed, I’d rather discuss the details on JIRA. It might be nice to send 
another email describing whatever conclusion we come to, after we have 
everything hashed out.


> On Aug 24, 2016, at 4:09 PM, Paulo Motta  wrote:
> 
> Thanks for sharing this! I added some comments/suggestions on the ticket
> for those interested.
> 
> On a side note, it's still not clear if we should do the discussion here on
> the dev-list or just call attention for a particular issue/ticket and then
> continue discussion on JIRA, but I find the latter more appropriate to
> avoid spamming those not interested, and only update here if there are new
> developments in the ticket direction.
> 
> 2016-08-24 18:35 GMT-03:00 Blake Eggleston :
> 
>> Hi everyone,
>> 
>> I just posted a proposed solution to some issues with incremental repair
>> in CASSANDRA-9143. The solution involves non-trivial changes to the way
>> incremental repair works, so I’m giving it a shout out on the dev list in
>> the spirit of increasing the flow of information here.
>> 
>> Summary of problem:
>> 
>> Anticompaction excludes sstables that have been, or are, compacting.
>> Anticompactions can also fail on a single machine due to any number of
>> reasons. In either of these scenarios, a potentially large amount of data
>> will be marked as unrepaired on one machine that’s marked as repaired on
>> the others. During the next incremental repair, this potentially large
>> amount of data will be unnecessarily streamed out to the other nodes,
>> because it won’t be in their unrepaired data.
>> 
>> Proposed solution:
>> 
>> Add a ‘pending repair’ bucket to the existing repaired and unrepaired
>> sstable buckets. We do the anticompaction up front, but put the
>> anticompacted data into the pending bucket. From here, the repair proceeds
>> normally against the pending sstables, with the streamed sstables also
>> going into the pending buckets. Once all nodes have completed streaming,
>> the pending sstables are moved into the repaired bucket, or back into
>> unrepaired if there’s a failure.
>> 
>> - Blake

Re: Proposal - 3.5.1

2016-09-16 Thread Blake Eggleston

 I'm not even sure it's reasonable to 
expect from *any* software, and even less so for an open-source 
project based on volunteering. Not saying it wouldn't be amazing, it 
would, I just don't believe it's realistic.

Postgres does a pretty good job of this. This sort of thinking is a self 
fulfilling prophecy imo. Clearly, we won’t get to this point right away, but it 
should definitely be a goal.

On September 16, 2016 at 9:04:03 AM, Sylvain Lebresne (sylv...@datastax.com) 
wrote:

On Fri, Sep 16, 2016 at 5:18 PM, Jonathan Haddad  wrote:  

>  
> This is a different mentality from having a "features" branch, where it's  
> implied that at times it's acceptable that it not be stable.  

I absolutely never implied that, though I willingly admit my choice of  
branch  
names may be to blame. I 100% agree that no releases should be done  
without a green test board moving forward and if something was implicit  
in my 'feature' branch proposal, it was that.  

Where we might not be in the same page is that I just don't believe it's  
reasonable to expect the project will get any time soon in a state where  
even a green test board release (with new features) meets the "can be  
confidently put into production". I'm not even sure it's reasonable to  
expect from *any* software, and even less so for an open-source  
project based on volunteering. Not saying it wouldn't be amazing, it  
would, I just don't believe it's realistic. In a way, the reason why I think  
tick-tock doesn't work is *exactly* because it's based on that unrealistic  
assumption.  

Of course, I suppose that's kind of my opinion. I'm sure some will think  
that the "historical trend" of release instability is simply due to a lack  
of  
effort (obviously Cassandra developers don't give a shit about users, that  
must the simplest explanation).

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-18 Thread Blake Eggleston

Introducing all of these in a single release seems pretty risky. I think it 
would be safer to spread these out over a few 4.x releases (as they’re 
finished) and give them time to stabilize before including them in an LTS 
release. The downside would be having to maintain backwards compatibility 
across the 4.x versions, but that seems preferable to delaying the release of 
4.0 to include these, and having another big bang release.

The other problem here is uncertainty about the frequency and length of support 
of so called LTS releases. There was a thread about getting off of tick-tock a 
while ago, but it died without coming to any kind of conclusion. Personally, 
I’d like to see us do (non-dev) releases every 6 months, support them for 1 
year, and critical fixes for 2 years.

On November 18, 2016 at 10:25:31 AM, Jason Brown (jasedbr...@gmail.com) wrote:

Hey all,  

Here's an update on the following items:  

NIO meassing/streaming - first is being reviewed; second is getting close  
to review time  
Gossip 2.0 - TL;DR I don't plan on moving cluster metadata (the current  
"gossip" data) onto the new gossip/membership stack until 5.0, so it's not  
a 4.0 blocker. I'll update #12345 with the details I'm thinking about. I  
still want to start getting this code in, though, or at least in discussion.  
Birch - on track  
#11559 (enhanced node representation) - decided it's *not* something we  
need wrt #7544 storage port configurable per node, so we are punting on  
#11559  
#6246 epaxos - if we're targeting Q1 2017 for 4.0, we probably can't get it  
ready by then  
#7544 storage port configurable per node - on track  

So basically, I've removed two items off that list of blockers for 4.0.  
Hope that helps  

-Jason  

On Fri, Nov 18, 2016 at 9:25 AM, sankalp kohli   
wrote:  

> Hi Nate,  
> Most of the JIRAs in the middle are being rebased or being  
> reviewed and code is already out there. These will make 4.0 a very solid  
> release.  
>  
> Thanks,  
> Sankalp  
>  
> On Thu, Nov 17, 2016 at 5:10 PM, Ben Bromhead  wrote:  
>  
> > We are happy to start testing against completed features. Ideally once  
> > everything is ready for an RC (to catch interaction bugs), but we can do  
> > sooner for features where it make sense and are finished earlier.  
> >  
> > On Thu, 17 Nov 2016 at 16:47 Nate McCall  wrote:  
> >  
> > > To sum up that other thread (I very much appreciate everyone's input,  
> > > btw), here is an aggregate list of large, breaking 4.0 proposed  
> > > changes:  
> > >  
> > > CASSANDRA-9425 Immutable node-local schema  
> > > CASSANDRA-10699 Strongly consistent schema alterations  
> > > --  
> > > CASSANDRA-12229 NIO streaming  
> > > CASSANDRA-8457 NIO messaging  
> > > CASSANDRA-12345 Gossip 2.0  
> > > CASSANDRA-9754 Birch trees  
> > > CASSANDRA-11559 enhanced node representation  
> > > CASSANDRA-6246 epaxos  
> > > CASSANDRA-7544 storage port configurable per node  
> > > --  
> > > CASSANDRA-5 remove thrift support  
> > > CASSANDRA-10857 dropping compact storage  
> > >  
> > > Again, this is the "big things that will probably break stuff" list  
> > > and thus should happen with a major (did I miss anything?). There  
> > > were/are/will be other smaller issues, but we don't really need to  
> > > keep them in front of us for this discussion as they can/will just  
> > > kind of happen w/o necessarily affecting anything else.  
> > >  
> > > That all said, since we are 'doing a software' we need to start  
> > > thinking about the above in balance with resources and time. However,  
> > > a lot of the above items do have a substantial amount of code written  
> > > against them so it's not as daunting as it seems.  
> > >  
> > > What I would like us to discuss is rough timelines and what is needed  
> > > to get these out the door.  
> > >  
> > > One thing that sticks out to me: that big chunk in the middle there is  
> > > coming out of the same shop in Cupertino. I'm nervous about that. Not  
> > > that that ya'll are not capable, I'm solely looking at it from the  
> > > "that is a big list of some pretty hard shit" perspective.  
> > >  
> > > So what else do we need to discuss to get these completed? How and  
> > > where can other folks pitch in?  
> > >  
> > > -Nate  
> > >  
> > --  
> > Ben Bromhead  
> > CTO | Instaclustr   
> > +1 650 284 9692  
> > Managed Cassandra / Spark on AWS, Azure and Softlayer  
> >  
>

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-18 Thread Blake Eggleston

> While stability is important if we push back large "core" changes until later 
> we're just setting ourselves up to face the same issues later on

In theory, yes. In practice, when incomplete features are earmarked for a 
certain release, those features are often rushed out, and not always fully 
baked.

In any case, I don’t think it makes sense to spend too much time planning what 
goes into 4.0, and what goes into the next major release with so many release 
strategy related decisions still up in the air. Are we going to ditch 
tick-tock? If so, what will it’s replacement look like? Specifically, when will 
the next “production” release happen? Without knowing that, it's hard to say if 
something should go in 4.0, or 4.5, or 5.0, or whatever.

The reason I suggested a production release every 6 months is because (in my 
mind) it’s frequent enough that people won’t be tempted to rush features to hit 
a given release, but not so frequent that it’s not practical to support. It 
wouldn’t be the end of the world if some of these tickets didn’t make it into 
4.0, because 4.5 would fine.

On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com) wrote:

On 18 November 2016 at 18:25, Jason Brown  wrote:  

> #11559 (enhanced node representation) - decided it's *not* something we  
> need wrt #7544 storage port configurable per node, so we are punting on  
>  

#12344 - Forward writes to replacement node with same address during replace  

depends on #11559. To be honest I'd say #12344 is pretty important,  
otherwise it makes it difficult to replace nodes without potentially  
requiring client code/configuration changes. It would be nice to get #12344  
in for 4.0. It's marked as an improvement but I'd consider it a bug and  
thus think it could be included in a later minor release.  

Introducing all of these in a single release seems pretty risky. I think it  
> would be safer to spread these out over a few 4.x releases (as they’re  
> finished) and give them time to stabilize before including them in an LTS  
> release. The downside would be having to maintain backwards compatibility  
> across the 4.x versions, but that seems preferable to delaying the release  
> of 4.0 to include these, and having another big bang release.  

I don't think anyone expects 4.0.0 to be stable. It's a major version  
change with lots of new features; in the production world people don't  
normally move to a new major version until it has been out for quite some  
time and several minor releases have passed. Really, most people are only  
migrating to 3.0.x now. While stability is important if we push back large  
"core" changes until later we're just setting ourselves up to face the same  
issues later on. There should be enough uptake on the early releases of 4.0  
from new users to help test and get it to a production-ready state.  

Kurt Greaves  
k...@instaclustr.com  
www.instaclustr.com

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Blake Eggleston

I think Ed's just using gossip 2.0 as a hypothetical example. His point is that 
we should only commit things when we have a high degree of confidence that they 
work correctly, not with the expectation that they don't.


On November 19, 2016 at 10:52:38 AM, Michael Kjellman 
(mkjell...@internalcircle.com) wrote:

Jason has asked for review and feedback many times. Maybe be constructive and 
review his code instead of just complaining (once again)?  

Sent from my iPhone  

> On Nov 19, 2016, at 1:49 PM, Edward Capriolo  wrote:  
>  
> I would say start with a mindset like 'people will run this in production'  
> not like 'why would you expect this to work'.  
>  
> Now how does this logic effect feature develement? Maybe use gossip 2.0 as  
> an example.  
>  
> I will play my given debby downer role. I could imagine 1 or 2 dtests and  
> the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes with  
> twitter announce of the release let bugs trickle in.  
>  
> One could also do something comprehensive like test on clusters of 2 to  
> 1000 nodes. Test with jepsen to see what happens during partitions, inject  
> things like jvm pauses and account for behaivor. Log convergence times  
> after given events.  
>  
> Take a stand and say look "we engineered and beat the crap out of this  
> feature. I deployed this release feature at my company and eat my dogfood.  
> You are not my crash test dummy."  
>  
>  
>> On Saturday, November 19, 2016, Jeff Jirsa  wrote:  
>>  
>> Any proposal to solve the problem you describe?  
>>  
>> --  
>> Jeff Jirsa  
>>  
>>  
>>> On Nov 19, 2016, at 8:50 AM, Edward Capriolo > <;>> wrote:  
>>>  
>>> This is especially relevant if people wish to focus on removing things.  
>>>  
>>> For example, gossip 2.0 sounds great, but seems geared toward huge  
>> clusters  
>>> which is not likely a majority of users. For those with a 20 node cluster  
>>> are the indirect benefits woth it?  
>>>  
>>> Also there seems to be a first push to remove things like compact storage  
>>> or thrift. Fine great. But what is the realistic update path for someone.  
>>> If the big players are running 2.1 and maintaining backports, the average  
>>> shop without a dedicated team is going to be stuck saying (great features  
>>> in 4.0 that improve performance, i would probably switch but its not  
>> stable  
>>> and we have that one compact storage cf and who knows what is going to  
>>> happen performance wise when)  
>>>  
>>> We really need to lose this realease wont be stable for 6 minor versions  
>>> concept.  
>>>  
>>> On Saturday, November 19, 2016, Edward Capriolo > <;>>  
>>> wrote:  
>>>  
>>>>  
>>>>  
>>>> On Friday, November 18, 2016, Jeff Jirsa > <;>  
>>>> <_e(%7B%7D,'cvml','jeff.ji...@crowdstrike.com <;>');>>  
>> wrote:  
>>>>  
>>>>> We should assume that we’re ditching tick/tock. I’ll post a thread on  
>>>>> 4.0-and-beyond here in a few minutes.  
>>>>>  
>>>>> The advantage of a prod release every 6 months is fewer incentive to  
>> push  
>>>>> unfinished work into a release.  
>>>>> The disadvantage of a prod release every 6 months is then we either  
>> have  
>>>>> a very short lifespan per-release, or we have to maintain lots of  
>> active  
>>>>> releases.  
>>>>>  
>>>>> 2.1 has been out for over 2 years, and a lot of people (including us)  
>> are  
>>>>> running it in prod – if we have a release every 6 months, that means  
>> we’d  
>>>>> be supporting 4+ releases at a time, just to keep parity with what we  
>> have  
>>>>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+  
>> year  
>>>>> old branches.  
>>>>>  
>>>>>  
>>>>> On 11/18/16, 3:10 PM, "beggles...@apple.com <;> on behalf  
>> of Blake  
>>>>> Eggleston" > wrote:  
>>>>>  
>>>>>>> While stability is important if we push back large "core" changes  
>>>>> until later we're just setting ourselves up to face the same issues  
>> later on  
>>>>>>  
>>>>>> In theory, yes. In practice, when incomplete features are earmarked  
>

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-20 Thread Blake Eggleston

> I'm not sure how the apache team does this. Perhaps individual engineers 
can run some modern version at a company of theirs, altho that seems 
unlikely, but as an Apache org, i just don't see how that happens. 

> To me it seems like the Apache Cassandra infrastructure itself needs to 
stand up a multinode live instance running some 'real-world' example 
that is getting pounded, so that we can stage feature branches to really 
test them. 

Not having access to test hardware as an apache org is a problem, but there’s 
also a lot of room for improvement on the junit testing and testability side of 
things. That’s true for both local and distributed components, but more junit 
coverage of the distributed mechanisms would make not having test hardware suck 
less. With distributed algorithms (like gossip 2.0) one of the limitations of 
testing with live nodes is that you’re often just testing the happy path. 
Reliably and repeatably testing how the system responds to weird edge cases 
involving specific ordering of events across nodes is very difficult to do.

I’d written epaxos with this sort of testing in mind, and was able to do a lot 
of testing of obscure failure scenarios (see 
https://github.com/bdeggleston/cassandra/blob/CASSANDRA-6246-trunk/test/unit/org/apache/cassandra/service/epaxos/integration/EpaxosIntegrationRF3Test.java#L144
 for an example). This doesn’t obviate the need to test on real clusters of 
course, but it does increase confidence that the system will behave correctly 
under load, and reduce the amount of things you’re relying on a loaded test 
cluster to reveal.

On November 20, 2016 at 9:02:55 AM, Dave Brosius (dbros...@mebigfatguy.com) 
wrote:

>> We fully intend to "engineer and test the snot out of" the changes  
we are working on as the whole point of us working on them is so we  
*can* run them in production, at our scale.  

I'm not sure how the apache team does this. Perhaps individual engineers  
can run some modern version at a company of theirs, altho that seems  
unlikely, but as an Apache org, i just don't see how that happens.  

To me it seems like the Apache Cassandra infrastructure itself needs to  
stand up a multinode live instance running some 'real-world' example  
that is getting pounded, so that we can stage feature branches to really  
test them.  

Otherwise we will forever be basing versions on the poor test saps who  
decide they are willing to risk all to upgrade to the cutting edge, and  
why, everyone believes in the adage, don't upgrade until at least .6  

--dave  

On 11/20/2016 09:50 AM, Jason Brown wrote:  
> Hey all,  
>  
> One of the goals on my team, when working on large patches, is to get  
> community feedback on these initiatives before throwing them into prod.  
> This gets us a wider net of feedback (see Sylvain's continuing excellent  
> rounds of feedback to my work on CASSANDRA-8457), as well as making sure we  
> don't go too far off the deep end in terms of straying from the community  
> version. The latter point is crucial because if we make too many  
> incompatible changes to, for example, the internode messaging protocol or  
> the CQL protocol or the sstable file format, and deploy that, it may be  
> very difficult, if not impossible, to rectify with future, in-development  
> versions of cassandra.  
>  
> We fully intend to "engineer and test the snot out of" the changes we are  
> working on as the whole point of us working on them is so we *can* run them  
> in production, at our scale. We aren't expecting others in the community to  
> dog food it for us. There will be a delay between committing something  
> upstream, and us backporting it to a current version we run in production  
> and actually deploying it. However, you can be sure that any bugs we find  
> will be fixed ASAP; we have many users counting on it.  
>  
> Thanks for listening,  
>  
> -Jason  
>  
>  
> On Sat, Nov 19, 2016 at 11:04 AM, Blake Eggleston   
> wrote:  
>  
>> I think Ed's just using gossip 2.0 as a hypothetical example. His point is  
>> that we should only commit things when we have a high degree of confidence  
>> that they work correctly, not with the expectation that they don't.  
>>  
>>  
>> On November 19, 2016 at 10:52:38 AM, Michael Kjellman (  
>> mkjell...@internalcircle.com) wrote:  
>>  
>> Jason has asked for review and feedback many times. Maybe be constructive  
>> and review his code instead of just complaining (once again)?  
>>  
>> Sent from my iPhone  
>>  
>>> On Nov 19, 2016, at 1:49 PM, Edward Capriolo   
>> wrote:  
>>> I would say start with a mindset like 'people will run this in  
>> production'  
>>

Re: Proposals for releases - 4.0 and beyond

2016-11-21 Thread Blake Eggleston

I really like Stefan's Ubuntu model (because of the LTS release), with 
Sylvain's suggestion a close second. Both because I think we should do a 
supported, non-dev release every 6 months, and release bug fixes for them for a 
at least a year.

On November 19, 2016 at 10:30:02 AM, Stefan Podkowinski (spo...@gmail.com) 
wrote:

I’d like to suggest an option similar to what Jeremiah described and that  
would basically follow the Ubuntu LTS release model [1], but with shorter  
time periods. The idea would be to do a stable release every 6 months with  
1 year bug fixing support. At the same time, every third stable release  
will serve as a LTS release and will be supported for 2 years.  

Have a look at the following gist for illustration:  
https://gist.github.com/spodkowinski/b9659169c73de3231f99bd17f74f5d1f  

As you can see, although the support periods are relatively long, only 3  
releases must be supported at the same time, which should be comparable to  
what is done now.  

At the same time, we also keep doing monthly releases, but they will only  
serve as a milestone for the next stable release. Call them “dev”, “beta”,  
“testing” or whatever you like. Users will be able to start developing for  
those dev releases and deploy to production with the next standard or LTS  
release, after development is finished. Another option for users would be  
to start a project with a standard release and later settle down on a LTS  
release for maintenance only. It's pretty flexible from a user perspective,  
easy to understand and not too much effort to implement from the  
development side.  

On Sat, Nov 19, 2016 at 12:49 AM, Jeff Jirsa   
wrote:  

> With 3.10 voting in progress (take 3), 3.11 in December/January  
> (probably?), we should solidify the plan for 4.0.  
>  
> I went through the archives and found a number of proposals. We (PMC) also  
> had a very brief chat in private to make sure we hadn’t missed any, and  
> here are the proposals that we’ve seen suggested.  
>  
> Option #1: Jon proposed [1] a feature release every 3 months and bugfixes  
> for 6 months after that.  
> Option #2: Mick proposed [2] bimonthly feature, semver, labelling release  
> with stability/quality during voting, 3 GA branches at a time.  
> Option #3: Sylvain proposed [3] feature / testing / stable branches, Y  
> cadence for releases, X month rotation from feature -> testing -> stable ->  
> EOL (X to be determined). This is similar to an Ubuntu/Debian like release  
> schedule – I asked Sylvain for an example just to make sure I understood  
> it, and I’ve copied that to github at [4].  
> Option #4: Jeremiah proposed [5] keeping monthly cadence, and every 12  
> months break off X.0.Y which becomes LTS (same as 3.0.x now). This  
> explicitly excludes alternating tick/tock feature/bugfix for the monthly  
> cadence on the newest/feature/4.x branch.  
> Option #5: Jason proposed a revision to Jeremiah’s proposal such that  
> releases to the LTS branches are NOT tied to a monthly cadence, but are  
> released “as needed”, and the LTS branches are also “as needed”, not tied  
> to a fixed (annual/semi-annual/etc) schedule.  
>  
> Please use this thread as an opportunity to discuss these proposals or  
> feel free to make your own proposals. I think it makes sense to treat this  
> like a nomination phase of an election – let’s allow at least 72 hours for  
> submitting and discussing proposals, and then we’ll open a vote after that.  
>  
> - Jeff  
>  
> [1]: https://lists.apache.org/thread.html/0b2ca82eb8c1235a4e44a406080729  
> be78fb539e1c0cbca638cfff52@%3Cdev.cassandra.apache.org%3E  
> [2]: https://lists.apache.org/thread.html/674ef1c02997041af4b8950023b07b  
> 2f48bce3b197010ef7d7088662@%3Cdev.cassandra.apache.org%3E  
> [3]: https://lists.apache.org/thread.html/fcc4180b7872be4db86eae12b538ee  
> f34c77dcdb5b13987235c8f2bd@%3Cdev.cassandra.apache.org%3E  
> [4]: https://gist.github.com/jeffjirsa/9bee187246ca045689c52ce9caed47bf  
> [5]: https://lists.apache.org/thread.html/0a3372b2f2b30fbeac04f7d5a214b2  
> 03b18f3d69223e7ec9efb64776@%3Cdev.cassandra.apache.org%3E  
>  
>  
>  
>  
>

Re: Wrapping up tick-tock

2017-01-10 Thread Blake Eggleston

I agree that 3.10 should be the last tick-tock release, but I also agree with 
Jon that we shouldn't go back to yearly-ish releases.

6 months has come up several times now as a good cadence for feature releases, 
and I think it's a good compromise between the competing interests of long term 
support, regular release of features (to prevent piling on), and effort to 
release. So +1 to 6 month releases.

On January 10, 2017 at 10:14:12 AM, Ariel Weisberg (ar...@weisberg.ws) wrote:

Hi,  

With yearly releases trunk is going to be a mess when it comes time to  
cut a release. Cutting releases is when people start caring whether all  
the things in the release are in a finished state. It's when the state  
of CI finally becomes relevant.  

If we wait a year we are going to accumulate a years worth of unfinished  
stuff in a single release. It's more expensive to context switch back  
and then address those issues. If we put out large unstable releases it  
means time until the features in the release are usable is pushed back  
even further since it takes another 6-12 months for the release to  
stabilize. Features introduced at the beginning of the cycle will have  
to wait 18-24 months before anyone can benefit from them.  

Is the biggest pain point with tick-tock just the elimination of long  
term support releases? What is the pain point around release frequency?  
Right now people should be using 3.0 unless they need a bleeding edge  
feature from 3.X and those people will have to give up something to get  
something.  

Ariel  

On Tue, Jan 10, 2017, at 10:29 AM, Jonathan Haddad wrote:  
> I don't see why it has to be one extreme (yearly) or another (monthly).  
> When you had originally proposed Tick Tock, you wrote:  
>  
> "The primary goal is to improve release quality. Our current major “dot  
> zero” releases require another five or six months to make them stable  
> enough for production. This is directly related to how we pile features  
> in  
> for 9 to 12 months and release all at once. The interactions between the  
> new features are complex and not always obvious. 2.1 was no exception,  
> despite DataStax hiring a full tme test engineering team specifically for  
> Apache Cassandra."  
>  
> I agreed with you at the time that the yearly cycle was too long to be  
> adding features before cutting a release, and still do now. Instead of  
> elastic banding all the way back to a process which wasn't working  
> before,  
> why not try somewhere in the middle? A release every 6 months (with  
> monthly bug fixes for a year) gives:  
>  
> 1. long enough time to stabilize (1 year vs 1 month)  
> 2. not so long things sit around untested forever  
> 3. only 2 releases (current and previous) to do bug fix support at any  
> given time.  
>  
> Jon  
>  
> On Tue, Jan 10, 2017 at 6:56 AM Jonathan Ellis  wrote:  
>  
> > Hi all,  
> >  
> > We’ve had a few threads now about the successes and failures of the  
> > tick-tock release process and what to do to replace it, but they all died  
> > out without reaching a robust consensus.  
> >  
> > In those threads we saw several reasonable options proposed, but from my  
> > perspective they all operated in a kind of theoretical fantasy land of  
> > testing and development resources. In particular, it takes around a  
> > person-week of effort to verify that a release is ready. That is, going  
> > through all the test suites, inspecting and re-running failing tests to see 
> >  
> > if there is a product problem or a flaky test.  
> >  
> > (I agree that in a perfect world this wouldn’t be necessary because your  
> > test ci is always green, but see my previous framing of the perfect world  
> > as a fantasy land. It’s also worth noting that this is a common problem  
> > for large OSS projects, not necessarily something to beat ourselves up  
> > over, but in any case, that's our reality right now.)  
> >  
> > I submit that any process that assumes a monthly release cadence is not  
> > realistic from a resourcing standpoint for this validation. Notably, we  
> > have struggled to marshal this for 3.10 for two months now.  
> >  
> > Therefore, I suggest first that we collectively roll up our sleeves to vet  
> > 3.10 as the last tick-tock release. Stick a fork in it, it’s done. No  
> > more tick-tock.  
> >  
> > I further suggest that in place of tick tock we go back to our old model of 
> >  
> > yearly-ish releases with as-needed bug fix releases on stable branches,  
> > probably bi-monthly. This amortizes the release validation problem over a  
> > longer development period. And of course we remain free to ramp back up to  
> > the more rapid cadence envisioned by the other proposals if we increase our 
> >  
> > pool of QA effort or we are able to eliminate flakey tests to the point  
> > that a long validation process becomes unnecessary.  
> >  
> > (While a longer dev period could mean a correspondingly more painful test  
> > valid

Re: [VOTE] 3.X branch feature freeze

2017-01-13 Thread Blake Eggleston

+1


On January 13, 2017 at 12:38:55 PM, Michael Shuler (mich...@pbandjelly.org) 
wrote:

+1 to freeze with this clarified branch situation.  

--  
Michael  

On 01/13/2017 11:53 AM, Aleksey Yeschenko wrote:  
> To elaborate further, under the current consensus there would be no 3.12 
> release.  
>  
> Meaning that there are a few features that already made it to 3.X (3.12) that 
> would  
> either:  
>  
> a) have to be reverted  
> b) have to be discarded together with the remained of the 3.X branch  
>  
> If the vote goes through, I suggest we kill off the 3.X branch, and 
> cherry-pick the bug fixes  
> that made it to 3.X back to the 3.11 branch.  
>  
> 3.11 branch will be the only one remaining.  
>  
> https://github.com/apache/cassandra/blob/cassandra-3.X/CHANGES.txt  
>  
> --  
> AY  
>  
> On 13 January 2017 at 17:21:22, Aleksey Yeschenko (alek...@apache.org) wrote: 
>  
>  
> Hi all!  
>  
> It seems like we have a general consensus on ending tick-tock at 3.11, and 
> moving  
> on to stabilisation-only for 3.11.x series.  
>  
> In light of this, I suggest immediate feature freeze in the 3.X branch.  
>  
> Meaning that only bug fixes go to the 3.11/3.X branch from now on.  
>  
> All new features that haven’t be committed yet should go to trunk only (4.0), 
> if the vote passes.  
>  
> What do you think?  
>  
> Thanks.  
>  
> --  
> AY  
>

Re: WriteTimeoutException when doing paralel DELETE IF EXISTS

2017-01-23 Thread Blake Eggleston

Hi Jaroslav,

That's pretty much expected behavior for the current LWT implementation, which 
has problems with key contention (the usage pattern you're describing here). 
Typically, you want to avoid having multiple clients doing LWT operations on 
the same partition key at the same time.

Thanks,

Blake
On January 20, 2017 at 4:25:05 AM, Jaroslav Kameník (jaros...@kamenik.cz) wrote:

Hi,  

I would like to ask here before posting new bug. I am trying to make a  
simple system  
for distribution preallocated tickets between concurrent clients using C*  
LWTs.  
It is simply one partition containing tickets for one domain, client reads  
the first one  
and tries to delete it conditionally, success = it owns it, fail = try  
again..  

It works well, but it starts to fail with WTEs under load. So I tried to  
make simple  
test with 16 concurrent threads competing for one row with 1000 cols. It  
was running  
on cluster with 5 C* 3.0.9 with default configuration, replication factor  
3.  

Surprisingly, it failed immediately after few requests. It takes longer  
time with  
less threads, but even 2 clients are enough to crash it.  

I am wondering, if it Is problem in Cassandra or normal behaviour or bad  
use of LWT?  

Thanks,  

Jaroslav

Re: Dropped messages on random nodes.

2017-01-23 Thread Blake Eggleston

Hi Dikang,

Do you have any GC logging or metrics you can correlate with the dropped 
messages? A 13 second pause sounds like a bad GC pause.

Thanks,

Blake


On January 22, 2017 at 10:37:22 PM, Dikang Gu (dikan...@gmail.com) wrote:

Btw, the C* version is 2.2.5, with several backported patches. 

On Sun, Jan 22, 2017 at 10:36 PM, Dikang Gu  wrote: 

> Hello there, 
> 
> We have a 100 nodes ish cluster, I find that there are dropped messages on 
> random nodes in the cluster, which caused error spikes and P99 latency 
> spikes as well. 
> 
> I tried to figure out the cause. I do not see any obvious bottleneck in 
> the cluster, the C* nodes still have plenty of cpu idle/disk io. But I do 
> see some suspicious gossip events around that time, not sure if it's 
> related. 
> 
> 2017-01-21_16:43:56.71033 WARN 16:43:56 [GossipTasks:1]: Not marking 
> nodes down due to local pause of 13079498815 > 50 
> 2017-01-21_16:43:56.85532 INFO 16:43:56 [ScheduledTasks:1]: MUTATION 
> messages were dropped in last 5000 ms: 65 for internal timeout and 10895 
> for cross node timeout 
> 2017-01-21_16:43:56.85533 INFO 16:43:56 [ScheduledTasks:1]: READ messages 
> were dropped in last 5000 ms: 33 for internal timeout and 7867 for cross 
> node timeout 
> 2017-01-21_16:43:56.85534 INFO 16:43:56 [ScheduledTasks:1]: Pool Name 
> Active Pending Completed Blocked All Time Blocked 
> 2017-01-21_16:43:56.85534 INFO 16:43:56 [ScheduledTasks:1]: MutationStage 
> 128 47794 1015525068 0 0 
> 2017-01-21_16:43:56.85535 
> 2017-01-21_16:43:56.85535 INFO 16:43:56 [ScheduledTasks:1]: ReadStage 
> 64 20202 450508940 0 0 
> 
> Any suggestions? 
> 
> Thanks! 
> 
> -- 
> Dikang 
> 
> 


-- 
Dikang

Re: Showing a new property in DESCRIBE TABLE output

2017-01-24 Thread Blake Eggleston

I haven't seen your implementation, but the likely cause of your problem is 
either that the new parameter isn't being sent over the client protocol, or 
that cqlsh is ignoring it. The cqlsh output of DESCRIBE TABLE seems to be 
generated by the TableMetadata class in the python driver (see the as_cql_query 
method). Dropping a breakpoint in there would probably be a good place to start.
On January 24, 2017 at 7:07:38 AM, Murukesh Mohanan 
(murukesh.moha...@gmail.com) wrote:

I'm having a go at CASSANDRA-13002 ( 
https://issues.apache.org/jira/browse/CASSANDRA-12403), by adding a new 
table property which will override the global slow_query_log_timeout_in_ms 
setting. It works, but I can't get it to show up in cqlsh DESCRIBE TABLE 
output. For example, this is what I get: 

cqlsh> DESCRIBE TABLE foo.bar; 

CREATE TABLE foo.bar ( 
id uuid PRIMARY KEY, 
name text 
) WITH bloom_filter_fp_chance = 0.01 
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} 
AND cdc = true 
AND comment = '' 
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'} 
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'} 
AND crc_check_chance = 1.0 
AND dclocal_read_repair_chance = 0.1 
AND default_time_to_live = 0 
AND gc_grace_seconds = 864000 
AND max_index_interval = 2048 
AND memtable_flush_period_in_ms = 1001 
AND min_index_interval = 128 
AND read_repair_chance = 0.0 
AND speculative_retry = '99PERCENTILE'; 

cqlsh> select table_name, slow_query_log_timeout_in_ms from 
system_schema.tables where table_name = 'bar' allow filtering; 

table_name | slow_query_log_timeout_in_ms 
+-- 
bar | 103 

The property (which is also called `slow_query_log_timeout_in_ms`) shows up 
in the system_schema table. 

It seems that the file to modify would be 
src/java/org/apache/cassandra/db/ColumnFamilyStoreCQLHelper.java, but I 
didn't have any luck modifying it. 

Any pointers, please? 



-- 

Murukesh Mohanan, 
Yahoo! Japan

Re: Code quality, principles and rules

2017-03-17 Thread Blake Eggleston

I think we’re getting a little ahead of ourselves talking about DI frameworks. 
Before that even becomes something worth talking about, we’d need to have made 
serious progress on un-spaghettifying Cassandra in the first place. It’s an 
extremely tall order. Adding a DI framework right now would be like throwing 
gasoline on a raging tire fire.

Removing singletons seems to come up every 6-12 months, and usually abandoned 
once people figure out how difficult they are to remove properly. I do think 
removing them *should* be a long term goal, but we really need something more 
immediately actionable. Otherwise, nothing’s going to happen, and we’ll be 
having this discussion again in a year or so when everyone’s angry that 
Cassandra 5.0 still isn’t ready for production, a year after it’s release.

That said, the reason singletons regularly get brought up is because doing 
extensive testing of anything in Cassandra is pretty much impossible, since the 
code is basically this big web of interconnected global state. Testing anything 
in isolation can’t be done, which, for a distributed database, is crazy. It’s a 
chronic problem that handicaps our ability to release a stable database.

At this point, I think a more pragmatic approach would be to draft and enforce 
some coding standards that can be applied in day to day development that drive 
incremental improvement of the testing and testability of the project. What 
should be tested, how it should be tested. How to write new code that talks to 
the rest of Cassandra and is testable. How to fix bugs in old code in a way 
that’s testable. We should also have some guidelines around refactoring the 
wildly untested sections, how to get started, what to do, what not to do, etc.

Thoughts?

Can we kill the wiki?

2017-03-17 Thread Blake Eggleston

With CASSANDRA-8700, docs were moved in tree, with the intention that they 
would replace the wiki. However, it looks like we’re still getting regular 
requests to edit the wiki. It seems like we should be directing these folks to 
the in tree docs and either disabling edits for the wiki, or just removing it 
entirely, and replacing it with a link to the hosted docs. I'd prefer we just 
remove it myself, makes things less confusing for newcomers.

Does that seem reasonable to everyone?

Re: Can we kill the wiki?

2017-03-19 Thread Blake Eggleston

The Cassandra wiki isn't used for collaborative design at all as far as I can 
tell. Also, most of the information that is in the wiki should probably be in 
tree somewhere. Either as proper docs, or contribution guidelines.

On March 19, 2017 at 8:17:18 AM, Edward Capriolo (edlinuxg...@gmail.com) wrote:

Wikis are still good for collaberative design etc. Its a burden to edit the 
docs and its not the place for all info.

Re: [VOTE] Ask Infra to move github notification emails to commits@

2017-03-20 Thread Blake Eggleston

Maybe we should add p...@cassandra.apache.org or something and send them there? 
I don't subscribe to commits@ because it's too much email, I would be 
interested in being notified when a PR is opened though.

On March 20, 2017 at 3:00:47 PM, Jeff Jirsa (jji...@gmail.com) wrote:

There's no reason for the dev list to get spammed everytime there's a  
github PR. We know most of the time we prefer JIRAs for real code PRs, but  
with docs being in tree and low barrier to entry, we may want to accept  
docs through PRs ( see https://issues.apache.org/jira/browse/CASSANDRA-13256  
, and comment on it if you disagree).  

To make that viable, we should make it not spam dev@ with every comment.  
Therefore I propose we move github PR comments/actions to commits@ so as  
not to clutter the dev@ list.  

Voting to remain open for 72 hours.  

- Jeff

Re: [VOTE] Ask Infra to move github notification emails to pr@

2017-03-20 Thread Blake Eggleston

+1


On March 20, 2017 at 3:33:16 PM, Jeff Jirsa (jji...@gmail.com) wrote:

There's no reason for the dev list to get spammed everytime there's a 
github PR. We know most of the time we prefer JIRAs for real code PRs, but 
with docs being in tree and low barrier to entry, we may want to accept 
docs through PRs ( see https://issues.apache.org/jira/browse/CASSANDRA-13256 
, and comment on it if you disagree). 

To make that viable, we should make it not spam dev@ with every comment. 
Therefore I propose we move github PR comments/actions to pr@ so as 
not to clutter the dev@ list. 

Voting to remain open for 72 hours. 

- Jeff

Re: [DISCUSS] Implementing code quality principles, and rules (was: Code quality, principles and rules)

2017-03-27 Thread Blake Eggleston

In addition to it’s test coverage problem, the project has a general 
testability problem, and I think it would be more effective to introduce some 
testing guidelines and standards that drive incremental improvement of both, 
instead of requiring an arbitrary code coverage metric be hit, which doesn’t 
tell the whole story anyway.

It’s not ready yet, but I’ve been putting together a testing standards document 
for the project since bringing it up in the “Code quality, principles and 
rules” email thread a week or so ago.

On March 27, 2017 at 4:51:31 PM, Edward Capriolo (edlinuxg...@gmail.com) wrote:
On Mon, Mar 27, 2017 at 7:03 PM, Josh McKenzie  wrote:  

> How do we plan on verifying #4? Also, root-cause to tie back new code that  
> introduces flaky tests (i.e. passes on commit, fails 5% of the time  
> thereafter) is a non-trivial pursuit (thinking #2 here), and a pretty  
> common problem in this environment.  
>  
> On Mon, Mar 27, 2017 at 6:51 PM, Nate McCall  wrote:  
>  
> > I don't want to lose track of the original idea from François, so  
> > let's do this formally in preparation for a vote. Having this all in  
> > place will make transition to new testing infrastructure more  
> > goal-oriented and keep us more focused moving forward.  
> >  
> > Does anybody have specific feedback/discussion points on the following  
> > (awesome, IMO) proposal:  
> >  
> > Principles:  
> >  
> > 1. Tests always pass. This is the starting point. If we don't care  
> > about test failures, then we should stop writing tests. A recurring  
> > failing test carries no signal and is better deleted.  
> > 2. The code is tested.  
> >  
> > Assuming we can align on these principles, here is a proposal for  
> > their implementation.  
> >  
> > Rules:  
> >  
> > 1. Each new release passes all tests (no flakinesss).  
> > 2. If a patch has a failing test (test touching the same code path),  
> > the code or test should be fixed prior to being accepted.  
> > 3. Bugs fixes should have one test that fails prior to the fix and  
> > passes after fix.  
> > 4. New code should have at least 90% test coverage.  
> >  
>  

True #4 is hard to verify in he current state. This was mentioned in a  
separate thread: If the code was in submodules, the code coverage tools  
should have less work to do because they typically only count coverage for  
a module and the tests inside that module. At that point it should be easy  
to write a plugin on top of something like this:  
http://alvinalexander.com/blog/post/java/sample-cobertura-ant-build-script.  

This is also an option:  

https://about.sonarqube.com/news/2016/05/02/continuous-analysis-for-oss-projects.html

Guidelines on testing

2017-04-24 Thread Blake Eggleston

About a month ago, in the ‘Code quality, principles and rules’ thread, I’d 
proposed adding some testing standards to the project in lieu of revisiting the 
idea of removing singletons. The idea was that we could drive incremental 
improvement of the test coverage and testability situation that could be 
applied in day to day work. I’ve pushed a first draft to my repo here:

https://github.com/bdeggleston/cassandra/blob/testing-doc/TESTING.md

Please take a look and let me know what you think. With the blessing of the 
pmc, I’d like this, or something like it, to be adopted as the reference for 
contributors and reviewers when deciding if a contribution is properly tested.

Blake

Re: Guidelines on testing

2017-05-05 Thread Blake Eggleston

I haven't had any objections, so I've opened 
https://issues.apache.org/jira/browse/CASSANDRA-13497

On May 4, 2017 at 10:18:10 AM, Jonathan Haddad (j...@jonhaddad.com) wrote:

+1  

On Tue, Apr 25, 2017 at 2:21 AM Stefan Podkowinski  wrote:  

> I don't see any reasons not to make this part of our guidelines. The  
> idea of having a list of what should be tested in each kind of test  
> makes sense. I also like the examples how to improve tests dealing with  
> global state.  
>  
> Some of the integration test cases, such as "dry  
> start"/"restart"/"shutdown"/"upgrade", could use some further  
> description and how-to examples. Are there any existing tests we can  
> link for reference?  
>  
> We also already have a testing related page in our documentation:  
> http://cassandra.apache.org/doc/latest/development/testing.html  
> Not sure if it would make sense to merge or create an additional document.  
>  
>  
> On 24.04.2017 18:13, Blake Eggleston wrote:  
> > About a month ago, in the ‘Code quality, principles and rules’ thread,  
> I’d proposed adding some testing standards to the project in lieu of  
> revisiting the idea of removing singletons. The idea was that we could  
> drive incremental improvement of the test coverage and testability  
> situation that could be applied in day to day work. I’ve pushed a first  
> draft to my repo here:  
> >  
> > https://github.com/bdeggleston/cassandra/blob/testing-doc/TESTING.md  
> >  
> > Please take a look and let me know what you think. With the blessing of  
> the pmc, I’d like this, or something like it, to be adopted as the  
> reference for contributors and reviewers when deciding if a contribution is  
> properly tested.  
> >  
> > Blake  
> >  
>

Re: Soliciting volunteers for flaky dtests on trunk

2017-05-10 Thread Blake Eggleston

I've taken CASSANDRA-13194, CASSANDRA-13506, CASSANDRA-13515, and 
CASSANDRA-13372 to start

On May 10, 2017 at 12:44:47 PM, Ariel Weisberg (ar...@weisberg.ws) wrote:

Hi,  

The dev list murdered my rich text formatted email. Here it is  
reformatted as plain text.  

The unit tests are looking pretty reliable right now. There is a long  
tail of infrequently failing tests but it's not bad and almost all  
builds succeed in the current build environment. In CircleCI it seems  
like unit tests might be a little less reliable, but still usable.  

The dtests on the other hand aren't producing clean builds yetl. There  
is also a pretty diverse set of failing tests.  

I did a bit of triaging of the flakey dtests. I started by cataloging  
everything, but what I found is that the long tail of flakey dtests is  
very long indeed so I narrowed focus to just the top frequently failing  
tests for now. See https://goo.gl/b96CdO  

I created spreadsheet with some of the failing tests. Links to JIRA,  
last time the test was seen failing, and how many failures I found in  
Apache Jenkins across the 3 dtest builds. There are a lot of failures  
not listed. There would be 50+ entries if I cataloged each one.  

There are two hard failing tests, but both are already moving along:  
CASSANDRA-13229 (Ready to commit, assigned Alex Petrov, Paulo Motta  
reviewing, last updated April 2017) dtest failure in  
topology_test.TestTopology.size_estimates_multidc_test  
CASSANDRA-13113 (Ready to commit, assigned Alex Petrov, Sam T Reviewing,  
last updated March 2017) test failure in  
auth_test.TestAuth.system_auth_ks_is_alterable_test  

I think the tests we should tackle first are on this sheet in priority  
order https://goo.gl/S3khv1  

Suite: bootstrap_test  
Test: TestBootstrap.simultaneous_bootstrap_test  
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13506  
Last failure: 5/5/2017  
Counted failures: 45  

Suite: repair_test  
Test: incremental_repair_test.TestIncRepair.compaction_test  
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13194  
Last failure: 5/4/2017  
Counted failures: 44  

Suite: sstableutil_test  
Test: SSTableUtilTest.compaction_test  
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13182  
Last failure: 5/4/2017  
Counted failures: 35  

Suite: paging_test  
Test: TestPagingWithDeletions.test_ttl_deletions  
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13507  
Last failure: 4/25/2017  
Counted failures: 31  

Suite: repair_test  
Test: incremental_repair_test.TestIncRepair.multiple_repair_test  
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13515  
Last failed: 5/4/2017  
Counted failures: 18  

Suite: cqlsh_tests  
Test: cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_*  
JIRA:  
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting%20Feedback%22)%20AND%20text%20~%20%22CqlshCopyTest%22
  
Last failed: 5/8/2017  
Counted failures: 23  

Suite: paxos_tests  
Test: TestPaxos.contention_test_many_threads  
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13517  
Last failed: 5/8/2017  
Counted failures: 15  

Suite: repair_test  
Test: TestRepair  
JIRA:  
https://issues.apache.org/jira/issues/?jql=status%20%3D%20Open%20AND%20text%20~%20%22dtest%20failure%20repair_test%22
  
Last failure: 5/4/2017  
Comment: No one test fails a lot but the number of failing tests is  
substantial  

Suite: cqlsh_tests  
Test: cqlsh_tests.CqlshSmokeTest.[test_insert | test_truncate |  
test_use_keyspace | test_create_keyspace]  
JIRA: No JIRA yet  
Last failed: 4/22/2017  
count: 6  

If you have spare cycles you can make a huge difference in test  
stability by picking off one of these.  

Regards,  
Ariel  

On Wed, May 10, 2017, at 12:45 PM, Ariel Weisberg wrote:  
> Hi all,  
>  
> The unit tests are looking pretty reliable right now. There is a long  
> tail of infrequently failing tests but it's not bad and almost all  
> builds succeed in the current build environment. In CircleCI it seems  
> like unit tests might be a little less reliable, but still usable.  
> The dtests on the other hand aren't producing clean builds yetl. There  
> is also a pretty diverse set of failing tests.  
> I did a bit of triaging of the flakey dtests. I started by cataloging  
> everything, but what I found is that the long tail of flakey dtests is  
> very long indeed so I narrowed focus to just the top frequently failing  
> tests for now. See https://goo.gl/b96CdO  
> I created spreadsheet with some of the failing tests. Links to JIRA,  
> last time the test was seen failing, and how many failures I found in  
> Apache Jenkins across the 3 dtest builds. There are a lot of failures  
> not listed. There would be 50+ entries if I cataloged each one.  
> There are two hard failing tests, but both are already moving along:  
> CAS

Re: Repair Management

2017-05-18 Thread Blake Eggleston

I am looking to improve monitoring and management of repairs (so far I have 
patch for adding ActiveRepairs to table/keyspace metrics) and come across 
ActiveRepairServiceMBean but this appears to be limited to incremental 
repairs. Is there a reason for this
The incremental repair stuff was just the first set of jmx controls added to 
ActiveRepairService. ActiveRepairService is involved in all repairs though.

I was looking to add something very similar to this nodetool repair_admin 
but it would work on co-ordinator repair commands. 

I'm not sure what you mean by "coordinator repair commands". Do you mean full 
repairs?

What is the purpose of the current repair_admin? If I wish to add the above 
should I rename the MBean to say 
org.apache.cassandra.db:type=IncrementalRepairService and the nodetool 
command to inc_repair_admin ? 

nodetool help repair_admin says it's purpose is to "list and fail incremental 
repair sessions". However, by failing incremental repair sessions, it doesn't 
mean that it cancels the validation/sync, just that it releases the sstables 
that were involved in the repair back into the unrepaired data set. I don't see 
any reason why you couldn't add this functionality to the existing 
RepairService mbean. That said, before getting into mbean names, it's probably 
best to come up with a plan for cancelling validation and sync on each of the 
replicas involved in a given repair. As far as I know (though I may be wrong), 
that's not currently supported.
On May 17, 2017 at 7:36:51 PM, Cameron Zemek (came...@instaclustr.com) wrote:

I am looking to improve monitoring and management of repairs (so far I have  
patch for adding ActiveRepairs to table/keyspace metrics) and come across  
ActiveRepairServiceMBean but this appears to be limited to incremental  
repairs. Is there a reason for this?  

I was looking to add something very similar to this nodetool repair_admin  
but it would work on co-ordinator repair commands.  

For example:  
$ nodetool repair_admin --list  
Repair#1 mykeyspace columnFamilies=colfamilya,colfamilyb; incremental=True;  
parallelism=parallel progress=5%  

$ nodetool repair_admin --terminate 1  
Terminating repair command #1 (19f00c30-1390-11e7-bb50-ffb920a6d70f)  

$ nodetool repair_admin --terminate-all # calls  
ssProxy.forceTerminateAllRepairSessions()  
Terminating all repair sessions  
Terminated repair command #2 (64c44230-21aa-11e7-9ede-cd6eb64e3786)  

What is the purpose of the current repair_admin? If I wish to add the above  
should I rename the MBean to say  
org.apache.cassandra.db:type=IncrementalRepairService and the nodetool  
command to inc_repair_admin ?

Re: Repair Management

2017-05-19 Thread Blake Eggleston

Cool. Just want to point out that if you're going to expose a command to 
terminate a repair, it should also stop any related validations a sync tasks 
that are in progress.


On May 18, 2017 at 6:02:16 PM, Cameron Zemek (came...@instaclustr.com) wrote:

Here is what I have done so far: 
https://github.com/apache/cassandra/compare/trunk...instaclustr:repair_management
 

> I'm not sure what you mean by "coordinator repair commands". Do you mean 
full repairs? 

By coordinator repair I meant the repair command from the coordinator node. 
That is the repair command from StorageService::repairAsync . Hopefully the 
branch above shows what I am mean. 





On 19 May 2017 at 03:16, Blake Eggleston  wrote: 

> I am looking to improve monitoring and management of repairs (so far I 
> have 
> patch for adding ActiveRepairs to table/keyspace metrics) and come across 
> ActiveRepairServiceMBean but this appears to be limited to incremental 
> repairs. Is there a reason for this 
> 
> The incremental repair stuff was just the first set of jmx controls added 
> to ActiveRepairService. ActiveRepairService is involved in all repairs 
> though. 
> 
> I was looking to add something very similar to this nodetool repair_admin 
> but it would work on co-ordinator repair commands. 
> 
> 
> I'm not sure what you mean by "coordinator repair commands". Do you mean 
> full repairs? 
> 
> What is the purpose of the current repair_admin? If I wish to add the 
> above 
> should I rename the MBean to say 
> org.apache.cassandra.db:type=IncrementalRepairService and the nodetool 
> command to inc_repair_admin ? 
> 
> 
> nodetool help repair_admin says it's purpose is to "list and fail 
> incremental repair sessions". However, by failing incremental repair 
> sessions, it doesn't mean that it cancels the validation/sync, just that it 
> releases the sstables that were involved in the repair back into the 
> unrepaired data set. I don't see any reason why you couldn't add this 
> functionality to the existing RepairService mbean. That said, before 
> getting into mbean names, it's probably best to come up with a plan for 
> cancelling validation and sync on each of the replicas involved in a given 
> repair. As far as I know (though I may be wrong), that's not currently 
> supported. 
> 
> On May 17, 2017 at 7:36:51 PM, Cameron Zemek (came...@instaclustr.com) 
> wrote: 
> 
> I am looking to improve monitoring and management of repairs (so far I 
> have 
> patch for adding ActiveRepairs to table/keyspace metrics) and come across 
> ActiveRepairServiceMBean but this appears to be limited to incremental 
> repairs. Is there a reason for this? 
> 
> I was looking to add something very similar to this nodetool repair_admin 
> but it would work on co-ordinator repair commands. 
> 
> For example: 
> $ nodetool repair_admin --list 
> Repair#1 mykeyspace columnFamilies=colfamilya,colfamilyb; 
> incremental=True; 
> parallelism=parallel progress=5% 
> 
> $ nodetool repair_admin --terminate 1 
> Terminating repair command #1 (19f00c30-1390-11e7-bb50-ffb920a6d70f) 
> 
> $ nodetool repair_admin --terminate-all # calls 
> ssProxy.forceTerminateAllRepairSessions() 
> Terminating all repair sessions 
> Terminated repair command #2 (64c44230-21aa-11e7-9ede-cd6eb64e3786) 
> 
> What is the purpose of the current repair_admin? If I wish to add the 
> above 
> should I rename the MBean to say 
> org.apache.cassandra.db:type=IncrementalRepairService and the nodetool 
> command to inc_repair_admin ? 
> 
>

Re: Proposal: Closing old, unable-to-repro JIRAs

2017-09-15 Thread Blake Eggleston

+1 to that


On September 14, 2017 at 4:50:54 PM, Jeff Jirsa (jji...@gmail.com) wrote:

There's a number of JIRAs that are old - sometimes very old - that 
represent bugs that either don't exist in modern versions, or don't have 
sufficient information for us to repro, but the reporter has gone away. 

Would anyone be offended if I start tagging these with the label 
'UnableToRepro' or 'Unresponsive' and start a 30 day timer to close them? 
Anyone have a better suggestion?

Proposal to retroactively mark materialized views experimental

2017-09-29 Thread Blake Eggleston

Hi dev@,

I’d like to propose that we retroactively classify materialized views as an 
experimental feature, disable them by default, and require users to enable them 
through a config setting before using.

Materialized views have several issues that make them (effectively) unusable in 
production. Some of the issues aren’t just implementation problems, but 
problems with the design that aren’t easily fixed. It’s unfair of us to make 
features available to users in this state without providing a clear warning 
that bad or unexpected things are likely to happen if they use it.

Obviously, this isn’t great news for users that have already adopted MVs, and I 
don’t have a great answer for that. I think that’s sort of a sunk cost at this 
point. If they have any MV related problems, they’ll have them whether they’re 
marked experimental or not. I would expect this to reduce the number of users 
adopting MVs in the future though, and if they do, it would be opt-in.

Once MVs reach a point where they’re usable in production, we can remove the 
flag. Specifics of how the experimental flag would work can be hammered out in 
a forthcoming JIRA, but I’d imagine it would just prevent users from creating 
new MVs, and maybe log warnings on startup for existing MVs if the flag isn’t 
enabled.

Let me know what you think.

Thanks,

Blake

Re: Proposal to retroactively mark materialized views experimental

2017-10-01 Thread Blake Eggleston

I'm not sure the main issue in the case of MVs is testing. In this case it 
seems to be that there are some design issues and/or the design was only works 
in some overly restrictive use cases. That MVs were committed knowing these 
were issues seems to be the real problem. So in the case of MVs, sure I don't 
think they should have ever made it to an experimental stage.

Thinking of how an experimental flag fits in the with the project going forward 
though, I disagree that we should avoid adding experimental features. On the 
contrary, I think leaning towards classifying new features as  experimental 
would be better for users. Especially larger features and changes.

Even with well spec'd, well tested, and well designed features, there will 
always be edge cases that you didn't think of, or you'll have made assumptions 
about the other parts of C* it relies on that aren't 100% correct. Small 
problems here can often affect correctness, or result in data loss. So, I think 
it makes sense to avoid marking them as ready for regular use until they've had 
time to bake in clusters where there are some expert operators that are 
sophisticated enough to understand the implications of running them, detect 
issues, and report bugs.

Regarding historical examples, in hindsight I think committing 8099, or at the 
very least, parts of it, behind an experimental flag would have been the right 
thing to do. It was a huge change that we're still finding issues with 2 years 
later.

On October 1, 2017 at 6:08:50 AM, DuyHai Doan (doanduy...@gmail.com) wrote:

How should we transition one feature from the "experimental" state to  
"production ready" state ? On which criteria ?  

On Sun, Oct 1, 2017 at 12:12 PM, Marcus Eriksson  wrote:  

> I was just thinking that we should try really hard to avoid adding  
> experimental features - they are experimental due to lack of testing right?  
> There should be a clear path to making the feature non-experimental (or get  
> it removed) and having that path discussed on dev@ might give more  
> visibility to it.  
>  
> I'm also struggling a bit to find good historic examples of "this would  
> have been better off as an experimental feature" - I used to think that it  
> would have been good to commit DTCS with some sort of experimental flag,  
> but that would not have made DTCS any better - it would have been better to  
> do more testing, realise that it does not work and then not commit it at  
> all of course.  
>  
> Does anyone have good examples of features where it would have made sense  
> to commit them behind an experimental flag? SASI might be a good example,  
> but for MVs - if we knew how painful they would be, they really would not  
> have gotten committed at all, right?  
>  
> /Marcus  
>  
> On Sat, Sep 30, 2017 at 7:42 AM, Jeff Jirsa  wrote:  
>  
> > Reviewers should be able to suggest when experimental is warranted, and  
> > conversation on dev+jira to justify when it’s transitioned from  
> > experimental to stable?  
> >  
> > We should remove the flag as soon as we’re (collectively) confident in a  
> > feature’s behavior - at least correctness, if not performance.  
> >  
> >  
> > > On Sep 29, 2017, at 10:31 PM, Marcus Eriksson   
> wrote:  
> > >  
> > > +1 on marking MVs experimental, but should there be some point in the  
> > > future where we consider removing them from the code base unless they  
> > have  
> > > gotten significant improvement as well?  
> > >  
> > > We probably need to enforce some kind of process for adding new  
> > > experimental features in the future - perhaps a mail like this one to  
> > dev@  
> > > motivating why it should be experimental?  
> > >  
> > > /Marcus  
> > >  
> > > On Sat, Sep 30, 2017 at 1:15 AM, Vinay Chella  
> >   
> > > wrote:  
> > >  
> > >> We tried perf testing MVs internally here but did not see good results  
> > with  
> > >> it, hence paused its usage. +1 on tagging certain features which are  
> not  
> > >> PROD ready or not stable enough.  
> > >>  
> > >> Regards,  
> > >> Vinay Chella  
> > >>  
> > >>> On Fri, Sep 29, 2017 at 7:22 PM, Ben Bromhead   
> > wrote:  
> > >>>  
> > >>> I'm a fan of introducing experimental flags in general as well, +1  
> > >>>  
> > >>>  
> > >>>  
> > >>>> On Fri, 29 Sep 2017 at 13:22 Jon Haddad  wrote:  
> > >>>>  
> > >>>> I’m very much +1 on this, and

Re: Proposal to retroactively mark materialized views experimental

2017-10-01 Thread Blake Eggleston

I think you're presenting a false dichotomy here. Yes there are people who are 
not interested in taking risks with C* and are still running 1.2, there are 
probably a few people who would put trunk in prod if we packaged it up for 
them, but there's a whole spectrum of users in between. Operator competence / 
sophistication has the same sort of spectrum.

I'd expect the amount of feedback on experimental features would be a function 
of the quality of the design / implementation and the amount of user interest. 
If you're not getting feedback on experimental feature, it's probably poorly 
implemented, or no one's interested in it.

I don't think labelling features is going to kill the user <-> developer 
feedback loop. It will probably slow down the pace of feature development a 
bit, but it's been slowing down anyway, and that's a good thing imo.

On October 1, 2017 at 9:14:45 AM, DuyHai Doan (doanduy...@gmail.com) wrote:

So basically we're saying that even with a lot of tests, you're never sure  
to cover all the possible edge cases and the real stamp for "production  
readiness" is only when the "experimental features" have been deployed in  
various clusters with various scenarios/use-cases, just re-phrasing Blake  
here. Totally +1 on the idea.  

Now I can foresee a problem with the "experimental" flag, that is nobody  
(in the community) will use it or even dare to play with it and thus the  
"experimental" features never get a chance to be tested and then we break  
the bug-reports/bug-fixes iterations ...  

How many times have I seen users on the ML asking which version of C* is  
the most fit for production and the answer was always at least 1 major  
version behind the current released major (2.1 was recommended when 3.x was  
released and so one ...) ?  

The fundamental issue here is that a lot of folks in the community do not  
want to take any risk and take a conservative approach for the production,  
which is fine and perfectly understandable. But it means that the implicit  
contract for OSS software, e.g. "you have a software for free in exchange  
you will give feedbacks and bug reports to improve it", is completely  
broken.  

Let's take the example of MV. MV was shipped with 3.0 --> considered not  
stable --> nobody/few people uses MV --> few bug reports --> bugs didn't  
have chance to get fixed --> the problem lasts until now  

About SASI, how many people really played with thoroughly apart from some  
toy examples ? Same causes, same consequences. And we can't even blame its  
design because fundamentally the architecture is pretty solid, just a  
question of usage and feedbacks.  

I suspect that this broken community QA/feedback loop did also explain  
partially the failure of tic/toc releases but it's only my own  
interpretation here.  

So if we don't figure out how to restore the "new feature/community bug  
report" strong feedback loop, we're going to face again the same issues and  
same debate in the future  

On Sun, Oct 1, 2017 at 5:30 PM, Blake Eggleston   
wrote:  

> I'm not sure the main issue in the case of MVs is testing. In this case it  
> seems to be that there are some design issues and/or the design was only  
> works in some overly restrictive use cases. That MVs were committed knowing  
> these were issues seems to be the real problem. So in the case of MVs, sure  
> I don't think they should have ever made it to an experimental stage.  
>  
> Thinking of how an experimental flag fits in the with the project going  
> forward though, I disagree that we should avoid adding experimental  
> features. On the contrary, I think leaning towards classifying new features  
> as experimental would be better for users. Especially larger features and  
> changes.  
>  
> Even with well spec'd, well tested, and well designed features, there will  
> always be edge cases that you didn't think of, or you'll have made  
> assumptions about the other parts of C* it relies on that aren't 100%  
> correct. Small problems here can often affect correctness, or result in  
> data loss. So, I think it makes sense to avoid marking them as ready for  
> regular use until they've had time to bake in clusters where there are some  
> expert operators that are sophisticated enough to understand the  
> implications of running them, detect issues, and report bugs.  
>  
> Regarding historical examples, in hindsight I think committing 8099, or at  
> the very least, parts of it, behind an experimental flag would have been  
> the right thing to do. It was a huge change that we're still finding issues  
> with 2 years later.  
>  
> On October 1, 2017 at 6:08:50 AM, DuyHai Doan (doanduy...@gmai

Re: Proposal to retroactively mark materialized views experimental

2017-10-02 Thread Blake Eggleston

Yeah I’m not sure that just emitting a warning is enough. The point is to be 
super explicit that bad things will happen if you use MVs. I would (in a patch 
release) disable MV CREATE statements, and emit warnings for ALTER statements 
and on schema load if they’re not explicitly enabled. Only emitting a warning 
really reduces visibility where we need it: in the development process.

By only emitting warning, we're just protecting users that don't run even 
rudimentary tests before upgrading their clusters. If an operator is going to 
blindly deploy a database update to prod without testing, they’re going to poke 
their eye out on something anyway. Whether it’s an MV flag or something else. 
If we make this change clear in NEWS.txt, and the user@ list, I think that’s 
the best thing to do.

On October 2, 2017 at 10:18:52 AM, Jeremiah D Jordan 
(jeremiah.jor...@gmail.com) wrote:

Hindsight is 20/20. For 8099 this is the reason we cut the 2.2 release before 
8099 got merged.  

But moving forward with where we are now, if we are going to start adding some 
experimental flags to things, then I would definitely put SASI on this list as 
well.  

For both SASI and MV I don’t know that adding a flags in the cassandra.yaml 
which prevents their use is the right way to go. I would propose that we emit 
WARN from the native protocol mechanism when a user does an ALTER/CREATE what 
ever that tries to use an experiment feature, and probably in the system.log as 
well.  So someone who is starting new development using them will get a warning 
showing up in cqlsh “hey the thing you just used is experimental, proceed with 
caution” and also in their logs.  

These things are live on clusters right now, and I would not want someone to 
upgrade their cluster to a new *patch* release and suddenly something that may 
have been working for them now does not function. Anyway, we need to be careful 
about how this gets put into practice if we are going to do it retroactively.  

-Jeremiah  

> On Oct 1, 2017, at 5:36 PM, Josh McKenzie  wrote:  
>  
>>  
>> I think committing 8099, or at the very least, parts of it, behind an  
>> experimental flag would have been the right thing to do.  
>  
> With a major refactor like that, it's a staggering amount of extra work to  
> have a parallel re-write of core components of a storage engine accessible  
> in parallel to the major based on an experimental flag in the same branch.  
> I think the complexity in the code-base of having two such channels in  
> parallel would be an altogether different kind of burden along with making  
> the work take considerably longer. The argument of modularizing a change  
> like that, however, is something I can get behind as a matter of general  
> principle. As we discussed at NGCC, the amount of static state in the C*  
> code-base makes this an aspirational goal rather than a reality all too  
> often, unfortunately.  
>  
> Not looking to get into the discussion of the appropriateness of 8099 and  
> other major refactors like it (nio MessagingService for instance) - but  
> there's a difference between building out new features and shielding the  
> code-base and users from their complexity and reliability and refactoring  
> core components of the code-base to keep it relevant.  
>  
> On Sun, Oct 1, 2017 at 5:01 PM, Dave Brosius  wrote:  
>  
>> triggers  
>>  
>>  
>> On 10/01/2017 11:25 AM, Jeff Jirsa wrote:  
>>  
>>> Historical examples are anything that you wouldn’t bet your job on for  
>>> the first release:  
>>>  
>>> Udf/uda in 2.2  
>>> Incremental repair - would have yanked the flag following 9143  
>>> SASI - probably still experimental  
>>> Counters - all sorts of correctness issues originally, no longer true  
>>> since the rewrite in 2.1  
>>> Vnodes - or at least shuffle  
>>> CDC - is the API going to change or is it good as-is?  
>>> CQL - we’re on v3, what’s that say about v1?  
>>>  
>>> Basically anything where we can’t definitively say “this feature is going  
>>> to work for you, build your product on it” because companies around the  
>>> world are trying to make that determination on their own, and they don’t  
>>> have the same insight that the active committers have.  
>>>  
>>> The transition out we could define as a fixed number of releases or a dev@  
>>> vote, I don’t think you’ll find something that applies to all experimental  
>>> features, so being flexible is probably the best bet there  
>>>  
>>>  
>>>  
>>  
>> -  
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
>> For additional commands, e-mail: dev-h...@cassandra.apache.org  
>>  
>>  

-  
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Proposal to retroactively mark materialized views experimental

2017-10-02 Thread Blake Eggleston

Yeah, I'm not proposing that we disable MVs in existing clusters.


On October 2, 2017 at 10:58:11 AM, Aleksey Yeshchenko (alek...@apple.com) wrote:

The idea is to check the flag in CreateViewStatement, so creation of new MVs 
doesn’t succeed without that flag flipped.  

Obviously, just disabling existing MVs working in a minor would be silly.  

As for the warning - yes, that should also be emitted. Unconditionally.  

—  
AY  

On 2 October 2017 at 18:18:52, Jeremiah D Jordan (jeremiah.jor...@gmail.com) 
wrote:  

These things are live on clusters right now, and I would not want someone to 
upgrade their cluster to a new *patch* release and suddenly something that may 
have been working for them now does not function. Anyway, we need to be careful 
about how this gets put into practice if we are going to do it retroactively.

Re: Proposal to retroactively mark materialized views experimental

2017-10-02 Thread Blake Eggleston

it:  
> in  
> > >> the development process.  
> > >>  
> > >> How does emitting a native protocol warning reduce visibility during  
> the  
> > >> development process? If you run CREATE MV and cqlsh then prints out a  
> > >> giant warning statement about how it is an experimental feature I  
> think  
> > >> that is pretty visible during development?  
> > >>  
> > >> I guess I can see just blocking new ones without a flag set, but we  
> need  
> > >> to be careful here. We need to make sure we don’t cause a problem for  
> > >> someone that is using them currently, even with all the edge cases  
> > issues  
> > >> they have now.  
> > >>  
> > >> -Jeremiah  
> > >>  
> > >>  
> > >>> On Oct 2, 2017, at 2:01 PM, Blake Eggleston   
> > >> wrote:  
> > >>>  
> > >>> Yeah, I'm not proposing that we disable MVs in existing clusters.  
> > >>>  
> > >>>  
> > >>> On October 2, 2017 at 10:58:11 AM, Aleksey Yeshchenko (  
> > alek...@apple.com)  
> > >> wrote:  
> > >>>  
> > >>> The idea is to check the flag in CreateViewStatement, so creation of  
> > new  
> > >> MVs doesn’t succeed without that flag flipped.  
> > >>>  
> > >>> Obviously, just disabling existing MVs working in a minor would be  
> > silly.  
> > >>>  
> > >>> As for the warning - yes, that should also be emitted.  
> Unconditionally.  
> > >>>  
> > >>> —  
> > >>> AY  
> > >>>  
> > >>> On 2 October 2017 at 18:18:52, Jeremiah D Jordan (  
> > >> jeremiah.jor...@gmail.com) wrote:  
> > >>>  
> > >>> These things are live on clusters right now, and I would not want  
> > >> someone to upgrade their cluster to a new *patch* release and suddenly  
> > >> something that may have been working for them now does not function.  
> > >> Anyway, we need to be careful about how this gets put into practice if  
> > we  
> > >> are going to do it retroactively.  
> > >>  
> > >>  
> > >> -  
> > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
> > >> For additional commands, e-mail: dev-h...@cassandra.apache.org  
> > >>  
> > >>  
> >  
> >  
> > -  
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
> > For additional commands, e-mail: dev-h...@cassandra.apache.org  
> >  
> >  
>

Re: Proposal to retroactively mark materialized views experimental

2017-10-02 Thread Blake Eggleston

Yes, I understand what you're saying. The points I'm making about logs still 
apply. It's possible for drivers and object mappers to handle queries and 
schema changes, and have developers rarely open cqlsh. It's also not uncommon 
for schema changes to be done by a different group than the developers writing 
the application.

On October 2, 2017 at 2:21:38 PM, Jeremiah D Jordan (jerem...@datastax.com) 
wrote:

Blake,  
We are not saying to just put something in logs, we are talking about the warn 
actually showing up in cqlsh.  
When you issue a native protocol warn cqlsh will print it out on the console in 
front of you in the results of the query.  
https://issues.apache.org/jira/browse/CASSANDRA-8930 
<https://issues.apache.org/jira/browse/CASSANDRA-8930>  

For example for SASI it would look something like:  


cqlsh:ks> CREATE CUSTOM INDEX ON sasi_table (c) USING 
'org.apache.cassandra.index.sasi.SASIIndex';  

Warnings :  
A SASI index was enabled for ‘ks.sasi_table'. SASI is still experimental, take 
extra caution when using it in production.  

cqlsh:ks>  

-Jeremiah  

> On Oct 2, 2017, at 5:05 PM, Blake Eggleston  wrote:  
>  
> The message isn't materially different, but it will reach fewer people, 
> later. People typically aren't as attentive to logs as they should be. 
> Developers finding out about new warnings in the logs later than they could 
> have, sometimes even after it's been deployed, is not uncommon. It's happened 
> to me. Requiring a flag will reach everyone trying to use MVs as soon as they 
> start developing against MVs. Logging a warning will reach a subset of users 
> at some point, hopefully. The only downside I can think of for the flag is 
> that it's not as polite.  
>  
> On October 2, 2017 at 1:16:10 PM, Josh McKenzie (jmcken...@apache.org) wrote: 
>  
>  
> "Nobody is talking about removing MVs."  
> Not precisely true for this email thread:  
>  
> "but should there be some point in the  
> future where we consider removing them from the code base unless they have  
> gotten significant improvement as well?"  
>  
> IMO a .yaml change requirement isn't materially different than barfing a  
> warning on someone's screen during the dev process when they use the DDL  
> for MV's. At the end of the day, it's just a question of how forceful you  
> want that messaging to be. If the cqlsh client prints 'THIS FEATURE IS NOT  
> READY' in big bold letters, that's not going to miscommunicate to a user  
> that 'feature X is ready' when it's not.  
>  
> Much like w/SASI, this is something that's in the code-base that for  
> certain use-cases apparently works just fine. Might be worth considering  
> the approach of making boundaries around those use-cases more rigid instead  
> of throwing the baby out with the bathwater.  
>  
> On Mon, Oct 2, 2017 at 3:32 PM, DuyHai Doan  wrote:  
>  
>> Ok so IF there is a flag to enable MV (à-la UDA/UDF in cassandra.yaml) then  
>> I'm fine with it. I initially understood that we wanted to disable it  
>> definitively. Maybe we should then add an explicit error message when MV is  
>> disabled and someone tries to use it, something like:  
>>  
>> "MV has been disabled, to enable it, turn on the flag  in  
>> cassandra.yaml" so users don't spend 3h searching around  
>>  
>>  
>> On Mon, Oct 2, 2017 at 9:07 PM, Jon Haddad  wrote:  
>>  
>>> There’s a big difference between removal of a protocol that every single  
>>> C* user had to use and disabling a feature which is objectively broken  
>> and  
>>> almost nobody is using. Nobody is talking about removing MVs. If you  
>> want  
>>> to use them you can enable them very trivially, but it should be an  
>>> explicit option because they really aren’t ready for general use.  
>>>  
>>> Claiming disabling by default == removal is not helpful to the  
>>> conversation and is very misleading.  
>>>  
>>> Let’s be practical here. The people that are most likely to put MVs in  
>>> production right now are people new to Cassandra that don’t know any  
>>> better. The people that *should* be using MVs are the contributors to  
>> the  
>>> project. People that actually wrote Cassandra code that can do a patch  
>> and  
>>> push it into prod, and get it submitted upstream when they fix something.  
>>> Yes, a lot of this stuff requires production usage to shake out the bugs,  
>>> that’s fine, but we shouldn’t lie to people and say “feature X is ready”  
>>> when it’s no

Re: Proposal to retroactively mark materialized views experimental

2017-10-03 Thread Blake Eggleston

The remaining issues are:

* There's no way to determine if a view is out of sync with the base table.
* If you do determine that a view is out of sync, the only way to fix it is to 
drop and rebuild the view.
* There are liveness issues with updates being reflected in the view.

On October 3, 2017 at 9:00:32 AM, Sylvain Lebresne (sylv...@datastax.com) wrote:

On Tue, Oct 3, 2017 at 5:54 PM, Aleksey Yeshchenko  wrote:  
> There are a couple compromise options here:  
>  
> a) Introduce the flag (enalbe_experimental_features, or maybe one per 
> experimental feature), set it to ‘false’ in the yaml, but have the default be 
> ‘true’. So that if you are upgrading from a previous minor to the next 
> without updating the yaml, you notice nothing.  
>  
> b) Introduce the flag in the minor, and set it to ‘true’ in the yaml in 3.0 
> and 3.11, but to ‘false’ in 4.0. So the operators and in general people who 
> know better can still disable it with one flip, but nobody would be affected 
> by it in a minor otherwise.  
>  
> B might be more correct, and I’m okay with it  

Does feel more correct to me as well  

> although I do feel that we are behaving irresponsibly as developers by 
> allowing MV creation by default in their current state  

You're giving little credit to the hard work that people have put into  
getting MV in a usable state. To quote Kurt's email:  

> And finally, back onto the original topic. I'm not convinced that MV's need  
> this treatment now. Zhao and Paulo (and others+reviewers) have made quite a  
> lot of fixes, granted there are still some outstanding bugs but the  
> majority of bad ones have been fixed in 3.11.1 and 3.0.15, the remaining  
> bugs mostly only affect views with a poor data model. Plus we've already  
> required the known broken components require a flag to be turned on.  

-  
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Cassandra pluggable storage engine (update)

2017-10-04 Thread Blake Eggleston

Hi Dikang,

Cool stuff. 2 questions. Based on your presentation at ngcc, it seems like 
rocks db stores things in byte order. Does this mean that you have code that 
makes each of the existing types byte comparable, or is clustering order 
implementation dependent? Also, I don't see anything in the draft api that 
seems to support splitting the data set into arbitrary categories (ie repaired 
and unrepaired data living in the same token range). Is support for incremental 
repair planned for v1?

Thanks,

Blake


On October 4, 2017 at 1:28:01 PM, Dikang Gu (dikan...@gmail.com) wrote:

Hello C* developers: 

In my previous email 
(https://www.mail-archive.com/dev@cassandra.apache.org/msg11024.html), I 
presented that Instagram was kicking off a project to make C*'s storage engine 
to be pluggable, as other modern databases, like mysql, mongoDB etc, so that 
users will be able to choose most suitable storage engine for different work 
load, or to use different features. In addition to that, a pluggable storage 
engine architecture will improve the modularity of the system, help to increase 
the testability and reliability of Cassandra.

After months of development and testing, we'd like to share the work we have 
done, including the first(draft) version of the C* storage engine API, and the 
first version of the RocksDB based storage engine.




For the C* storage engine API, here is the draft version we proposed, 
https://docs.google.com/document/d/1PxYm9oXW2jJtSDiZ-SR9O20jud_0jnA-mW7ttp2dVmk/edit.
 It contains the APIs for read/write requests, streaming, and table management. 
The storage engine related functionalities, like data encoding/decoding format, 
on-disk data read/write, compaction, etc, will be taken care by the storage 
engine implementation.

Each storage engine is a class with each instance of the class is stored in the 
Keyspace instance. So all the column families within a keyspace will share one 
storage engine instance.

Once a storage engine instance is created, Cassandra sever issues commands to 
the engine instance to performance data storage and retrieval tasks such as 
opening a column family, managing column families and streaming.

How to config storage engine for different keyspaces? It's still open for 
discussion. One proposal is that we can add the storage engine option in the 
create keyspace cql command, and potentially we can overwrite the option per C* 
node in its config file.

Under that API, we implemented a new storage engine, based on RocksDB, called 
RocksEngine. In long term, we want to support most of C* existing features in 
RocksEngine, and we want to build it in a progressive manner. For the first 
version of the RocksDBEngine, we support following features:
Most of non-nested data types
Table schema
Point query
Range query
Mutations
Timestamp
TTL
Deletions/Cell tombstones
Streaming
We do not supported following features in first version yet:
Multi-partition query
Nested data types
Counters
Range tombstone
Materialized views
Secondary indexes
SASI
Repair
At this moment, we've implemented the V1 features, and deployed it to our 
shadow cluster. Using shadowing traffic of our production use cases, we saw ~3X 
P99 read latency drop, compared to our C* 2.2 prod clusters. Here are some 
detailed metrics: 
https://docs.google.com/document/d/1DojHPteDPSphO0_N2meZ3zkmqlidRwwe_cJpsXLcp10.

So if you need the features in existing storage engine, please keep using the 
existing storage engine. If you want to have a more predictable and lower read 
latency, also the features supported by RocksEngine are enough for your use 
cases, then RocksEngine could be a fit for you.

The work is 1% finished, and we want to work together with community to make it 
happen. We presented the work in NGCC last week, and also pushed the beta 
version of the pluggable storage engine to Instagram github Cassandra repo, 
rocks_3.0 branch (https://github.com/Instagram/cassandra/tree/rocks_3.0), which 
is based on C* 3.0.12, please feel free to play with it! You can download it 
and follow the instructions 
(https://github.com/Instagram/cassandra/blob/rocks_3.0/StorageEngine.md) to try 
it out in your test environment, your feedback will be very valuable to us.

Thanks
Dikang.

Re: Reviewer for LWT bug

2017-12-19 Thread Blake Eggleston

I'll take it


On December 17, 2017 at 3:48:04 PM, kurt greaves (k...@instaclustr.com) wrote:

Need a reviewer for CASSANDRA-14087 
 

Pretty straight forward, we just get an NPE when comparing against a frozen 
collection which is null and we expect a specific collection. Anyone with a 
bit of knowledge around ColumnCondition.java should be able to review. 

Patch is for 3.0 but should apply cleanly to 3.11. Trunk is unaffected. 

Cheers, 
Kurt

Re: Expensive metrics?

2018-02-22 Thread Blake Eggleston

Hi Micke,

This is really cool, thanks for taking the time to investigate this. I believe 
the metrics around memtable insert time come in handy in identifying high 
partition contention in the memtable. I know I've been involved in a situation 
over the past year where we got actionable info from this metric. Reducing 
resolution to milliseconds is probably a no go since most things in this path 
should complete in less than a millisecond. 

Revisiting the use of the codahale metrics in the hot path like this definitely 
seems like a good idea though. I don't think it's been something we've talked 
about a lot, and it definitely looks like we could benefit from using something 
more specialized here. I think it's worth doing, especially since there won't 
be any major changes to how we do threading in 4.0. It's probably also worth 
opening a JIRA and investigating the calls to nano time. We at least need 
microsecond resolution here, and there could be something we haven't thought 
of? It's worth a look at least.

Thanks,

Blake

On 2/22/18, 6:10 AM, "Michael Burman"  wrote:

Hi,

I wanted to get some input from the mailing list before making a JIRA 
and potential fixes. I'll touch the performance more on latter part, but 
there's one important question regarding the write latency metric 
recording place. Currently we measure the writeLatency (and metric write 
sampler..) in ColumnFamilyStore.apply() and this is also the metric we 
then replicate to Keyspace metrics etc.

This is an odd place for writeLatency. Not to mention it is in a 
hot-path of Memtable-modifications, but it also does not measure the 
real write latency, since it completely ignores the CommitLog latency in 
that same process. Is the intention really to measure 
Memtable-modification latency only or the actual write latencies?

Then the real issue.. this single metric is a cause of huge overhead in 
Memtable processing. There are several metrics / events in the CFS apply 
method, including metric sampler, storageHook reportWrite, 
colUpdateTimeDeltaHistogram and metric.writeLatency. These are not free 
at all when it comes to the processing. I made a small JMH benchmark 
here: https://gist.github.com/burmanm/b5b284bc9f1d410b1d635f6d3dac3ade 
that I'll be referring to.

The most offending of all these metrics is the writeLatency metric. What 
it does is update the latency in codahale's timer, doing a histogram 
update and then going through all the parent metrics also which update 
the keyspace writeLatency and globalWriteLatency. When measuring the 
performance of Memtable.put with parameter of 1 partition (to reduce the 
ConcurrentSkipListMap search speed impact - that's separate issue and 
takes a little bit longer to solve although I've started to prototype 
something..) on my machine I see 1.3M/s performance with the metric and 
when it is disabled the performance climbs to 4M/s. So the overhead for 
this single metric is ~2/3 of total performance. That's insane. My perf 
stats indicate that the CPU is starved as it can't get enough data in.

Removing the replication from TableMetrics to the Keyspace & global 
latencies in the write time (and doing this when metrics are requested 
instead) improves the performance to 2.1M/s on my machine. It's an 
improvement, but it's still huge amount. Even when we pressure the 
ConcurrentSkipListMap with 100 000 partitions in one active Memtable, 
the performance drops by about ~40% due to this metric, so it's never free.

i did not find any discussion replacing the metric processing with 
something faster, so has this been considered before? At least for these 
performance sensitive ones. The other issue is obviously the use of 
System.nanotime() which by itself is very slow (two System.nanotime() 
calls eat another ~1M/s from the performance)

My personal quick fix would be to move writeLatency to Keyspace.apply, 
change write time aggregates to read time processing (metrics are read 
less often than we write data) and maybe even reduce the nanotime -> 
currentTimeMillis (even given it's relative lack of precision). That is 
- if these metrics make any sense at all at CFS level? Maybe these 
should be measured from the network processing time (including all the 
deserializations and such) ? Especially if at some point the smarter 
threading / eventlooping changes go forward (in which case they might 
sleep at some "queue" for a while).

   - Micke


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org





-
To unsubscribe, e-ma

Re: A JIRA proposing a seperate repository for the online documentation

2018-03-16 Thread Blake Eggleston

It would probably be more productive to list some specific concerns you have 
with Hugo. Then explain why you think they make using it a bad idea. Then offer 
some alternatives.

On 3/16/18, 1:18 PM, "Kenneth Brotman"  wrote:

Thanks for that Eric Evans.  

I'm not sure Hugo is the way to go.  I don't see how I would generate the 
quality of work I would want with it.  It seems like another example of coders 
learning and using a more complicated program to generate the code they could 
have already generated - it’s a disease in the I.T. industry right now.  But I 
could be wrong.  

Here's the thing.  I've been spending a lot of my time for the past three 
weeks now trying to help with the website.  That is a tiny website.  I've never 
worked with a website that tiny.  Bear with me.  

I'm studying Jeff Carpenter and Eben Hewitt's book: Cassandra The 
Definitive Guide 
https://www.amazon.com/Cassandra-Definitive-Guide-Distributed-Scale/dp/1491933666/ref=sr_1_1?ie=UTF8&qid=1521230539&sr=8-1&keywords=cassandra+the+definitive+guide
 and have already have a terrible itch to start contributing some code.  I just 
want to get set up to do that.  The book seems to be a good way to get familiar 
with the internals and the code of Cassandra.

I can only do so much for the group at one time just like anyone else.  
I'll only do top quality work.  I'll only be a part of top quality work.  It 
could be that I won't feel comfortable with what the group wants to do for the 
website.  

Please keep working on it as it is really embarrassing, terrible, 
substandard unacceptable beneath professional standards...

I will contribute if it's possible for me to do so. Let's see what we 
decide to do going forward for the website.

Kenneth Brotman
(Cassandra coder?) 

-Original Message-
From: Eric Evans [mailto:john.eric.ev...@gmail.com] 
Sent: Friday, March 16, 2018 7:59 AM
To: dev@cassandra.apache.org
Subject: Re: A JIRA proposing a seperate repository for the online 
documentation

On Thu, Mar 15, 2018 at 11:40 AM, Kenneth Brotman 
 wrote:
> Well pickle my cucumbers Jon!  It's good to know that you have experience 
with Hugo, see it as a good fit and that all has been well.  I look forward to 
the jira epic!
>
> How exactly does the group make such a decision:  Call for final 
discussion?  Call for vote?  Wait for the PMC to vote?

Good question!

Decisions like this are made by consensus; As the person who is attempting 
to do something, you discuss your ideas with the group, listen to the feedback 
of others, and develop consensus around a direction.

This is more difficult than demanding your way, or having someone(s) in a 
position of absolute power tell you what you can and cannot do, but the result 
is better.

> -Original Message-
> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon 
> Haddad
> Sent: Thursday, March 15, 2018 9:24 AM
> To: dev@cassandra.apache.org
> Subject: Re: A JIRA proposing a seperate repository for the online 
> documentation
>
> Murukesh is correct on a very useable, pretty standard process of 
multi-versioned docs.
>
> I’ll put my thoughts in a JIRA epic tonight.  I’ll be a multi-phase 
process.  Also correct in that I’d like us to move to Hugo for the site, I’d 
like us to have a unified system between the site & the docs, and hugo has been 
excellent. We run the reaper site & docs off hugo, it works well.  We just 
don’t do multi-versions (because we don’t support multiple): 
https://github.com/thelastpickle/cassandra-reaper/tree/master/src/docs 
.
>
> Jon
>
>> On Mar 15, 2018, at 8:57 AM, Murukesh Mohanan 
 wrote:
>>
>> On Fri, Mar 16, 2018 at 0:19 Kenneth Brotman 
>> mailto:kenbrot...@yahoo.com.invalid>>
>> wrote:
>>
>>> Help me out here.  I could have had a website with support for more 
>>> than one version done several different ways by now.
>>>
>>> A website with several versions of documentation is going to have 
>>> sub-directories for each version of documentation obviously.  I've 
>>> offered to create those sub-directories under the "doc" folder of 
>>> the current repository; and I've offered to move the online 
>>> documentation to a separate repository and have the sub-directories 
>>> there.  Both were shot down.  Is there a third way?  If so please just 
spill the beans.
>>>
>>
>> There is. Note that the website is an independent repository. So to 
>> host docs for multiple versions, only the website's repository (or 
>> rather, the final built contents) needs multiple directories. You can 
>> just checkout each branch or tag, generate the docs, make a directory 
>> for that branch or tag in

Repair scheduling tools

2018-04-03 Thread Blake Eggleston

Hi dev@,

 

The question of the best way to schedule repairs came up on CASSANDRA-14346, 
and I thought it would be good to bring up the idea of an external tool on the 
dev list.

 

Cassandra lacks any sort of tools for automating routine tasks that are 
required for running clusters, specifically repair. Regular repair is a must 
for most clusters, like compaction. This means that, especially as far as 
eventual consistency is concerned, Cassandra isn’t totally functional out of 
the box. Operators either need to find a 3rd party solution or implement one 
themselves. Adding this to Cassandra would make it easier to use.

 

Is this something we should be doing? If so, what should it look like?

 

Personally, I feel like this is a pretty big gap in the project and would like 
to see an out of process tool offered. Ideally, Cassandra would just take care 
of itself, but writing a distributed repair scheduler that you trust to run in 
production is a lot harder than writing a single process management application 
that can failover.

 

Any thoughts on this?

 

Thanks,

 

Blake

Re: Roadmap for 4.0

2018-04-04 Thread Blake Eggleston

+1

On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:

Earlier than I’d have personally picked, but I’m +1 too



-- 
Jeff Jirsa


> On Apr 4, 2018, at 5:06 PM, Nate McCall  wrote:
> 
> Top-posting as I think this summary is on point - thanks, Scott! (And
> great to have you back, btw).
> 
> It feels to me like we are coalescing on two points:
> 1. June 1 as a freeze for alpha
> 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> implied by such before a GA)
> 
> How do folks feel about the above points?
> 
> 
>> Re-raising a point made earlier in the thread by Jeff and affirmed by 
Josh:
>> 
>> –––
>> Jeff:
 A hard date for a feature freeze makes sense, a hard date for a release
 does not.
>> 
>> Josh:
>>> Strongly agree. We should also collectively define what "Done" looks 
like
>>> post freeze so we don't end up in bike-shedding hell like we have in the
>>> past.
>> –––
>> 
>> Another way of saying this: ensuring that the 4.0 release is of high 
quality is more important than cutting the release on a specific date.
>> 
>> If we adopt Sylvain's suggestion of freezing features on a "feature 
complete" date (modulo a "definition of done" as Josh suggested), that will 
help us align toward the polish, performance work, and dog-fooding needed to 
feel great about shipping 4.0. It's a good time to start thinking about the 
approaches to testing, profiling, and dog-fooding various contributors will 
want to take on before release.
>> 
>> I love how Ben put it:
>> 
>>> An "exciting" 4.0 release to me is one that is stable and usable
>>> with no perf regressions on day 1 and includes some of the big
>>> internal changes mentioned previously.
>>> 
>>> This will set the community up well for some awesome and exciting
>>> stuff that will still be in the pipeline if it doesn't make it to 4.0.
>> 
>> That sounds great to me, too.
>> 
>> – Scott
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Roadmap for 4.0

2018-04-11 Thread Blake Eggleston

I agree that not releasing semi-regularly is not good for the project. I think 
our habit of releasing half working software is much worse though. Our 
testing/stability story is not iron clad. I really think the bar for releasing 
4.0 should be that the people in this thread are running the code in 
production, recommending their customers run it in production, or offering and 
supporting it as part of their cloud service.

In that context, the argument for waiting for some features is less about 
trying to do all the things and more about making 4.0 something worth the time 
and expense of validating for production.

On 4/11/18, 1:06 AM, "Sylvain Lebresne"  wrote:

On Wed, Apr 11, 2018 at 12:35 AM Jeff Jirsa  wrote:

> Seriously, what's the rush to branch? Do we all love merging so much we
> want to do a few more times just for the sake of merging? If nothing
> diverges, there's nothing gained from the branch, and if it did diverge, 
we
> add work for no real gain.
>

Again, to me, the "rush" is that 1) there is tons of changes sitting in
trunk
that some user (_not all_, granted)[1], especially new ones, would likely
benefits, and sooner for those is better than later, 2) we want to favor
release stability and we *know* from years of experience (and frankly,
common
sense) that the bigger the release is, the harder it is to test it/ensuring
overall stability[2] and 3) not having major releases for years[3] is
impacting at least the perceived dynamism/liveness of the project to
external
actors (prospective new user come in mind here, but not only) and that's
simply bad for the project.

And having listed arguments for a soon freeze/not accumulating much more
before release, I'd like to reverse the question to you: what are the big
downsides of not doing that? Are we really that hung up on our own
developers
comfort that the annoyance of a bit more merging trumps the arguments above?

Anyway, the reasons above make me thing that it's better _for the project_
to freeze 4.0 soon, which doesn't exclude a "short" cycle for the following
major (where my definition of short here is something like 6-8 months), and
I'm happy to decide to make 4.0 a non-mandatory upgrade to whatever
comes next so that folks that prefer upgrading rarely can simply skip it and
go to the next one. Likely nobody will die if we wait more though, and it's
clear it will make a few people here more happy if we do, but I believe the
project as a whole will be a bit worst off, that's all.

--
Sylvain

[1]: I'll note that I don't deny upgrading is huge deal for some users, but
let's not skew arguments too much based on any one user interest. For many
users, upgrading even every year to get improvements is still considered as
a
good deal, and that's not counting new users for which it's super
frustrating
to miss out on improvements because we release major only every 2+ years.
[2]: I'll be clear: I will simply not buy anyone argument that "we'll do
so much better testing this time" on face value. Not anymore. If you want to
use that argument to sell having bigger releases, then prove it first. Let's
do reasonably sized 4.0 and 4.1/5.0 and prove that our testing/stability
story
is iron clad now, and then for 4.2/6.0 I'll be willing to agree that making
bigger release may not impact stability too much.
[3]: Conservative estimate, if we do care about stable releases as we all
seem
to, even if we were to freeze June 1, we will almost surely not release
before
October/November, which will be ~1.3 year since the last major release
(again,
that's the conservative estimate). If we push a few months to get some big
complex feature in, not only this push the freeze of those few months, but
will also require more testing, so we're looking at 2+ years, with a
possibly
large '+'.

>
> Beyond that, I still don't like June 1. Validating releases is hard. It
> sounds easy to drop a 4.1 and ask people to validate again, but it's a 
hell
> of a lot harder than it sounds. I'm not saying I'm a hard -1, but I really
> think it's too soon. 50'ish days is too short to draw a line in the sand,
> especially as people balance work obligations with Cassandra feature
> development.
>
>
>
>
> On Tue, Apr 10, 2018 at 3:18 PM, Nate McCall  wrote:
>
> > A lot of good points and everyone's input is really appreciated.
> >
> > So it sounds like we are building consensus towards June 1 for 4.0
> > branch point/feature freeze and the goal is stability. (No one has
> > come with a hard NO anyway).
> >
> > I want to reiterate Sylvain's point that we can do whatever we want in
> > terms of dropping a new feature 4.1/5.0 (or

Re: Repair scheduling tools

2018-04-16 Thread Blake Eggleston

ven non-overlapping ranges at the same time. That lets people
>> experiment
>> > > with and quickly/safely/easily iterate on different scheduling
>> strategies
>> > > in the short term, and long-term those strategies can be integrated
>> into a
>> > > built-in scheduler
>> > >
>> > > On the subject of scheduling, I think adjusting
>> parallelism/aggression with
>> > > a possible whitelist or blacklist would be a lot more useful than a
>> "time
>> > > between repairs". That is, if repairs run for a few hours then don't
>> run
>> > > for a few (somewhat hard-to-predict) hours, I still have to size the
>> > > cluster for the load when the repairs are running. The only reason I
>> can
>> > > think of for an interval between repairs is to allow re-compaction
>> from
>> > > repair anticompactions, and subrange repairs seem to eliminate this.
>> Even
>> > > if they didn't, a more direct method along the lines of "don't repair
>> when
>> > > the compaction queue is too long" might make more sense. Blacklisted
>> > > timeslots might be useful for avoiding peak time or batch jobs, but
>> only if
>> > > they can be specified for consistent time-of-day intervals instead of
>> > > unpredictable lulls between repairs.
>> > >
>> > > I really like the idea of automatically adjusting gc_grace_seconds
>> based on
>> > > repair state. The only_purge_repaired_tombstones option fixes this
>> > > elegantly for sequential/incremental repairs on STCS, but not for
>> subrange
>> > > repairs or LCS (unless a scheduler gains the ability somehow to
>> determine
>> > > that every subrange in an sstable has been repaired and mark it
>> > > accordingly?)
>> > >
>> > >
>> > > On 2018/04/03 17:48:14, Blake Eggleston  wrote:
>> > > > Hi dev@,
>> > > >
>> > > > >
>> > > >
>> > > > The question of the best way to schedule repairs came up on
>> > > CASSANDRA-14346, and I thought it would be good to bring up the idea
>> of an
>> > > external tool on the dev list.
>> > > >
>> > > > >
>> > > >
>> > > > Cassandra lacks any sort of tools for automating routine tasks that
>> are
>> > > required for running clusters, specifically repair. Regular repair is
>> a
>> > > must for most clusters, like compaction. This means that, especially
>> as far
>> > > as eventual consistency is concerned, Cassandra isn’t totally
>> functional
>> > > out of the box. Operators either need to find a 3rd party solution or
>> > > implement one themselves. Adding this to Cassandra would make it
>> easier to
>> > > use.
>> > > >
>> > > > >
>> > > >
>> > > > Is this something we should be doing? If so, what should it look
>> like?
>> > > >
>> > > > >
>> > > >
>> > > > Personally, I feel like this is a pretty big gap in the project and
>> would
>> > > like to see an out of process tool offered. Ideally, Cassandra would
>> just
>> > > take care of itself, but writing a distributed repair scheduler that
>> you
>> > > trust to run in production is a lot harder than writing a single
>> process
>> > > management application that can failover.
>> > > >
>> > > > >
>> > > >
>> > > > Any thoughts on this?
>> > > >
>> > > > >
>> > > >
>> > > > Thanks,
>> > > >
>> > > > >
>> > > >
>> > > > Blake
>> > > >
>> > > >
>> > >
>>
>
>




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Cassandra + RAMP transactions

2015-02-09 Thread Blake Eggleston

I've been working on the epaxos implementation. You can take a look at the
ticket here: https://issues.apache.org/jira/browse/CASSANDRA-6246


On Mon, Feb 9, 2015 at 8:51 PM,  wrote:

> Hi Jatin,
>
> I believe there is a lot of interest in developing RAMP transactions for
> Cassandra, but no concrete activity yet.
> https://issues.apache.org/jira/browse/CASSANDRA-7056
>
> -Tupshin
>
> On Mon, Feb 9, 2015, at 11:30 PM, Jatin Ganhotra wrote:
> > Hi,
> >
> > Please forgive me if this is not the right forum for this query.
> > I recently read an article where Jonathan Ellis mentioned that ePaxos and
> > RAMP transactions should be soon added to Cassandra.
> >
> > Is there any work going on in this direction?
> >
> > Thanks
> > —
> > Jatin Ganhotra
> > Graduate Student, Computer Science
> > University of Illinois at Urbana Champaign
> > http://jatinganhotra.com
> > http://linkedin.com/in/jatinganhotra
>

Re: [VOTE] Release Apache Cassandra 4.0-rc1

2021-03-30 Thread Blake Eggleston

+1

> On Mar 29, 2021, at 6:05 AM, Mick Semb Wever  wrote:
> 
> Proposing the test build of Cassandra 4.0-rc1 for release.
> 
> sha1: 2facbc97ea215faef1735d9a3d5697162f61bc8c
> Git:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-rc1-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1234/org/apache/cassandra/cassandra-all/4.0-rc1/
> 
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/4.0-rc1/
> 
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
> 
> Known issues with this release, that are planned to be fixed in 4.0-rc2, are
> - four files were missing copyright headers,
> - LICENSE and NOTICE contain additional unneeded information,
> - jar files under lib/ in the source artefact.
> 
> These issues are actively being worked on, along with our expectations that
> the ASF makes the policy around them more explicit so it is clear exactly
> what is required of us.
> 
> 
> [1]: CHANGES.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-rc1-tentative
> [2]: NEWS.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0-rc1-tentative


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 4.0-rc1 (take2)

2021-04-21 Thread Blake Eggleston

+1

> On Apr 21, 2021, at 2:25 PM, Scott Andreas  wrote:
> 
> +1nb, thank you!
> 
> 
> From: Ekaterina Dimitrova 
> Sent: Wednesday, April 21, 2021 12:23 PM
> To: dev@cassandra.apache.org
> Subject: Re: [VOTE] Release Apache Cassandra 4.0-rc1 (take2)
> 
> +1 and thanks everyone for all the hard work
> 
> Checked:
> - gpg signatures
> - sha checksums
> - binary convenience artifact runs
> - src convenience artifacts builds with one command, and runs
> - deb and rpm install and run
> 
>> On Wed, 21 Apr 2021 at 14:57, Michael Semb Wever  wrote:
>> 
>> 
>>> The vote will be open for 72 hours (longer if needed). Everyone who
>>> has tested the build is invited to vote. Votes by PMC members are
>>> considered binding. A vote passes if there are at least three binding
>>> +1s and no -1's.
>> 
>> 
>> +1
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 4.0.0 (take2)

2021-07-14 Thread Blake Eggleston

+1

> On Jul 14, 2021, at 8:21 AM, Aleksey Yeschenko  wrote:
> 
> +1
> 
>> On 14 Jul 2021, at 15:37, Jonathan Ellis  wrote:
>> 
>> +1
>> 
>>> On Tue, Jul 13, 2021 at 5:14 PM Mick Semb Wever  wrote:
>>> 
>>> Proposing the test build of Cassandra 4.0.0 for release.
>>> 
>>> sha1: 924bf92fab1820942137138c779004acaf834187
>>> Git:
>>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0.0-tentative
>>> Maven Artifacts:
>>> 
>>> https://repository.apache.org/content/repositories/orgapachecassandra-1242/org/apache/cassandra/cassandra-all/4.0.0/
>>> 
>>> The Source and Build Artifacts, and the Debian and RPM packages and
>>> repositories, are available here:
>>> https://dist.apache.org/repos/dist/dev/cassandra/4.0.0/
>>> 
>>> The vote will be open for 72 hours (longer if needed). Everyone who
>>> has tested the build is invited to vote. Votes by PMC members are
>>> considered binding. A vote passes if there are at least three binding
>>> +1s and no -1's.
>>> 
>>> [1]: CHANGES.txt:
>>> 
>>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.0-tentative
>>> [2]: NEWS.txt:
>>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.0-tentative
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>> 
>> -- 
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

1 2 >

1 - 100 of 129 matches

Mail list logo