Re: scheduled work compaction strategy

2018-02-16 Thread Jeff Jirsa
There’s a company using TWCS in this config - I’m not going to out them, but I 
think they do it (or used to) with aggressive tombstone sub properties. They 
may have since extended/enhanced it somewhat.

-- 
Jeff Jirsa


> On Feb 16, 2018, at 2:24 PM, Carl Mueller  
> wrote:
> 
> Oh and as a further refinement outside of our use case.
> 
> If we could group/organize the sstables by the rowkey time value or
> inherent TTL value, the naive version would be evenly distributed buckets
> into the future.
> 
> But many/most data patterns like this have "busy" data in the near term.
> Far out scheduled stuff would be more sparse. In our case, 50% of the data
> is in the first 12 hours, 50% of the remaining in the next day or two, 50%
> of the remaining in the next week, etc etc.
> 
> So we could have a "long term" general bucket to take data far in the
> future. But here's the thing, if we could actively process the "long term"
> sstable on a regular basis into two sstables: the stuff that is still "long
> term" and sstables for the "near term", that could solve many general
> cases. The "long term" bucket could even be STCS by default, and as the
> near term comes into play, that is considered a different "level".
> 
> Of course all this relies on the ability to look at the data in the rowkey
> or the TTL associated with the row.
> 
> On Fri, Feb 16, 2018 at 4:17 PM, Carl Mueller 
> wrote:
> 
>> We have a scheduler app here at smartthings, where we track per-second
>> tasks to be executed.
>> 
>> These are all TTL'd to be destroyed after the second the event was
>> registered with has passed.
>> 
>> If the scheduling window was sufficiently small, say, 1 day, we could
>> probably use a time window compaction strategy with this. But the window is
>> one-two years worth of adhoc event registration per the contract.
>> 
>> Thus, the intermingling of all this data TTL'ing at the different times
>> since they are registered at different times means the sstables are not
>> written with data TTLing in the same rough time period. If they were, then
>> compaction would be a relatively easy process since the entire sstable
>> would tombstone.
>> 
>> We could kind of do this by doing sharded tables for the time periods and
>> rotating the shards for duty, and truncating them as they are recycled.
>> 
>> But an elegant way would be a custom compaction strategy that would
>> "window" the data into clustered sstables that could be compacted with
>> other similarly time bucketed sstables.
>> 
>> This would require visibility into the rowkey when it came time to convert
>> the memtable data to sstables. Is that even possible with compaction
>> schemes? We would provide a requirement that the time-based data would be
>> in the row key if it is a composite row key, making it required.
>> 
>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: scheduled work compaction strategy

2018-02-16 Thread Carl Mueller
An even MORE complicated version could address the case where the TTLs are
at the column key rather than the row key. That would divide the row across
sstables by the rowkey, in essence the opposite of what most compaction
strategies try to do: eventually centralize the data for a rowkey in one
sstable. This strategy assumes TTLs would be cleaning up these row
fragments, so that the distribution of the data across many many sstables
wouldn't pollute the bloom filters too much.

On Fri, Feb 16, 2018 at 4:24 PM, Carl Mueller 
wrote:

> Oh and as a further refinement outside of our use case.
>
> If we could group/organize the sstables by the rowkey time value or
> inherent TTL value, the naive version would be evenly distributed buckets
> into the future.
>
> But many/most data patterns like this have "busy" data in the near term.
> Far out scheduled stuff would be more sparse. In our case, 50% of the data
> is in the first 12 hours, 50% of the remaining in the next day or two, 50%
> of the remaining in the next week, etc etc.
>
> So we could have a "long term" general bucket to take data far in the
> future. But here's the thing, if we could actively process the "long term"
> sstable on a regular basis into two sstables: the stuff that is still "long
> term" and sstables for the "near term", that could solve many general
> cases. The "long term" bucket could even be STCS by default, and as the
> near term comes into play, that is considered a different "level".
>
> Of course all this relies on the ability to look at the data in the rowkey
> or the TTL associated with the row.
>
> On Fri, Feb 16, 2018 at 4:17 PM, Carl Mueller <
> carl.muel...@smartthings.com> wrote:
>
>> We have a scheduler app here at smartthings, where we track per-second
>> tasks to be executed.
>>
>> These are all TTL'd to be destroyed after the second the event was
>> registered with has passed.
>>
>> If the scheduling window was sufficiently small, say, 1 day, we could
>> probably use a time window compaction strategy with this. But the window is
>> one-two years worth of adhoc event registration per the contract.
>>
>> Thus, the intermingling of all this data TTL'ing at the different times
>> since they are registered at different times means the sstables are not
>> written with data TTLing in the same rough time period. If they were, then
>> compaction would be a relatively easy process since the entire sstable
>> would tombstone.
>>
>> We could kind of do this by doing sharded tables for the time periods and
>> rotating the shards for duty, and truncating them as they are recycled.
>>
>> But an elegant way would be a custom compaction strategy that would
>> "window" the data into clustered sstables that could be compacted with
>> other similarly time bucketed sstables.
>>
>> This would require visibility into the rowkey when it came time to
>> convert the memtable data to sstables. Is that even possible with
>> compaction schemes? We would provide a requirement that the time-based data
>> would be in the row key if it is a composite row key, making it required.
>>
>>
>>
>


Re: scheduled work compaction strategy

2018-02-16 Thread Carl Mueller
Oh and as a further refinement outside of our use case.

If we could group/organize the sstables by the rowkey time value or
inherent TTL value, the naive version would be evenly distributed buckets
into the future.

But many/most data patterns like this have "busy" data in the near term.
Far out scheduled stuff would be more sparse. In our case, 50% of the data
is in the first 12 hours, 50% of the remaining in the next day or two, 50%
of the remaining in the next week, etc etc.

So we could have a "long term" general bucket to take data far in the
future. But here's the thing, if we could actively process the "long term"
sstable on a regular basis into two sstables: the stuff that is still "long
term" and sstables for the "near term", that could solve many general
cases. The "long term" bucket could even be STCS by default, and as the
near term comes into play, that is considered a different "level".

Of course all this relies on the ability to look at the data in the rowkey
or the TTL associated with the row.

On Fri, Feb 16, 2018 at 4:17 PM, Carl Mueller 
wrote:

> We have a scheduler app here at smartthings, where we track per-second
> tasks to be executed.
>
> These are all TTL'd to be destroyed after the second the event was
> registered with has passed.
>
> If the scheduling window was sufficiently small, say, 1 day, we could
> probably use a time window compaction strategy with this. But the window is
> one-two years worth of adhoc event registration per the contract.
>
> Thus, the intermingling of all this data TTL'ing at the different times
> since they are registered at different times means the sstables are not
> written with data TTLing in the same rough time period. If they were, then
> compaction would be a relatively easy process since the entire sstable
> would tombstone.
>
> We could kind of do this by doing sharded tables for the time periods and
> rotating the shards for duty, and truncating them as they are recycled.
>
> But an elegant way would be a custom compaction strategy that would
> "window" the data into clustered sstables that could be compacted with
> other similarly time bucketed sstables.
>
> This would require visibility into the rowkey when it came time to convert
> the memtable data to sstables. Is that even possible with compaction
> schemes? We would provide a requirement that the time-based data would be
> in the row key if it is a composite row key, making it required.
>
>
>


scheduled work compaction strategy

2018-02-16 Thread Carl Mueller
We have a scheduler app here at smartthings, where we track per-second
tasks to be executed.

These are all TTL'd to be destroyed after the second the event was
registered with has passed.

If the scheduling window was sufficiently small, say, 1 day, we could
probably use a time window compaction strategy with this. But the window is
one-two years worth of adhoc event registration per the contract.

Thus, the intermingling of all this data TTL'ing at the different times
since they are registered at different times means the sstables are not
written with data TTLing in the same rough time period. If they were, then
compaction would be a relatively easy process since the entire sstable
would tombstone.

We could kind of do this by doing sharded tables for the time periods and
rotating the shards for duty, and truncating them as they are recycled.

But an elegant way would be a custom compaction strategy that would
"window" the data into clustered sstables that could be compacted with
other similarly time bucketed sstables.

This would require visibility into the rowkey when it came time to convert
the memtable data to sstables. Is that even possible with compaction
schemes? We would provide a requirement that the time-based data would be
in the row key if it is a composite row key, making it required.


Re: row tombstones as a separate sstable citizen

2018-02-16 Thread Carl Mueller
re: the tombstone sstables being read-only inputs to compaction, there
would be one case the non-tombstone sstables would input to the compaction
of the row tombstones: when the row no longer exists in any of the data
sstables with respect to the row tombstone timestamp.

There may be other opportunities for simplified processing of the row
tombstone sstables, as they are pure key-value (row key : deletion flag)
rather than columnar data. We may be able to offer the option of a memory
map if the row tombstones fit in a sufficiently small space. The "row
cache" may be wayyy simpler for these than the general row cache
difficulties for cassandra data. Those caches could only be loaded during
compaction operations too.

On Thu, Feb 15, 2018 at 11:24 AM, Jeff Jirsa  wrote:

> Worth a JIRA, yes
>
>
> On Wed, Feb 14, 2018 at 9:45 AM, Carl Mueller <
> carl.muel...@smartthings.com>
> wrote:
>
> > So is this at least a decent candidate for a feature request ticket?
> >
> >
> > On Tue, Feb 13, 2018 at 8:09 PM, Carl Mueller <
> > carl.muel...@smartthings.com>
> > wrote:
> >
> > > I'm particularly interested in getting the tombstones to "promote" up
> the
> > > levels of LCS more quickly. Currently they get attached at the low
> level
> > > and don't propagate up to higher levels until enough activity at a
> lower
> > > level promotes the data. Meanwhile, LCS means compactions can occur in
> > > parallel at each level. So row tombstones in their own sstable could be
> > up
> > > promoted the LCS levels preferentially before normal processes would
> move
> > > them up.
> > >
> > > So if the delete-only sstables could move up more quickly, the
> compaction
> > > at the levels would happen more quickly.
> > >
> > > The threshold stuff is nice if I read 7019 correctly, but what is the %
> > > there? % of rows? % of columns? or % of the size of the sstable? Row
> > > tombstones are pretty compact being just the rowkey and the tombstone
> > > marker. So if 7019 is triggered at 10% of the sstable size, even a
> > crapton
> > > of tombstones deleting practially the entire database would only be a
> > small
> > > % size of the sstable.
> > >
> > > Since the row tombstones are so compact, that's why I think they are
> good
> > > candidates for special handling.
> > >
> > > On Tue, Feb 13, 2018 at 5:22 PM, J. D. Jordan <
> jeremiah.jor...@gmail.com
> > >
> > > wrote:
> > >
> > >> Have you taken a look at the new stuff introduced by
> > >> https://issues.apache.org/jira/browse/CASSANDRA-7019 ?  I think it
> may
> > >> go a ways to reducing the need for something complicated like this.
> > >> Though it is an interesting idea as special handling for bulk deletes.
> > >> If they were truly just sstables that only contained deletes the logic
> > from
> > >> 7109 would probably go a long ways. Though if you are bulk inserting
> > >> deletes that is what you would end up with, so maybe it already works.
> > >>
> > >> -Jeremiah
> > >>
> > >> > On Feb 13, 2018, at 6:04 PM, Jeff Jirsa  wrote:
> > >> >
> > >> > On Tue, Feb 13, 2018 at 2:38 PM, Carl Mueller <
> > >> carl.muel...@smartthings.com>
> > >> > wrote:
> > >> >
> > >> >> In process of doing my second major data purge from a cassandra
> > system.
> > >> >>
> > >> >> Almost all of my purging is done via row tombstones. While
> performing
> > >> this
> > >> >> the second time while trying to cajole compaction to occur (in
> 2.1.x,
> > >> >> LevelledCompaction) to goddamn actually compact the data, I've been
> > >> >> thinking as to why there isn't a separate set of sstable
> > infrastructure
> > >> >> setup for row deletion tombstones.
> > >> >>
> > >> >> I'm imagining that row tombstones are written to separate sstables
> > than
> > >> >> mainline data updates/appends and range/column tombstones.
> > >> >>
> > >> >> By writing them to separate sstables, the compaction systems can
> > >> >> preferentially merge / process them when compacting sstables.
> > >> >>
> > >> >> This would create an additional sstable for lookup in the bloom
> > >> filters,
> > >> >> granted. I had visions of short circuiting the lookups to other
> > >> sstables if
> > >> >> a row tombstone was present in one of the special row tombstone
> > >> sstables.
> > >> >>
> > >> >>
> > >> > All of the above sounds really interesting to me, but I suspect
> it's a
> > >> LOT
> > >> > of work to make it happen correctly.
> > >> >
> > >> > You'd almost end up with 2 sets of logs for the LSM - a tombstone
> > >> > log/generation, and a data log/generation, and the tombstone logs
> > would
> > >> be
> > >> > read-only inputs to data compactions.
> > >> >
> > >> >
> > >> >> But that would only be possible if there was the notion of a "super
> > row
> > >> >> tombstone" that permanently deleted a rowkey and all future writes
> > >> would be
> > >> >> invalidated. Kind of like how a tombstone with a mistakenly huge
> > >> timestamp
> > >> >> becomes a sneaky permanent tombstone, but 

Re: Release votes

2018-02-16 Thread Ariel Weisberg
Hi,

I created https://issues.apache.org/jira/browse/CASSANDRA-14241 for this issue. 
You are right there is a solid chunk of failing tests on Apache infrastructure 
that don't fail on CircleCI. I'll find someone to get it done.

I think that fix before commit is only going to happen if we go all the way and 
route every single commit through testing infrastructure that runs all the 
tests multiple times and refuses to merge commits unless the tests pass 
somewhat consistently. Short of that flakey (and hard failing) tests are going 
to keep creeping in (and even then). That's not feasible without much better 
infrastructure available to everyone and it's not a short term thing RN I 
think. I mean maybe we move forward with it on the Apache infrastructure we 
have.

I 'm not sure flakey infrastructure is what is acutely hurting us although we 
do have infrastructure that exposes unreliable tests although maybe that's just 
a matter of framing.

Dealing with flakey tests generally devolves into picking victim(s) via some 
process. Blocking releases on failing tests is a way of picking the people who 
want the next release as victims. Blocking commits on flakey tests is a way of 
making people who want to merge stuff the victim. Doing nothing is making some 
random subset of volunteers who fix the tests as well as all developers who run 
the tests victims as well as end users to a certain extent. Excluding tests and 
running tests multiple times is picking the end user of releases as the victim.

RE multi-pronged. We are currently using a flaky annotation that reruns tests, 
we have skipped tests with JIRAs, we are are re-running tests right now if they 
fail for certain classes of reasons. So we are currently down that road right 
now. I think it's fine but we need a backpressure mechanism because we can't 
keep accruing this kind of thing forever.

In my mind processes for keeping the tests passing need to provide two 
functions, pick victims(s) (task management), and create backpressure (slow new 
development to match defect rate). It seems possible to create backpressure by 
blocking releases, but that fails to pick victims to an extent. Many people 
running C* are so far behind they aren't waiting on that next release. Or they 
are accustomed to running a private fork and backporting. When we were able to 
block commits via informal process I think it helped, but an informal process 
has limitations.

I think blocking commits via automation is going to spread the load out most 
evenly and make it a priority for everyone in the contributor base. We have 16 
apache nodes to work with which I think would handle our current commit load. 
We can fine tune criteria for blocking commits as we go.

I don't have an answer for how we backpressure the utilization of flakey 
annotations and re-running tests. Maybe it's a czar saying no commits until we 
reach some goal done on a period (every 3 months). Maybe we vote on it 
periodically. Czars can be really effective in moving the herd. The Czar does 
need to be able to wield something to motivate some set of contributors to do 
the work. It's not so much about preventing the commits as it is signaling 
unambiguously that this is what we are working on now and if you aren't you are 
working on the wrong thing. It ends up being quite depressing though when you 
end up working through significant amounts of tech debt all at once. It hurts 
less when you have a lot of people working on it.

Ariel

On Thu, Feb 15, 2018, at 6:48 PM, kurt greaves wrote:
> It seems there has been a bit of a slip in testing as of recently, mostly
> due to the fact that there's no canonical testing environment that isn't
> flaky. We probably need to come up with some ideas and a plan on how we're
> going to do testing in the future, and how we're going to make testing
> accessible for all contributors. I think this is the only way we're really
> going to change behaviour. Having an incredibly tedious process and then
> being aggressive about it only leads to resentment and workarounds.
> 
> I'm completely unsure of where dtests are at since the conversion to
> pytest, and there's a lot of failing dtests on the ASF jenkins jobs (which
> appear to be running pytest). As there's currently not a lot of visibility
> into what people are doing with CircleCI for this it's hard to say if
> things are better over there. I'd like to help here if anyone wants to fill
> me in.
> 
> On 15 February 2018 at 21:14, Josh McKenzie  wrote:
> 
> > >
> > > We’ve said in the past that we don’t release without green tests. The PMC
> > > gets to vote and enforce it. If you don’t vote yes without seeing the
> > test
> > > results, that enforces it.
> >
> > I think this is noble and ideal in theory. In practice, the tests take long
> > enough, hardware infra has proven flaky enough, and the tests *themselves*
> > flaky enough, that there's been a consistent low-level of test failure
> > noise 

[RELEASE] Apache Cassandra 2.2.12 released - PLEASE READ NOTICE

2018-02-16 Thread Michael Shuler
PLEASE READ: MAXIMUM TTL EXPIRATION DATE NOTICE (CASSANDRA-14092)
--

The maximum expiration timestamp that can be represented by the storage
engine is 2038-01-19T03:14:06+00:00, which means that inserts with TTL
thatl expire after this date are not currently supported. By default,
INSERTS with TTL exceeding the maximum supported date are rejected, but
it's possible to choose a different expiration overflow policy. See
CASSANDRA-14092.txt for more details.

Prior to 3.0.16 (3.0.X) and 3.11.2 (3.11.x) there was no protection
against INSERTS with TTL expiring after the maximum supported date,
causing the expiration time field to overflow and the records to expire
immediately. Clusters in the 2.X and lower series are not subject to
this when assertions are enabled. Backed up SSTables can be potentially
recovered and recovery instructions can be found on the
CASSANDRA-14092.txt file.

If you use or plan to use very large TTLS (10 to 20 years), read
CASSANDRA-14092.txt for more information.
--

The Cassandra team is pleased to announce the release of Apache
Cassandra version 2.2.12.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/a6L1TE
[2]: (NEWS.txt) https://goo.gl/M9jhdZ
[3]: https://issues.apache.org/jira/browse/CASSANDRA



signature.asc
Description: OpenPGP digital signature


[RELEASE] Apache Cassandra 2.1.20 released - PLEASE READ NOTICE

2018-02-16 Thread Michael Shuler
PLEASE READ: MAXIMUM TTL EXPIRATION DATE NOTICE (CASSANDRA-14092)
--

The maximum expiration timestamp that can be represented by the storage
engine is 2038-01-19T03:14:06+00:00, which means that inserts with TTL
thatl expire after this date are not currently supported. By default,
INSERTS with TTL exceeding the maximum supported date are rejected, but
it's possible to choose a different expiration overflow policy. See
CASSANDRA-14092.txt for more details.

Prior to 3.0.16 (3.0.X) and 3.11.2 (3.11.x) there was no protection
against INSERTS with TTL expiring after the maximum supported date,
causing the expiration time field to overflow and the records to expire
immediately. Clusters in the 2.X and lower series are not subject to
this when assertions are enabled. Backed up SSTables can be potentially
recovered and recovery instructions can be found on the
CASSANDRA-14092.txt file.

If you use or plan to use very large TTLS (10 to 20 years), read
CASSANDRA-14092.txt for more information.
--

The Cassandra team is pleased to announce the release of Apache
Cassandra version 2.1.20.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/5M7w4X
[2]: (NEWS.txt) https://goo.gl/Kd2kF3
[3]: https://issues.apache.org/jira/browse/CASSANDRA



signature.asc
Description: OpenPGP digital signature


[VOTE PASSED] Release Apache Cassandra 2.2.12

2018-02-16 Thread Michael Shuler
With 10 binding +1, 1 non-binding +1, and no other votes, this vote for
2.2.12 passes. I'll upload the artifacts today.

-- 
Kind regards,
Michael

On 02/12/2018 02:30 PM, Michael Shuler wrote:
> I propose the following artifacts for release as 2.2.12.
> 
> sha1: 1602e606348959aead18531cb8027afb15f276e7
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.12-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1153/org/apache/cassandra/apache-cassandra/2.2.12/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1153/
> 
> Debian and RPM packages are available here:
> http://people.apache.org/~mshuler
> 
> *** This release addresses an important fix for CASSANDRA-14092 ***
> "Max ttl of 20 years will overflow localDeletionTime"
> https://issues.apache.org/jira/browse/CASSANDRA-14092
> 
> The vote will be open for 72 hours (longer if needed).
> 
> [1]: (CHANGES.txt) https://goo.gl/QkJeXH
> [2]: (NEWS.txt) https://goo.gl/A4iKFb
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



[VOTE PASSED] Release Apache Cassandra 2.1.20

2018-02-16 Thread Michael Shuler
With 9 binding +1 and no other votes, the 2.1.20 release passes. I will
get the artifacts uploaded today.

-- 
Kind regards,
Michael

On 02/12/2018 02:30 PM, Michael Shuler wrote:
> I propose the following artifacts for release as 2.1.20.
> 
> sha1: b2949439ec62077128103540e42570238520f4ee
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.20-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1152/org/apache/cassandra/apache-cassandra/2.1.20/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1152/
> 
> Debian and RPM packages are available here:
> http://people.apache.org/~mshuler
> 
> *** This release addresses an important fix for CASSANDRA-14092 ***
> "Max ttl of 20 years will overflow localDeletionTime"
> https://issues.apache.org/jira/browse/CASSANDRA-14092
> 
> The vote will be open for 72 hours (longer if needed).
> 
> [1]: (CHANGES.txt) https://goo.gl/5i2nw9
> [2]: (NEWS.txt) https://goo.gl/i9Fg2u
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Release votes

2018-02-16 Thread Jason Brown
Hi,

I'm ecstatic others are now running the tests and, more importantly, that
we're having the conversation.

I've become convinced we cannot always have 100% green tests. I am reminded
of this [1] blog post from Google when thinking about flaky tests.
The TL;DR is "flakiness happens", to the tune of about 1.5% of all tests
across Google.

I am in no way advocating that we simply turn a blind eye to broken or
flaky tests, or shrug our shoulders
and rubber stamp a vote, but instead to accept it when it reasonably
applies.
To achieve this, we might need to have discussion at vote/release time (if
not sooner) to triage flaky tests, but I see that as a good thing.

Thanks,

-Jason

[1]
https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html



On Fri, Feb 16, 2018 at 12:47 AM, Dinesh Joshi <
dinesh.jo...@yahoo.com.invalid> wrote:

> I'm new to this project and here are my two cents.
> If there are tests that are constantly failing or flaky and you have had
> releases despite their failures, then they're not useful and can be
> disabled. They can always be reenabled if they are in fact valuable. Having
> 100% blue dashboard is not idealistic IMHO. Hardware failures are harder
> but they can be addressed too.
> I could pitch in to fix the noisy tests or just help in other ways to get
> the dashboard to blue.
> Dinesh
> On Thursday, February 15, 2018, 1:14:33 PM PST, Josh McKenzie <
> jmcken...@apache.org> wrote:
>  >
> > We’ve said in the past that we don’t release without green tests. The PMC
> > gets to vote and enforce it. If you don’t vote yes without seeing the
> test
> > results, that enforces it.
>
> I think this is noble and ideal in theory. In practice, the tests take long
> enough, hardware infra has proven flaky enough, and the tests *themselves*
> flaky enough, that there's been a consistent low-level of test failure
> noise that makes separating signal from noise in this context very time
> consuming. Reference 3.11-test-all for example re:noise:
> https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-3.11-test-all/test/?width=1024=768
>
> Having spearheaded burning test failures to 0 multiple times and have them
> regress over time, my gut intuition is we should have one person as our
> Source of Truth with a less-flaky source for release-vetting CI (dedicated
> hardware, circle account, etc) we can use as a reference to vote on release
> SHA's.
>
> We’ve declared this a requirement multiple times
>
> Declaring things != changed behavior, and thus != changed culture. The
> culture on this project is one of having a constant low level of test
> failure noise in our CI as a product of our working processes. Unless we
> change those (actually block release w/out green board, actually
> aggressively block merge w/any failing tests, aggressively retroactively
> track down test failures on a daily basis and RCA), the situation won't
> improve. Given that this is a volunteer organization / project, that kind
> of daily time investment is a big ask.
>
> On Thu, Feb 15, 2018 at 1:10 PM, Jeff Jirsa  wrote:
>
> > Moving this to it’s own thread:
> >
> > We’ve declared this a requirement multiple times and then we occasionally
> > get a critical issue and have to decide whether it’s worth the delay. I
> > assume Jason’s earlier -1 on attempt 1 was an enforcement of that earlier
> > stated goal.
> >
> > It’s up to the PMC. We’ve said in the past that we don’t release without
> > green tests. The PMC gets to vote and enforce it. If you don’t vote yes
> > without seeing the test results, that enforces it.
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Feb 15, 2018, at 9:49 AM, Josh McKenzie 
> wrote:
> > >
> > > What would it take for us to get green utest/dtests as a blocking part
> of
> > > the release process? i.e. "for any given SHA, here's a link to the
> tests
> > > that passed" in the release vote email?
> > >
> > > That being said, +1.
> > >
> > >> On Wed, Feb 14, 2018 at 4:33 PM, Nate McCall 
> > wrote:
> > >>
> > >> +1
> > >>
> > >> On Thu, Feb 15, 2018 at 9:40 AM, Michael Shuler <
> mich...@pbandjelly.org
> > >
> > >> wrote:
> > >>> I propose the following artifacts for release as 3.0.16.
> > >>>
> > >>> sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
> > >>> Git:
> > >>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> > >> shortlog;h=refs/tags/3.0.16-tentative
> > >>> Artifacts:
> > >>> https://repository.apache.org/content/repositories/
> > >> orgapachecassandra-1157/org/apache/cassandra/apache-cassandra/3.0.16/
> > >>> Staging repository:
> > >>> https://repository.apache.org/content/repositories/
> > >> orgapachecassandra-1157/
> > >>>
> > >>> Debian and RPM packages are available here:
> > >>> http://people.apache.org/~mshuler
> > >>>
> > >>> *** This release addresses an important fix for CASSANDRA-14092 ***
> > >>>"Max ttl of 20 years will overflow localDeletionTime"
> > >>>

Re: [VOTE] (Take 2) Release Apache Cassandra 3.0.16

2018-02-16 Thread Tommy Stendahl

+1


On 2018-02-14 21:40, Michael Shuler wrote:

I propose the following artifacts for release as 3.0.16.

sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
Git:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.16-tentative
Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1157/org/apache/cassandra/apache-cassandra/3.0.16/
Staging repository:
https://repository.apache.org/content/repositories/orgapachecassandra-1157/

Debian and RPM packages are available here:
http://people.apache.org/~mshuler

*** This release addresses an important fix for CASSANDRA-14092 ***
 "Max ttl of 20 years will overflow localDeletionTime"
 https://issues.apache.org/jira/browse/CASSANDRA-14092

The vote will be open for 72 hours (longer if needed).

[1]: (CHANGES.txt) https://goo.gl/rLj59Z
[2]: (NEWS.txt) https://goo.gl/EkrT4G

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] (Take 3) Release Apache Cassandra 3.11.2

2018-02-16 Thread Tommy Stendahl

+1


On 2018-02-14 22:09, Michael Shuler wrote:

I propose the following artifacts for release as 3.11.2.

sha1: 1d506f9d09c880ff2b2693e3e27fa58c02ecf398
Git:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.11.2-tentative
Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1158/org/apache/cassandra/apache-cassandra/3.11.2/
Staging repository:
https://repository.apache.org/content/repositories/orgapachecassandra-1158/

Debian and RPM packages are available here:
http://people.apache.org/~mshuler

*** This release addresses an important fix for CASSANDRA-14092 ***
 "Max ttl of 20 years will overflow localDeletionTime"
 https://issues.apache.org/jira/browse/CASSANDRA-14092

The vote will be open for 72 hours (longer if needed).

[1]: (CHANGES.txt) https://goo.gl/RLZLrR
[2]: (NEWS.txt) https://goo.gl/kpnVHp

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Release votes

2018-02-16 Thread Dinesh Joshi
I'm new to this project and here are my two cents.
If there are tests that are constantly failing or flaky and you have had 
releases despite their failures, then they're not useful and can be disabled. 
They can always be reenabled if they are in fact valuable. Having 100% blue 
dashboard is not idealistic IMHO. Hardware failures are harder but they can be 
addressed too.
I could pitch in to fix the noisy tests or just help in other ways to get the 
dashboard to blue.
Dinesh
On Thursday, February 15, 2018, 1:14:33 PM PST, Josh McKenzie 
 wrote: 
 >
> We’ve said in the past that we don’t release without green tests. The PMC
> gets to vote and enforce it. If you don’t vote yes without seeing the test
> results, that enforces it.

I think this is noble and ideal in theory. In practice, the tests take long
enough, hardware infra has proven flaky enough, and the tests *themselves*
flaky enough, that there's been a consistent low-level of test failure
noise that makes separating signal from noise in this context very time
consuming. Reference 3.11-test-all for example re:noise:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-3.11-test-all/test/?width=1024=768

Having spearheaded burning test failures to 0 multiple times and have them
regress over time, my gut intuition is we should have one person as our
Source of Truth with a less-flaky source for release-vetting CI (dedicated
hardware, circle account, etc) we can use as a reference to vote on release
SHA's.

We’ve declared this a requirement multiple times

Declaring things != changed behavior, and thus != changed culture. The
culture on this project is one of having a constant low level of test
failure noise in our CI as a product of our working processes. Unless we
change those (actually block release w/out green board, actually
aggressively block merge w/any failing tests, aggressively retroactively
track down test failures on a daily basis and RCA), the situation won't
improve. Given that this is a volunteer organization / project, that kind
of daily time investment is a big ask.

On Thu, Feb 15, 2018 at 1:10 PM, Jeff Jirsa  wrote:

> Moving this to it’s own thread:
>
> We’ve declared this a requirement multiple times and then we occasionally
> get a critical issue and have to decide whether it’s worth the delay. I
> assume Jason’s earlier -1 on attempt 1 was an enforcement of that earlier
> stated goal.
>
> It’s up to the PMC. We’ve said in the past that we don’t release without
> green tests. The PMC gets to vote and enforce it. If you don’t vote yes
> without seeing the test results, that enforces it.
>
> --
> Jeff Jirsa
>
>
> > On Feb 15, 2018, at 9:49 AM, Josh McKenzie  wrote:
> >
> > What would it take for us to get green utest/dtests as a blocking part of
> > the release process? i.e. "for any given SHA, here's a link to the tests
> > that passed" in the release vote email?
> >
> > That being said, +1.
> >
> >> On Wed, Feb 14, 2018 at 4:33 PM, Nate McCall 
> wrote:
> >>
> >> +1
> >>
> >> On Thu, Feb 15, 2018 at 9:40 AM, Michael Shuler  >
> >> wrote:
> >>> I propose the following artifacts for release as 3.0.16.
> >>>
> >>> sha1: 890f319142ddd3cf2692ff45ff28e71001365e96
> >>> Git:
> >>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> >> shortlog;h=refs/tags/3.0.16-tentative
> >>> Artifacts:
> >>> https://repository.apache.org/content/repositories/
> >> orgapachecassandra-1157/org/apache/cassandra/apache-cassandra/3.0.16/
> >>> Staging repository:
> >>> https://repository.apache.org/content/repositories/
> >> orgapachecassandra-1157/
> >>>
> >>> Debian and RPM packages are available here:
> >>> http://people.apache.org/~mshuler
> >>>
> >>> *** This release addresses an important fix for CASSANDRA-14092 ***
> >>>    "Max ttl of 20 years will overflow localDeletionTime"
> >>>    https://issues.apache.org/jira/browse/CASSANDRA-14092
> >>>
> >>> The vote will be open for 72 hours (longer if needed).
> >>>
> >>> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
> >>> [2]: (NEWS.txt) https://goo.gl/EkrT4G
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>