Re: March 2015 QA retrospective

2015-04-13 Thread Aleksey Yeschenko
CASSANDRA-8285
https://issues.apache.org/jira/browse/CASSANDRA-8285 Aleksey Yeschenko
Move
all hints related tasks to hints private executor Pierre's reproducer
represents something we weren't doing, but that users are. Is that now
being tested?

That particular issue will not happen again. That class of issues can only
be tested by a sufficiently long running stress test, with plenty of
chaos-monkeying thrown in. That's essentially how it got caught - by driver
duration tests with chaos-monkeying in. It's still being run exercised by
the driver tests, and we do now run them prior to releasing stuff, so I'd
say yes - it's being tested.

Once we have our own framework, we'll migrate it there, from driver tests.

CASSANDRA-8462
https://issues.apache.org/jira/browse/CASSANDRA-8462 Aleksey
Yeschenko Upgrading
a 2.0 to 2.1 breaks CFMetaData on 2.0 nodes Have additional dtest coverage,
need to do this in kitchen sink tests

dtests coverage would not help here, not really. It's a tricky race
condition that can only be triggered during upgrade. Would need some kind
of test that repeatedly upgrades the nodes in the cluster, again and again,
to have this caught, or else it'd be flaky at best.

The actual proper fix for that issue is the new, versioned, schema update
exchange protocol - https://issues.apache.org/jira/browse/CASSANDRA-6038 is
the ticket. That one will come with a metric ton of tests.


On Thu, Apr 9, 2015 at 11:45 AM, Ariel Weisberg ariel.weisb...@datastax.com
 wrote:

 Repeated with sort
*Key* *Assignee* *Summary* *Revisit reason*  CASSANDRA-8285
 https://issues.apache.org/jira/browse/CASSANDRA-8285 Aleksey Yeschenko
 Move
 all hints related tasks to hints private executor Pierre's reproducer
 represents something we weren't doing, but that users are. Is that now
 being tested?  CASSANDRA-8462
 https://issues.apache.org/jira/browse/CASSANDRA-8462 Aleksey
 Yeschenko Upgrading
 a 2.0 to 2.1 breaks CFMetaData on 2.0 nodes Have additional dtest coverage,
 need to do this in kitchen sink tests  CASSANDRA-8640
 https://issues.apache.org/jira/browse/CASSANDRA-8640 Anthony Cozzie
 Paxos
 requires all nodes for CAS If PAXOS is not supposed to require all nodes
 for CAS we should be able to fail nodes or a certain number of nodes and
 still continue to CAS (test availability of CAS under failure conditions).
 No regression test.  CASSANDRA-8677
 https://issues.apache.org/jira/browse/CASSANDRA-8677 Ariel Weisberg
 rpc_interface
 and listen_interface generate NPE on startup when specified interface
 doesn't exist Missing unit tests checking error messages for
 DatabaseDescriptor  CASSANDRA-8577
 https://issues.apache.org/jira/browse/CASSANDRA-8577 Artem Aliev Values
 of set types not loading correctly into Pig Full set of interactions with
 PIG not validated  CASSANDRA-7704
 https://issues.apache.org/jira/browse/CASSANDRA-7704 Benedict
 FileNotFoundException
 during STREAM-OUT triggers 100% CPU usage Streaming testing didn't
 reproduce this before release  CASSANDRA-8383
 https://issues.apache.org/jira/browse/CASSANDRA-8383 Benedict Memtable
 flush may expire records from the commit log that are in a later memtable
 No
 regression test, no follow up ticket. Could/should this have been
 reproducable as an actual bug?  CASSANDRA-8429
 https://issues.apache.org/jira/browse/CASSANDRA-8429 Benedict Some keys
 unreadable during compaction Running stress in CI would have caught this,
 and we're going to do that  CASSANDRA-8459
 https://issues.apache.org/jira/browse/CASSANDRA-8459 Benedict
 autocompaction
 on reads can prevent memtable space reclaimation What would have reproduced
 this before release?  CASSANDRA-8499
 https://issues.apache.org/jira/browse/CASSANDRA-8499 Benedict Ensure
 SSTableWriter cleans up properly after failure Testing error paths? Any way
 to test things in a loop to detect leaks?  CASSANDRA-8513
 https://issues.apache.org/jira/browse/CASSANDRA-8513 Benedict
 SSTableScanner
 may not acquire reference, but will still release it when closed This had a
 user visible component, what test could have caught it befor erelease?
 CASSANDRA-8619 https://issues.apache.org/jira/browse/CASSANDRA-8619
 Benedict using CQLSSTableWriter gives ConcurrentModificationException What
 kind of test would have caught this before release?  CASSANDRA-8632
 https://issues.apache.org/jira/browse/CASSANDRA-8632 Benedict
 cassandra-stress
 only generating a single unique row We rely on stress for performance
 testing, that might mean it needs real testing that demonstrates it
 generates load that looks like the load it is supposed to be generating.
 CASSANDRA-8668 https://issues.apache.org/jira/browse/CASSANDRA-8668
 Benedict We don't enforce offheap memory constraints; regression introduced
 by 7882 Memory constraints was a supported feature/UI, but not completely
 tested before release. Could this have been found most effectively by a
 unit test or a blackbox test?  CASSANDRA-8719
 

Re: [discuss] Modernization of Cassandra build system

2015-04-13 Thread Łukasz Dywicki
Hey Benedict,
My replies in line


 According to some recordings from DataStax there is a plan to support in
 Cassandra multiple kinds of store - document, graph so it won’t get easier
 with the time but rather harder - ask yourself do you really want to mess
 all these things together?
 Well, these certainly won't live in the same repository, so I wouldn't
 worry about that
That’s good. That’s very good cause it will force separation. If you will do 
that please consider using other build system to don’t repeat mistakes which 
are present now in main Cassandra build.

 As I briefly counted in my ealier mail there was 116 issues related to
 artifacts published by build process.
 That does sound like a lot of bugs. How many actual maintenance releases
 were necessary, did you happen to also count? This is something that could
 be raised at the new retrospective that Ariel has begun, to see if there's
 anything that can be done to reduce their incidence and risk.
There have been 159 minor releases of cassandra (git tag —list | egrep rc | 
egrep beta | wc -l). I did not track exactly what is correnation of the bug 
ration. These 116 vs 159 are just numbers. From my understanding there is 116 
unecessary issues which could be avoided. You can read these numbers in two 
different ways - every second minor release was fixing maven artifacts OR every 
second release was broken due the maven artifacts. Seems you preffer first one 
while users usualy observes second.


 however it gives real boost when it comes to community donations, tool
 development, or even debugging
 You're conflating the task of upgrading the build system with
 modularisation, which is a bad idea if you want to make progress on either
 one, since they're each a different and difficult discussion, even if they
 relate.
I do that cause this is typical chicken vs egg problem. One thing can not be 
done without another it’s just question which one is fist to follow. Code 
modularization/package separation without strict bounds is hard to follow. 
However nothing prevents doing this in reverse mode - by solving code issues 
first and then introducing new build tool. It’s up to cassandra developers to 
decide.

 On the topic of the build system: if you can justify why you think Maven
 has a significant chance of reducing our bug burden here, a case can
 perhaps be made, and I will defer to the members of this list with more
 experience of our build system for that in depth discussion. At the moment,
 it seems to be taken as a given this would occur, but I don't yet see a
 clear reason that we should expect this to occur.
You see - I don’t have to justify Maven. I have proposed you a help with it. I 
also gave you couple of reasons why Ant is not first sort of tools these days. 
I don’t feel myself responsible for doing any advocating for Maven itself. It’s 
up to you what you choose. The major thing, major problem which modern tools 
are doing for you is build time classpath management (both compile  test) and 
separate javac executions for both of these. Take what you preffer - gradle, 
sbt, leiningen. Anything which does things from previous sentence. Do your own 
evaluation. Take what work for you, not only for me.

 On the topic of modularisation: Like I said previously, everyone on this
 list is sympathetic to that goal, I think. However the practical reality is
 likely to be too confounding. But that doesn't mean it is absolutely a
 losing battle, if you can demonstrate a sufficiently painless and
 worthwhile transition.
I don't quite get you at this point. From one side you suppose everyone is for 
taking such step, from another one you ask for proofs. In case of code 
relocation there are always multiple ways. Cause of what you have currently 
forces solution of multiple problems. You can start on any of it (ie. circular 
dependencies I did mention in earlier conversation doesn’t require changing a 
tool). In place where you stay at this moment there will be no such thing as 
painless transition. As said ealier - it will be only harder over time.
Given example from my life. We do use Cassandra. We do have plenty of mid level 
integration tests which are verifying end to end functionality. Starting from 
frontend or messaging layer up to data persistence. Now each of our tests even 
if it consist a low amount of data hits IO on multiple levels - starting from 
socket ending on disk. We do not test in such cases consistency levels as it’s 
assumed to be tested by cassandra itself - we are ensuring that incoming data 
passes storage interface and can be retrieved back via same interface. With 
what cassandra is now we can not make our tests running fast. People are 
prisoners of cassandra-unit cause embedding cassandra is impossible, even if 
it’s written using portable language. It has too many inner and outer 
dependencies. On other hand we have for example ActiveMQ which has lots of 
options. Even with all of these it might be embedded 

Re: 3.0 and the Cassandra release process

2015-04-13 Thread Jonathan Ellis
On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis jbel...@gmail.com wrote:


 I’m optimistic that as we improve our process this way, our even releases
 will become increasingly stable.  If so, we can skip sub-minor releases
 (3.2.x) entirely, and focus on keeping the release train moving.  In the
 meantime, we will continue delivering 2.1.x stability releases.


The weak point of this plan is the transition from the big release
development methodology culminating in 3.0, to the monthly tick-tock
releases.  Since 3.0 needs to go through a beta/release candidate phase,
during which we're going to be serious about not adding new features, that
means that 3.1 will come with multiple months worth of features, so right
off the bat we're starting from a disadvantage from a stability standpoint.

Recognizing that it will take several months for the tick-tock releases to
stabilize, I would like to ship 3.0.x stability releases concurrently with
3.y tick-tock releases.  This should stabilize 3.0.x faster than tick-tock,
while at the same time hedging our bets such that if we assess tick-tock in
six months and decide it's not delivering on its goals, we're not six
months behind in having a usable set of features that we shipped in 3.0.

So, to summarize:

- New features will *only* go into tick-tock releases.
- Bug fixes will go into tick-tock releases and a 3.0.x branch, which will
be maintained for at least a year

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Impact of removing compactions_in_progress folder

2015-04-13 Thread Anuj Wadehra
Often we face errors on Cassandra start regarding unfinished compactions 
particularly when cassandra was abrupty shut down . Problem gets resolved when 
we delete /var/lib/cassandra/data/system/compactions_in_progress folder. Does 
deletion of the folder has any impact on  integrity of data or any other aspect?



Thanks

Anuj Wadehra



Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Anuj Wadehra
Recently we faced an issue where every repair operation caused addition of 
hundreds of sstables (CASSANDRA-9146). In order to bring situation under 
control and make sure reads are not impacted, we were left with no option but 
to run major compaction to ensure that thousands of tiny sstables are compacted.


Queries:
Does major compaction has any drawback after automatic tombstone compaction got 
implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)? 
I understand that the huge SSTable created after major compaction wont be 
compacted with new data any time soon but is that a problem if purged data is 
removed via automatic tombstone compaction? If we major compaction results in a 
huge file say 500GB, what are the drawbacks of it?

If one big sstable is a problem, is there any way of solving the problem? We 
tried running sstablesplit after major compaction to split the big sstable but 
as new sstables were of same size they are again compacted into single huge 
table once Cassandra was started after executing sstablesplit.



Thanks

Anuj Wadehra



Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Anuj Wadehra
I havent got much response regarding this on user list..so posting it on dev 
list too..


Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:Anuj Wadehra anujw_2...@yahoo.co.in
Date:Tue, 14 Apr, 2015 at 7:05 am
Subject:Drawbacks of Major Compaction now that Automatic Tombstone Compaction 
Exists

Recently we faced an issue where every repair operation caused addition of 
hundreds of sstables (CASSANDRA-9146). In order to bring situation under 
control and make sure reads are not impacted, we were left with no option but 
to run major compaction to ensure that thousands of tiny sstables are compacted.


Queries:
Does major compaction has any drawback after automatic tombstone compaction got 
implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)? 
I understand that the huge SSTable created after major compaction wont be 
compacted with new data any time soon but is that a problem if purged data is 
removed via automatic tombstone compaction? If we major compaction results in a 
huge file say 500GB, what are the drawbacks of it?

If one big sstable is a problem, is there any way of solving the problem? We 
tried running sstablesplit after major compaction to split the big sstable but 
as new sstables were of same size they are again compacted into single huge 
table once Cassandra was started after executing sstablesplit.



Thanks

Anuj Wadehra



Re: [discuss] Modernization of Cassandra build system

2015-04-13 Thread Benedict Elliott Smith

 every second minor release was fixing maven artifacts OR every second
 release was broken due the maven artifacts


Well, it's also possible just one release had 116 build artefact problems?
Obviously that's the absurd extreme end, but the reason I was asking if you
had any idea, since you'd done the counting.

 I don’t feel myself responsible for doing any advocating for Maven itself.
 It’s up to you what you choose.


This is a community process, and I'm trying (and apparently failing) to
help you understand at least how *I* understand it to work, and the
problems I see with what you're proposing. The silence on the list suggests
there is significant inertia and no other strong advocates for this change.
This could be for myriad reasons, from people simply not caring, to
thinking there are roughly equal pros and cons, to also just hoping the
conversation will go away because they're against it. Without advocacy, the
inertia is not overcome, and since you're the only person so far to express
a desire for this change, it is unfortunately up to you to convince us. I,
and I'm sure the rest of the community, are very appreciative of the offer
of your time. We really are. Unfortunately that isn't enough to warrant
utilising it, but we *are* open to discussion and advocacy on the topic.

The crux of the problem is that Cassandra has a lot of important work being
done to it, work that I personally perceive (and suspect others do also) as
more important than the admitted inadequacy of our modularisation and,
perhaps, our build system (I plead ignorance here). This work is currently
surpassing the labour we have to address it. If this upheaval hinders that
work, that is bad, and that is what I mean when I say warrants - is the
upheaval small enough, or the yield really great (modularisation doesn't
always pan out, so we may not even get a good result, but still have the
significant pain)?

I don't want to give you the impression I am either a gatekeeper or
shooting down your proposal. I'm just attempting to explain my perception
of the view of the existing contributors.


On Mon, Apr 13, 2015 at 9:31 PM, Łukasz Dywicki l...@code-house.org wrote:

 Hey Benedict,
 My replies in line


  According to some recordings from DataStax there is a plan to support in
  Cassandra multiple kinds of store - document, graph so it won’t get
 easier
  with the time but rather harder - ask yourself do you really want to
 mess
  all these things together?
  Well, these certainly won't live in the same repository, so I wouldn't
  worry about that
 That’s good. That’s very good cause it will force separation. If you will
 do that please consider using other build system to don’t repeat mistakes
 which are present now in main Cassandra build.

  As I briefly counted in my ealier mail there was 116 issues related to
  artifacts published by build process.
  That does sound like a lot of bugs. How many actual maintenance releases
  were necessary, did you happen to also count? This is something that
 could
  be raised at the new retrospective that Ariel has begun, to see if
 there's
  anything that can be done to reduce their incidence and risk.
 There have been 159 minor releases of cassandra (git tag —list | egrep rc
 | egrep beta | wc -l). I did not track exactly what is correnation of the
 bug ration. These 116 vs 159 are just numbers. From my understanding there
 is 116 unecessary issues which could be avoided. You can read these numbers
 in two different ways - every second minor release was fixing maven
 artifacts OR every second release was broken due the maven artifacts. Seems
 you preffer first one while users usualy observes second.


  however it gives real boost when it comes to community donations, tool
  development, or even debugging
  You're conflating the task of upgrading the build system with
  modularisation, which is a bad idea if you want to make progress on
 either
  one, since they're each a different and difficult discussion, even if
 they
  relate.
 I do that cause this is typical chicken vs egg problem. One thing can not
 be done without another it’s just question which one is fist to follow.
 Code modularization/package separation without strict bounds is hard to
 follow. However nothing prevents doing this in reverse mode - by solving
 code issues first and then introducing new build tool. It’s up to cassandra
 developers to decide.

  On the topic of the build system: if you can justify why you think Maven
  has a significant chance of reducing our bug burden here, a case can
  perhaps be made, and I will defer to the members of this list with more
  experience of our build system for that in depth discussion. At the
 moment,
  it seems to be taken as a given this would occur, but I don't yet see a
  clear reason that we should expect this to occur.
 You see - I don’t have to justify Maven. I have proposed you a help with
 it. I also gave you couple of reasons why Ant is not first sort of tools
 these 

Re: [discuss] Modernization of Cassandra build system

2015-04-13 Thread Benedict Elliott Smith

 According to some recordings from DataStax there is a plan to support in
 Cassandra multiple kinds of store - document, graph so it won’t get easier
 with the time but rather harder - ask yourself do you really want to mess
 all these things together?


Well, these certainly won't live in the same repository, so I wouldn't
worry about that

 As I briefly counted in my ealier mail there was 116 issues related to
 artifacts published by build process.


That does sound like a lot of bugs. How many actual maintenance releases
were necessary, did you happen to also count? This is something that could
be raised at the new retrospective that Ariel has begun, to see if there's
anything that can be done to reduce their incidence and risk.

however it gives real boost when it comes to community donations, tool
 development, or even debugging


You're conflating the task of upgrading the build system with
modularisation, which is a bad idea if you want to make progress on either
one, since they're each a different and difficult discussion, even if they
relate.

On the topic of the build system: if you can justify why you think Maven
has a significant chance of reducing our bug burden here, a case can
perhaps be made, and I will defer to the members of this list with more
experience of our build system for that in depth discussion. At the moment,
it seems to be taken as a given this would occur, but I don't yet see a
clear reason that we should expect this to occur.

On the topic of modularisation: Like I said previously, everyone on this
list is sympathetic to that goal, I think. However the practical reality is
likely to be too confounding. But that doesn't mean it is absolutely a
losing battle, if you can demonstrate a sufficiently painless and
worthwhile transition.


On Sat, Apr 11, 2015 at 11:12 AM, Łukasz Dywicki l...@code-house.org
wrote:

 Sorry for not coming back to topic for long time.

 You are right that what Cassandra project have currently - does work and
 keeping package scoping discipline in such big development community as
 Cassandra is clearly impossible without tool support (if you insist to keep
 ant please try to separate javac tasks for logical parts in current build
 to verify that). I clearly pointed out that it doesn’t work in reliable way
 causing troubles with artifacts uploaded to maven central. As I briefly
 counted in my ealier mail there was 116 issues related to artifacts
 published by build process. It is a lot and these changes requires another
 mainanance releases to fix for example one or another bytecode level
 dependency causing NoClassDefErrors with invalid artifacts. According to
 some recordings from DataStax there is a plan to support in Cassandra
 multiple kinds of store - document, graph so it won’t get easier with the
 time but rather harder - ask yourself do you really want to mess all these
 things together?

 Starting from 2.x Cassandra supports triggers but writing even a simplest
 trigger which will drop a log message or publish UDP packet requires entire
 cassandra and all it’s dependencies to be present during development.
 Fact that everything sits in one big ant build.xml is caused by troubles
 generated by ant itself to support multiple build modules, placeholders and
 so on, not because it’s handsome to do such.

 Modernization of build and internal dependencies is not something which
 brings huge benefit in first run cause now your frontend is CQL, however it
 gives real boost when it comes to community donations, tool development, or
 even debugging. Sadly keeping current Ant build is silent agreement to keep
 mess internally and rickety architecture of project. Ant was already legacy
 tool when Cassandra has been launched. The longer you will stay with it the
 more troubles you will get with it over time.

 Kind regards,
 Lukasz


  Wiadomość napisana przez Robert Stupp sn...@snazy.de w dniu 2 kwi
 2015, o godz. 14:51:
 
  TL;DR - Benedict is right.
 
  IMO Maven is a nice, straight-forward tool if you know what you’re doing
 and start on a _new_ project.
  But Maven easily becomes a pita if you want to do something that’s not
 supported out-of-the-box.
  I bet that Maven would just not work for C* source tree with all the
 little nice features that C*’s build.xml offers (just look at the scripted
 stuff in build.xml).
 
  Eventually gradle would be an option; I proposed to switch to gradle
 several months ago. Same story (although gradle is better than Maven ;) ).
  But… you need to know that build.xml is not just used to build the code
 and artifacts. It is also used in CI, ccm, cstar-perf and a some other
 custom systems that exist and just work. So - if we would exchange ant with
 something else, it would force a lot of effort to change several tools and
 systems. And there must be a guarantee that everything works like it did
 before.
 
  Regarding IDEs: i’m using IDEA every day and it works like a charm with
 C*. Eclipse is ”supported natively” by 

Re: March 2015 QA retrospective

2015-04-13 Thread Marcus Eriksson
On Fri, Apr 10, 2015 at 8:34 PM, Ariel Weisberg ariel.weisb...@datastax.com
 wrote:

 Hi Marcus,

 CASSANDRA-8211 https://issues.apache.org/jira/browse/CASSANDRA-8211
 Overlapping
 sstables in L1+

 So the question I would ask is would the workload that reproduces this
 specific bug be interesting in the general sense. Do we have to do anything
 special to reproduce it like make many non-verlapping sstables in L0 and is
 that interesting in the general sense.


The way I reproduced it was (and I doubt it is interesting in the general
case)
* insert 10M keys with stress
* major compact
* stop node
* sstablesplit
* start node
* switch to LCS
* wait, repeat if no error within 5 minutes

ie, very specific to the bug



 CASSANDRA-8320 https://issues.apache.org/jira/browse/CASSANDRA-8320
  2.1.2: NullPointerException in SSTableWriter

 What tests did we add for this? Is it a targeted regression test, or are we
 now fully exercising SSTableRewriter the way users do against
 representative configurations and data?


Yes, we added unit tests that exercises the code in SSTableRewriter


 CASSANDRA-8432 https://issues.apache.org/jira/browse/CASSANDRA-8432
 Standalone
 Scrubber broken for LCS

 OK great. Looks like I didn't need to ask Jeremiah to do to look at the
 offline tools.

 CASSANDRA-8386 https://issues.apache.org/jira/browse/CASSANDRA-8386 Make
 sure we release references to sstables after incremental repair
 CASSANDRA-8316 https://issues.apache.org/jira/browse/CASSANDRA-8316 Did
 not get positive replies from all endpoints error on incremental repair
 CASSANDRA-8580 https://issues.apache.org/jira/browse/CASSANDRA-8580
 AssertionErrors
 after activating unchecked_tombstone_compaction with leveled compaction
 CASSANDRA-8458 https://issues.apache.org/jira/browse/CASSANDRA-8458
 Don't
 give out positions in an sstable beyond its first/last tokens

 Can you capture the elements of what we need and put it under
 CASSANDRA-9012. Maybe a ticket for testing repair under load, a ticket for
 switching compaction strategies and what the test scenarios for that would
 be, and when you say configurations what kind of configurations are you
 thinking of?


will add subtickets to #9012



 CASSANDRA-8525 https://issues.apache.org/jira/browse/CASSANDRA-8525
 Bloom
 Filter truePositive counter not updated on key cache hit

 I think for JMX we only need to test that the access path delivers the
 value. Knowing we had to test for the value was I think an original
 implementer and reviewer issue. There was a metric exposed by the unit so
 we needed a test that shows the metric has a correct value.

 CASSANDRA-8532 https://issues.apache.org/jira/browse/CASSANDRA-8532 Fix
 calculation of expected write size during compaction
 https://issues.apache.org/jira/browse/CASSANDRA-8562 Fix checking
 available disk
 space before compaction starts

 Great. Linked 9154 to 9012 (Triage missing tests)

 CASSANDRA-8635 https://issues.apache.org/jira/browse/CASSANDRA-8635 STCS
 cold sstable omission does not handle overwrites without reads

 Can you create a ticket for this then? We should emit this workload pattern
 and then validate after that utilization is as expected.

 Alternatively this could be caught as a performance regression? It's
 starting to seem like some regressions could be caught as performance
 regressions.


Yes this would definitely be caught in a performance test, if the workload
was correct (ie, very few reads, many writes)

/Marcus


Re: March 2015 QA retrospective

2015-04-13 Thread Ariel Weisberg
Hi Benedict,

This only requires unit testing or dtests to be run this way. However for
 the kitchen sink tests this is just another dimension in the configuration
 state space, which IMO should be addressed as a whole methodically. Perhaps
 we should file a central JIRA, or the Google doc you suggested, for
 tracking all of these data points?


I created a doc
https://docs.google.com/a/datastax.com/document/d/1kccPqxEAoYQpT0gXnp20MYQUDmjOrakAeQhf6vkqjGo/edit?usp=sharing
that
is requirements, but not implementation. I want to list things we would
like it to test in the general sense, as well as enumerating specific bugs
that it should have been able to catch.

This does raise an interesting, but probably not significant downside to
 the new approach: I fixed this ticket because somebody mentioned to me that
 it was hurting them, and I saw a quick and easy fix. The testing would not
 be quick and easy, so I am unlikely to volunteer to patch quick fixes in
 the new world order. This will certainly lead to higher quality bug fixes,
 but it may lead to fewer of them, and fewer instances of volunteer work to
 help people out, because the overhead eats too much into the work you're
 actually responsible for. This may lead to bug fixing being seen as much
 more of a chore than it already can be. I don't say this to discourage the
 new approach; it is just a thought that occurs to me off the back of this
 specific discussion.


It's a real problem. People doing bugs fixes can be stuck spending months
doing nothing but that and writing tests to fill in coverage. Then they get
unhappy and unproductive.

One of the reasons I leave the option for filing a JIRA  open instead of
saying that they have to do something is that it gives assignees and
reviewers the option to have the work done later or by someone else. The
person who is scheduling releases can see that the test issues before
release (you would set fix version for the next release). It's still not
done and the release is not done. That puts pressure on the person who
wants to release to make sure it is in someone's queue.

If you are hardcore agile and doing one or two week sprints what happens is
that there are no tickets left in the sprint other than what was agreed on
at at the planning meeting and people will have no choice but to work on
test tasks. How we manage and prioritize tasks right now is magic to me
and maybe not something that scales down to monthly releases.

For monthly releases on at least a weekly basis you need to know what
stands between you and the release being done and you need to have a plan
for who is going to take care of the blockers that crop up.

The testing would not
 be quick and easy, so I am unlikely to volunteer to patch quick fixes in
 the new world order.


I think this gets into how we load balance bug fixes. There is a clear
benefit to routing the bug to the person who will know how to fix and test
it. I have never seen bugs as something you volunteer for. They typically
belong somewhere and if it is with you then so be it.


 because the overhead eats too much into the work you're
 actually responsible for.


We need to make sure that bug fixing isn't seen that way. I think it's
important to make sure bugs find their way home. The work your actually
responsible for is not done so you can't claim that bug fixes are eating
into it. It already done been ate.

We shouldn't prioritize new work over past work that was never finished.
With monthly releases and breaking things down into much smaller chunks it
means you have the option to let new work slip to accommodate without
moving tasks between people.

Ariel



On Fri, Apr 10, 2015 at 7:07 PM, Benedict Elliott Smith 
belliottsm...@datastax.com wrote:

 
  CASSANDRA-8459 https://issues.apache.org/jira/browse/CASSANDRA-8459
  autocompaction
  on reads can prevent memtable space reclaimation
 
  Can you link a ticket to CASSANDRA-9012 and characterize in a way we can
  try and implement how to make sufficiently large partitions, over
  sufficiently large periods of time?

 Maybe also enumerate the other permutations where this matters like
  secondary indexes and the access patterns (scans).
 

 Does this really qualify for its own ticket? This should just be one of
 many configurations for stress' part in the new tests. We should perhaps
 have an aggregation ticket where we ensure we enumerate the configuration
 data points we've met that need to be covered. But, IMO at least, a
 methodical exhaustive approach should be undertaken separately, and only be
 corroborated against such a list to ensure it was done sufficiently well.


 
  CASSANDRA-8619 https://issues.apache.org/jira/browse/CASSANDRA-8619 -
  using
  CQLSSTableWriter gives ConcurrentModificationException
 
  OK. I don't think the original fix meets our new definition of done since
  the was insufficient coverage, and in this case no regression test. To be
  done you would have to either implement the coverage or