Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-18 Thread Josh McKenzie
> One thing where this “could” come into play is that we currently run with 
> different configs at the CI level and we might be able to make this happen at 
> the class or method level instead..
It'd be great to be able to declaratively indicate which configurations a test 
needed to exercise and we just have 1 CI run that includes them as appropriate. 

On Mon, Dec 18, 2023, at 7:22 PM, David Capwell wrote:
>> A brief perusal shows jqwik as integrated with JUnit 5 taking a fairly 
>> interesting annotation-based approach to property testing. Curious if you've 
>> looked into or used that at all David (Capwell)? (link for the lazy: 
>> https://jqwik.net/docs/current/user-guide.html#detailed-table-of-contents).
> 
> I have not no.  Looking at your link it moves from lambdas to annotations, 
> and tries to define a API for stateful… I am neutral to that as its mostly 
> style…. One thing to call out is that the project documents it tries to 
> “shrink”… we ended up disabling this in QuickTheories as shrinking doesn’t 
> work well for many of our tests (too high resource demand and unable to 
> actually shrink once you move past trivial generators).  Looking at their 
> docs and their code, its hard for me to see how we actually create C* 
> generators… its so much class gen magic that I really don’t see how to create 
> AbstractType or TableMetadata… the only example they gave was not random data 
> but hand crafted data… 
> 
>> moving to JUnit 5
> 
> I am a fan of this.  If we add dependencies and don’t keep update with them 
> it becomes painful over time (missing features, lack of support, etc).  
> 
>> First of all - when you want to have a parameterized test case you do not 
>> have to make the whole test class parameterized - it is per test case. Also, 
>> each method can have different parameters.
> 
> I strongly prefer this, but never had it as a blocker from me doing param 
> tests…. One thing where this “could” come into play is that we currently run 
> with different configs at the CI level and we might be able to make this 
> happen at the class or method level instead..
> 
> @ServerConfigs(all) // can exclude unsupported configs
> public class InsertTest
> 
> It bothers me deeply that we run tests that don’t touch the configs we use in 
> CI, causing us to waste resources… Can we solve this in junit4 param logic… 
> no clue… 
> 
>> On Dec 15, 2023, at 6:52 PM, Josh McKenzie  wrote:
>> 
>>> First of all - when you want to have a parameterized test case you do not 
>>> have to make the whole test class parameterized - it is per test case. 
>>> Also, each method can have different parameters.
>> This is a pretty compelling improvement to me having just had to use the 
>> somewhat painful and blunt instrument of our current framework's 
>> parameterization; pretty clunky and broad.
>> 
>> It also looks like they moved to a "test engine abstracted away from test 
>> identification" approach to their architecture in 5 w/the "vintage" model 
>> providing native unchanged backwards-compatibility w/junit 4. Assuming they 
>> didn't bork up their architecture that *should* lower risk of the framework 
>> change leading to disruption or failure (famous last words...).
>> 
>> A brief perusal shows jqwik as integrated with JUnit 5 taking a fairly 
>> interesting annotation-based approach to property testing. Curious if you've 
>> looked into or used that at all David (Capwell)? (link for the lazy: 
>> https://jqwik.net/docs/current/user-guide.html#detailed-table-of-contents).
>> 
>> On Tue, Dec 12, 2023, at 11:39 AM, Jacek Lewandowski wrote:
>>> First of all - when you want to have a parameterized test case you do not 
>>> have to make the whole test class parameterized - it is per test case. 
>>> Also, each method can have different parameters.
>>> 
>>> For the extensions - we can have extensions which provide Cassandra 
>>> configuration, extensions which provide a running cluster and others. We 
>>> could for example apply some extensions to all test classes externally 
>>> without touching those classes, something like logging the begin and end of 
>>> each test case. 
>>> 
>>> 
>>> 
>>> wt., 12 gru 2023 o 12:07 Benedict  napisał(a):
 
 Could you give (or link to) some examples of how this would actually 
 benefit our test suites?
 
 
> On 12 Dec 2023, at 10:51, Jacek Lewandowski  
> wrote:
> 
> I have two major pros for JUnit 5:
> - much better support for parameterized tests
> - global test hooks (automatically detectable extensions) + 
> multi-inheritance
> 
> 
> 
> 
> pon., 11 gru 2023 o 13:38 Benedict  napisał(a):
>> 
>> Why do we want to move to JUnit 5? 
>> 
>> I’m generally opposed to churn unless well justified, which it may be - 
>> just not immediately obvious to me.
>> 
>> 
>>> On 11 Dec 2023, at 08:33, Jacek Lewandowski 
>>>  wrote:
>>> 
>>> Nobody referred so far to the idea 

Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-18 Thread David Capwell
> A brief perusal shows jqwik as integrated with JUnit 5 taking a fairly 
> interesting annotation-based approach to property testing. Curious if you've 
> looked into or used that at all David (Capwell)? (link for the lazy: 
> https://jqwik.net/docs/current/user-guide.html#detailed-table-of-contents).

I have not no.  Looking at your link it moves from lambdas to annotations, and 
tries to define a API for stateful… I am neutral to that as its mostly style…. 
One thing to call out is that the project documents it tries to “shrink”… we 
ended up disabling this in QuickTheories as shrinking doesn’t work well for 
many of our tests (too high resource demand and unable to actually shrink once 
you move past trivial generators).  Looking at their docs and their code, its 
hard for me to see how we actually create C* generators… its so much class gen 
magic that I really don’t see how to create AbstractType or TableMetadata… the 
only example they gave was not random data but hand crafted data… 

> moving to JUnit 5

I am a fan of this.  If we add dependencies and don’t keep update with them it 
becomes painful over time (missing features, lack of support, etc).  

> First of all - when you want to have a parameterized test case you do not 
> have to make the whole test class parameterized - it is per test case. Also, 
> each method can have different parameters.

I strongly prefer this, but never had it as a blocker from me doing param 
tests…. One thing where this “could” come into play is that we currently run 
with different configs at the CI level and we might be able to make this happen 
at the class or method level instead..

@ServerConfigs(all) // can exclude unsupported configs
public class InsertTest

It bothers me deeply that we run tests that don’t touch the configs we use in 
CI, causing us to waste resources… Can we solve this in junit4 param logic… no 
clue… 

> On Dec 15, 2023, at 6:52 PM, Josh McKenzie  wrote:
> 
>> First of all - when you want to have a parameterized test case you do not 
>> have to make the whole test class parameterized - it is per test case. Also, 
>> each method can have different parameters.
> This is a pretty compelling improvement to me having just had to use the 
> somewhat painful and blunt instrument of our current framework's 
> parameterization; pretty clunky and broad.
> 
> It also looks like they moved to a "test engine abstracted away from test 
> identification" approach to their architecture in 5 w/the "vintage" model 
> providing native unchanged backwards-compatibility w/junit 4. Assuming they 
> didn't bork up their architecture that should lower risk of the framework 
> change leading to disruption or failure (famous last words...).
> 
> A brief perusal shows jqwik as integrated with JUnit 5 taking a fairly 
> interesting annotation-based approach to property testing. Curious if you've 
> looked into or used that at all David (Capwell)? (link for the lazy: 
> https://jqwik.net/docs/current/user-guide.html#detailed-table-of-contents).
> 
> On Tue, Dec 12, 2023, at 11:39 AM, Jacek Lewandowski wrote:
>> First of all - when you want to have a parameterized test case you do not 
>> have to make the whole test class parameterized - it is per test case. Also, 
>> each method can have different parameters.
>> 
>> For the extensions - we can have extensions which provide Cassandra 
>> configuration, extensions which provide a running cluster and others. We 
>> could for example apply some extensions to all test classes externally 
>> without touching those classes, something like logging the begin and end of 
>> each test case. 
>> 
>> 
>> 
>> wt., 12 gru 2023 o 12:07 Benedict > > napisał(a):
>> 
>> Could you give (or link to) some examples of how this would actually benefit 
>> our test suites?
>> 
>> 
>>> On 12 Dec 2023, at 10:51, Jacek Lewandowski >> > wrote:
>>> 
>>> I have two major pros for JUnit 5:
>>> - much better support for parameterized tests
>>> - global test hooks (automatically detectable extensions) + 
>>> multi-inheritance
>>> 
>>> 
>>> 
>>> 
>>> pon., 11 gru 2023 o 13:38 Benedict >> > napisał(a):
>>> 
>>> Why do we want to move to JUnit 5? 
>>> 
>>> I’m generally opposed to churn unless well justified, which it may be - 
>>> just not immediately obvious to me.
>>> 
>>> 
 On 11 Dec 2023, at 08:33, Jacek Lewandowski >>> > wrote:
 
 Nobody referred so far to the idea of moving to JUnit 5, what are the 
 opinions?
 
 
 
 niedz., 10 gru 2023 o 11:03 Benedict >>> > napisał(a):
 
 Alex’s suggestion was that we meta randomise, ie we randomise the config 
 parameters to gain better rather than lesser coverage overall. This means 
 we cover these specific configs and more - just not necessarily on any 
 single commit.
 
 I strongly endorse 

Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-15 Thread Josh McKenzie
> First of all - when you want to have a parameterized test case you do not 
> have to make the whole test class parameterized - it is per test case. Also, 
> each method can have different parameters.
This is a pretty compelling improvement to me having just had to use the 
somewhat painful and blunt instrument of our current framework's 
parameterization; pretty clunky and broad.

It also looks like they moved to a "test engine abstracted away from test 
identification" approach to their architecture in 5 w/the "vintage" model 
providing native unchanged backwards-compatibility w/junit 4. Assuming they 
didn't bork up their architecture that *should* lower risk of the framework 
change leading to disruption or failure (famous last words...).

A brief perusal shows jqwik as integrated with JUnit 5 taking a fairly 
interesting annotation-based approach to property testing. Curious if you've 
looked into or used that at all David (Capwell)? (link for the lazy: 
https://jqwik.net/docs/current/user-guide.html#detailed-table-of-contents).

On Tue, Dec 12, 2023, at 11:39 AM, Jacek Lewandowski wrote:
> First of all - when you want to have a parameterized test case you do not 
> have to make the whole test class parameterized - it is per test case. Also, 
> each method can have different parameters.
> 
> For the extensions - we can have extensions which provide Cassandra 
> configuration, extensions which provide a running cluster and others. We 
> could for example apply some extensions to all test classes externally 
> without touching those classes, something like logging the begin and end of 
> each test case. 
> 
> 
> 
> wt., 12 gru 2023 o 12:07 Benedict  napisał(a):
>> 
>> Could you give (or link to) some examples of how this would actually benefit 
>> our test suites?
>> 
>> 
>>> On 12 Dec 2023, at 10:51, Jacek Lewandowski  
>>> wrote:
>>> 
>>> I have two major pros for JUnit 5:
>>> - much better support for parameterized tests
>>> - global test hooks (automatically detectable extensions) + 
>>> multi-inheritance
>>> 
>>> 
>>> 
>>> 
>>> pon., 11 gru 2023 o 13:38 Benedict  napisał(a):
 
 Why do we want to move to JUnit 5? 
 
 I’m generally opposed to churn unless well justified, which it may be - 
 just not immediately obvious to me.
 
 
> On 11 Dec 2023, at 08:33, Jacek Lewandowski  
> wrote:
> 
> Nobody referred so far to the idea of moving to JUnit 5, what are the 
> opinions?
> 
> 
> 
> niedz., 10 gru 2023 o 11:03 Benedict  napisał(a):
>> 
>> Alex’s suggestion was that we meta randomise, ie we randomise the config 
>> parameters to gain better rather than lesser coverage overall. This 
>> means we cover these specific configs and more - just not necessarily on 
>> any single commit.
>> 
>> I strongly endorse this approach over the status quo.
>> 
>> 
>>> On 8 Dec 2023, at 13:26, Mick Semb Wever  wrote:
>>> 
>>>  
>>>  
>>>  
 
> I think everyone agrees here, but…. these variations are still 
> catching failures, and until we have an improvement or replacement we 
> do rely on them.   I'm not in favour of removing them until we have 
> proof /confidence that any replacement is catching the same failures. 
>  Especially oa, tries, vnodes. (Not tries and offheap is being 
> replaced with "latest", which will be valuable simplification.)  
 
 What kind of proof do you expect? I cannot imagine how we could prove 
 that because the ability of detecting failures results from the 
 randomness of those tests. That's why when such a test fail you 
 usually cannot reproduce that easily.
>>> 
>>> 
>>> Unit tests that fail consistently but only on one configuration, should 
>>> not be removed/replaced until the replacement also catches the failure.
>>>  
 We could extrapolate that to - why we only have those configurations? 
 why don't test trie / oa + compression, or CDC, or system memtable? 
>>> 
>>> 
>>> Because, along the way, people have decided a certain configuration 
>>> deserves additional testing and it has been done this way in lieu of 
>>> any other more efficient approach.
>>> 
>>> 
>>> 


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-12 Thread Jacek Lewandowski
First of all - when you want to have a parameterized test case you do not
have to make the whole test class parameterized - it is per test case.
Also, each method can have different parameters.

For the extensions - we can have extensions which provide Cassandra
configuration, extensions which provide a running cluster and others. We
could for example apply some extensions to all test classes externally
without touching those classes, something like logging the begin and end of
each test case.



wt., 12 gru 2023 o 12:07 Benedict  napisał(a):

> Could you give (or link to) some examples of how this would actually
> benefit our test suites?
>
> On 12 Dec 2023, at 10:51, Jacek Lewandowski 
> wrote:
>
> 
> I have two major pros for JUnit 5:
> - much better support for parameterized tests
> - global test hooks (automatically detectable extensions) +
> multi-inheritance
>
>
>
>
> pon., 11 gru 2023 o 13:38 Benedict  napisał(a):
>
>> Why do we want to move to JUnit 5?
>>
>> I’m generally opposed to churn unless well justified, which it may be -
>> just not immediately obvious to me.
>>
>> On 11 Dec 2023, at 08:33, Jacek Lewandowski 
>> wrote:
>>
>> 
>> Nobody referred so far to the idea of moving to JUnit 5, what are the
>> opinions?
>>
>>
>>
>> niedz., 10 gru 2023 o 11:03 Benedict  napisał(a):
>>
>>> Alex’s suggestion was that we meta randomise, ie we randomise the config
>>> parameters to gain better rather than lesser coverage overall. This means
>>> we cover these specific configs and more - just not necessarily on any
>>> single commit.
>>>
>>> I strongly endorse this approach over the status quo.
>>>
>>> On 8 Dec 2023, at 13:26, Mick Semb Wever  wrote:
>>>
>>> 
>>>
>>>
>>>

 I think everyone agrees here, but…. these variations are still
> catching failures, and until we have an improvement or replacement we
> do rely on them.   I'm not in favour of removing them until we have
> proof /confidence that any replacement is catching the same failures.
> Especially oa, tries, vnodes. (Not tries and offheap is being
> replaced with "latest", which will be valuable simplification.)


 What kind of proof do you expect? I cannot imagine how we could prove
 that because the ability of detecting failures results from the randomness
 of those tests. That's why when such a test fail you usually cannot
 reproduce that easily.

>>>
>>>
>>> Unit tests that fail consistently but only on one configuration, should
>>> not be removed/replaced until the replacement also catches the failure.
>>>
>>>
>>>
 We could extrapolate that to - why we only have those configurations?
 why don't test trie / oa + compression, or CDC, or system memtable?

>>>
>>>
>>> Because, along the way, people have decided a certain configuration
>>> deserves additional testing and it has been done this way in lieu of any
>>> other more efficient approach.
>>>
>>>
>>>


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-12 Thread Benedict
Could you give (or link to) some examples of how this would actually benefit our test suites?On 12 Dec 2023, at 10:51, Jacek Lewandowski  wrote:I have two major pros for JUnit 5:- much better support for parameterized tests- global test hooks (automatically detectable extensions) + multi-inheritancepon., 11 gru 2023 o 13:38 Benedict  napisał(a):Why do we want to move to JUnit 5? I’m generally opposed to churn unless well justified, which it may be - just not immediately obvious to me.On 11 Dec 2023, at 08:33, Jacek Lewandowski  wrote:Nobody referred so far to the idea of moving to JUnit 5, what are the opinions?niedz., 10 gru 2023 o 11:03 Benedict  napisał(a):Alex’s suggestion was that we meta randomise, ie we randomise the config parameters to gain better rather than lesser coverage overall. This means we cover these specific configs and more - just not necessarily on any single commit.I strongly endorse this approach over the status quo.On 8 Dec 2023, at 13:26, Mick Semb Wever  wrote:     I think everyone agrees here, but…. these variations are still catching failures, and until we have an improvement or replacement we do rely on them.   I'm not in favour of removing them until we have proof /confidence that any replacement is catching the same failures.  Especially oa, tries, vnodes. (Not tries and offheap is being replaced with "latest", which will be valuable simplification.)  What kind of proof do you expect? I cannot imagine how we could prove that because the ability of detecting failures results from the randomness of those tests. That's why when such a test fail you usually cannot reproduce that easily. Unit tests that fail consistently but only on one configuration, should not be removed/replaced until the replacement also catches the failure. We could extrapolate that to - why we only have those configurations? why don't test trie / oa + compression, or CDC, or system memtable? Because, along the way, people have decided a certain configuration deserves additional testing and it has been done this way in lieu of any other more efficient approach.




Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-12 Thread Jacek Lewandowski
I have two major pros for JUnit 5:
- much better support for parameterized tests
- global test hooks (automatically detectable extensions) +
multi-inheritance




pon., 11 gru 2023 o 13:38 Benedict  napisał(a):

> Why do we want to move to JUnit 5?
>
> I’m generally opposed to churn unless well justified, which it may be -
> just not immediately obvious to me.
>
> On 11 Dec 2023, at 08:33, Jacek Lewandowski 
> wrote:
>
> 
> Nobody referred so far to the idea of moving to JUnit 5, what are the
> opinions?
>
>
>
> niedz., 10 gru 2023 o 11:03 Benedict  napisał(a):
>
>> Alex’s suggestion was that we meta randomise, ie we randomise the config
>> parameters to gain better rather than lesser coverage overall. This means
>> we cover these specific configs and more - just not necessarily on any
>> single commit.
>>
>> I strongly endorse this approach over the status quo.
>>
>> On 8 Dec 2023, at 13:26, Mick Semb Wever  wrote:
>>
>> 
>>
>>
>>
>>>
>>> I think everyone agrees here, but…. these variations are still catching
 failures, and until we have an improvement or replacement we do rely
 on them.   I'm not in favour of removing them until we have proof
 /confidence that any replacement is catching the same failures.  Especially
 oa, tries, vnodes. (Not tries and offheap is being replaced with
 "latest", which will be valuable simplification.)
>>>
>>>
>>> What kind of proof do you expect? I cannot imagine how we could prove
>>> that because the ability of detecting failures results from the randomness
>>> of those tests. That's why when such a test fail you usually cannot
>>> reproduce that easily.
>>>
>>
>>
>> Unit tests that fail consistently but only on one configuration, should
>> not be removed/replaced until the replacement also catches the failure.
>>
>>
>>
>>> We could extrapolate that to - why we only have those configurations?
>>> why don't test trie / oa + compression, or CDC, or system memtable?
>>>
>>
>>
>> Because, along the way, people have decided a certain configuration
>> deserves additional testing and it has been done this way in lieu of any
>> other more efficient approach.
>>
>>
>>


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-11 Thread Benedict
Why do we want to move to JUnit 5? I’m generally opposed to churn unless well justified, which it may be - just not immediately obvious to me.On 11 Dec 2023, at 08:33, Jacek Lewandowski  wrote:Nobody referred so far to the idea of moving to JUnit 5, what are the opinions?niedz., 10 gru 2023 o 11:03 Benedict  napisał(a):Alex’s suggestion was that we meta randomise, ie we randomise the config parameters to gain better rather than lesser coverage overall. This means we cover these specific configs and more - just not necessarily on any single commit.I strongly endorse this approach over the status quo.On 8 Dec 2023, at 13:26, Mick Semb Wever  wrote:     I think everyone agrees here, but…. these variations are still catching failures, and until we have an improvement or replacement we do rely on them.   I'm not in favour of removing them until we have proof /confidence that any replacement is catching the same failures.  Especially oa, tries, vnodes. (Not tries and offheap is being replaced with "latest", which will be valuable simplification.)  What kind of proof do you expect? I cannot imagine how we could prove that because the ability of detecting failures results from the randomness of those tests. That's why when such a test fail you usually cannot reproduce that easily. Unit tests that fail consistently but only on one configuration, should not be removed/replaced until the replacement also catches the failure. We could extrapolate that to - why we only have those configurations? why don't test trie / oa + compression, or CDC, or system memtable? Because, along the way, people have decided a certain configuration deserves additional testing and it has been done this way in lieu of any other more efficient approach.



Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-11 Thread Jacek Lewandowski
Nobody referred so far to the idea of moving to JUnit 5, what are the
opinions?



niedz., 10 gru 2023 o 11:03 Benedict  napisał(a):

> Alex’s suggestion was that we meta randomise, ie we randomise the config
> parameters to gain better rather than lesser coverage overall. This means
> we cover these specific configs and more - just not necessarily on any
> single commit.
>
> I strongly endorse this approach over the status quo.
>
> On 8 Dec 2023, at 13:26, Mick Semb Wever  wrote:
>
> 
>
>
>
>>
>> I think everyone agrees here, but…. these variations are still catching
>>> failures, and until we have an improvement or replacement we do rely on
>>> them.   I'm not in favour of removing them until we have proof /confidence
>>> that any replacement is catching the same failures.  Especially oa, tries,
>>> vnodes. (Not tries and offheap is being replaced with "latest", which
>>> will be valuable simplification.)
>>
>>
>> What kind of proof do you expect? I cannot imagine how we could prove
>> that because the ability of detecting failures results from the randomness
>> of those tests. That's why when such a test fail you usually cannot
>> reproduce that easily.
>>
>
>
> Unit tests that fail consistently but only on one configuration, should
> not be removed/replaced until the replacement also catches the failure.
>
>
>
>> We could extrapolate that to - why we only have those configurations? why
>> don't test trie / oa + compression, or CDC, or system memtable?
>>
>
>
> Because, along the way, people have decided a certain configuration
> deserves additional testing and it has been done this way in lieu of any
> other more efficient approach.
>
>
>


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-10 Thread Benedict
Alex’s suggestion was that we meta randomise, ie we randomise the config 
parameters to gain better rather than lesser coverage overall. This means we 
cover these specific configs and more - just not necessarily on any single 
commit.

I strongly endorse this approach over the status quo.

> On 8 Dec 2023, at 13:26, Mick Semb Wever  wrote:
> 
> 
>  
>  
>  
>> 
>>> I think everyone agrees here, but…. these variations are still catching 
>>> failures, and until we have an improvement or replacement we do rely on 
>>> them.   I'm not in favour of removing them until we have proof /confidence 
>>> that any replacement is catching the same failures.  Especially oa, tries, 
>>> vnodes. (Not tries and offheap is being replaced with "latest", which will 
>>> be valuable simplification.)  
>> 
>> What kind of proof do you expect? I cannot imagine how we could prove that 
>> because the ability of detecting failures results from the randomness of 
>> those tests. That's why when such a test fail you usually cannot reproduce 
>> that easily.
> 
> 
> Unit tests that fail consistently but only on one configuration, should not 
> be removed/replaced until the replacement also catches the failure.
> 
>  
>> We could extrapolate that to - why we only have those configurations? why 
>> don't test trie / oa + compression, or CDC, or system memtable? 
> 
> 
> Because, along the way, people have decided a certain configuration deserves 
> additional testing and it has been done this way in lieu of any other more 
> efficient approach.
> 


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-08 Thread Josh McKenzie
> Unit tests that fail consistently but only on one configuration, should not 
> be removed/replaced until the replacement also catches the failure.

> along the way, people have decided a certain configuration deserves 
> additional testing and it has been done this way in lieu of any other more 
> efficient approach.

Totally agree with these sentiments as well as the framing of our current unit 
tests as "bad fuzz-tests thanks to non-determinism".

To me, this reinforces my stance on a "pre-commit vs. post-commit" approach to 
testing *with our current constraints:*
 • Test the default configuration on all supported JDK's pre-commit
 • Post-commit, treat *consistent *failures on non-default configurations as 
immediate interrupts to the author that introduced them
 • Pre-release, push for no consistent failures on any suite in any 
configuration, and no regressions in flaky tests from prior release (in ASF CI 
env).
I think there's value in having the non-default configurations, but I'm not 
convinced the benefits outweigh the costs *specifically in terms of pre-commit 
work* due to flakiness in the execution of the software env itself, not to 
mention hardware env variance on the ASF side today.

All that said - if we got to a world where we could run our jvm-based tests 
deterministically within the simulator, my intuition is that we'd see a lot of 
the test-specific, non-defect flakiness reduced drastically. In such a world 
I'd be in favor of running :allthethings: pre-commit as we'd have *much* higher 
confidence that those failures were actually attributable to the author of 
whatever diff the test is run against. 

On Fri, Dec 8, 2023, at 8:25 AM, Mick Semb Wever wrote:
>  
>  
>  
>> 
>>> I think everyone agrees here, but…. these variations are still catching 
>>> failures, and until we have an improvement or replacement we do rely on 
>>> them.   I'm not in favour of removing them until we have proof /confidence 
>>> that any replacement is catching the same failures.  Especially oa, tries, 
>>> vnodes. (Not tries and offheap is being replaced with "latest", which will 
>>> be valuable simplification.)  
>> 
>> What kind of proof do you expect? I cannot imagine how we could prove that 
>> because the ability of detecting failures results from the randomness of 
>> those tests. That's why when such a test fail you usually cannot reproduce 
>> that easily.
> 
> 
> Unit tests that fail consistently but only on one configuration, should not 
> be removed/replaced until the replacement also catches the failure.
>  
>> We could extrapolate that to - why we only have those configurations? why 
>> don't test trie / oa + compression, or CDC, or system memtable? 
> 
> 
> Because, along the way, people have decided a certain configuration deserves 
> additional testing and it has been done this way in lieu of any other more 
> efficient approach.
> 
> 
> 


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-08 Thread Mick Semb Wever
>
> I think everyone agrees here, but…. these variations are still catching
>> failures, and until we have an improvement or replacement we do rely on
>> them.   I'm not in favour of removing them until we have proof /confidence
>> that any replacement is catching the same failures.  Especially oa, tries,
>> vnodes. (Not tries and offheap is being replaced with "latest", which
>> will be valuable simplification.)
>
>
> What kind of proof do you expect? I cannot imagine how we could prove that
> because the ability of detecting failures results from the randomness of
> those tests. That's why when such a test fail you usually cannot reproduce
> that easily.
>


Unit tests that fail consistently but only on one configuration, should not
be removed/replaced until the replacement also catches the failure.



> We could extrapolate that to - why we only have those configurations? why
> don't test trie / oa + compression, or CDC, or system memtable?
>


Because, along the way, people have decided a certain configuration
deserves additional testing and it has been done this way in lieu of any
other more efficient approach.


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-08 Thread Jacek Lewandowski
>
> It would be great to setup a JUnitRunner using the simulator and find out
> though.
>

I like this idea - this is what I meant when asking about the current unit
tests - to me, a test is either simulation or a fuzz. Due to pretty random
execution order of unit tests, all of them can be considered really
unrobust fuzz tests, implemented with the intention to be simulation tests
(with exact execution order, testing a very specific behaviour).

I think everyone agrees here, but…. these variations are still catching
> failures, and until we have an improvement or replacement we do rely on
> them.   I'm not in favour of removing them until we have proof /confidence
> that any replacement is catching the same failures.  Especially oa, tries,
> vnodes. (Not tries and offheap is being replaced with "latest", which
> will be valuable simplification.)


What kind of proof do you expect? I cannot imagine how we could prove that
because the ability of detecting failures results from the randomness of
those tests. That's why when such a test fail you usually cannot reproduce
that easily. We could extrapolate that to - why we only have those
configurations? why don't test trie / oa + compression, or CDC, or system
memtable? Each random run of a any test can find a new problem. I'm in
favour of parametrizing the "clients" of a certain feature - like
parameterize storage engine tests, streaming and tools tests against
different sstable formats; though it make no sense to parameterize gossip
tests, utility classes tests or dedicated test for certain storage
implementations .



pt., 8 gru 2023 o 07:51 Alex Petrov  napisał(a):

> My logic here was that CQLTester tests would probably be the best
> candidate as they are largely single-threaded and single-node. I'm sure
> there are background processes that might slow things down when serialised
> into a single execution thread, but my expectation would be that it will
> not be as significant as with other tests such as multinode in-jvm dtests.
>
> On Thu, Dec 7, 2023, at 7:44 PM, Benedict wrote:
>
>
> I think the biggest impediment to that is that most tests are probably not
> sufficiently robust for simulation. If things happen in a surprising order
> many tests fail, as they implicitly rely on the normal timing of things.
>
> Another issue is that the simulator does potentially slow things down a
> little at the moment. Not sure what the impact would be overall.
>
> It would be great to setup a JUnitRunner using the simulator and find out
> though.
>
>
> On 7 Dec 2023, at 15:43, Alex Petrov  wrote:
>
> 
> We have been extensively using simulator for TCM, and I think we have make
> simulator tests more approachable. I think many of the existing tests
> should be ran under simulator instead of CQLTester, for example. This will
> both strengthen the simulator, and make things better in terms of
> determinism. Of course not to say that CQLTester tests are the biggest
> beneficiary there.
>
> On Thu, Dec 7, 2023, at 4:09 PM, Benedict wrote:
>
> To be fair, the lack of coherent framework doesn’t mean we can’t merge
> them from a naming perspective. I don’t mind losing one of burn or fuzz,
> and merging them.
>
> Today simulator tests are kept under the simulator test tree but that
> primarily exists for the simulator itself and testing it. It’s quite a
> complex source tree, as you might expect, and it exists primarily for
> managing its own complexity. It might make sense to bring the Paxos and
> Accord simulator entry points out into the burn/fuzz trees, though not sure
> it’s all that important.
>
>
> > On 7 Dec 2023, at 15:05, Benedict  wrote:
> >
> > Yes, the only system/real-time timeout is a progress one, wherein if
> nothing happens for ten minutes we assume the simulation has locked up.
> Hitting this is indicative of a bug, and the timeout is so long that no
> realistic system variability could trigger it.
> >
> >> On 7 Dec 2023, at 14:56, Brandon Williams  wrote:
> >>
> >> On Thu, Dec 7, 2023 at 8:50 AM Alex Petrov  wrote:
>  I've noticed many "sleeps" in the tests - is it possible with
> simulation tests to artificially move the clock forward by, say, 5 seconds
> instead of sleeping just to test, for example whether TTL works?)
> >>>
> >>> Yes, simulator will skip the sleep and do a simulated sleep with a
> simulated clock instead.
> >>
> >> Since it uses an artificial clock, does this mean that the simulator
> >> is also impervious to timeouts caused by the underlying environment?
> >>
> >> Kind Regards,
> >> Brandon
>
>
>
>
>


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-07 Thread Alex Petrov
My logic here was that CQLTester tests would probably be the best candidate as 
they are largely single-threaded and single-node. I'm sure there are background 
processes that might slow things down when serialised into a single execution 
thread, but my expectation would be that it will not be as significant as with 
other tests such as multinode in-jvm dtests. 

On Thu, Dec 7, 2023, at 7:44 PM, Benedict wrote:
> 
> I think the biggest impediment to that is that most tests are probably not 
> sufficiently robust for simulation. If things happen in a surprising order 
> many tests fail, as they implicitly rely on the normal timing of things.
> 
> Another issue is that the simulator does potentially slow things down a 
> little at the moment. Not sure what the impact would be overall.
> 
> It would be great to setup a JUnitRunner using the simulator and find out 
> though.
> 
> 
>> On 7 Dec 2023, at 15:43, Alex Petrov  wrote:
>> 
>> We have been extensively using simulator for TCM, and I think we have make 
>> simulator tests more approachable. I think many of the existing tests should 
>> be ran under simulator instead of CQLTester, for example. This will both 
>> strengthen the simulator, and make things better in terms of determinism. Of 
>> course not to say that CQLTester tests are the biggest beneficiary there.
>> 
>> On Thu, Dec 7, 2023, at 4:09 PM, Benedict wrote:
>>> To be fair, the lack of coherent framework doesn’t mean we can’t merge them 
>>> from a naming perspective. I don’t mind losing one of burn or fuzz, and 
>>> merging them.
>>> 
>>> Today simulator tests are kept under the simulator test tree but that 
>>> primarily exists for the simulator itself and testing it. It’s quite a 
>>> complex source tree, as you might expect, and it exists primarily for 
>>> managing its own complexity. It might make sense to bring the Paxos and 
>>> Accord simulator entry points out into the burn/fuzz trees, though not sure 
>>> it’s all that important.
>>> 
>>> 
>>> > On 7 Dec 2023, at 15:05, Benedict  wrote:
>>> > 
>>> > Yes, the only system/real-time timeout is a progress one, wherein if 
>>> > nothing happens for ten minutes we assume the simulation has locked up. 
>>> > Hitting this is indicative of a bug, and the timeout is so long that no 
>>> > realistic system variability could trigger it.
>>> > 
>>> >> On 7 Dec 2023, at 14:56, Brandon Williams  wrote:
>>> >> 
>>> >> On Thu, Dec 7, 2023 at 8:50 AM Alex Petrov  wrote:
>>>  I've noticed many "sleeps" in the tests - is it possible with 
>>>  simulation tests to artificially move the clock forward by, say, 5 
>>>  seconds instead of sleeping just to test, for example whether TTL 
>>>  works?)
>>> >>> 
>>> >>> Yes, simulator will skip the sleep and do a simulated sleep with a 
>>> >>> simulated clock instead.
>>> >> 
>>> >> Since it uses an artificial clock, does this mean that the simulator
>>> >> is also impervious to timeouts caused by the underlying environment?
>>> >> 
>>> >> Kind Regards,
>>> >> Brandon
>>> 
>>> 
>> 


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-07 Thread Benedict
I think the biggest impediment to that is that most tests are probably not sufficiently robust for simulation. If things happen in a surprising order many tests fail, as they implicitly rely on the normal timing of things.Another issue is that the simulator does potentially slow things down a little at the moment. Not sure what the impact would be overall.It would be great to setup a JUnitRunner using the simulator and find out though.On 7 Dec 2023, at 15:43, Alex Petrov  wrote:We have been extensively using simulator for TCM, and I think we have make simulator tests more approachable. I think many of the existing tests should be ran under simulator instead of CQLTester, for example. This will both strengthen the simulator, and make things better in terms of determinism. Of course not to say that CQLTester tests are the biggest beneficiary there.On Thu, Dec 7, 2023, at 4:09 PM, Benedict wrote:To be fair, the lack of coherent framework doesn’t mean we can’t merge them from a naming perspective. I don’t mind losing one of burn or fuzz, and merging them.Today simulator tests are kept under the simulator test tree but that primarily exists for the simulator itself and testing it. It’s quite a complex source tree, as you might expect, and it exists primarily for managing its own complexity. It might make sense to bring the Paxos and Accord simulator entry points out into the burn/fuzz trees, though not sure it’s all that important.> On 7 Dec 2023, at 15:05, Benedict  wrote:> > Yes, the only system/real-time timeout is a progress one, wherein if nothing happens for ten minutes we assume the simulation has locked up. Hitting this is indicative of a bug, and the timeout is so long that no realistic system variability could trigger it.> >> On 7 Dec 2023, at 14:56, Brandon Williams  wrote:>> >> On Thu, Dec 7, 2023 at 8:50 AM Alex Petrov  wrote: I've noticed many "sleeps" in the tests - is it possible with simulation tests to artificially move the clock forward by, say, 5 seconds instead of sleeping just to test, for example whether TTL works?)>>> >>> Yes, simulator will skip the sleep and do a simulated sleep with a simulated clock instead.>> >> Since it uses an artificial clock, does this mean that the simulator>> is also impervious to timeouts caused by the underlying environment?>> >> Kind Regards,>> Brandon

Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-07 Thread Alex Petrov
We have been extensively using simulator for TCM, and I think we have make 
simulator tests more approachable. I think many of the existing tests should be 
ran under simulator instead of CQLTester, for example. This will both 
strengthen the simulator, and make things better in terms of determinism. Of 
course not to say that CQLTester tests are the biggest beneficiary there.

On Thu, Dec 7, 2023, at 4:09 PM, Benedict wrote:
> To be fair, the lack of coherent framework doesn’t mean we can’t merge them 
> from a naming perspective. I don’t mind losing one of burn or fuzz, and 
> merging them.
> 
> Today simulator tests are kept under the simulator test tree but that 
> primarily exists for the simulator itself and testing it. It’s quite a 
> complex source tree, as you might expect, and it exists primarily for 
> managing its own complexity. It might make sense to bring the Paxos and 
> Accord simulator entry points out into the burn/fuzz trees, though not sure 
> it’s all that important.
> 
> 
> > On 7 Dec 2023, at 15:05, Benedict  wrote:
> > 
> > Yes, the only system/real-time timeout is a progress one, wherein if 
> > nothing happens for ten minutes we assume the simulation has locked up. 
> > Hitting this is indicative of a bug, and the timeout is so long that no 
> > realistic system variability could trigger it.
> > 
> >> On 7 Dec 2023, at 14:56, Brandon Williams  wrote:
> >> 
> >> On Thu, Dec 7, 2023 at 8:50 AM Alex Petrov  wrote:
>  I've noticed many "sleeps" in the tests - is it possible with simulation 
>  tests to artificially move the clock forward by, say, 5 seconds instead 
>  of sleeping just to test, for example whether TTL works?)
> >>> 
> >>> Yes, simulator will skip the sleep and do a simulated sleep with a 
> >>> simulated clock instead.
> >> 
> >> Since it uses an artificial clock, does this mean that the simulator
> >> is also impervious to timeouts caused by the underlying environment?
> >> 
> >> Kind Regards,
> >> Brandon
> 
> 


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-07 Thread Benedict
To be fair, the lack of coherent framework doesn’t mean we can’t merge them 
from a naming perspective. I don’t mind losing one of burn or fuzz, and merging 
them.

Today simulator tests are kept under the simulator test tree but that primarily 
exists for the simulator itself and testing it. It’s quite a complex source 
tree, as you might expect, and it exists primarily for managing its own 
complexity. It might make sense to bring the Paxos and Accord simulator entry 
points out into the burn/fuzz trees, though not sure it’s all that important.


> On 7 Dec 2023, at 15:05, Benedict  wrote:
> 
> Yes, the only system/real-time timeout is a progress one, wherein if nothing 
> happens for ten minutes we assume the simulation has locked up. Hitting this 
> is indicative of a bug, and the timeout is so long that no realistic system 
> variability could trigger it.
> 
>> On 7 Dec 2023, at 14:56, Brandon Williams  wrote:
>> 
>> On Thu, Dec 7, 2023 at 8:50 AM Alex Petrov  wrote:
 I've noticed many "sleeps" in the tests - is it possible with simulation 
 tests to artificially move the clock forward by, say, 5 seconds instead of 
 sleeping just to test, for example whether TTL works?)
>>> 
>>> Yes, simulator will skip the sleep and do a simulated sleep with a 
>>> simulated clock instead.
>> 
>> Since it uses an artificial clock, does this mean that the simulator
>> is also impervious to timeouts caused by the underlying environment?
>> 
>> Kind Regards,
>> Brandon



Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-07 Thread Benedict
Yes, the only system/real-time timeout is a progress one, wherein if nothing 
happens for ten minutes we assume the simulation has locked up. Hitting this is 
indicative of a bug, and the timeout is so long that no realistic system 
variability could trigger it.

> On 7 Dec 2023, at 14:56, Brandon Williams  wrote:
> 
> On Thu, Dec 7, 2023 at 8:50 AM Alex Petrov  wrote:
>>> I've noticed many "sleeps" in the tests - is it possible with simulation 
>>> tests to artificially move the clock forward by, say, 5 seconds instead of 
>>> sleeping just to test, for example whether TTL works?)
>> 
>> Yes, simulator will skip the sleep and do a simulated sleep with a simulated 
>> clock instead.
> 
> Since it uses an artificial clock, does this mean that the simulator
> is also impervious to timeouts caused by the underlying environment?
> 
> Kind Regards,
> Brandon



Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-07 Thread Brandon Williams
On Thu, Dec 7, 2023 at 8:50 AM Alex Petrov  wrote:
> > I've noticed many "sleeps" in the tests - is it possible with simulation 
> > tests to artificially move the clock forward by, say, 5 seconds instead of 
> > sleeping just to test, for example whether TTL works?)
>
> Yes, simulator will skip the sleep and do a simulated sleep with a simulated 
> clock instead.

Since it uses an artificial clock, does this mean that the simulator
is also impervious to timeouts caused by the underlying environment?

Kind Regards,
Brandon


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-07 Thread Alex Petrov
> We should get rid of long-running unit tests altogether. They should run 
> faster or be split.

I think we just need to evaluate on a case-by-case basis. Some tests are bad 
and need to go. And we need other/better ones to replace them. I am 
deliberately not making examples here both to avoid controversy and to 
highlight this will be a long process. 

> I'm still confused about the distinction between burn and fuzz tests - it 
> seems to me that fuzz tests are just modern burn tests - should we refactor 
> the existing burn tests to use the new framework?

At the particular moment we do not have a corerent generator framework. We have 
like 15 different ways to generate data and run tests. We need to evaluate, and 
bring them together.

> 3. Simulation tests - since you say they provide a way to execute a test 
> deterministically, it should be a property of unit tests - well, a unit test 
> is either deterministic or a fuzz test.

Unit tests do not have a guarantee of determinism. The fact that you have 
determinism from perspective of API (i.e. it is driven by a single thread), has 
no implications about the behaviour of the system. Simulator guarantees all 
executions, also concurrent ones, are fully deterministic, including messaging, 
executors, threads, delays, timeouts, etc. 

> I've noticed many "sleeps" in the tests - is it possible with simulation 
> tests to artificially move the clock forward by, say, 5 seconds instead of 
> sleeping just to test, for example whether TTL works?)

Yes, simulator will skip the sleep and do a simulated sleep with a simulated 
clock instead. 

> Also, as we start refactoring the tests, it will be an excellent opportunity 
> to move to JUnit 5.

I am working on bringing Harry in-tree. I will need many reviewers and 
collaborators for making the test suite more powerful and coherent. It would be 
nice to be able to have a bit more lenience and flexibility and shorter 
tunarounds when we deal with tests, at least in early phases.

Thank you for the interest in the subject, I think we need to do a lot here.




On Fri, Dec 1, 2023, at 1:31 PM, Jacek Lewandowski wrote:
> Thanks for the exhaustive response, Alex :)
> 
> Let me bring my point of view:
> 
> 1. Since long tests are just unit tests that take a long time to run, it 
> makes sense to separate them for efficient parallelization in CI. Since we 
> are adding new tests, modifying the existing ones, etc., that should be 
> something maintainable; otherwise, the distinction makes no sense to me. For 
> example - adjust timeouts on CI to 1 minute per test class for "short" tests 
> and more for "long" tests. To satisfy CI, the contributor will have to either 
> make the test run faster or move it to the "long" tests. The opposite 
> enforcement could be more difficult, though it is doable as well - failing 
> the "long" test if it takes too little time and should be qualified as a 
> regular unit test. As I'm reading what I've just written, it sounds stupid :/ 
> We should get rid of long-running unit tests altogether. They should run 
> faster or be split.
> 
> 2. I'm still confused about the distinction between burn and fuzz tests - it 
> seems to me that fuzz tests are just modern burn tests - should we refactor 
> the existing burn tests to use the new framework?
> 
> 3. Simulation tests - since you say they provide a way to execute a test 
> deterministically, it should be a property of unit tests - well, a unit test 
> is either deterministic or a fuzz test. Is the simulation framework usable 
> for CQLTester-based tests? (side question here: I've noticed many "sleeps" in 
> the tests - is it possible with simulation tests to artificially move the 
> clock forward by, say, 5 seconds instead of sleeping just to test, for 
> example whether TTL works?)
> 
> 4. Yeah, running a complete suite for each artificially crafted configuration 
> brings little value compared to the maintenance and infrastructure costs. It 
> feels like we are running all tests a bit blindly, hoping we catch something 
> accidentally. I agree this is not the purpose of the unit tests and should be 
> covered instead by fuzz. For features like CDC, compression, different 
> sstable formats, trie memtable, commit log compression/encryption, system 
> directory keyspace, etc... we should have dedicated tests that verify just 
> that functionality
> 
> With more or more functionality offered by Cassandra, they will become a 
> significant pain shortly. Let's start thinking about concrete actions. 
> 
> Also, as we start refactoring the tests, it will be an excellent opportunity 
> to move to JUnit 5.
> 
> thanks,
> Jacek


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-02 Thread Mick Semb Wever
>
> 1. Since long tests are just unit tests that take a long time to run,
>


Yes, they are just "resource intensive" tests, on par to the "large" python
dtests.  they require more machine specs to run.
They are great candidates to improve so they don't require additional
resources, but many often value and cannot.



> 2. I'm still confused about the distinction between burn and fuzz tests -
> it seems to me that fuzz tests are just modern burn tests - should we
> refactor the existing burn tests to use the new framework?
>


Burn are not really tests that belong in the CI pipeline. We only run them
in the CI pipeline to validate that they can still compile and run.  So we
only need to run them for an absolute minimum amount of time.  Maybe it
would be nice if they were part of the checks stage instead of being their
own test type.



> 4. Yeah, running a complete suite for each artificially crafted
> configuration brings little value compared to the maintenance and
> infrastructure costs. It feels like we are running all tests a bit blindly,
> hoping we catch something accidentally. I agree this is not the purpose of
> the unit tests and should be covered instead by fuzz. For features like
> CDC, compression, different sstable formats, trie memtable, commit log
> compression/encryption, system directory keyspace, etc... we should have
> dedicated tests that verify just that functionality
>


I think everyone agrees here, but…. these variations are still catching
failures, and until we have an improvement or replacement we do rely on
them.   I'm not in favour of removing them until we have proof /confidence
that any replacement is catching the same failures.  Especially oa, tries,
vnodes. (Not tries and offheap is being replaced with "latest", which will
be valuable simplification.)

Dedicated unit tests may also be parameterised tests with a base
parameterisation that extends off on analysis of what a patch touches…


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-01 Thread Jacek Lewandowski
Thanks for the exhaustive response, Alex :)

Let me bring my point of view:

1. Since long tests are just unit tests that take a long time to run, it
makes sense to separate them for efficient parallelization in CI. Since we
are adding new tests, modifying the existing ones, etc., that should be
something maintainable; otherwise, the distinction makes no sense to me.
For example - adjust timeouts on CI to 1 minute per test class for "short"
tests and more for "long" tests. To satisfy CI, the contributor will have
to either make the test run faster or move it to the "long" tests. The
opposite enforcement could be more difficult, though it is doable as well -
failing the "long" test if it takes too little time and should be qualified
as a regular unit test. As I'm reading what I've just written, it sounds
stupid :/ We should get rid of long-running unit tests altogether. They
should run faster or be split.

2. I'm still confused about the distinction between burn and fuzz tests -
it seems to me that fuzz tests are just modern burn tests - should we
refactor the existing burn tests to use the new framework?

3. Simulation tests - since you say they provide a way to execute a test
deterministically, it should be a property of unit tests - well, a unit
test is either deterministic or a fuzz test. Is the simulation framework
usable for CQLTester-based tests? (side question here: I've noticed many
"sleeps" in the tests - is it possible with simulation tests to
artificially move the clock forward by, say, 5 seconds instead of sleeping
just to test, for example whether TTL works?)

4. Yeah, running a complete suite for each artificially crafted
configuration brings little value compared to the maintenance and
infrastructure costs. It feels like we are running all tests a bit blindly,
hoping we catch something accidentally. I agree this is not the purpose of
the unit tests and should be covered instead by fuzz. For features like
CDC, compression, different sstable formats, trie memtable, commit log
compression/encryption, system directory keyspace, etc... we should have
dedicated tests that verify just that functionality

With more or more functionality offered by Cassandra, they will become a
significant pain shortly. Let's start thinking about concrete actions.

Also, as we start refactoring the tests, it will be an excellent
opportunity to move to JUnit 5.

thanks,
Jacek


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-11-30 Thread Alex Petrov
I will try to resopnd, but please keep in mind that all these terms are 
somewhat contextual. 

I think long and burn tests are somewhat synonymous. But most long/burn tests 
that we have in-tree aren't actually that long. They are just long compared to 
the unit tests. I personally would call the test long when it runs for hours at 
least, but realistically for days. 

Fuzz tests are randomised tests that attempt to find issues in the system under 
test. Most of fuzz tests we wrote using Harry are also property-based: they are 
using a model checker to simulate an internal state of the system and check its 
responses with a simplified representation.

Simulator tests are just tests that use our simulator framework, that executes 
a test against a cluster of nodes deterministically by fully serialising all of 
its operations. We also have a bunch of smaller simulations that simulate 
different scenarios: bounces, metadata changes, etc, without actually starting 
the cluster. Those are not simulator tests though. I have also used the word 
"simulate" in the context of model-checking, but also mostly to illustrate that 
it's all context-dependent. 

I personally believe that many tests, and test pipelines can (and probably 
should) be deprecated. But last time I brought this up, there was a bit of 
pushback, so I think before we can consider deprecation of tests that we think 
are redundant we will have to substantially improve adoption of the tools that 
allow better multiplexing.

As regards configurations, I do not think it is necessary to re-run an entire 
set of u/d/injvm tests with vnode/trie/etc configurations, and instead these 
scenarios should be exercised by config permuation using a fuzzer. As 
experience (and several recent issues particularly) show - some important 
settings are never touched by any of the tests at all, and since tests are 
static, a chance of finding any issues with some combination of those is slim. 

Apart from what we already have (data and schema generators and failure 
injection), we now need configuration generators that will find interesting 
configurations and run randomly generated workflows against those, expecting 
any configuration of Cassandra to behave the same. 

I do find our test matrix a bit convoluted, and in my experience you spend way 
more time tweaking tests to work for all configurations after some code 
changes, and they find legimitate issues rather infrequently. We would probably 
be better off with a quick "sanity check" for major configurations per commit 
which, again, would exercise a common set of operations, combined with a 
comprehensive test suite which will try to cover as much ground as possible.

Hope this helps.
--Alex


On Thu, Nov 30, 2023, at 10:25 AM, Jacek Lewandowski wrote:
> Hi,
> 
> I'm getting a bit lost - what are the exact differences between those test 
> scenarios? What are the criteria for qualifying a test to be part of a 
> certain scenario?
> 
> I'm working a little bit with tests and build scripts and the number of 
> different configurations for which we have a separate target in the build 
> starts to be problematic, I cannot imagine how problematic it is for a new 
> contributor.
> 
> It is not urgent, but we should at least have a plan on how to simplify and 
> unify things.
> 
> I'm in favour of reducing the number of test targets to the minimum - for 
> different configurations I think we should provide a parameter pointing to 
> jvm options file and maybe to cassandra.yaml. I know that we currently do 
> some super hacky things with cassandra yaml for different configs - like 
> concatenting parts of it. I presume it is not necessary - we can have a 
> default test config yaml and a directory with overriding yamls; while 
> building we could have a tool which is able to load the default 
> configuration, apply the override and save the resulting yaml somewhere in 
> the build/test/configs for example. That would allows us to easily use those 
> yamls in IDE as well - currently it is impossible.
> 
> What do you think?
> 
> Thank you and my apologize for bothering about lower priority stuff while we 
> have a 5.0 release headache...
> 
> Jacek
> 


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-11-30 Thread Benedict
I don’t know - I’m not sure what fuzz test means in this context. It’s a newer concept that I didn’t introduce.On 30 Nov 2023, at 20:06, Jacek Lewandowski  wrote:How those burn tests then compare to the fuzz tests? (the new ones)czw., 30 lis 2023, 20:22 użytkownik Benedict  napisał:By “could run indefinitely” I don’t mean by default they run forever. There will be parameters that change how much work is done for a given run, but just running repeatedly (each time with a different generated seeds) is the expected usage. Until you run out of compute or patience.I agree they are only of value pre-commit to check they haven’t been broken in some way by changes. On 30 Nov 2023, at 18:36, Josh McKenzie  wrote:that may be long-running and that could be run indefinitelyPerfect. That was the distinction I wasn't aware of. Also means having the burn target as part of regular CI runs is probably a mistake, yes? i.e. if someone adds a burn tests that runs indefinitely, are there any guardrails or built-in checks or timeouts to keep it from running right up to job timeout and then failing?On Thu, Nov 30, 2023, at 1:11 PM, Benedict wrote:A burn test is a randomised test targeting broad coverage of a single system, subsystem or utility, that may be long-running and that could be run indefinitely, each run providing incrementally more assurance of quality of the system.A long test is a unit test that sometimes takes a long time to run, no more no less. I’m not sure any of these offer all that much value anymore, and perhaps we could look to deprecate them.On 30 Nov 2023, at 17:20, Josh McKenzie  wrote:Strongly agree. I started working on a declarative refactor out of our CI configuration so circle, ASFCI, and other systems could inherit from it (for instance, see pre-commit pipeline declaration here); I had to set that down while I finished up implementing an internal CI system since the code in neither the ASF CI structure nor circle structure (.sh embedded in .yml /cry) was re-usable in their current form.Having a jvm.options and cassandra.yaml file per suite and referencing them from a declarative job definition would make things a lot easier to wrap our heads around and maintain I think.As for what qualifies as burn vs. long... /shrug couldn't tell you. Would have to go down the git blame + dev ML + JIRA rabbit hole. :) Maybe someone else on-list knows.On Thu, Nov 30, 2023, at 4:25 AM, Jacek Lewandowski wrote:Hi,I'm getting a bit lost - what are the exact differences between those test scenarios? What are the criteria for qualifying a test to be part of a certain scenario?I'm working a little bit with tests and build scripts and the number of different configurations for which we have a separate target in the build starts to be problematic, I cannot imagine how problematic it is for a new contributor.It is not urgent, but we should at least have a plan on how to simplify and unify things.I'm in favour of reducing the number of test targets to the minimum - for different configurations I think we should provide a parameter pointing to jvm options file and maybe to cassandra.yaml. I know that we currently do some super hacky things with cassandra yaml for different configs - like concatenting parts of it. I presume it is not necessary - we can have a default test config yaml and a directory with overriding yamls; while building we could have a tool which is able to load the default configuration, apply the override and save the resulting yaml somewhere in the build/test/configs for example. That would allows us to easily use those yamls in IDE as well - currently it is impossible.What do you think?Thank you and my apologize for bothering about lower priority stuff while we have a 5.0 release headache...Jacek


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-11-30 Thread Jacek Lewandowski
How those burn tests then compare to the fuzz tests? (the new ones)

czw., 30 lis 2023, 20:22 użytkownik Benedict  napisał:

> By “could run indefinitely” I don’t mean by default they run forever.
> There will be parameters that change how much work is done for a given run,
> but just running repeatedly (each time with a different generated seeds) is
> the expected usage. Until you run out of compute or patience.
>
> I agree they are only of value pre-commit to check they haven’t been
> broken in some way by changes.
>
>
>
> On 30 Nov 2023, at 18:36, Josh McKenzie  wrote:
>
> 
>
> that may be long-running and that could be run indefinitely
>
> Perfect. That was the distinction I wasn't aware of. Also means having the
> burn target as part of regular CI runs is probably a mistake, yes? i.e. if
> someone adds a burn tests that runs indefinitely, are there any guardrails
> or built-in checks or timeouts to keep it from running right up to job
> timeout and then failing?
>
> On Thu, Nov 30, 2023, at 1:11 PM, Benedict wrote:
>
>
> A burn test is a randomised test targeting broad coverage of a single
> system, subsystem or utility, that may be long-running and that could be
> run indefinitely, each run providing incrementally more assurance of
> quality of the system.
>
> A long test is a unit test that sometimes takes a long time to run, no
> more no less. I’m not sure any of these offer all that much value anymore,
> and perhaps we could look to deprecate them.
>
> On 30 Nov 2023, at 17:20, Josh McKenzie  wrote:
>
> 
> Strongly agree. I started working on a declarative refactor out of our CI
> configuration so circle, ASFCI, and other systems could inherit from it
> (for instance, see pre-commit pipeline declaration here
> );
> I had to set that down while I finished up implementing an internal CI
> system since the code in neither the ASF CI structure nor circle structure
> (.sh embedded in .yml /cry) was re-usable in their current form.
>
> Having a jvm.options and cassandra.yaml file per suite and referencing
> them from a declarative job definition
> 
> would make things a lot easier to wrap our heads around and maintain I
> think.
>
> As for what qualifies as burn vs. long... /shrug couldn't tell you. Would
> have to go down the git blame + dev ML + JIRA rabbit hole. :) Maybe someone
> else on-list knows.
>
> On Thu, Nov 30, 2023, at 4:25 AM, Jacek Lewandowski wrote:
>
> Hi,
>
> I'm getting a bit lost - what are the exact differences between those
> test scenarios? What are the criteria for qualifying a test to be part of a
> certain scenario?
>
> I'm working a little bit with tests and build scripts and the number of
> different configurations for which we have a separate target in the build
> starts to be problematic, I cannot imagine how problematic it is for a new
> contributor.
>
> It is not urgent, but we should at least have a plan on how to
> simplify and unify things.
>
> I'm in favour of reducing the number of test targets to the minimum - for
> different configurations I think we should provide a parameter pointing to
> jvm options file and maybe to cassandra.yaml. I know that we currently do
> some super hacky things with cassandra yaml for different configs - like
> concatenting parts of it. I presume it is not necessary - we can have a
> default test config yaml and a directory with overriding yamls; while
> building we could have a tool which is able to load the default
> configuration, apply the override and save the resulting yaml somewhere in
> the build/test/configs for example. That would allows us to easily use
> those yamls in IDE as well - currently it is impossible.
>
> What do you think?
>
> Thank you and my apologize for bothering about lower priority stuff while
> we have a 5.0 release headache...
>
> Jacek
>
>
>
>


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-11-30 Thread Benedict
By “could run indefinitely” I don’t mean by default they run forever. There will be parameters that change how much work is done for a given run, but just running repeatedly (each time with a different generated seeds) is the expected usage. Until you run out of compute or patience.I agree they are only of value pre-commit to check they haven’t been broken in some way by changes. On 30 Nov 2023, at 18:36, Josh McKenzie  wrote:that may be long-running and that could be run indefinitelyPerfect. That was the distinction I wasn't aware of. Also means having the burn target as part of regular CI runs is probably a mistake, yes? i.e. if someone adds a burn tests that runs indefinitely, are there any guardrails or built-in checks or timeouts to keep it from running right up to job timeout and then failing?On Thu, Nov 30, 2023, at 1:11 PM, Benedict wrote:A burn test is a randomised test targeting broad coverage of a single system, subsystem or utility, that may be long-running and that could be run indefinitely, each run providing incrementally more assurance of quality of the system.A long test is a unit test that sometimes takes a long time to run, no more no less. I’m not sure any of these offer all that much value anymore, and perhaps we could look to deprecate them.On 30 Nov 2023, at 17:20, Josh McKenzie  wrote:Strongly agree. I started working on a declarative refactor out of our CI configuration so circle, ASFCI, and other systems could inherit from it (for instance, see pre-commit pipeline declaration here); I had to set that down while I finished up implementing an internal CI system since the code in neither the ASF CI structure nor circle structure (.sh embedded in .yml /cry) was re-usable in their current form.Having a jvm.options and cassandra.yaml file per suite and referencing them from a declarative job definition would make things a lot easier to wrap our heads around and maintain I think.As for what qualifies as burn vs. long... /shrug couldn't tell you. Would have to go down the git blame + dev ML + JIRA rabbit hole. :) Maybe someone else on-list knows.On Thu, Nov 30, 2023, at 4:25 AM, Jacek Lewandowski wrote:Hi,I'm getting a bit lost - what are the exact differences between those test scenarios? What are the criteria for qualifying a test to be part of a certain scenario?I'm working a little bit with tests and build scripts and the number of different configurations for which we have a separate target in the build starts to be problematic, I cannot imagine how problematic it is for a new contributor.It is not urgent, but we should at least have a plan on how to simplify and unify things.I'm in favour of reducing the number of test targets to the minimum - for different configurations I think we should provide a parameter pointing to jvm options file and maybe to cassandra.yaml. I know that we currently do some super hacky things with cassandra yaml for different configs - like concatenting parts of it. I presume it is not necessary - we can have a default test config yaml and a directory with overriding yamls; while building we could have a tool which is able to load the default configuration, apply the override and save the resulting yaml somewhere in the build/test/configs for example. That would allows us to easily use those yamls in IDE as well - currently it is impossible.What do you think?Thank you and my apologize for bothering about lower priority stuff while we have a 5.0 release headache...Jacek

Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-11-30 Thread Josh McKenzie
> that may be long-running and that could be run indefinitely
Perfect. That was the distinction I wasn't aware of. Also means having the burn 
target as part of regular CI runs is probably a mistake, yes? i.e. if someone 
adds a burn tests that runs indefinitely, are there any guardrails or built-in 
checks or timeouts to keep it from running right up to job timeout and then 
failing?

On Thu, Nov 30, 2023, at 1:11 PM, Benedict wrote:
> 
> A burn test is a randomised test targeting broad coverage of a single system, 
> subsystem or utility, that may be long-running and that could be run 
> indefinitely, each run providing incrementally more assurance of quality of 
> the system.
> 
> A long test is a unit test that sometimes takes a long time to run, no more 
> no less. I’m not sure any of these offer all that much value anymore, and 
> perhaps we could look to deprecate them.
> 
>> On 30 Nov 2023, at 17:20, Josh McKenzie  wrote:
>> 
>> Strongly agree. I started working on a declarative refactor out of our CI 
>> configuration so circle, ASFCI, and other systems could inherit from it (for 
>> instance, see pre-commit pipeline declaration here 
>> );
>>  I had to set that down while I finished up implementing an internal CI 
>> system since the code in neither the ASF CI structure nor circle structure 
>> (.sh embedded in .yml /cry) was re-usable in their current form.
>> 
>> Having a jvm.options and cassandra.yaml file per suite and referencing them 
>> from a declarative job definition 
>> 
>>  would make things a lot easier to wrap our heads around and maintain I 
>> think.
>> 
>> As for what qualifies as burn vs. long... /shrug couldn't tell you. Would 
>> have to go down the git blame + dev ML + JIRA rabbit hole. :) Maybe someone 
>> else on-list knows.
>> 
>> On Thu, Nov 30, 2023, at 4:25 AM, Jacek Lewandowski wrote:
>>> Hi,
>>> 
>>> I'm getting a bit lost - what are the exact differences between those test 
>>> scenarios? What are the criteria for qualifying a test to be part of a 
>>> certain scenario?
>>> 
>>> I'm working a little bit with tests and build scripts and the number of 
>>> different configurations for which we have a separate target in the build 
>>> starts to be problematic, I cannot imagine how problematic it is for a new 
>>> contributor.
>>> 
>>> It is not urgent, but we should at least have a plan on how to simplify and 
>>> unify things.
>>> 
>>> I'm in favour of reducing the number of test targets to the minimum - for 
>>> different configurations I think we should provide a parameter pointing to 
>>> jvm options file and maybe to cassandra.yaml. I know that we currently do 
>>> some super hacky things with cassandra yaml for different configs - like 
>>> concatenting parts of it. I presume it is not necessary - we can have a 
>>> default test config yaml and a directory with overriding yamls; while 
>>> building we could have a tool which is able to load the default 
>>> configuration, apply the override and save the resulting yaml somewhere in 
>>> the build/test/configs for example. That would allows us to easily use 
>>> those yamls in IDE as well - currently it is impossible.
>>> 
>>> What do you think?
>>> 
>>> Thank you and my apologize for bothering about lower priority stuff while 
>>> we have a 5.0 release headache...
>>> 
>>> Jacek
>>> 
>> 


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-11-30 Thread Benedict
A burn test is a randomised test targeting broad coverage of a single system, subsystem or utility, that may be long-running and that could be run indefinitely, each run providing incrementally more assurance of quality of the system.A long test is a unit test that sometimes takes a long time to run, no more no less. I’m not sure any of these offer all that much value anymore, and perhaps we could look to deprecate them.On 30 Nov 2023, at 17:20, Josh McKenzie  wrote:Strongly agree. I started working on a declarative refactor out of our CI configuration so circle, ASFCI, and other systems could inherit from it (for instance, see pre-commit pipeline declaration here); I had to set that down while I finished up implementing an internal CI system since the code in neither the ASF CI structure nor circle structure (.sh embedded in .yml /cry) was re-usable in their current form.Having a jvm.options and cassandra.yaml file per suite and referencing them from a declarative job definition would make things a lot easier to wrap our heads around and maintain I think.As for what qualifies as burn vs. long... /shrug couldn't tell you. Would have to go down the git blame + dev ML + JIRA rabbit hole. :) Maybe someone else on-list knows.On Thu, Nov 30, 2023, at 4:25 AM, Jacek Lewandowski wrote:Hi,I'm getting a bit lost - what are the exact differences between those test scenarios? What are the criteria for qualifying a test to be part of a certain scenario?I'm working a little bit with tests and build scripts and the number of different configurations for which we have a separate target in the build starts to be problematic, I cannot imagine how problematic it is for a new contributor.It is not urgent, but we should at least have a plan on how to simplify and unify things.I'm in favour of reducing the number of test targets to the minimum - for different configurations I think we should provide a parameter pointing to jvm options file and maybe to cassandra.yaml. I know that we currently do some super hacky things with cassandra yaml for different configs - like concatenting parts of it. I presume it is not necessary - we can have a default test config yaml and a directory with overriding yamls; while building we could have a tool which is able to load the default configuration, apply the override and save the resulting yaml somewhere in the build/test/configs for example. That would allows us to easily use those yamls in IDE as well - currently it is impossible.What do you think?Thank you and my apologize for bothering about lower priority stuff while we have a 5.0 release headache...Jacek

Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-11-30 Thread Josh McKenzie
Strongly agree. I started working on a declarative refactor out of our CI 
configuration so circle, ASFCI, and other systems could inherit from it (for 
instance, see pre-commit pipeline declaration here 
);
 I had to set that down while I finished up implementing an internal CI system 
since the code in neither the ASF CI structure nor circle structure (.sh 
embedded in .yml /cry) was re-usable in their current form.

Having a jvm.options and cassandra.yaml file per suite and referencing them 
from a declarative job definition 

 would make things a lot easier to wrap our heads around and maintain I think.

As for what qualifies as burn vs. long... /shrug couldn't tell you. Would have 
to go down the git blame + dev ML + JIRA rabbit hole. :) Maybe someone else 
on-list knows.

On Thu, Nov 30, 2023, at 4:25 AM, Jacek Lewandowski wrote:
> Hi,
> 
> I'm getting a bit lost - what are the exact differences between those test 
> scenarios? What are the criteria for qualifying a test to be part of a 
> certain scenario?
> 
> I'm working a little bit with tests and build scripts and the number of 
> different configurations for which we have a separate target in the build 
> starts to be problematic, I cannot imagine how problematic it is for a new 
> contributor.
> 
> It is not urgent, but we should at least have a plan on how to simplify and 
> unify things.
> 
> I'm in favour of reducing the number of test targets to the minimum - for 
> different configurations I think we should provide a parameter pointing to 
> jvm options file and maybe to cassandra.yaml. I know that we currently do 
> some super hacky things with cassandra yaml for different configs - like 
> concatenting parts of it. I presume it is not necessary - we can have a 
> default test config yaml and a directory with overriding yamls; while 
> building we could have a tool which is able to load the default 
> configuration, apply the override and save the resulting yaml somewhere in 
> the build/test/configs for example. That would allows us to easily use those 
> yamls in IDE as well - currently it is impossible.
> 
> What do you think?
> 
> Thank you and my apologize for bothering about lower priority stuff while we 
> have a 5.0 release headache...
> 
> Jacek
>