Re: [rng] Tests in -sampling

2018-07-26 Thread Rob Tompkins



> On Jul 25, 2018, at 9:48 PM, Gilles  wrote:
> 
> On Wed, 25 Jul 2018 21:08:57 -0400, Rob Tompkins wrote:
>>> On Jul 24, 2018, at 9:13 PM, Rob Tompkins  wrote:
>>> 
>>> 
>>> 
 On Jul 24, 2018, at 7:04 PM, Gilles  wrote:
 
 Hi Rob.
 
 On Tue, 24 Jul 2018 18:33:40 -0400, Rob Tompkins wrote:
> I know that the tests will be necessarily non-deterministic, but we
> can at least get closer to having determinism by running the same test
> 1000 times and expecting some reasonable number of passes right? Could
> we use the underlying distribution that we are testing to sort out
> this value?
 
 This *is* what the test is doing, although it repeats 50 times
 (takes quite some time already) instead of 1000.
 As I've reported on this list, it is quite possible that the
 failure probabilities are underestimated; (first) review welcome:
 the tests are fairly well documented as to what they are doing
 but I might have committed some bugs wrt the statistics involved.
>>> 
>>> Once I get the release out, I’ll have a look.
>> 
>> So the curiosity here is a standard probability problem. It seems
>> that we have N tests each with some probability of failing P_N. For
>> some arbitrary test T, P_T is fairly inconsequential, but when
>> aggregated together with in with P_1, P_2, … , P_{T-1}, P_{T}, …., P_N
>> the probability of failure of test approaches something between 10%
>> and 50% which is indeed consequential.
> 
> If p is the probability that the test will fail, 1-p is
> the probability that it'll succeed. The probability that
> all N tests succeed is (1-p)^N.
> 
> Example from empirical runs: Overall failure is ~25% (3/12 as
> per previous post); there ~35 such tests, thus p is ~1%.
> We'd have to look for how to reduce this latter value.

If we simply set up surefire to re-run only the failed tests, we’d overcome the 
problem. I checked that into 1.1 last night. I think that’ll help considerably.

> 
> Gilles
> 
>> I’m going to have to think
>> about this some. If I recall correctly, we could use the central limit
>> theorem here about overall test failure, right? Could we apply the
>> same characteristic to the over all number of tests in the project? I
>> don’t think we can avoid it. Does surefire accommodate a percentage of
>> test failures for passing the build?
>> 
>> -Rob
>> 
>>> 
>>> Cheers,
>>> -Rob
>>> 
 
 Regards,
 Gilles
 
> 
> -Rob
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [rng] Tests in -sampling

2018-07-25 Thread Gilles

On Wed, 25 Jul 2018 21:08:57 -0400, Rob Tompkins wrote:
On Jul 24, 2018, at 9:13 PM, Rob Tompkins  
wrote:




On Jul 24, 2018, at 7:04 PM, Gilles  
wrote:


Hi Rob.

On Tue, 24 Jul 2018 18:33:40 -0400, Rob Tompkins wrote:
I know that the tests will be necessarily non-deterministic, but 
we
can at least get closer to having determinism by running the same 
test
1000 times and expecting some reasonable number of passes right? 
Could

we use the underlying distribution that we are testing to sort out
this value?


This *is* what the test is doing, although it repeats 50 times
(takes quite some time already) instead of 1000.
As I've reported on this list, it is quite possible that the
failure probabilities are underestimated; (first) review welcome:
the tests are fairly well documented as to what they are doing
but I might have committed some bugs wrt the statistics involved.


Once I get the release out, I’ll have a look.


So the curiosity here is a standard probability problem. It seems
that we have N tests each with some probability of failing P_N. For
some arbitrary test T, P_T is fairly inconsequential, but when
aggregated together with in with P_1, P_2, … , P_{T-1}, P_{T}, …., 
P_N

the probability of failure of test approaches something between 10%
and 50% which is indeed consequential.


If p is the probability that the test will fail, 1-p is
the probability that it'll succeed. The probability that
all N tests succeed is (1-p)^N.

Example from empirical runs: Overall failure is ~25% (3/12 as
per previous post); there ~35 such tests, thus p is ~1%.
We'd have to look for how to reduce this latter value.

Gilles


I’m going to have to think
about this some. If I recall correctly, we could use the central 
limit

theorem here about overall test failure, right? Could we apply the
same characteristic to the over all number of tests in the project? I
don’t think we can avoid it. Does surefire accommodate a percentage 
of

test failures for passing the build?

-Rob



Cheers,
-Rob



Regards,
Gilles



-Rob



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [rng] Tests in -sampling

2018-07-25 Thread Rob Tompkins



> On Jul 24, 2018, at 9:13 PM, Rob Tompkins  wrote:
> 
> 
> 
>> On Jul 24, 2018, at 7:04 PM, Gilles  wrote:
>> 
>> Hi Rob.
>> 
>> On Tue, 24 Jul 2018 18:33:40 -0400, Rob Tompkins wrote:
>>> I know that the tests will be necessarily non-deterministic, but we
>>> can at least get closer to having determinism by running the same test
>>> 1000 times and expecting some reasonable number of passes right? Could
>>> we use the underlying distribution that we are testing to sort out
>>> this value?
>> 
>> This *is* what the test is doing, although it repeats 50 times
>> (takes quite some time already) instead of 1000.
>> As I've reported on this list, it is quite possible that the
>> failure probabilities are underestimated; (first) review welcome:
>> the tests are fairly well documented as to what they are doing
>> but I might have committed some bugs wrt the statistics involved.
> 
> Once I get the release out, I’ll have a look.

So the curiosity here is a standard probability problem. It seems that we have 
N tests each with some probability of failing P_N. For some arbitrary test T, 
P_T is fairly inconsequential, but when aggregated together with in with P_1, 
P_2, … , P_{T-1}, P_{T}, …., P_N the probability of failure of test approaches 
something between 10% and 50% which is indeed consequential. I’m going to have 
to think about this some. If I recall correctly, we could use the central limit 
theorem here about overall test failure, right? Could we apply the same 
characteristic to the over all number of tests in the project? I don’t think we 
can avoid it. Does surefire accommodate a percentage of test failures for 
passing the build?

-Rob

> 
> Cheers,
> -Rob
> 
>> 
>> Regards,
>> Gilles
>> 
>>> 
>>> -Rob
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [rng] Tests in -sampling

2018-07-24 Thread Rob Tompkins



> On Jul 24, 2018, at 7:04 PM, Gilles  wrote:
> 
> Hi Rob.
> 
> On Tue, 24 Jul 2018 18:33:40 -0400, Rob Tompkins wrote:
>> I know that the tests will be necessarily non-deterministic, but we
>> can at least get closer to having determinism by running the same test
>> 1000 times and expecting some reasonable number of passes right? Could
>> we use the underlying distribution that we are testing to sort out
>> this value?
> 
> This *is* what the test is doing, although it repeats 50 times
> (takes quite some time already) instead of 1000.
> As I've reported on this list, it is quite possible that the
> failure probabilities are underestimated; (first) review welcome:
> the tests are fairly well documented as to what they are doing
> but I might have committed some bugs wrt the statistics involved.

Once I get the release out, I’ll have a look.

Cheers,
-Rob

> 
> Regards,
> Gilles
> 
>> 
>> -Rob
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [rng] Tests in -sampling

2018-07-24 Thread Gilles

Hi Rob.

On Tue, 24 Jul 2018 18:33:40 -0400, Rob Tompkins wrote:

I know that the tests will be necessarily non-deterministic, but we
can at least get closer to having determinism by running the same 
test
1000 times and expecting some reasonable number of passes right? 
Could

we use the underlying distribution that we are testing to sort out
this value?


This *is* what the test is doing, although it repeats 50 times
(takes quite some time already) instead of 1000.
As I've reported on this list, it is quite possible that the
failure probabilities are underestimated; (first) review welcome:
the tests are fairly well documented as to what they are doing
but I might have committed some bugs wrt the statistics involved.

Regards,
Gilles



-Rob



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[rng] Tests in -sampling

2018-07-24 Thread Rob Tompkins
I know that the tests will be necessarily non-deterministic, but we can at 
least get closer to having determinism by running the same test 1000 times and 
expecting some reasonable number of passes right? Could we use the underlying 
distribution that we are testing to sort out this value?

-Rob
-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org