Re: [rng] Tests in -sampling
> On Jul 25, 2018, at 9:48 PM, Gilles wrote: > > On Wed, 25 Jul 2018 21:08:57 -0400, Rob Tompkins wrote: >>> On Jul 24, 2018, at 9:13 PM, Rob Tompkins wrote: >>> >>> >>> On Jul 24, 2018, at 7:04 PM, Gilles wrote: Hi Rob. On Tue, 24 Jul 2018 18:33:40 -0400, Rob Tompkins wrote: > I know that the tests will be necessarily non-deterministic, but we > can at least get closer to having determinism by running the same test > 1000 times and expecting some reasonable number of passes right? Could > we use the underlying distribution that we are testing to sort out > this value? This *is* what the test is doing, although it repeats 50 times (takes quite some time already) instead of 1000. As I've reported on this list, it is quite possible that the failure probabilities are underestimated; (first) review welcome: the tests are fairly well documented as to what they are doing but I might have committed some bugs wrt the statistics involved. >>> >>> Once I get the release out, I’ll have a look. >> >> So the curiosity here is a standard probability problem. It seems >> that we have N tests each with some probability of failing P_N. For >> some arbitrary test T, P_T is fairly inconsequential, but when >> aggregated together with in with P_1, P_2, … , P_{T-1}, P_{T}, …., P_N >> the probability of failure of test approaches something between 10% >> and 50% which is indeed consequential. > > If p is the probability that the test will fail, 1-p is > the probability that it'll succeed. The probability that > all N tests succeed is (1-p)^N. > > Example from empirical runs: Overall failure is ~25% (3/12 as > per previous post); there ~35 such tests, thus p is ~1%. > We'd have to look for how to reduce this latter value. If we simply set up surefire to re-run only the failed tests, we’d overcome the problem. I checked that into 1.1 last night. I think that’ll help considerably. > > Gilles > >> I’m going to have to think >> about this some. If I recall correctly, we could use the central limit >> theorem here about overall test failure, right? Could we apply the >> same characteristic to the over all number of tests in the project? I >> don’t think we can avoid it. Does surefire accommodate a percentage of >> test failures for passing the build? >> >> -Rob >> >>> >>> Cheers, >>> -Rob >>> Regards, Gilles > > -Rob > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [rng] Tests in -sampling
On Wed, 25 Jul 2018 21:08:57 -0400, Rob Tompkins wrote: On Jul 24, 2018, at 9:13 PM, Rob Tompkins wrote: On Jul 24, 2018, at 7:04 PM, Gilles wrote: Hi Rob. On Tue, 24 Jul 2018 18:33:40 -0400, Rob Tompkins wrote: I know that the tests will be necessarily non-deterministic, but we can at least get closer to having determinism by running the same test 1000 times and expecting some reasonable number of passes right? Could we use the underlying distribution that we are testing to sort out this value? This *is* what the test is doing, although it repeats 50 times (takes quite some time already) instead of 1000. As I've reported on this list, it is quite possible that the failure probabilities are underestimated; (first) review welcome: the tests are fairly well documented as to what they are doing but I might have committed some bugs wrt the statistics involved. Once I get the release out, I’ll have a look. So the curiosity here is a standard probability problem. It seems that we have N tests each with some probability of failing P_N. For some arbitrary test T, P_T is fairly inconsequential, but when aggregated together with in with P_1, P_2, … , P_{T-1}, P_{T}, …., P_N the probability of failure of test approaches something between 10% and 50% which is indeed consequential. If p is the probability that the test will fail, 1-p is the probability that it'll succeed. The probability that all N tests succeed is (1-p)^N. Example from empirical runs: Overall failure is ~25% (3/12 as per previous post); there ~35 such tests, thus p is ~1%. We'd have to look for how to reduce this latter value. Gilles I’m going to have to think about this some. If I recall correctly, we could use the central limit theorem here about overall test failure, right? Could we apply the same characteristic to the over all number of tests in the project? I don’t think we can avoid it. Does surefire accommodate a percentage of test failures for passing the build? -Rob Cheers, -Rob Regards, Gilles -Rob - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [rng] Tests in -sampling
> On Jul 24, 2018, at 9:13 PM, Rob Tompkins wrote: > > > >> On Jul 24, 2018, at 7:04 PM, Gilles wrote: >> >> Hi Rob. >> >> On Tue, 24 Jul 2018 18:33:40 -0400, Rob Tompkins wrote: >>> I know that the tests will be necessarily non-deterministic, but we >>> can at least get closer to having determinism by running the same test >>> 1000 times and expecting some reasonable number of passes right? Could >>> we use the underlying distribution that we are testing to sort out >>> this value? >> >> This *is* what the test is doing, although it repeats 50 times >> (takes quite some time already) instead of 1000. >> As I've reported on this list, it is quite possible that the >> failure probabilities are underestimated; (first) review welcome: >> the tests are fairly well documented as to what they are doing >> but I might have committed some bugs wrt the statistics involved. > > Once I get the release out, I’ll have a look. So the curiosity here is a standard probability problem. It seems that we have N tests each with some probability of failing P_N. For some arbitrary test T, P_T is fairly inconsequential, but when aggregated together with in with P_1, P_2, … , P_{T-1}, P_{T}, …., P_N the probability of failure of test approaches something between 10% and 50% which is indeed consequential. I’m going to have to think about this some. If I recall correctly, we could use the central limit theorem here about overall test failure, right? Could we apply the same characteristic to the over all number of tests in the project? I don’t think we can avoid it. Does surefire accommodate a percentage of test failures for passing the build? -Rob > > Cheers, > -Rob > >> >> Regards, >> Gilles >> >>> >>> -Rob >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [rng] Tests in -sampling
> On Jul 24, 2018, at 7:04 PM, Gilles wrote: > > Hi Rob. > > On Tue, 24 Jul 2018 18:33:40 -0400, Rob Tompkins wrote: >> I know that the tests will be necessarily non-deterministic, but we >> can at least get closer to having determinism by running the same test >> 1000 times and expecting some reasonable number of passes right? Could >> we use the underlying distribution that we are testing to sort out >> this value? > > This *is* what the test is doing, although it repeats 50 times > (takes quite some time already) instead of 1000. > As I've reported on this list, it is quite possible that the > failure probabilities are underestimated; (first) review welcome: > the tests are fairly well documented as to what they are doing > but I might have committed some bugs wrt the statistics involved. Once I get the release out, I’ll have a look. Cheers, -Rob > > Regards, > Gilles > >> >> -Rob > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [rng] Tests in -sampling
Hi Rob. On Tue, 24 Jul 2018 18:33:40 -0400, Rob Tompkins wrote: I know that the tests will be necessarily non-deterministic, but we can at least get closer to having determinism by running the same test 1000 times and expecting some reasonable number of passes right? Could we use the underlying distribution that we are testing to sort out this value? This *is* what the test is doing, although it repeats 50 times (takes quite some time already) instead of 1000. As I've reported on this list, it is quite possible that the failure probabilities are underestimated; (first) review welcome: the tests are fairly well documented as to what they are doing but I might have committed some bugs wrt the statistics involved. Regards, Gilles -Rob - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[rng] Tests in -sampling
I know that the tests will be necessarily non-deterministic, but we can at least get closer to having determinism by running the same test 1000 times and expecting some reasonable number of passes right? Could we use the underlying distribution that we are testing to sort out this value? -Rob - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org