Re: [lang] RandomStringUtilsTest.testRandomStringUtilsHomog fails a lot

2023-10-20 Thread Alex Herbert
I opened a PR after changing the expected failure probability to 1e-5.

I had an old version of the GH build file when I estimated it used 4
runs. The latest CI runs 4 JDKs on 3 platforms plus CodeQL and
coverage. So this is 14 runs. We should see failures with a
probability of:

(1 - (1 - 1e-5)**14) = 0.0001399, or approximately 1 in 7143.

Compare that to previously:

(1 - (1 - 1e-3)**14) = 0.01391, or approximately 1 in 71.89.

If this is still a problem moving forward then we can replace with a
fixed random number generator calling the underlying method and test
coverage of the original method by other means.

Alex

On Fri, 20 Oct 2023 at 20:16, Gary D. Gregory  wrote:
>
> Hi Alex,
>
> I'd prefer if you could give a shot at adjusting this test when you can take 
> the time.
>
> TY,
> Gary
>
> On 2023/10/20 18:17:35 Alex Herbert wrote:
> > On Fri, 20 Oct 2023 at 18:55, Alex Herbert  wrote:
> > >
> > > The chi-square critical value (13.82) is correct:
> > >
> > > >>> from scipy.stats import chi2
> > > >>> chi2(2).isf(0.001)
> > > 13.815510557964274
> > >
> > > The test seems to fail with the expected frequency when run locally. I
> > > annotated with:
> > >
> > > @RepeatedTest(value = 10)
> > >
> > > I observe 93 failures (just under 1 in 1000). So it is strange this
> > > fails a lot on the GH CI build.
> > >
> > > We could just use a fixed Random argument to the call that is
> > > ultimately performing the random string generation:
> > >
> > > random(count, 0, chars.length, false, false, chars, random());
> > >
> > > Switch the test to:
> > >
> > > Random rng = new Random(0xdeadbeef)
> > >
> > > gen = RandomStringUtils.random(6, 0, 3, false, false, chars, rng);
> > >
> > > You will see a drop in coverage by not exercising the public API.
> > >
> > > The alternative is to change the chi-square critical value:
> > >
> > > 1 in 10,000: 18.420680743952364
> > > 1 in 100,000: 23.025850929940457
> > > 1 in 1,000,000: 27.631021115928547
> > >
> > > Alex
> >
> > Also note that although the test fails 1 in 1000 times, we run 4
> > builds in CI (coverage + 3 JDKS). So we expect to see failure with a
> > p-value of:
> >
> > 1 - (1 - 0.001)^4 = 0.00399
> >
> > This is the probability of not seeing a failure subtracted from 1. It
> > is approximately 1 in 250.
> >
> > I did check the computation of the chi-square statistic and AFAIK it is 
> > correct.
> >
> > My suggestion for a first change is to bump the critical value to the
> > level for 1 in 100,000. We should then see failures every 25,000 GH
> > builds. If the frequency is more than that then we have to assume that
> > the ThreadLocalRandom instance is not uniformly sampling from the set
> > of size 3. I find this unlikely as the underlying algorithm for
> > ThreadLocalRandom is good [1].
> >
> > Alex
> >
> > [1] IIRC it passes the Test U01 BigCrush test for random generators
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [lang] RandomStringUtilsTest.testRandomStringUtilsHomog fails a lot

2023-10-20 Thread Gary D. Gregory
Hi Alex,

I'd prefer if you could give a shot at adjusting this test when you can take 
the time.

TY,
Gary

On 2023/10/20 18:17:35 Alex Herbert wrote:
> On Fri, 20 Oct 2023 at 18:55, Alex Herbert  wrote:
> >
> > The chi-square critical value (13.82) is correct:
> >
> > >>> from scipy.stats import chi2
> > >>> chi2(2).isf(0.001)
> > 13.815510557964274
> >
> > The test seems to fail with the expected frequency when run locally. I
> > annotated with:
> >
> > @RepeatedTest(value = 10)
> >
> > I observe 93 failures (just under 1 in 1000). So it is strange this
> > fails a lot on the GH CI build.
> >
> > We could just use a fixed Random argument to the call that is
> > ultimately performing the random string generation:
> >
> > random(count, 0, chars.length, false, false, chars, random());
> >
> > Switch the test to:
> >
> > Random rng = new Random(0xdeadbeef)
> >
> > gen = RandomStringUtils.random(6, 0, 3, false, false, chars, rng);
> >
> > You will see a drop in coverage by not exercising the public API.
> >
> > The alternative is to change the chi-square critical value:
> >
> > 1 in 10,000: 18.420680743952364
> > 1 in 100,000: 23.025850929940457
> > 1 in 1,000,000: 27.631021115928547
> >
> > Alex
> 
> Also note that although the test fails 1 in 1000 times, we run 4
> builds in CI (coverage + 3 JDKS). So we expect to see failure with a
> p-value of:
> 
> 1 - (1 - 0.001)^4 = 0.00399
> 
> This is the probability of not seeing a failure subtracted from 1. It
> is approximately 1 in 250.
> 
> I did check the computation of the chi-square statistic and AFAIK it is 
> correct.
> 
> My suggestion for a first change is to bump the critical value to the
> level for 1 in 100,000. We should then see failures every 25,000 GH
> builds. If the frequency is more than that then we have to assume that
> the ThreadLocalRandom instance is not uniformly sampling from the set
> of size 3. I find this unlikely as the underlying algorithm for
> ThreadLocalRandom is good [1].
> 
> Alex
> 
> [1] IIRC it passes the Test U01 BigCrush test for random generators
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [lang] RandomStringUtilsTest.testRandomStringUtilsHomog fails a lot

2023-10-20 Thread Alex Herbert
On Fri, 20 Oct 2023 at 18:55, Alex Herbert  wrote:
>
> The chi-square critical value (13.82) is correct:
>
> >>> from scipy.stats import chi2
> >>> chi2(2).isf(0.001)
> 13.815510557964274
>
> The test seems to fail with the expected frequency when run locally. I
> annotated with:
>
> @RepeatedTest(value = 10)
>
> I observe 93 failures (just under 1 in 1000). So it is strange this
> fails a lot on the GH CI build.
>
> We could just use a fixed Random argument to the call that is
> ultimately performing the random string generation:
>
> random(count, 0, chars.length, false, false, chars, random());
>
> Switch the test to:
>
> Random rng = new Random(0xdeadbeef)
>
> gen = RandomStringUtils.random(6, 0, 3, false, false, chars, rng);
>
> You will see a drop in coverage by not exercising the public API.
>
> The alternative is to change the chi-square critical value:
>
> 1 in 10,000: 18.420680743952364
> 1 in 100,000: 23.025850929940457
> 1 in 1,000,000: 27.631021115928547
>
> Alex

Also note that although the test fails 1 in 1000 times, we run 4
builds in CI (coverage + 3 JDKS). So we expect to see failure with a
p-value of:

1 - (1 - 0.001)^4 = 0.00399

This is the probability of not seeing a failure subtracted from 1. It
is approximately 1 in 250.

I did check the computation of the chi-square statistic and AFAIK it is correct.

My suggestion for a first change is to bump the critical value to the
level for 1 in 100,000. We should then see failures every 25,000 GH
builds. If the frequency is more than that then we have to assume that
the ThreadLocalRandom instance is not uniformly sampling from the set
of size 3. I find this unlikely as the underlying algorithm for
ThreadLocalRandom is good [1].

Alex

[1] IIRC it passes the Test U01 BigCrush test for random generators

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [lang] RandomStringUtilsTest.testRandomStringUtilsHomog fails a lot

2023-10-20 Thread Alex Herbert
The chi-square critical value (13.82) is correct:

>>> from scipy.stats import chi2
>>> chi2(2).isf(0.001)
13.815510557964274

The test seems to fail with the expected frequency when run locally. I
annotated with:

@RepeatedTest(value = 10)

I observe 93 failures (just under 1 in 1000). So it is strange this
fails a lot on the GH CI build.

We could just use a fixed Random argument to the call that is
ultimately performing the random string generation:

random(count, 0, chars.length, false, false, chars, random());

Switch the test to:

Random rng = new Random(0xdeadbeef)

gen = RandomStringUtils.random(6, 0, 3, false, false, chars, rng);

You will see a drop in coverage by not exercising the public API.

The alternative is to change the chi-square critical value:

1 in 10,000: 18.420680743952364
1 in 100,000: 23.025850929940457
1 in 1,000,000: 27.631021115928547

Alex

On Fri, 20 Oct 2023 at 18:44, Elliotte Rusty Harold  wrote:
>
> It's possible the chi square test is miscalculated. Perhaps some stats
> expert can check that. It's also possible the chi square test isn't
> the right one to use here. Again, consult a stats expert.
>
> It's also very possible that the randomness is not nearly as random as
> it's supposed to be. That's incredibly common, and that might be
> noticeable given the very short three-letter character set [a, b, c]
> being picked from.
>
> On Fri, Oct 20, 2023 at 1:31 PM Gary D. Gregory  wrote:
> >
> > Despite the failure comment:
> >
> > RandomStringUtilsTest.testRandomStringUtilsHomog:474 test homogeneity -- 
> > will fail about 1 in 1000 times ==> expected:  but was: 
> >
> > This test fails a LOT more than once every 1000 times, based on how many 
> > GitHub builds I need to restart every week.
> >
> > What can be done to make this test more resilient?
> >
> > TY!
> > Gary
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
>
>
> --
> Elliotte Rusty Harold
> elh...@ibiblio.org
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [lang] RandomStringUtilsTest.testRandomStringUtilsHomog fails a lot

2023-10-20 Thread Elliotte Rusty Harold
It's possible the chi square test is miscalculated. Perhaps some stats
expert can check that. It's also possible the chi square test isn't
the right one to use here. Again, consult a stats expert.

It's also very possible that the randomness is not nearly as random as
it's supposed to be. That's incredibly common, and that might be
noticeable given the very short three-letter character set [a, b, c]
being picked from.

On Fri, Oct 20, 2023 at 1:31 PM Gary D. Gregory  wrote:
>
> Despite the failure comment:
>
> RandomStringUtilsTest.testRandomStringUtilsHomog:474 test homogeneity -- will 
> fail about 1 in 1000 times ==> expected:  but was: 
>
> This test fails a LOT more than once every 1000 times, based on how many 
> GitHub builds I need to restart every week.
>
> What can be done to make this test more resilient?
>
> TY!
> Gary
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>


-- 
Elliotte Rusty Harold
elh...@ibiblio.org

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[lang] RandomStringUtilsTest.testRandomStringUtilsHomog fails a lot

2023-10-20 Thread Gary D. Gregory
Despite the failure comment:

RandomStringUtilsTest.testRandomStringUtilsHomog:474 test homogeneity -- will 
fail about 1 in 1000 times ==> expected:  but was: 

This test fails a LOT more than once every 1000 times, based on how many GitHub 
builds I need to restart every week.

What can be done to make this test more resilient?

TY!
Gary

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org