Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread camel-cdr
‐‐‐ Original Message ‐‐‐
On Saturday, February 6, 2021 3:29 PM, Robert Kern  
wrote:

> On Sat, Feb 6, 2021 at 7:27 AM  wrote:
>
>>> Well, I can tell you why it needs to be backward compatible! I use random 
>>> numbers fairly frequently, and to unit test them I set a specific seed and 
>>> then make sure I get the same answers.
>>
>> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the 
>> same thing with the same bit's as the original one, but I couldn't get it to 
>> work.
>
> To be clear, this is not our backwards compatibility policy for the methods 
> that you have modified. Our policy is spelled out here:
>
> https://numpy.org/neps/nep-0019-rng-policy.html
>
> The TestRandomDist suite of tests were adapted from the older RandomState 
> (which is indeed frozen and not allowed to change algorithms). It's a mix of 
> correctness tests that are valid regardless of the precise algorithm (does 
> this method reject invalid arguments? do degenerate arguments yield the 
> correct constant value?) and actual "has this algorithm changed 
> unexpectedly?" tests. The former are the most valuable, but the latter are 
> useful for testing in cross-platform contexts. Compilers and different CPUs 
> can do naughty things sometimes, and we want the cross-platform differences 
> to be minimal. When you do change an algorithm implementation for Generator, 
> as you have done, you are expected to do thorough tests (offline, not in the 
> unit tests) that it is correctly sampling from the target probability 
> distribution, then once satisfied, change the hard-coded values in 
> TestRandomDist to match whatever you are generating.
>
> --
> Robert Kern

Ok, cool, that basically explains a lot.

> When you do change an algorithm implementation for Generator, as you have 
> done, you are expected to do thorough tests (offline, not in the unit tests) 
> that it is correctly sampling from the target probability distribution, then 
> once satisfied, change the hard-coded values in TestRandomDist to match 
> whatever you are generating.

I'm probably not versed enough in statistics to do thorough testing. I used the 
testing in https://www.seehuhn.de/pages/ziggurat and plotting histograms to 
verify correctness, that probably won't be sufficient.___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread camel-cdr
> Well, I can tell you why it needs to be backward compatible! I use random 
> numbers fairly frequently, and to unit test them I set a specific seed and 
> then make sure I get the same answers.

Hmm, I guess that makes sense. I tried to adjust my algorithms to do the same 
thing with the same bit's as the original one, but I couldn't get it to work.

> Have you benchmarked it using the generator interface? The structure of this 
> as a no monolithic generator makes it a good deal slower than generating in 
> straight C (with everything inline). While I'm not sure a factor of 2 is 
> enough to justify a change (for me 10x, 1.2x is not but I don't know where 
> the cutoff is).

I originally benchmarked my implementation against a bunch of other ones in c 
(because I was developing a c library 
https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
But I did run the built-in benchmark: ./runtests.py --bench 
bench_random.RNG.time_normal_zig and the results are:

new old
PCG64 589±3μs 1.06±0.03ms
MT19937 985±4μs 1.44±0.01ms
Philox 981±30μs 1.39±0.01ms
SFC64 508±4μs 900±4μs
numpy 2.99±0.06ms 2.98±0.01ms # no change for /dev/urandom

I'm not yet 100% certain about the implementations, but I attached a diff of my 
current progress.diff --git a/numpy/random/src/distributions/distributions.c b/numpy/random/src/distributions/distributions.c
index 93e0bdc5f..447d5e161 100644
--- a/numpy/random/src/distributions/distributions.c
+++ b/numpy/random/src/distributions/distributions.c
@@ -46,9 +46,9 @@ void random_standard_uniform_fill_f(bitgen_t *bitgen_state, npy_intp cnt, float
 static double standard_exponential_unlikely(bitgen_t *bitgen_state,
 uint8_t idx, double x) {
   if (idx == 0) {
 /* Switch to 1.0 - U to avoid log(0.0), see GH 13361 */
-return ziggurat_exp_r - log(1.0 - next_double(bitgen_state));
+return (double)ZIGGURAT_EXP_R - log(1.0 - next_double(bitgen_state));
   } else if ((fe_double[idx - 1] - fe_double[idx]) * next_double(bitgen_state) +
  fe_double[idx] <
  exp(-x)) {
 return x;
@@ -83,9 +83,9 @@ void random_standard_exponential_fill(bitgen_t * bitgen_state, npy_intp cnt, dou
 static float standard_exponential_unlikely_f(bitgen_t *bitgen_state,
  uint8_t idx, float x) {
   if (idx == 0) {
 /* Switch to 1.0 - U to avoid log(0.0), see GH 13361 */
-return ziggurat_exp_r_f - logf(1.0f - next_float(bitgen_state));
+return (float)ZIGGURAT_EXP_R - logf(1.0f - next_float(bitgen_state));
   } else if ((fe_float[idx - 1] - fe_float[idx]) * next_float(bitgen_state) +
  fe_float[idx] <
  expf(-x)) {
 return x;
@@ -132,41 +132,36 @@ void random_standard_exponential_inv_fill_f(bitgen_t * bitgen_state, npy_intp cn
 out[i] = -log(1.0 - next_float(bitgen_state));
   }
 }
 
-
 double random_standard_normal(bitgen_t *bitgen_state) {
-  uint64_t r;
-  int sign;
-  uint64_t rabs;
-  int idx;
-  double x, xx, yy;
-  for (;;) {
-/* r = e3n52sb8 */
-r = next_uint64(bitgen_state);
-idx = r & 0xff;
-r >>= 8;
-sign = r & 0x1;
-rabs = (r >> 1) & 0x000f;
-x = rabs * wi_double[idx];
-if (sign & 0x1)
-  x = -x;
-if (rabs < ki_double[idx])
-  return x; /* 99.3% of the time return here */
+  while (1) {
+double x, y, f0, f1;
+union { uint64_t u; double f; } u;
+const uint64_t u64 = next_uint64(bitgen_state);
+const uint_fast8_t idx = (u64 >> 1) & 0xff;
+const double uf64 = ((u64 >> 11) * (1.0 / (UINT64_C(1) << 53))) *
+xtbl_double[idx];
+
+if (uf64 < xtbl_double[idx + 1])
+  return u.f = uf64, u.u |= (u64 & 1) << 63, u.f;
+
 if (idx == 0) {
-  for (;;) {
-/* Switch to 1.0 - U to avoid log(0.0), see GH 13361 */
-xx = -ziggurat_nor_inv_r * log(1.0 - next_double(bitgen_state));
-yy = -log(1.0 - next_double(bitgen_state));
-if (yy + yy > xx * xx)
-  return ((rabs >> 8) & 0x1) ? -(ziggurat_nor_r + xx)
- : ziggurat_nor_r + xx;
-  }
-} else {
-  if (((fi_double[idx - 1] - fi_double[idx]) * next_double(bitgen_state) +
-   fi_double[idx]) < exp(-0.5 * x * x))
-return x;
+  do {
+x = log(next_double(bitgen_state)) * 1.0 / ZIGGURAT_NORM_R;
+y = log(next_double(bitgen_state));
+  } while (-(y + y) < x * x);
+  if (u64 & 1)
+return x - (double)ZIGGURAT_NORM_R;
+  else
+return (double)ZIGGURAT_NORM_R - x;
 }
+
+y = uf64 * uf64;
+f0 = exp(-0.5 * (xtbl_double[idx] * xtbl_double[idx] - y));
+f1 = exp(-0.5 * (xtbl_double[idx + 1] * xtbl_double[idx + 1] - y));
+if (f1 + next_double(bitgen_state) * (f0 - f1) < 1)
+  return u.f = uf64, u.u |= (u64 & 1) << 63, u.f;
   }
 }
 
 void random_standard_normal_fill(bitgen_t *bitgen_state, npy_intp cnt, double 

Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread Robert Kern
On Sat, Feb 6, 2021 at 7:27 AM  wrote:

> Well, I can tell you why it needs to be backward compatible!  I use random
> numbers fairly frequently, and to unit test them I set a specific seed and
> then make sure I get the same answers.
>
> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the
> same thing with the same bit's as the original one, but I couldn't get it
> to work.
>

To be clear, this is not our backwards compatibility policy for the methods
that you have modified. Our policy is spelled out here:

  https://numpy.org/neps/nep-0019-rng-policy.html

The TestRandomDist suite of tests were adapted from the older RandomState
(which is indeed frozen and not allowed to change algorithms). It's a mix
of correctness tests that are valid regardless of the precise algorithm
(does this method reject invalid arguments? do degenerate arguments yield
the correct constant value?) and actual "has this algorithm changed
unexpectedly?" tests. The former are the most valuable, but the latter are
useful for testing in cross-platform contexts. Compilers and different CPUs
can do naughty things sometimes, and we want the cross-platform differences
to be minimal. When you do change an algorithm implementation for
Generator, as you have done, you are expected to do thorough tests
(offline, not in the unit tests) that it is correctly sampling from the
target probability distribution, then once satisfied, change the hard-coded
values in TestRandomDist to match whatever you are generating.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread Charles R Harris
On Sat, Feb 6, 2021 at 5:27 AM  wrote:

> Well, I can tell you why it needs to be backward compatible!  I use random
> numbers fairly frequently, and to unit test them I set a specific seed and
> then make sure I get the same answers.
>
> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the
> same thing with the same bit's as the original one, but I couldn't get it
> to work.
>
> Have you benchmarked it using the generator interface? The structure of
> this as a no monolithic generator makes it a good deal slower than
> generating in straight C (with everything inline).  While I'm not sure a
> factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I
> don't know where the cutoff is).
>
>
> I originally benchmarked my implementation against a bunch of other ones
> in c (because I was developing a c library
> https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
> But I did run the built-in benchmark: ./runtests.py --bench
> bench_random.RNG.time_normal_zig and the results are:
>
>   new   old
> PCG64  589±3μs 1.06±0.03ms
> MT19937 985±4μs 1.44±0.01ms
> Philox 981±30μs1.39±0.01ms
> SFC64  508±4μs   900±4μs
> numpy2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom
>
>
> I'm not yet 100% certain about the implementations, but I attached a diff
> of my current progress.
>
>
You can actually get rid of the loop entirely and implement the exponential
function directly by using an exponential bound on the bottom ziggurat
block ends. It just requires a slight change in the block boundaries.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread camel-cdr
I tried to implement a different implementation of the ziggurat method for 
generating standard normal distributions that is about twice as fast and uses 
2/3 of the memory than the old one.
I tested the implementation separately and am very confident it's correct, but 
it does fail 28 test in coverage testing.
Checking the testing code I found out that all the failed tests are inside 
TestRandomDist which has the goal of "Make[ing] sure the random distribution 
returns the correct value for a given seed". Why would this be needed?
The only explanation I can come up with is that it's standard_normal is, in 
regards to seeding, required to be backwards compatible. If that's the case how 
would, could one even implement a new algorithm?___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread Kevin Sheppard
Have you benchmarked it using the generator interface? The structure of
this as a no monolithic generator makes it a good deal slower than
generating in straight C (with everything inline).  While I'm not sure a
factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I
don't know where the cutoff is).

Can you post benchmarks from using it through Generator?

Also, those tests would be replaced with new values if the patch was
accepted, so don't worry about them.

Kevin


On Sat, Feb 6, 2021, 09:32  wrote:

> I tried to implement a different implementation of the ziggurat method for
> generating standard normal distributions that is about twice as fast and
> uses 2/3 of the memory than the old one.
> I tested the implementation separately and am very confident it's correct,
> but it does fail 28 test in coverage testing.
> Checking the testing code I found out that all the failed tests are inside
> TestRandomDist which has the goal of "Make[ing] sure the random
> distribution returns the correct value for a given seed". Why would this be
> needed?
> The only explanation I can come up with is that it's standard_normal is,
> in regards to seeding, required to be backwards compatible. If that's the
> case how would, could one even implement a new algorithm?
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread Tom Swirly
Well, I can tell you why it needs to be backward compatible!  I use random
numbers fairly frequently, and to unit test them I set a specific seed and
then make sure I get the same answers.

If your change went in (and I were using numpy normal distributions, which
I am not) then my tests would break.

Particularly, you'd have the unfixable problem that it would be impossible
to write tests for your code that worked regardless of the version of numpy
that was installed.

Yes, I agree that in your use case, this is powerfully unfortunate, and
prevents you from making a change that would otherwise benefit everyone.

The three ways to do this would be the following:

   - Add a new parameter to the function call, say, faster=False, which you
   set True to get the new behavior
   - Add a global flag somewhere you set to get the new behavior everywhere
   - Create a new function called normal_faster or some such

All of these are ugly for obvious reasons.

On Sat, Feb 6, 2021 at 10:33 AM  wrote:

> I tried to implement a different implementation of the ziggurat method for
> generating standard normal distributions that is about twice as fast and
> uses 2/3 of the memory than the old one.
> I tested the implementation separately and am very confident it's correct,
> but it does fail 28 test in coverage testing.
> Checking the testing code I found out that all the failed tests are inside
> TestRandomDist which has the goal of "Make[ing] sure the random
> distribution returns the correct value for a given seed". Why would this be
> needed?
> The only explanation I can come up with is that it's standard_normal is,
> in regards to seeding, required to be backwards compatible. If that's the
> case how would, could one even implement a new algorithm?
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 
 /t

PGP Key: https://flowcrypt.com/pub/tom.ritchf...@gmail.com
*https://tom.ritchford.com *
*https://tom.swirly.com *
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion