Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Robert Kern
On Mon, Feb 8, 2021 at 12:10 PM Sebastian Berg 
wrote:

>
> This type of change should be in the release notes undoubtedly and
> likely a `.. versionchanged::` directive in the docstring.
>
> Maybe the best thing would be to create a single, prominent but brief,
> changelog listing all (or almost all) stream changes? (Possibly instead
> of documenting it in the individual function as `.. versionchanged::`)
>
> I am thinking just a table with:
>   * version changed
>   * short description
>   * affected functions
>   * how the stream changed (if that is ever relevant)
>

Both are probably useful. Good ideas.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Charles R Harris
On Sat, Feb 6, 2021 at 2:32 AM  wrote:

> I tried to implement a different implementation of the ziggurat method for
> generating standard normal distributions that is about twice as fast and
> uses 2/3 of the memory than the old one.
> I tested the implementation separately and am very confident it's correct,
> but it does fail 28 test in coverage testing.
> Checking the testing code I found out that all the failed tests are inside
> TestRandomDist which has the goal of "Make[ing] sure the random
> distribution returns the correct value for a given seed". Why would this be
> needed?
> The only explanation I can come up with is that it's standard_normal is,
> in regards to seeding, required to be backwards compatible. If that's the
> case how would, could one even implement a new algorithm?
>

Just for fun, I've attached the (C++) implementation that uses the
exponentially extended base block. Note that the constructor produces the
table.

Chuck
#include 
#include 
#include 

template 
class RandomNormal {
int32_t m_k[256];
double m_w[256];
double m_f[256];
RNG m_rng;

public:
explicit RandomNormal(RNG & rng) : m_rng(rng)
{
const double a = 4.928760963486943786e-03;
const double two31 = 2147483648.0;
double r = 3.655420419026941481;

m_f[0] = exp(-.5*r*r);
m_w[0] = a/(m_f[0]*two31);
m_k[0] = static_cast(r/m_w[0]);
for(int i = 1; i < 255; ++i) {
const double y = m_f[i-1] + a/r;
const double x = sqrt(-2*log(y));
m_f[i] = y;
m_w[i] = r/two31;
m_k[i] = static_cast(x/m_w[i]);
r = x;
}
m_f[255] = 1.0;
m_w[255] = r/two31;
m_k[255] = 0;
}


double operator()()
{
while (1) {
const uint32_t randu = m_rng();
const uint32_t mask1 = static_cast(0x00ffUL);
const uint32_t mask2 = static_cast(0xff00UL);
const int32_t j = static_cast(randu & mask2);
const int32_t i = static_cast(randu & mask1);
const double x = j*m_w[i];

if (abs(j) < m_k[i]) {
return x;
}
if (i > 0) {
const double randd = m_rng.double3();
const double y = m_f[i-1] + randd*(m_f[i] - m_f[i-1]);
if (y < exp(-.5*x*x)) {
return x;
}
}
else {
const double r = 3.655420419026941481;
const double t = -log(m_rng.double3())*(1./r);
const double y = -log(m_rng.double3());
if (2*y > t*t) {
return x > 0 ? (r + t) : -(r + t);
}
}
}
}
};
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Sebastian Berg
On Mon, 2021-02-08 at 16:36 +, Kevin Sheppard wrote:
> That is good news indeed.  Seems like a good upgrade, especially
> given the breadth of application of normals and the multiple
> appearances within distributions.c (e.g., Cauchy). Is there a
> deprecation for a change like this? Or is it just a note and new
> random numbers in the next major?  I think this is the first time a
> substantially new algo has replaced an existing one.
>  

I don't think we can deprecate or even warn about it, that would just
result in noise that cannot be silenced.
If we really think warnings are necessary, it sounds like you would
need an opt-in `numpy.random.set_warn_if_streams_will_change()`.
That sounds like diminishing returns on first sight.

It may be good that this happens now, rather than in a few years when
adoption of the new API is probably still on the low side.


This type of change should be in the release notes undoubtedly and
likely a `.. versionchanged::` directive in the docstring.

Maybe the best thing would be to create a single, prominent but brief,
changelog listing all (or almost all) stream changes? (Possibly instead
of documenting it in the individual function as `.. versionchanged::`)

I am thinking just a table with:
  * version changed
  * short description
  * affected functions
  * how the stream changed (if that is ever relevant)


Cheers,

Sebastian


> Kevin
>  
>  
> From: Robert Kern
> Sent: Monday, February 8, 2021 4:06 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Question about optimizing
> random_standard_normal
>  
> On Mon, Feb 8, 2021 at 10:53 AM Kevin Sheppard <
> kevin.k.shepp...@gmail.com> wrote:
> > My reading is that the first 4 are pure C, presumably using the
> > standard practice of inclining so as to make the tightest loop
> > possible, and to allow the compiler to make other optimizations. 
> > The final line is what happens when you replace the existing
> > ziggurat in NumPy with the new one. I read it this way since it has
> > both “new” and “old” with numpy. If it isn’t this, then I’m unsure
> > what “new” and “old” could mean in the context of this thread.
> 
>  
> No, these are our benchmarks of `Generator`. `numpy` is testing
> `RandomState`, which wasn't touched by their contribution.
>  
>   
> https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L93-L97
>   
> https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L123-L124
>  
> > I suppose camel-cdr can clarify what was actually done.
> 
>  
> > > > > But I did run the built-in benchmark: ./runtests.py --bench
> > > > > bench_random.RNG.time_normal_zig and the results are:
> 
>  
> -- 
> Robert Kern
>  
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Robert Kern
On Mon, Feb 8, 2021 at 11:38 AM Kevin Sheppard 
wrote:

> That is good news indeed.  Seems like a good upgrade, especially given the
> breadth of application of normals and the multiple appearances within
> distributions.c (e.g., Cauchy). Is there a deprecation for a change like
> this? Or is it just a note and new random numbers in the next major?  I
> think this is the first time a substantially new algo has replaced an
> existing one.
>

Per NEP 19, a change like this is a new feature that can be included in a
feature release, like any other feature.

I would like to see some more testing on the quality of the sequences more
than the ones already quoted. Using Kolmogorov-Smirnov and/or
Anderson-Darling tests as well, which should be more thorough.


https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py#L604-L620

There are also some subtle issues involved in ziggurat method
implementations that go beyond just the marginal distributions. I'm not
sure, even, that our current implementation deals with the issues raised in
the following paper, but I'd like to do no worse.

  https://www.doornik.com/research/ziggurat.pdf

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Kevin Sheppard
That is good news indeed.  Seems like a good upgrade, especially given the breadth of application of normals and the multiple appearances within distributions.c (e.g., Cauchy). Is there a deprecation for a change like this? Or is it just a note and new random numbers in the next major?  I think this is the first time a substantially new algo has replaced an existing one. Kevin  From: Robert KernSent: Monday, February 8, 2021 4:06 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] Question about optimizing random_standard_normal On Mon, Feb 8, 2021 at 10:53 AM Kevin Sheppard <kevin.k.shepp...@gmail.com> wrote:My reading is that the first 4 are pure C, presumably using the standard practice of inclining so as to make the tightest loop possible, and to allow the compiler to make other optimizations.  The final line is what happens when you replace the existing ziggurat in NumPy with the new one. I read it this way since it has both “new” and “old” with numpy. If it isn’t this, then I’m unsure what “new” and “old” could mean in the context of this thread. No, these are our benchmarks of `Generator`. `numpy` is testing `RandomState`, which wasn't touched by their contribution.   https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L93-L97  https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L123-L124 I suppose camel-cdr can clarify what was actually done. But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are: -- Robert Kern 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Robert Kern
On Mon, Feb 8, 2021 at 10:53 AM Kevin Sheppard 
wrote:

> My reading is that the first 4 are pure C, presumably using the standard
> practice of inclining so as to make the tightest loop possible, and to
> allow the compiler to make other optimizations.  The final line is what
> happens when you replace the existing ziggurat in NumPy with the new one. I
> read it this way since it has both “new” and “old” with numpy. If it isn’t
> this, then I’m unsure what “new” and “old” could mean in the context of
> this thread.
>

No, these are our benchmarks of `Generator`. `numpy` is testing
`RandomState`, which wasn't touched by their contribution.


https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L93-L97

https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L123-L124


> I suppose camel-cdr can clarify what was actually done.
>

But I did run the built-in benchmark: ./runtests.py --bench
> bench_random.RNG.time_normal_zig and the results are:
>
>
-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Kevin Sheppard
My reading is that the first 4 are pure C, presumably using the standard practice of inclining so as to make the tightest loop possible, and to allow the compiler to make other optimizations.  The final line is what happens when you replace the existing ziggurat in NumPy with the new one. I read it this way since it has both “new” and “old” with numpy. If it isn’t this, then I’m unsure what “new” and “old” could mean in the context of this thread. I suppose camel-cdr can clarify what was actually done. Kevin  From: Robert KernSent: Monday, February 8, 2021 3:41 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] Question about optimizing random_standard_normal On Mon, Feb 8, 2021 at 3:05 AM Kevin Sheppard <kevin.k.shepp...@gmail.com> wrote:If I understand correctly, there is no gain when applying this patch to Generator.  I'm not that surprised that this is the case since the compiler is much more limited in when it can do in Generator than what it can when filling a large array directly with functions available for inlining and unrolling. Again, if I understand correctly I think it will be difficult to justify breaking the stream for a negligible gain in performance. Can you explain your understanding of the benchmark results? To me, it looks like nearly a 2x improvement with the faster BitGenerators (our default PCG64 and SFC64). That may or may not worth breaking the stream, but it's far from negligible.But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are:   new   oldPCG64  589±3μs 1.06±0.03msMT19937 985±4μs 1.44±0.01msPhilox 981±30μs    1.39±0.01msSFC64  508±4μs   900±4μsnumpy    2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom -- Robert Kern 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Robert Kern
On Mon, Feb 8, 2021 at 3:05 AM Kevin Sheppard 
wrote:

> If I understand correctly, there is no gain when applying this patch to
> Generator.  I'm not that surprised that this is the case since the compiler
> is much more limited in when it can do in Generator than what it can when
> filling a large array directly with functions available for inlining and
> unrolling. Again, if I understand correctly I think it will be difficult to
> justify breaking the stream for a negligible gain in performance.
>

Can you explain your understanding of the benchmark results? To me, it
looks like nearly a 2x improvement with the faster BitGenerators (our
default PCG64 and SFC64). That may or may not worth breaking the stream,
but it's far from negligible.

> But I did run the built-in benchmark: ./runtests.py --bench
>> bench_random.RNG.time_normal_zig and the results are:
>>
>>
>>   new   old
>> PCG64  589±3μs 1.06±0.03ms
>> MT19937 985±4μs 1.44±0.01ms
>> Philox 981±30μs1.39±0.01ms
>> SFC64  508±4μs   900±4μs
>> numpy2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom
>>
>
-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Kevin Sheppard
If I understand correctly, there is no gain when applying this patch to
Generator.  I'm not that surprised that this is the case since the compiler
is much more limited in when it can do in Generator than what it can when
filling a large array directly with functions available for inlining and
unrolling. Again, if I understand correctly I think it will be difficult to
justify breaking the stream for a negligible gain in performance.

Kevin


On Sat, Feb 6, 2021 at 12:27 PM  wrote:

> Well, I can tell you why it needs to be backward compatible!  I use random
> numbers fairly frequently, and to unit test them I set a specific seed and
> then make sure I get the same answers.
>
> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the
> same thing with the same bit's as the original one, but I couldn't get it
> to work.
>
> Have you benchmarked it using the generator interface? The structure of
> this as a no monolithic generator makes it a good deal slower than
> generating in straight C (with everything inline).  While I'm not sure a
> factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I
> don't know where the cutoff is).
>
>
> I originally benchmarked my implementation against a bunch of other ones
> in c (because I was developing a c library
> https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
> But I did run the built-in benchmark: ./runtests.py --bench
> bench_random.RNG.time_normal_zig and the results are:
>
>   new   old
> PCG64  589±3μs 1.06±0.03ms
> MT19937 985±4μs 1.44±0.01ms
> Philox 981±30μs1.39±0.01ms
> SFC64  508±4μs   900±4μs
> numpy2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom
>
>
> I'm not yet 100% certain about the implementations, but I attached a diff
> of my current progress.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread camel-cdr
‐‐‐ Original Message ‐‐‐
On Saturday, February 6, 2021 3:29 PM, Robert Kern  
wrote:

> On Sat, Feb 6, 2021 at 7:27 AM  wrote:
>
>>> Well, I can tell you why it needs to be backward compatible! I use random 
>>> numbers fairly frequently, and to unit test them I set a specific seed and 
>>> then make sure I get the same answers.
>>
>> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the 
>> same thing with the same bit's as the original one, but I couldn't get it to 
>> work.
>
> To be clear, this is not our backwards compatibility policy for the methods 
> that you have modified. Our policy is spelled out here:
>
> https://numpy.org/neps/nep-0019-rng-policy.html
>
> The TestRandomDist suite of tests were adapted from the older RandomState 
> (which is indeed frozen and not allowed to change algorithms). It's a mix of 
> correctness tests that are valid regardless of the precise algorithm (does 
> this method reject invalid arguments? do degenerate arguments yield the 
> correct constant value?) and actual "has this algorithm changed 
> unexpectedly?" tests. The former are the most valuable, but the latter are 
> useful for testing in cross-platform contexts. Compilers and different CPUs 
> can do naughty things sometimes, and we want the cross-platform differences 
> to be minimal. When you do change an algorithm implementation for Generator, 
> as you have done, you are expected to do thorough tests (offline, not in the 
> unit tests) that it is correctly sampling from the target probability 
> distribution, then once satisfied, change the hard-coded values in 
> TestRandomDist to match whatever you are generating.
>
> --
> Robert Kern

Ok, cool, that basically explains a lot.

> When you do change an algorithm implementation for Generator, as you have 
> done, you are expected to do thorough tests (offline, not in the unit tests) 
> that it is correctly sampling from the target probability distribution, then 
> once satisfied, change the hard-coded values in TestRandomDist to match 
> whatever you are generating.

I'm probably not versed enough in statistics to do thorough testing. I used the 
testing in https://www.seehuhn.de/pages/ziggurat and plotting histograms to 
verify correctness, that probably won't be sufficient.___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread camel-cdr
> Well, I can tell you why it needs to be backward compatible! I use random 
> numbers fairly frequently, and to unit test them I set a specific seed and 
> then make sure I get the same answers.

Hmm, I guess that makes sense. I tried to adjust my algorithms to do the same 
thing with the same bit's as the original one, but I couldn't get it to work.

> Have you benchmarked it using the generator interface? The structure of this 
> as a no monolithic generator makes it a good deal slower than generating in 
> straight C (with everything inline). While I'm not sure a factor of 2 is 
> enough to justify a change (for me 10x, 1.2x is not but I don't know where 
> the cutoff is).

I originally benchmarked my implementation against a bunch of other ones in c 
(because I was developing a c library 
https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
But I did run the built-in benchmark: ./runtests.py --bench 
bench_random.RNG.time_normal_zig and the results are:

new old
PCG64 589±3μs 1.06±0.03ms
MT19937 985±4μs 1.44±0.01ms
Philox 981±30μs 1.39±0.01ms
SFC64 508±4μs 900±4μs
numpy 2.99±0.06ms 2.98±0.01ms # no change for /dev/urandom

I'm not yet 100% certain about the implementations, but I attached a diff of my 
current progress.diff --git a/numpy/random/src/distributions/distributions.c b/numpy/random/src/distributions/distributions.c
index 93e0bdc5f..447d5e161 100644
--- a/numpy/random/src/distributions/distributions.c
+++ b/numpy/random/src/distributions/distributions.c
@@ -46,9 +46,9 @@ void random_standard_uniform_fill_f(bitgen_t *bitgen_state, npy_intp cnt, float
 static double standard_exponential_unlikely(bitgen_t *bitgen_state,
 uint8_t idx, double x) {
   if (idx == 0) {
 /* Switch to 1.0 - U to avoid log(0.0), see GH 13361 */
-return ziggurat_exp_r - log(1.0 - next_double(bitgen_state));
+return (double)ZIGGURAT_EXP_R - log(1.0 - next_double(bitgen_state));
   } else if ((fe_double[idx - 1] - fe_double[idx]) * next_double(bitgen_state) +
  fe_double[idx] <
  exp(-x)) {
 return x;
@@ -83,9 +83,9 @@ void random_standard_exponential_fill(bitgen_t * bitgen_state, npy_intp cnt, dou
 static float standard_exponential_unlikely_f(bitgen_t *bitgen_state,
  uint8_t idx, float x) {
   if (idx == 0) {
 /* Switch to 1.0 - U to avoid log(0.0), see GH 13361 */
-return ziggurat_exp_r_f - logf(1.0f - next_float(bitgen_state));
+return (float)ZIGGURAT_EXP_R - logf(1.0f - next_float(bitgen_state));
   } else if ((fe_float[idx - 1] - fe_float[idx]) * next_float(bitgen_state) +
  fe_float[idx] <
  expf(-x)) {
 return x;
@@ -132,41 +132,36 @@ void random_standard_exponential_inv_fill_f(bitgen_t * bitgen_state, npy_intp cn
 out[i] = -log(1.0 - next_float(bitgen_state));
   }
 }
 
-
 double random_standard_normal(bitgen_t *bitgen_state) {
-  uint64_t r;
-  int sign;
-  uint64_t rabs;
-  int idx;
-  double x, xx, yy;
-  for (;;) {
-/* r = e3n52sb8 */
-r = next_uint64(bitgen_state);
-idx = r & 0xff;
-r >>= 8;
-sign = r & 0x1;
-rabs = (r >> 1) & 0x000f;
-x = rabs * wi_double[idx];
-if (sign & 0x1)
-  x = -x;
-if (rabs < ki_double[idx])
-  return x; /* 99.3% of the time return here */
+  while (1) {
+double x, y, f0, f1;
+union { uint64_t u; double f; } u;
+const uint64_t u64 = next_uint64(bitgen_state);
+const uint_fast8_t idx = (u64 >> 1) & 0xff;
+const double uf64 = ((u64 >> 11) * (1.0 / (UINT64_C(1) << 53))) *
+xtbl_double[idx];
+
+if (uf64 < xtbl_double[idx + 1])
+  return u.f = uf64, u.u |= (u64 & 1) << 63, u.f;
+
 if (idx == 0) {
-  for (;;) {
-/* Switch to 1.0 - U to avoid log(0.0), see GH 13361 */
-xx = -ziggurat_nor_inv_r * log(1.0 - next_double(bitgen_state));
-yy = -log(1.0 - next_double(bitgen_state));
-if (yy + yy > xx * xx)
-  return ((rabs >> 8) & 0x1) ? -(ziggurat_nor_r + xx)
- : ziggurat_nor_r + xx;
-  }
-} else {
-  if (((fi_double[idx - 1] - fi_double[idx]) * next_double(bitgen_state) +
-   fi_double[idx]) < exp(-0.5 * x * x))
-return x;
+  do {
+x = log(next_double(bitgen_state)) * 1.0 / ZIGGURAT_NORM_R;
+y = log(next_double(bitgen_state));
+  } while (-(y + y) < x * x);
+  if (u64 & 1)
+return x - (double)ZIGGURAT_NORM_R;
+  else
+return (double)ZIGGURAT_NORM_R - x;
 }
+
+y = uf64 * uf64;
+f0 = exp(-0.5 * (xtbl_double[idx] * xtbl_double[idx] - y));
+f1 = exp(-0.5 * (xtbl_double[idx + 1] * xtbl_double[idx + 1] - y));
+if (f1 + next_double(bitgen_state) * (f0 - f1) < 1)
+  return u.f = uf64, u.u |= (u64 & 1) << 63, u.f;
   }
 }
 
 void random_standard_normal_fill(bitgen_t *bitgen_state, npy_intp cnt, double 

Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread Robert Kern
On Sat, Feb 6, 2021 at 7:27 AM  wrote:

> Well, I can tell you why it needs to be backward compatible!  I use random
> numbers fairly frequently, and to unit test them I set a specific seed and
> then make sure I get the same answers.
>
> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the
> same thing with the same bit's as the original one, but I couldn't get it
> to work.
>

To be clear, this is not our backwards compatibility policy for the methods
that you have modified. Our policy is spelled out here:

  https://numpy.org/neps/nep-0019-rng-policy.html

The TestRandomDist suite of tests were adapted from the older RandomState
(which is indeed frozen and not allowed to change algorithms). It's a mix
of correctness tests that are valid regardless of the precise algorithm
(does this method reject invalid arguments? do degenerate arguments yield
the correct constant value?) and actual "has this algorithm changed
unexpectedly?" tests. The former are the most valuable, but the latter are
useful for testing in cross-platform contexts. Compilers and different CPUs
can do naughty things sometimes, and we want the cross-platform differences
to be minimal. When you do change an algorithm implementation for
Generator, as you have done, you are expected to do thorough tests
(offline, not in the unit tests) that it is correctly sampling from the
target probability distribution, then once satisfied, change the hard-coded
values in TestRandomDist to match whatever you are generating.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread Charles R Harris
On Sat, Feb 6, 2021 at 5:27 AM  wrote:

> Well, I can tell you why it needs to be backward compatible!  I use random
> numbers fairly frequently, and to unit test them I set a specific seed and
> then make sure I get the same answers.
>
> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the
> same thing with the same bit's as the original one, but I couldn't get it
> to work.
>
> Have you benchmarked it using the generator interface? The structure of
> this as a no monolithic generator makes it a good deal slower than
> generating in straight C (with everything inline).  While I'm not sure a
> factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I
> don't know where the cutoff is).
>
>
> I originally benchmarked my implementation against a bunch of other ones
> in c (because I was developing a c library
> https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
> But I did run the built-in benchmark: ./runtests.py --bench
> bench_random.RNG.time_normal_zig and the results are:
>
>   new   old
> PCG64  589±3μs 1.06±0.03ms
> MT19937 985±4μs 1.44±0.01ms
> Philox 981±30μs1.39±0.01ms
> SFC64  508±4μs   900±4μs
> numpy2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom
>
>
> I'm not yet 100% certain about the implementations, but I attached a diff
> of my current progress.
>
>
You can actually get rid of the loop entirely and implement the exponential
function directly by using an exponential bound on the bottom ziggurat
block ends. It just requires a slight change in the block boundaries.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread camel-cdr
I tried to implement a different implementation of the ziggurat method for 
generating standard normal distributions that is about twice as fast and uses 
2/3 of the memory than the old one.
I tested the implementation separately and am very confident it's correct, but 
it does fail 28 test in coverage testing.
Checking the testing code I found out that all the failed tests are inside 
TestRandomDist which has the goal of "Make[ing] sure the random distribution 
returns the correct value for a given seed". Why would this be needed?
The only explanation I can come up with is that it's standard_normal is, in 
regards to seeding, required to be backwards compatible. If that's the case how 
would, could one even implement a new algorithm?___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread Kevin Sheppard
Have you benchmarked it using the generator interface? The structure of
this as a no monolithic generator makes it a good deal slower than
generating in straight C (with everything inline).  While I'm not sure a
factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I
don't know where the cutoff is).

Can you post benchmarks from using it through Generator?

Also, those tests would be replaced with new values if the patch was
accepted, so don't worry about them.

Kevin


On Sat, Feb 6, 2021, 09:32  wrote:

> I tried to implement a different implementation of the ziggurat method for
> generating standard normal distributions that is about twice as fast and
> uses 2/3 of the memory than the old one.
> I tested the implementation separately and am very confident it's correct,
> but it does fail 28 test in coverage testing.
> Checking the testing code I found out that all the failed tests are inside
> TestRandomDist which has the goal of "Make[ing] sure the random
> distribution returns the correct value for a given seed". Why would this be
> needed?
> The only explanation I can come up with is that it's standard_normal is,
> in regards to seeding, required to be backwards compatible. If that's the
> case how would, could one even implement a new algorithm?
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread Tom Swirly
Well, I can tell you why it needs to be backward compatible!  I use random
numbers fairly frequently, and to unit test them I set a specific seed and
then make sure I get the same answers.

If your change went in (and I were using numpy normal distributions, which
I am not) then my tests would break.

Particularly, you'd have the unfixable problem that it would be impossible
to write tests for your code that worked regardless of the version of numpy
that was installed.

Yes, I agree that in your use case, this is powerfully unfortunate, and
prevents you from making a change that would otherwise benefit everyone.

The three ways to do this would be the following:

   - Add a new parameter to the function call, say, faster=False, which you
   set True to get the new behavior
   - Add a global flag somewhere you set to get the new behavior everywhere
   - Create a new function called normal_faster or some such

All of these are ugly for obvious reasons.

On Sat, Feb 6, 2021 at 10:33 AM  wrote:

> I tried to implement a different implementation of the ziggurat method for
> generating standard normal distributions that is about twice as fast and
> uses 2/3 of the memory than the old one.
> I tested the implementation separately and am very confident it's correct,
> but it does fail 28 test in coverage testing.
> Checking the testing code I found out that all the failed tests are inside
> TestRandomDist which has the goal of "Make[ing] sure the random
> distribution returns the correct value for a given seed". Why would this be
> needed?
> The only explanation I can come up with is that it's standard_normal is,
> in regards to seeding, required to be backwards compatible. If that's the
> case how would, could one even implement a new algorithm?
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 
 /t

PGP Key: https://flowcrypt.com/pub/tom.ritchf...@gmail.com
*https://tom.ritchford.com *
*https://tom.swirly.com *
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion