Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-14 Thread Robert Kern
On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbeck...@gmail.com wrote:
 I guess I talked to you about 100 years ago about sharing state between numpy
 rng and code I have in c++ that wraps boost::random.  So is there a C-api for
 this RandomState object I could use to call from c++?  Maybe I could do
 something with that.

There is not one currently. Cython has provisions for sharing such
low-level access to other Cython extensions, but I'm not sure how well
it works for exporting data pointers and function pointers to general
C/++ code. We could probably package the necessities into a struct and
export a pointer to it via a PyCapsule.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-14 Thread Neal Becker
Robert Kern wrote:

 On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbeck...@gmail.com wrote:
 I guess I talked to you about 100 years ago about sharing state between numpy
 rng and code I have in c++ that wraps boost::random.  So is there a C-api for
 this RandomState object I could use to call from c++?  Maybe I could do
 something with that.
 
 There is not one currently. Cython has provisions for sharing such
 low-level access to other Cython extensions, but I'm not sure how well
 it works for exporting data pointers and function pointers to general
 C/++ code. We could probably package the necessities into a struct and
 export a pointer to it via a PyCapsule.
 

I did find a way to do this, and the results are good enough.  Timing is quite 
comparable to my pure c++ implementation.

I used rk_ulong from mtrand.so.  I also tried using rk_fill, but it was a bit 
slower.

The boost::python c++ code is attached, for posterity.
#include boost/python.hpp

// Include this AFTER Python.h; in projects with multiple source files, you may have to
// #define some things first (see the numpy C-API documentation).
#include numpy/arrayobject.h
#include boost/numpy/dtype.hpp

#include boost/python/class.hpp
#include boost/python/init.hpp
#include boost/python/module.hpp
#include boost/python/def.hpp
#include boost/python/args.hpp
#include boost/python/str.hpp
#include boost/python/call.hpp
#include boost/numpy.hpp
#include ndarray/Array.h
#include ndarray/bp/auto.h
#include inttypes.h
#include stdexcept

using namespace boost::python;
namespace bp=boost::python;
namespace nd=ndarray;

#define RK_STATE_LEN 624

typedef struct rk_state_
{
unsigned long key[RK_STATE_LEN];
int pos;
int has_gauss; /* !=0: gauss contains a gaussian deviate */
double gauss;

/* The rk_state structure has been extended to store the following
 * information for the binomial generator. If the input values of n or p
 * are different than nsave and psave, then the other parameters will be
 * recomputed. RTK 2005-09-02 */

int has_binomial; /* !=0: following parameters initialized for
  binomial */
double psave;
long nsave;
double r;
double q;
double fm;
long m;
double p1;
double xm;
double xl;
double xr;
double c;
double laml;
double lamr;
double p2;
double p3;
double p4;

}
rk_state;

struct RandomState {
  PyObject_HEAD
  rk_state *internal_state;
};

extern C void rk_fill(void *buffer, size_t size, rk_state *state);
extern C unsigned long rk_ulong(rk_state *state);


inline NPY_TYPES dtype_to_type (bp::object const dtype, NPY_TYPES _default) {
  if (dtype == bp::object())
return _default;
  else {
boost::numpy::dtype dt (dtype);
return NPY_TYPES (reinterpret_castPyArray_Descr*(dt.ptr())-type_num);
// PyArray_Descr *tp_descr;
// if (PyArray_DescrConverter(dtype.ptr(), tp_descr) != NPY_SUCCEED)
//   throw bp::error_already_set();
// return NPY_TYPES(tp_descr-type_num);
  }
}

templateint struct type_imap;
template
struct type_imapNPY_INT8 {
  typedef npy_int8 type;
};
template
struct type_imapNPY_INT16 {
  typedef npy_int16 type;
};
template
struct type_imapNPY_INT32 {
  typedef npy_int32 type;
};
template
struct type_imapNPY_INT64 {
  typedef npy_int64 type;
};
template
struct type_imapNPY_FLOAT32 {
  typedef float type;
};
template
struct type_imapNPY_FLOAT64 {
  typedef double type;
};
template
struct type_imapNPY_COMPLEX64 {
  typedef std::complexfloat type;
};
template
struct type_imapNPY_COMPLEX128 {
  typedef std::complexdouble type;
};


  template class IntType=uint64_t
  class pnseq_generator {
  public:
typedef IntType result_type;
typedef IntType input_type;

explicit  pnseq_generator(int _width) :  width (_width), count (0),
	 mask (
		   _width == std::numeric_limitsresult_type::digits ? ~(result_type(0)) : ~(result_type(-1LL)  _width))
{
  if (std::numeric_limitsresult_type::digits  width or width = 0)
	throw std::runtime_error (pnseq_generator::invalid width);
}

explicit pnseq_generator (int _width, int _count, result_type _cache):
  width (_width), count (_count), cache (_cache), mask (~(result_type(-1LL)  _width)) {}

templateclass Engine
result_type operator() (Engine eng) {

  //  BOOST_STATIC_ASSERT (sizeof (typename Engine::result_type) = width);

  if (count = width-1) {
	cache = eng();
	count = std::min (std::numeric_limitstypename Engine::result_type::digits, std::numeric_limitsresult_type::digits);
  }
  result_type bits = cache  mask;
  cache = width;
  count -= width;
  return bits;
}

//  private:
const int width;
int count;
result_type cache;
const result_type mask;

  };

templatetypename out_type=uint64_t
struct engine {
  typedef out_type result_type;
  bp::object rs;		// RandomState object

  engine (bp::object const _rs) : rs (_rs) {}

  result_type 

Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-14 Thread Neal Becker
Robert Kern wrote:

 On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbeck...@gmail.com wrote:
 I guess I talked to you about 100 years ago about sharing state between numpy
 rng and code I have in c++ that wraps boost::random.  So is there a C-api for
 this RandomState object I could use to call from c++?  Maybe I could do
 something with that.
 
 There is not one currently. Cython has provisions for sharing such
 low-level access to other Cython extensions, but I'm not sure how well
 it works for exporting data pointers and function pointers to general
 C/++ code. We could probably package the necessities into a struct and
 export a pointer to it via a PyCapsule.
 

One thing this code doesn't do: it requires construction of the wrapper class 
passing in a RandomState object.  It doesn't verify you actually gave it a 
RandomState object.  It's hard to do that.  The problem as I see it is to 
perform this check, I need the RandomStateType object, which unfortunately 
mtrand.so does not export.

The only way to do it is in c++ code:

1. import numpy.random
2. get RandomState class
3. call it to create RandomState instance
4. get the ob_type pointer.

Pretty ugly:

  object mod = object (handle 
(borrowed((PyImport_ImportModule(numpy.random);
  object rs_obj = mod.attr(RandomState);
  object rs_inst = callobject (rs_obj.ptr(), 0);
  RandomStateTypeObj = rs_inst.ptr()-ob_type;


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-14 Thread Robert Kern
On Thu, Mar 14, 2013 at 11:00 AM, Neal Becker ndbeck...@gmail.com wrote:
 Robert Kern wrote:

 On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbeck...@gmail.com wrote:
 I guess I talked to you about 100 years ago about sharing state between 
 numpy
 rng and code I have in c++ that wraps boost::random.  So is there a C-api 
 for
 this RandomState object I could use to call from c++?  Maybe I could do
 something with that.

 There is not one currently. Cython has provisions for sharing such
 low-level access to other Cython extensions, but I'm not sure how well
 it works for exporting data pointers and function pointers to general
 C/++ code. We could probably package the necessities into a struct and
 export a pointer to it via a PyCapsule.


 One thing this code doesn't do: it requires construction of the wrapper class
 passing in a RandomState object.  It doesn't verify you actually gave it a
 RandomState object.  It's hard to do that.  The problem as I see it is to
 perform this check, I need the RandomStateType object, which unfortunately
 mtrand.so does not export.

 The only way to do it is in c++ code:

 1. import numpy.random
 2. get RandomState class
 3. call it to create RandomState instance
 4. get the ob_type pointer.

 Pretty ugly:

   object mod = object (handle
 (borrowed((PyImport_ImportModule(numpy.random);
   object rs_obj = mod.attr(RandomState);
   object rs_inst = callobject (rs_obj.ptr(), 0);
   RandomStateTypeObj = rs_inst.ptr()-ob_type;

PyObject_IsInstance() should be sufficient.

http://docs.python.org/2/c-api/object.html#PyObject_IsInstance

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-14 Thread Neal Becker
Robert Kern wrote:

 On Thu, Mar 14, 2013 at 11:00 AM, Neal Becker ndbeck...@gmail.com wrote:
 Robert Kern wrote:

 On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbeck...@gmail.com wrote:
 I guess I talked to you about 100 years ago about sharing state between
 numpy
 rng and code I have in c++ that wraps boost::random.  So is there a C-api
 for
 this RandomState object I could use to call from c++?  Maybe I could do
 something with that.

 There is not one currently. Cython has provisions for sharing such
 low-level access to other Cython extensions, but I'm not sure how well
 it works for exporting data pointers and function pointers to general
 C/++ code. We could probably package the necessities into a struct and
 export a pointer to it via a PyCapsule.


 One thing this code doesn't do: it requires construction of the wrapper class
 passing in a RandomState object.  It doesn't verify you actually gave it a
 RandomState object.  It's hard to do that.  The problem as I see it is to
 perform this check, I need the RandomStateType object, which unfortunately
 mtrand.so does not export.

 The only way to do it is in c++ code:

 1. import numpy.random
 2. get RandomState class
 3. call it to create RandomState instance
 4. get the ob_type pointer.

 Pretty ugly:

   object mod = object (handle
 (borrowed((PyImport_ImportModule(numpy.random);
   object rs_obj = mod.attr(RandomState);
   object rs_inst = callobject (rs_obj.ptr(), 0);
   RandomStateTypeObj = rs_inst.ptr()-ob_type;
 
 PyObject_IsInstance() should be sufficient.
 
 http://docs.python.org/2/c-api/object.html#PyObject_IsInstance
 
 --
 Robert Kern

Thanks!
For the record, an updated version attached.
#include boost/python.hpp

// Include this AFTER Python.h; in projects with multiple source files, you may have to
// #define some things first (see the numpy C-API documentation).
#include numpy/arrayobject.h
#include boost/numpy/dtype.hpp

#include boost/python/class.hpp
#include boost/python/init.hpp
#include boost/python/module.hpp
#include boost/python/def.hpp
#include boost/python/args.hpp
#include boost/python/str.hpp
#include boost/python/call.hpp
#include boost/numpy.hpp
#include ndarray/Array.h
#include ndarray/bp/auto.h
#include inttypes.h
#include stdexcept

using namespace boost::python;
namespace bp=boost::python;
namespace nd=ndarray;

#define RK_STATE_LEN 624

typedef struct rk_state_
{
unsigned long key[RK_STATE_LEN];
int pos;
int has_gauss; /* !=0: gauss contains a gaussian deviate */
double gauss;

/* The rk_state structure has been extended to store the following
 * information for the binomial generator. If the input values of n or p
 * are different than nsave and psave, then the other parameters will be
 * recomputed. RTK 2005-09-02 */

int has_binomial; /* !=0: following parameters initialized for
  binomial */
double psave;
long nsave;
double r;
double q;
double fm;
long m;
double p1;
double xm;
double xl;
double xr;
double c;
double laml;
double lamr;
double p2;
double p3;
double p4;

}
rk_state;

struct RandomState {
  PyObject_HEAD
  rk_state *internal_state;
};

extern C void rk_fill(void *buffer, size_t size, rk_state *state);
extern C unsigned long rk_ulong(rk_state *state);
extern C unsigned long rk_random(rk_state *state);


inline NPY_TYPES dtype_to_type (bp::object const dtype, NPY_TYPES _default) {
  if (dtype == bp::object())
return _default;
  else {
boost::numpy::dtype dt (dtype);
return NPY_TYPES (reinterpret_castPyArray_Descr*(dt.ptr())-type_num);
// PyArray_Descr *tp_descr;
// if (PyArray_DescrConverter(dtype.ptr(), tp_descr) != NPY_SUCCEED)
//   throw bp::error_already_set();
// return NPY_TYPES(tp_descr-type_num);
  }
}

templateint struct type_imap;
template
struct type_imapNPY_INT8 {
  typedef npy_int8 type;
};
template
struct type_imapNPY_INT16 {
  typedef npy_int16 type;
};
template
struct type_imapNPY_INT32 {
  typedef npy_int32 type;
};
template
struct type_imapNPY_INT64 {
  typedef npy_int64 type;
};
template
struct type_imapNPY_FLOAT32 {
  typedef float type;
};
template
struct type_imapNPY_FLOAT64 {
  typedef double type;
};
template
struct type_imapNPY_COMPLEX64 {
  typedef std::complexfloat type;
};
template
struct type_imapNPY_COMPLEX128 {
  typedef std::complexdouble type;
};


  template class IntType=uint32_t
  class pnseq_generator {
  public:
typedef IntType result_type;
typedef IntType input_type;

explicit  pnseq_generator(int _width) :  width (_width), count (0),
	 mask (
		   _width == std::numeric_limitsresult_type::digits ? ~(result_type(0)) : ~(result_type(-1LL)  _width))
{
  if (std::numeric_limitsresult_type::digits  width or width = 0)
	throw std::runtime_error (pnseq_generator::invalid width);
}

explicit pnseq_generator (int _width, int _count, result_type _cache):
  width 

Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-13 Thread Neal Becker
Neal Becker wrote:

 Neal Becker wrote:
 
 I guess I talked to you about 100 years ago about sharing state between numpy
 rng and code I have in c++ that wraps boost::random.  So is there a C-api for
 this RandomState object I could use to call from c++?  Maybe I could do
 something with that.
 
 The c++ code could invoke via the python api, but that might be slower.  I'm
 just rambling here, I'd have to see the API to get some ideas.
 
 I think if I could just grab a long int from the underlying mersenne twister,
 through some c api?

Well, this at least appears to work - probably not the most efficient approach 
- 
calls the RandomState object via the python interface to get 4 bytes at a time:

int test1 (bp::object  rs) {
  bp::str bytes = call_methodbp::str (rs.ptr(), bytes, 4); // get 4 bytes
  return *reinterpret_castint* (PyString_AS_STRING (bytes.ptr()));
}

BOOST_PYTHON_MODULE (numpy_rand) {
  boost::numpy::initialize();

  def (test1, test1);
}


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-12 Thread Nathaniel Smith
On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote:
 On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote:
 My suggestion to overcome (1) and (2) is to allow the user to select between
 the two implementations (and possibly different algorithms in the future).
 If user does not provide a choice, we use the MT19937-32 by default.

 numpy.random.set_state(MT19937_64, …)   # choose the 64-bit
 implementation

 Most likely, the different PRNGs should be different subclasses of
 RandomState. The module-level convenience API should probably be left
 alone. If you need to control the PRNG that you are using, you really
 need to be passing around a RandomState instance and not relying on
 reseeding the shared global instance.

+1

 Aside: I really wish we hadn't
 exposed `set_state()` in the module API. It's an attractive nuisance.

And our own test suite is a serious offender in this regard, we have
tests that fail if you run the test suite in a non-default order...
  https://github.com/numpy/numpy/issues/347

I wonder if we dare deprecate it? The whole idea of a global random
state is just a bad one, like every other sort of global shared state.
But it's one that's deeply baked into a lot of scientific programmers
expectations about how APIs work...

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-12 Thread Nathaniel Smith
On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote:
 On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote:
 My suggestion to overcome (1) and (2) is to allow the user to select between
 the two implementations (and possibly different algorithms in the future).
 If user does not provide a choice, we use the MT19937-32 by default.

 numpy.random.set_state(MT19937_64, …)   # choose the 64-bit
 implementation

 Most likely, the different PRNGs should be different subclasses of
 RandomState. The module-level convenience API should probably be left
 alone. If you need to control the PRNG that you are using, you really
 need to be passing around a RandomState instance and not relying on
 reseeding the shared global instance.

 +1

 Aside: I really wish we hadn't
 exposed `set_state()` in the module API. It's an attractive nuisance.

 And our own test suite is a serious offender in this regard, we have
 tests that fail if you run the test suite in a non-default order...
   https://github.com/numpy/numpy/issues/347

 I wonder if we dare deprecate it? The whole idea of a global random
 state is just a bad one, like every other sort of global shared state.
 But it's one that's deeply baked into a lot of scientific programmers
 expectations about how APIs work...

(To be clear, by 'it' here I meant np.random.set_seed(), not the whole
np.random API. Probably. And by 'deprecate' I mean 'whine loudly in
some fashion when people use it', not 'rip out in a few releases'. I
think.)

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-12 Thread josef . pktd
On Tue, Mar 12, 2013 at 5:27 PM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote:
 On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote:
 My suggestion to overcome (1) and (2) is to allow the user to select 
 between
 the two implementations (and possibly different algorithms in the future).
 If user does not provide a choice, we use the MT19937-32 by default.

 numpy.random.set_state(MT19937_64, …)   # choose the 64-bit
 implementation

 Most likely, the different PRNGs should be different subclasses of
 RandomState. The module-level convenience API should probably be left
 alone. If you need to control the PRNG that you are using, you really
 need to be passing around a RandomState instance and not relying on
 reseeding the shared global instance.

 +1

 Aside: I really wish we hadn't
 exposed `set_state()` in the module API. It's an attractive nuisance.

Here is a recipe how to use it
http://mail.scipy.org/pipermail/numpy-discussion/2010-September/052911.html

(I'm just drawing a random number as seed that I can save, instead of
the entire state.)

Josef


 And our own test suite is a serious offender in this regard, we have
 tests that fail if you run the test suite in a non-default order...
   https://github.com/numpy/numpy/issues/347

 I wonder if we dare deprecate it? The whole idea of a global random
 state is just a bad one, like every other sort of global shared state.
 But it's one that's deeply baked into a lot of scientific programmers
 expectations about how APIs work...

 (To be clear, by 'it' here I meant np.random.set_seed(), not the whole
 np.random API. Probably. And by 'deprecate' I mean 'whine loudly in
 some fashion when people use it', not 'rip out in a few releases'. I
 think.)

 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-12 Thread Neal Becker
Nathaniel Smith wrote:

 On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote:
 On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote:
 My suggestion to overcome (1) and (2) is to allow the user to select
 between the two implementations (and possibly different algorithms in the
 future). If user does not provide a choice, we use the MT19937-32 by
 default.

 numpy.random.set_state(MT19937_64, …)   # choose the 64-bit
 implementation

 Most likely, the different PRNGs should be different subclasses of
 RandomState. The module-level convenience API should probably be left
 alone. If you need to control the PRNG that you are using, you really
 need to be passing around a RandomState instance and not relying on
 reseeding the shared global instance.

 +1

 Aside: I really wish we hadn't
 exposed `set_state()` in the module API. It's an attractive nuisance.

 And our own test suite is a serious offender in this regard, we have
 tests that fail if you run the test suite in a non-default order...
   https://github.com/numpy/numpy/issues/347

 I wonder if we dare deprecate it? The whole idea of a global random
 state is just a bad one, like every other sort of global shared state.
 But it's one that's deeply baked into a lot of scientific programmers
 expectations about how APIs work...
 
 (To be clear, by 'it' here I meant np.random.set_seed(), not the whole
 np.random API. Probably. And by 'deprecate' I mean 'whine loudly in
 some fashion when people use it', not 'rip out in a few releases'. I
 think.)
 
 -n

What do you mean that the idea of global shared state is a bad one?  How would 
you prefer the API to look?  An alternative is a stateless rng, where you have 
to pass it it's state on each invocation, which it would update and return.  I 
hope you're not advocating that. 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-12 Thread Robert Kern
On Tue, Mar 12, 2013 at 10:38 PM, Neal Becker ndbeck...@gmail.com wrote:
 Nathaniel Smith wrote:

 On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote:
 On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote:
 My suggestion to overcome (1) and (2) is to allow the user to select
 between the two implementations (and possibly different algorithms in the
 future). If user does not provide a choice, we use the MT19937-32 by
 default.

 numpy.random.set_state(MT19937_64, …)   # choose the 64-bit
 implementation

 Most likely, the different PRNGs should be different subclasses of
 RandomState. The module-level convenience API should probably be left
 alone. If you need to control the PRNG that you are using, you really
 need to be passing around a RandomState instance and not relying on
 reseeding the shared global instance.

 +1

 Aside: I really wish we hadn't
 exposed `set_state()` in the module API. It's an attractive nuisance.

 And our own test suite is a serious offender in this regard, we have
 tests that fail if you run the test suite in a non-default order...
   https://github.com/numpy/numpy/issues/347

 I wonder if we dare deprecate it? The whole idea of a global random
 state is just a bad one, like every other sort of global shared state.
 But it's one that's deeply baked into a lot of scientific programmers
 expectations about how APIs work...

 (To be clear, by 'it' here I meant np.random.set_seed(), not the whole
 np.random API. Probably. And by 'deprecate' I mean 'whine loudly in
 some fashion when people use it', not 'rip out in a few releases'. I
 think.)

 -n

 What do you mean that the idea of global shared state is a bad one?

The words global shared state drives fear into the hearts of
experienced programmers everywhere, whatever the context. :-) It's
rarely a *good* idea.

 How would
 you prefer the API to look?

There are two current APIs:

1. Instantiate RandomState and call it's methods
2. Just call the functions in numpy.random

The latter has a shared global state. In fact, all of those
functions are just references to the methods on a shared global
RandomState instance.

We advocate using the former API. Note that it already exists. It was
the recommended API from day one. No one is recommending adding a new
API.

 An alternative is a stateless rng, where you have
 to pass it it's state on each invocation, which it would update and return.  I
 hope you're not advocating that.

No. This is a place where OOP solved the problem neatly.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-12 Thread Neal Becker
I guess I talked to you about 100 years ago about sharing state between numpy 
rng and code I have in c++ that wraps boost::random.  So is there a C-api for 
this RandomState object I could use to call from c++?  Maybe I could do 
something with that.

The c++ code could invoke via the python api, but that might be slower.  I'm 
just rambling here, I'd have to see the API to get some ideas.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-12 Thread Neal Becker
Neal Becker wrote:

 I guess I talked to you about 100 years ago about sharing state between numpy
 rng and code I have in c++ that wraps boost::random.  So is there a C-api for
 this RandomState object I could use to call from c++?  Maybe I could do
 something with that.
 
 The c++ code could invoke via the python api, but that might be slower.  I'm
 just rambling here, I'd have to see the API to get some ideas.

I think if I could just grab a long int from the underlying mersenne twister, 
through some c api?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-12 Thread josef . pktd
On Tue, Mar 12, 2013 at 7:10 PM, Robert Kern robert.k...@gmail.com wrote:
 On Tue, Mar 12, 2013 at 10:38 PM, Neal Becker ndbeck...@gmail.com wrote:
 Nathaniel Smith wrote:

 On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote:
 On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote:
 My suggestion to overcome (1) and (2) is to allow the user to select
 between the two implementations (and possibly different algorithms in the
 future). If user does not provide a choice, we use the MT19937-32 by
 default.

 numpy.random.set_state(MT19937_64, …)   # choose the 64-bit
 implementation

 Most likely, the different PRNGs should be different subclasses of
 RandomState. The module-level convenience API should probably be left
 alone. If you need to control the PRNG that you are using, you really
 need to be passing around a RandomState instance and not relying on
 reseeding the shared global instance.

 +1

 Aside: I really wish we hadn't
 exposed `set_state()` in the module API. It's an attractive nuisance.

 And our own test suite is a serious offender in this regard, we have
 tests that fail if you run the test suite in a non-default order...
   https://github.com/numpy/numpy/issues/347

 I wonder if we dare deprecate it? The whole idea of a global random
 state is just a bad one, like every other sort of global shared state.
 But it's one that's deeply baked into a lot of scientific programmers
 expectations about how APIs work...

 (To be clear, by 'it' here I meant np.random.set_seed(), not the whole
 np.random API. Probably. And by 'deprecate' I mean 'whine loudly in
 some fashion when people use it', not 'rip out in a few releases'. I
 think.)

 -n

 What do you mean that the idea of global shared state is a bad one?

 The words global shared state drives fear into the hearts of
 experienced programmers everywhere, whatever the context. :-) It's
 rarely a *good* idea.

 How would
 you prefer the API to look?

 There are two current APIs:

 1. Instantiate RandomState and call it's methods
 2. Just call the functions in numpy.random

 The latter has a shared global state. In fact, all of those
 functions are just references to the methods on a shared global
 RandomState instance.

 We advocate using the former API. Note that it already exists. It was
 the recommended API from day one. No one is recommending adding a new
 API.

I never saw much advertising for the RandomState api, and until
recently wasn't sure why using the global random state function
np.random.norm, ... should be a bad idea.

Learning by example, and seeing almost all examples using the global
state, is not exactly conducive to figuring out that there is an
issue.

All of scipy.stats.distribution random numbers are using the global
random state. (I guess I should open a ticket.)

Josef


 An alternative is a stateless rng, where you have
 to pass it it's state on each invocation, which it would update and return.  
 I
 hope you're not advocating that.

 No. This is a place where OOP solved the problem neatly.

 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-11 Thread Robert Kern
On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote:
 Hi all,

 I am redirecting a discussion on github issue tracker here.  My original
 post (https://github.com/numpy/numpy/issues/3137):

 The current implementation of the RNG seems to be MT19937-32. Since 64-bit
 machines are common nowadays, I am suggesting adding or upgrading to
 MT19937-64.  Thoughts?

 Let me start by answering to njsmith's comments on the issue tracker:

 Would it be faster?


 Although I have not benchmarked the 64-bit implementation, it is likely that
 it will be faster on a 64-bit machine since the number of iteration
 (controlled by NN and MM in the reference implementation
 http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c)
 is reduced by half.  In addition, each generation in the 64-bit
 implementation produces a 64-bit random int which can be used to generate
 double precision random number.  Unlike the 32-bit implementation which
 requires generating a pair of 32-bit random int.

From the last time this was brought up, it looks like getting a single
64-bit integer out from MT19937-64 takes about the same amount of time
as getting a single 32-bit integer from MT19937-32, perhaps a little
slower, even on a 64-bit machine.

http://comments.gmane.org/gmane.comp.python.numeric.general/27773

So getting a single double would be not quite twice as fast.

 But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit
 instructions; thus, it is likely to be slower.  (1)

 Use less memory?


 The amount of memory use will remain the same.  The size of the RNG state is
 the same.

 Provide higher quality randomness?


 My naive answer is that 32-bit and 64-bit implementation have the same
 2^19937-1 period. Need to do some research and experiments.

 Would it change the output of this program: import numpy
 numpy.random.seed(0) print numpy.random.random() ?


 Unfortunately, yes.  The 64-bit implementation generates a different random
 number sequence with the same seed. (2)


 My suggestion to overcome (1) and (2) is to allow the user to select between
 the two implementations (and possibly different algorithms in the future).
 If user does not provide a choice, we use the MT19937-32 by default.

 numpy.random.set_state(MT19937_64, …)   # choose the 64-bit
 implementation

Most likely, the different PRNGs should be different subclasses of
RandomState. The module-level convenience API should probably be left
alone. If you need to control the PRNG that you are using, you really
need to be passing around a RandomState instance and not relying on
reseeding the shared global instance. Aside: I really wish we hadn't
exposed `set_state()` in the module API. It's an attractive nuisance.

There is some low-level C work that needs to be done to allow the
non-uniform distributions to be shared between implementations of the
core uniform PRNG, but that's the same no matter how you organize the
upper layer.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-10 Thread Siu Kwan Lam
Hi all,

I am redirecting a discussion on github issue tracker here.  My original post 
(https://github.com/numpy/numpy/issues/3137):

The current implementation of the RNG seems to be MT19937-32. Since 64-bit 
machines are common nowadays, I am suggesting adding or upgrading to 
MT19937-64.  Thoughts?

Let me start by answering to njsmith's comments on the issue tracker:

 Would it be faster?

Although I have not benchmarked the 64-bit implementation, it is likely that it 
will be faster on a 64-bit machine since the number of iteration (controlled by 
NN and MM in the reference implementation 
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c) 
is reduced by half.  In addition, each generation in the 64-bit implementation 
produces a 64-bit random int which can be used to generate double precision 
random number.  Unlike the 32-bit implementation which requires generating a 
pair of 32-bit random int.

But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit 
instructions; thus, it is likely to be slower.  (1)

 Use less memory?

The amount of memory use will remain the same.  The size of the RNG state is 
the same.

 Provide higher quality randomness?


My naive answer is that 32-bit and 64-bit implementation have the same 
2^19937-1 period.  Need to do some research and experiments.

 Would it change the output of this program:
   import numpy
   numpy.random.seed(0)
   print numpy.random.random()
 ?

Unfortunately, yes.  The 64-bit implementation generates a different random 
number sequence with the same seed. (2)


My suggestion to overcome (1) and (2) is to allow the user to select between 
the two implementations (and possibly different algorithms in the future).  If 
user does not provide a choice, we use the MT19937-32 by default.

numpy.random.set_state(MT19937_64, …)   # choose the 64-bit 
implementation

Thoughts?

Best,
Siu 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion