Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbeck...@gmail.com wrote: I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that. There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
Robert Kern wrote: On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbeck...@gmail.com wrote: I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that. There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule. I did find a way to do this, and the results are good enough. Timing is quite comparable to my pure c++ implementation. I used rk_ulong from mtrand.so. I also tried using rk_fill, but it was a bit slower. The boost::python c++ code is attached, for posterity. #include boost/python.hpp // Include this AFTER Python.h; in projects with multiple source files, you may have to // #define some things first (see the numpy C-API documentation). #include numpy/arrayobject.h #include boost/numpy/dtype.hpp #include boost/python/class.hpp #include boost/python/init.hpp #include boost/python/module.hpp #include boost/python/def.hpp #include boost/python/args.hpp #include boost/python/str.hpp #include boost/python/call.hpp #include boost/numpy.hpp #include ndarray/Array.h #include ndarray/bp/auto.h #include inttypes.h #include stdexcept using namespace boost::python; namespace bp=boost::python; namespace nd=ndarray; #define RK_STATE_LEN 624 typedef struct rk_state_ { unsigned long key[RK_STATE_LEN]; int pos; int has_gauss; /* !=0: gauss contains a gaussian deviate */ double gauss; /* The rk_state structure has been extended to store the following * information for the binomial generator. If the input values of n or p * are different than nsave and psave, then the other parameters will be * recomputed. RTK 2005-09-02 */ int has_binomial; /* !=0: following parameters initialized for binomial */ double psave; long nsave; double r; double q; double fm; long m; double p1; double xm; double xl; double xr; double c; double laml; double lamr; double p2; double p3; double p4; } rk_state; struct RandomState { PyObject_HEAD rk_state *internal_state; }; extern C void rk_fill(void *buffer, size_t size, rk_state *state); extern C unsigned long rk_ulong(rk_state *state); inline NPY_TYPES dtype_to_type (bp::object const dtype, NPY_TYPES _default) { if (dtype == bp::object()) return _default; else { boost::numpy::dtype dt (dtype); return NPY_TYPES (reinterpret_castPyArray_Descr*(dt.ptr())-type_num); // PyArray_Descr *tp_descr; // if (PyArray_DescrConverter(dtype.ptr(), tp_descr) != NPY_SUCCEED) // throw bp::error_already_set(); // return NPY_TYPES(tp_descr-type_num); } } templateint struct type_imap; template struct type_imapNPY_INT8 { typedef npy_int8 type; }; template struct type_imapNPY_INT16 { typedef npy_int16 type; }; template struct type_imapNPY_INT32 { typedef npy_int32 type; }; template struct type_imapNPY_INT64 { typedef npy_int64 type; }; template struct type_imapNPY_FLOAT32 { typedef float type; }; template struct type_imapNPY_FLOAT64 { typedef double type; }; template struct type_imapNPY_COMPLEX64 { typedef std::complexfloat type; }; template struct type_imapNPY_COMPLEX128 { typedef std::complexdouble type; }; template class IntType=uint64_t class pnseq_generator { public: typedef IntType result_type; typedef IntType input_type; explicit pnseq_generator(int _width) : width (_width), count (0), mask ( _width == std::numeric_limitsresult_type::digits ? ~(result_type(0)) : ~(result_type(-1LL) _width)) { if (std::numeric_limitsresult_type::digits width or width = 0) throw std::runtime_error (pnseq_generator::invalid width); } explicit pnseq_generator (int _width, int _count, result_type _cache): width (_width), count (_count), cache (_cache), mask (~(result_type(-1LL) _width)) {} templateclass Engine result_type operator() (Engine eng) { // BOOST_STATIC_ASSERT (sizeof (typename Engine::result_type) = width); if (count = width-1) { cache = eng(); count = std::min (std::numeric_limitstypename Engine::result_type::digits, std::numeric_limitsresult_type::digits); } result_type bits = cache mask; cache = width; count -= width; return bits; } // private: const int width; int count; result_type cache; const result_type mask; }; templatetypename out_type=uint64_t struct engine { typedef out_type result_type; bp::object rs; // RandomState object engine (bp::object const _rs) : rs (_rs) {} result_type
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
Robert Kern wrote: On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbeck...@gmail.com wrote: I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that. There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule. One thing this code doesn't do: it requires construction of the wrapper class passing in a RandomState object. It doesn't verify you actually gave it a RandomState object. It's hard to do that. The problem as I see it is to perform this check, I need the RandomStateType object, which unfortunately mtrand.so does not export. The only way to do it is in c++ code: 1. import numpy.random 2. get RandomState class 3. call it to create RandomState instance 4. get the ob_type pointer. Pretty ugly: object mod = object (handle (borrowed((PyImport_ImportModule(numpy.random); object rs_obj = mod.attr(RandomState); object rs_inst = callobject (rs_obj.ptr(), 0); RandomStateTypeObj = rs_inst.ptr()-ob_type; ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
On Thu, Mar 14, 2013 at 11:00 AM, Neal Becker ndbeck...@gmail.com wrote: Robert Kern wrote: On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbeck...@gmail.com wrote: I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that. There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule. One thing this code doesn't do: it requires construction of the wrapper class passing in a RandomState object. It doesn't verify you actually gave it a RandomState object. It's hard to do that. The problem as I see it is to perform this check, I need the RandomStateType object, which unfortunately mtrand.so does not export. The only way to do it is in c++ code: 1. import numpy.random 2. get RandomState class 3. call it to create RandomState instance 4. get the ob_type pointer. Pretty ugly: object mod = object (handle (borrowed((PyImport_ImportModule(numpy.random); object rs_obj = mod.attr(RandomState); object rs_inst = callobject (rs_obj.ptr(), 0); RandomStateTypeObj = rs_inst.ptr()-ob_type; PyObject_IsInstance() should be sufficient. http://docs.python.org/2/c-api/object.html#PyObject_IsInstance -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
Robert Kern wrote: On Thu, Mar 14, 2013 at 11:00 AM, Neal Becker ndbeck...@gmail.com wrote: Robert Kern wrote: On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbeck...@gmail.com wrote: I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that. There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule. One thing this code doesn't do: it requires construction of the wrapper class passing in a RandomState object. It doesn't verify you actually gave it a RandomState object. It's hard to do that. The problem as I see it is to perform this check, I need the RandomStateType object, which unfortunately mtrand.so does not export. The only way to do it is in c++ code: 1. import numpy.random 2. get RandomState class 3. call it to create RandomState instance 4. get the ob_type pointer. Pretty ugly: object mod = object (handle (borrowed((PyImport_ImportModule(numpy.random); object rs_obj = mod.attr(RandomState); object rs_inst = callobject (rs_obj.ptr(), 0); RandomStateTypeObj = rs_inst.ptr()-ob_type; PyObject_IsInstance() should be sufficient. http://docs.python.org/2/c-api/object.html#PyObject_IsInstance -- Robert Kern Thanks! For the record, an updated version attached. #include boost/python.hpp // Include this AFTER Python.h; in projects with multiple source files, you may have to // #define some things first (see the numpy C-API documentation). #include numpy/arrayobject.h #include boost/numpy/dtype.hpp #include boost/python/class.hpp #include boost/python/init.hpp #include boost/python/module.hpp #include boost/python/def.hpp #include boost/python/args.hpp #include boost/python/str.hpp #include boost/python/call.hpp #include boost/numpy.hpp #include ndarray/Array.h #include ndarray/bp/auto.h #include inttypes.h #include stdexcept using namespace boost::python; namespace bp=boost::python; namespace nd=ndarray; #define RK_STATE_LEN 624 typedef struct rk_state_ { unsigned long key[RK_STATE_LEN]; int pos; int has_gauss; /* !=0: gauss contains a gaussian deviate */ double gauss; /* The rk_state structure has been extended to store the following * information for the binomial generator. If the input values of n or p * are different than nsave and psave, then the other parameters will be * recomputed. RTK 2005-09-02 */ int has_binomial; /* !=0: following parameters initialized for binomial */ double psave; long nsave; double r; double q; double fm; long m; double p1; double xm; double xl; double xr; double c; double laml; double lamr; double p2; double p3; double p4; } rk_state; struct RandomState { PyObject_HEAD rk_state *internal_state; }; extern C void rk_fill(void *buffer, size_t size, rk_state *state); extern C unsigned long rk_ulong(rk_state *state); extern C unsigned long rk_random(rk_state *state); inline NPY_TYPES dtype_to_type (bp::object const dtype, NPY_TYPES _default) { if (dtype == bp::object()) return _default; else { boost::numpy::dtype dt (dtype); return NPY_TYPES (reinterpret_castPyArray_Descr*(dt.ptr())-type_num); // PyArray_Descr *tp_descr; // if (PyArray_DescrConverter(dtype.ptr(), tp_descr) != NPY_SUCCEED) // throw bp::error_already_set(); // return NPY_TYPES(tp_descr-type_num); } } templateint struct type_imap; template struct type_imapNPY_INT8 { typedef npy_int8 type; }; template struct type_imapNPY_INT16 { typedef npy_int16 type; }; template struct type_imapNPY_INT32 { typedef npy_int32 type; }; template struct type_imapNPY_INT64 { typedef npy_int64 type; }; template struct type_imapNPY_FLOAT32 { typedef float type; }; template struct type_imapNPY_FLOAT64 { typedef double type; }; template struct type_imapNPY_COMPLEX64 { typedef std::complexfloat type; }; template struct type_imapNPY_COMPLEX128 { typedef std::complexdouble type; }; template class IntType=uint32_t class pnseq_generator { public: typedef IntType result_type; typedef IntType input_type; explicit pnseq_generator(int _width) : width (_width), count (0), mask ( _width == std::numeric_limitsresult_type::digits ? ~(result_type(0)) : ~(result_type(-1LL) _width)) { if (std::numeric_limitsresult_type::digits width or width = 0) throw std::runtime_error (pnseq_generator::invalid width); } explicit pnseq_generator (int _width, int _count, result_type _cache): width
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
Neal Becker wrote: Neal Becker wrote: I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that. The c++ code could invoke via the python api, but that might be slower. I'm just rambling here, I'd have to see the API to get some ideas. I think if I could just grab a long int from the underlying mersenne twister, through some c api? Well, this at least appears to work - probably not the most efficient approach - calls the RandomState object via the python interface to get 4 bytes at a time: int test1 (bp::object rs) { bp::str bytes = call_methodbp::str (rs.ptr(), bytes, 4); // get 4 bytes return *reinterpret_castint* (PyString_AS_STRING (bytes.ptr())); } BOOST_PYTHON_MODULE (numpy_rand) { boost::numpy::initialize(); def (test1, test1); } ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote: On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote: My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default. numpy.random.set_state(MT19937_64, …) # choose the 64-bit implementation Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance. +1 Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance. And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347 I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work... -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote: On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote: My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default. numpy.random.set_state(MT19937_64, …) # choose the 64-bit implementation Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance. +1 Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance. And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347 I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work... (To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.) -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
On Tue, Mar 12, 2013 at 5:27 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote: On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote: My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default. numpy.random.set_state(MT19937_64, …) # choose the 64-bit implementation Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance. +1 Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance. Here is a recipe how to use it http://mail.scipy.org/pipermail/numpy-discussion/2010-September/052911.html (I'm just drawing a random number as seed that I can save, instead of the entire state.) Josef And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347 I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work... (To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.) -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
Nathaniel Smith wrote: On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote: On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote: My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default. numpy.random.set_state(MT19937_64, …) # choose the 64-bit implementation Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance. +1 Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance. And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347 I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work... (To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.) -n What do you mean that the idea of global shared state is a bad one? How would you prefer the API to look? An alternative is a stateless rng, where you have to pass it it's state on each invocation, which it would update and return. I hope you're not advocating that. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
On Tue, Mar 12, 2013 at 10:38 PM, Neal Becker ndbeck...@gmail.com wrote: Nathaniel Smith wrote: On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote: On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote: My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default. numpy.random.set_state(MT19937_64, …) # choose the 64-bit implementation Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance. +1 Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance. And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347 I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work... (To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.) -n What do you mean that the idea of global shared state is a bad one? The words global shared state drives fear into the hearts of experienced programmers everywhere, whatever the context. :-) It's rarely a *good* idea. How would you prefer the API to look? There are two current APIs: 1. Instantiate RandomState and call it's methods 2. Just call the functions in numpy.random The latter has a shared global state. In fact, all of those functions are just references to the methods on a shared global RandomState instance. We advocate using the former API. Note that it already exists. It was the recommended API from day one. No one is recommending adding a new API. An alternative is a stateless rng, where you have to pass it it's state on each invocation, which it would update and return. I hope you're not advocating that. No. This is a place where OOP solved the problem neatly. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that. The c++ code could invoke via the python api, but that might be slower. I'm just rambling here, I'd have to see the API to get some ideas. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
Neal Becker wrote: I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that. The c++ code could invoke via the python api, but that might be slower. I'm just rambling here, I'd have to see the API to get some ideas. I think if I could just grab a long int from the underlying mersenne twister, through some c api? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
On Tue, Mar 12, 2013 at 7:10 PM, Robert Kern robert.k...@gmail.com wrote: On Tue, Mar 12, 2013 at 10:38 PM, Neal Becker ndbeck...@gmail.com wrote: Nathaniel Smith wrote: On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.k...@gmail.com wrote: On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote: My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default. numpy.random.set_state(MT19937_64, …) # choose the 64-bit implementation Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance. +1 Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance. And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347 I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work... (To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.) -n What do you mean that the idea of global shared state is a bad one? The words global shared state drives fear into the hearts of experienced programmers everywhere, whatever the context. :-) It's rarely a *good* idea. How would you prefer the API to look? There are two current APIs: 1. Instantiate RandomState and call it's methods 2. Just call the functions in numpy.random The latter has a shared global state. In fact, all of those functions are just references to the methods on a shared global RandomState instance. We advocate using the former API. Note that it already exists. It was the recommended API from day one. No one is recommending adding a new API. I never saw much advertising for the RandomState api, and until recently wasn't sure why using the global random state function np.random.norm, ... should be a bad idea. Learning by example, and seeing almost all examples using the global state, is not exactly conducive to figuring out that there is an issue. All of scipy.stats.distribution random numbers are using the global random state. (I guess I should open a ticket.) Josef An alternative is a stateless rng, where you have to pass it it's state on each invocation, which it would update and return. I hope you're not advocating that. No. This is a place where OOP solved the problem neatly. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam s...@continuum.io wrote: Hi all, I am redirecting a discussion on github issue tracker here. My original post (https://github.com/numpy/numpy/issues/3137): The current implementation of the RNG seems to be MT19937-32. Since 64-bit machines are common nowadays, I am suggesting adding or upgrading to MT19937-64. Thoughts? Let me start by answering to njsmith's comments on the issue tracker: Would it be faster? Although I have not benchmarked the 64-bit implementation, it is likely that it will be faster on a 64-bit machine since the number of iteration (controlled by NN and MM in the reference implementation http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c) is reduced by half. In addition, each generation in the 64-bit implementation produces a 64-bit random int which can be used to generate double precision random number. Unlike the 32-bit implementation which requires generating a pair of 32-bit random int. From the last time this was brought up, it looks like getting a single 64-bit integer out from MT19937-64 takes about the same amount of time as getting a single 32-bit integer from MT19937-32, perhaps a little slower, even on a 64-bit machine. http://comments.gmane.org/gmane.comp.python.numeric.general/27773 So getting a single double would be not quite twice as fast. But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit instructions; thus, it is likely to be slower. (1) Use less memory? The amount of memory use will remain the same. The size of the RNG state is the same. Provide higher quality randomness? My naive answer is that 32-bit and 64-bit implementation have the same 2^19937-1 period. Need to do some research and experiments. Would it change the output of this program: import numpy numpy.random.seed(0) print numpy.random.random() ? Unfortunately, yes. The 64-bit implementation generates a different random number sequence with the same seed. (2) My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default. numpy.random.set_state(MT19937_64, …) # choose the 64-bit implementation Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance. Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance. There is some low-level C work that needs to be done to allow the non-uniform distributions to be shared between implementations of the core uniform PRNG, but that's the same no matter how you organize the upper layer. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Adopt Mersenne Twister 64bit?
Hi all, I am redirecting a discussion on github issue tracker here. My original post (https://github.com/numpy/numpy/issues/3137): The current implementation of the RNG seems to be MT19937-32. Since 64-bit machines are common nowadays, I am suggesting adding or upgrading to MT19937-64. Thoughts? Let me start by answering to njsmith's comments on the issue tracker: Would it be faster? Although I have not benchmarked the 64-bit implementation, it is likely that it will be faster on a 64-bit machine since the number of iteration (controlled by NN and MM in the reference implementation http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c) is reduced by half. In addition, each generation in the 64-bit implementation produces a 64-bit random int which can be used to generate double precision random number. Unlike the 32-bit implementation which requires generating a pair of 32-bit random int. But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit instructions; thus, it is likely to be slower. (1) Use less memory? The amount of memory use will remain the same. The size of the RNG state is the same. Provide higher quality randomness? My naive answer is that 32-bit and 64-bit implementation have the same 2^19937-1 period. Need to do some research and experiments. Would it change the output of this program: import numpy numpy.random.seed(0) print numpy.random.random() ? Unfortunately, yes. The 64-bit implementation generates a different random number sequence with the same seed. (2) My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default. numpy.random.set_state(MT19937_64, …) # choose the 64-bit implementation Thoughts? Best, Siu ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion