Re: [Numpy-discussion] Scipy 2016 attending
Yup. On Wed, May 18, 2016 at 5:04 PM, Steve Waterbury wrote: > Me 3! ;) > > Steve > > > On 05/18/2016 06:03 PM, Nathaniel Smith wrote: > > Me too. > > On Wed, May 18, 2016 at 3:02 PM, Chris Barker > wrote: > > I'll be there. > > -CHB > > > On Wed, May 18, 2016 at 2:09 PM, Charles R Harris > wrote: > > Hi All, > > Out of curiosity, who all here intends to be at Scipy 2016? > > Chuck > > ___ > NumPy-Discussion mailing > listNumPy-Discussion@scipy.orghttps://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > chris.bar...@noaa.gov > > ___ > NumPy-Discussion mailing > listNumPy-Discussion@scipy.orghttps://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Ryan May ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Scipy 2016 attending
Me 3! ;) Steve On 05/18/2016 06:03 PM, Nathaniel Smith wrote: Me too. On Wed, May 18, 2016 at 3:02 PM, Chris Barker wrote: I'll be there. -CHB On Wed, May 18, 2016 at 2:09 PM, Charles R Harris wrote: Hi All, Out of curiosity, who all here intends to be at Scipy 2016? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Scipy 2016 attending
Me too. On Wed, May 18, 2016 at 3:02 PM, Chris Barker wrote: > I'll be there. > > -CHB > > > On Wed, May 18, 2016 at 2:09 PM, Charles R Harris > wrote: >> >> Hi All, >> >> Out of curiosity, who all here intends to be at Scipy 2016? >> >> Chuck >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Nathaniel J. Smith -- https://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Scipy 2016 attending
I'll be there. -CHB On Wed, May 18, 2016 at 2:09 PM, Charles R Harris wrote: > Hi All, > > Out of curiosity, who all here intends to be at Scipy 2016? > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Scipy 2016 attending
Hi All, Out of curiosity, who all here intends to be at Scipy 2016? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Wed, May 18, 2016 at 5:07 AM, Robert Kern wrote: > On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith wrote: >> >> On Tue, May 17, 2016 at 10:41 AM, Robert Kern >> wrote: >> > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith wrote: >> >> >> >> On May 17, 2016 1:50 AM, "Robert Kern" wrote: >> >> > >> >> [...] >> >> > What you want is a function that returns many RandomState objects >> >> > that >> >> > are hopefully spread around the MT19937 space enough that they are >> >> > essentially independent (in the absence of true jumpahead). The >> >> > better >> >> > implementation of such a function would look something like this: >> >> > >> >> > def spread_out_prngs(n, root_prng=None): >> >> > if root_prng is None: >> >> > root_prng = np.random >> >> > elif not isinstance(root_prng, np.random.RandomState): >> >> > root_prng = np.random.RandomState(root_prng) >> >> > sprouted_prngs = [] >> >> > for i in range(n): >> >> > seed_array = root_prng.randint(1<<32, size=624) # >> >> > dtype=np.uint32 under 1.11 >> >> > sprouted_prngs.append(np.random.RandomState(seed_array)) >> >> > return spourted_prngs >> >> >> >> Maybe a nice way to encapsulate this in the RandomState interface would >> >> be >> >> a method RandomState.random_state() that generates and returns a new >> >> child >> >> RandomState. >> > >> > I disagree. This is a workaround in the absence of proper jumpahead or >> > guaranteed-independent streams. I would not encourage it. >> > >> >> > Internally, this generates seed arrays of about the size of the >> >> > MT19937 >> >> > state so make sure that you can access more of the state space. That >> >> > will at >> >> > least make the chance of collision tiny. And it can be easily >> >> > rewritten to >> >> > take advantage of one of the newer PRNGs that have true independent >> >> > streams: >> >> > >> >> > https://github.com/bashtage/ng-numpy-randomstate >> >> >> >> ... But unfortunately I'm not sure how to make my interface suggestion >> >> above work on top of one of these RNGs, because for >> >> RandomState.random_state >> >> you really want a tree of independent RNGs and the fancy new PRNGs only >> >> provide a single flat namespace :-/. And even more annoyingly, the tree >> >> API >> >> is actually a nicer API, because with a flat namespace you have to know >> >> up >> >> front about all possible RNGs your code will use, which is an >> >> unfortunate >> >> global coupling that makes it difficult to compose programs out of >> >> independent pieces, while the RandomState.random_state approach >> >> composes >> >> beautifully. Maybe there's some clever way to allocate a 64-bit >> >> namespace to >> >> make it look tree-like? I'm not sure 64 bits is really enough... >> > >> > MT19937 doesn't have a "tree" any more than the others. It's the same >> > flat >> > state space. You are just getting the illusion of a tree by hoping that >> > you >> > never collide. You ought to think about precisely the same global >> > coupling >> > issues with MT19937 as you do with guaranteed-independent streams. >> > Hope-and-prayer isn't really a substitute for properly engineering your >> > problem. It's just a moral hazard to promote this method to the main >> > API. >> >> Nonsense. >> >> If your definition of "hope and prayer" includes assuming that we >> won't encounter a random collision in a 2**19937 state space, then >> literally all engineering is hope-and-prayer. A collision could >> happen, but if it does it's overwhelmingly more likely to happen >> because of a flaw in the mathematical analysis, or a bug in the >> implementation, or because random quantum fluctuations caused you and >> your program to suddenly be transported to a parallel world where 1 + >> 1 = 1, than that you just got unlucky with your random state. And all >> of these hazards apply equally to both MT19937 and more modern PRNGs. > > Granted. > >> ...anyway, the real reason I'm a bit grumpy is because there are solid >> engineering reasons why users *want* this API, > > I remain unconvinced on this mark. Grumpily. Sorry for getting grumpy :-). The engineering reasons seem pretty obvious to me though? If you have any use case for independent streams at all, and you're writing code that's intended to live inside a library's abstraction barrier, then you need some way to choose your streams to avoid colliding with arbitrary other code that the end-user might assemble alongside yours as part of their final program. So AFAICT you have two options: either you need a "tree-style" API for allocating these streams, or else you need to add some explicit API to your library that lets the end-user control in detail which streams you use. Both are possible, but the latter is obviously undesireable if you can avoid it, since it breaks the abstraction barrier, making your library more complicated to use and harder to evolve. >> so whether or not it >> turns out to be possible I think we should at l
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Wed, May 18, 2016 at 6:20 PM, wrote: > > On Wed, May 18, 2016 at 12:01 PM, Robert Kern wrote: >> >> On Wed, May 18, 2016 at 4:50 PM, Chris Barker wrote: >> >> >> >> > ...anyway, the real reason I'm a bit grumpy is because there are solid >> >> > engineering reasons why users *want* this API, >> > >> > Honestly, I am lost in the math -- but like any good engineer, I want to accomplish something anyway :-) I trust you guys to get this right -- or at least document what's "wrong" with it. >> > >> > But, if I'm reading the use case that started all this correctly, it closely matches my use-case. That is, I have a complex model with multiple independent "random" processes. And we want to be able to re-produce EXACTLY simulations -- our users get confused when the results are "different" even if in a statistically insignificant way. >> > >> > At the moment we are using one RNG, with one seed for everything. So we get reproducible results, but if one thing is changed, then the entire simulation is different -- which is OK, but it would be nicer to have each process using its own RNG stream with it's own seed. However, it matters not one whit if those seeds are independent -- the processes are different, you'd never notice if they were using the same PRN stream -- because they are used differently. So a "fairly low probability of a clash" would be totally fine. >> >> Well, the main question is: do you need to be able to spawn dependent streams at arbitrary points to an arbitrary depth without coordination between processes? The necessity for multiple independent streams per se is not contentious. > > I'm similar to Chris, and didn't try to figure out the details of what you are talking about. > > However, if there are functions getting into numpy that help in using a best practice even if it's not bullet proof, then it's still better than home made approaches. > If it get's in soon, then we can use it in a few years (given dependency lag). At that point there should be more distributed, nested simulation based algorithms where we don't know in advance how far we have to go to get reliable numbers or convergence. > > (But I don't see anything like that right now.) Current best practice is to use PRNGs with settable streams (or fixed jumpahead for those PRNGs cursed to not have settable streams but blessed to have super-long periods). The way to get those into numpy is to help Kevin Sheppard finish: https://github.com/bashtage/ng-numpy-randomstate He's done nearly all of the hard work already. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Wed, May 18, 2016 at 12:01 PM, Robert Kern wrote: > On Wed, May 18, 2016 at 4:50 PM, Chris Barker > wrote: > >> > >> > ...anyway, the real reason I'm a bit grumpy is because there are solid > >> > engineering reasons why users *want* this API, > > > > Honestly, I am lost in the math -- but like any good engineer, I want to > accomplish something anyway :-) I trust you guys to get this right -- or at > least document what's "wrong" with it. > > > > But, if I'm reading the use case that started all this correctly, it > closely matches my use-case. That is, I have a complex model with multiple > independent "random" processes. And we want to be able to re-produce > EXACTLY simulations -- our users get confused when the results are > "different" even if in a statistically insignificant way. > > > > At the moment we are using one RNG, with one seed for everything. So we > get reproducible results, but if one thing is changed, then the entire > simulation is different -- which is OK, but it would be nicer to have each > process using its own RNG stream with it's own seed. However, it matters > not one whit if those seeds are independent -- the processes are different, > you'd never notice if they were using the same PRN stream -- because they > are used differently. So a "fairly low probability of a clash" would be > totally fine. > > Well, the main question is: do you need to be able to spawn dependent > streams at arbitrary points to an arbitrary depth without coordination > between processes? The necessity for multiple independent streams per se is > not contentious. > I'm similar to Chris, and didn't try to figure out the details of what you are talking about. However, if there are functions getting into numpy that help in using a best practice even if it's not bullet proof, then it's still better than home made approaches. If it get's in soon, then we can use it in a few years (given dependency lag). At that point there should be more distributed, nested simulation based algorithms where we don't know in advance how far we have to go to get reliable numbers or convergence. (But I don't see anything like that right now.) Josef > > -- > Robert Kern > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Wed, May 18, 2016 at 4:50 PM, Chris Barker wrote: >> >> > ...anyway, the real reason I'm a bit grumpy is because there are solid >> > engineering reasons why users *want* this API, > > Honestly, I am lost in the math -- but like any good engineer, I want to accomplish something anyway :-) I trust you guys to get this right -- or at least document what's "wrong" with it. > > But, if I'm reading the use case that started all this correctly, it closely matches my use-case. That is, I have a complex model with multiple independent "random" processes. And we want to be able to re-produce EXACTLY simulations -- our users get confused when the results are "different" even if in a statistically insignificant way. > > At the moment we are using one RNG, with one seed for everything. So we get reproducible results, but if one thing is changed, then the entire simulation is different -- which is OK, but it would be nicer to have each process using its own RNG stream with it's own seed. However, it matters not one whit if those seeds are independent -- the processes are different, you'd never notice if they were using the same PRN stream -- because they are used differently. So a "fairly low probability of a clash" would be totally fine. Well, the main question is: do you need to be able to spawn dependent streams at arbitrary points to an arbitrary depth without coordination between processes? The necessity for multiple independent streams per se is not contentious. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
> > > ...anyway, the real reason I'm a bit grumpy is because there are solid > > engineering reasons why users *want* this API, > Honestly, I am lost in the math -- but like any good engineer, I want to accomplish something anyway :-) I trust you guys to get this right -- or at least document what's "wrong" with it. But, if I'm reading the use case that started all this correctly, it closely matches my use-case. That is, I have a complex model with multiple independent "random" processes. And we want to be able to re-produce EXACTLY simulations -- our users get confused when the results are "different" even if in a statistically insignificant way. At the moment we are using one RNG, with one seed for everything. So we get reproducible results, but if one thing is changed, then the entire simulation is different -- which is OK, but it would be nicer to have each process using its own RNG stream with it's own seed. However, it matters not one whit if those seeds are independent -- the processes are different, you'd never notice if they were using the same PRN stream -- because they are used differently. So a "fairly low probability of a clash" would be totally fine. Granted, in a Monte Carlo simulation, it could be disastrous... :-) I guess the point is -- do something reasonable, and document its limitations, and we're all fine :-) And thanks for giving your attention to this. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith wrote: > > On Tue, May 17, 2016 at 10:41 AM, Robert Kern wrote: > > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith wrote: > >> > >> On May 17, 2016 1:50 AM, "Robert Kern" wrote: > >> > > >> [...] > >> > What you want is a function that returns many RandomState objects that > >> > are hopefully spread around the MT19937 space enough that they are > >> > essentially independent (in the absence of true jumpahead). The better > >> > implementation of such a function would look something like this: > >> > > >> > def spread_out_prngs(n, root_prng=None): > >> > if root_prng is None: > >> > root_prng = np.random > >> > elif not isinstance(root_prng, np.random.RandomState): > >> > root_prng = np.random.RandomState(root_prng) > >> > sprouted_prngs = [] > >> > for i in range(n): > >> > seed_array = root_prng.randint(1<<32, size=624) # > >> > dtype=np.uint32 under 1.11 > >> > sprouted_prngs.append(np.random.RandomState(seed_array)) > >> > return spourted_prngs > >> > >> Maybe a nice way to encapsulate this in the RandomState interface would be > >> a method RandomState.random_state() that generates and returns a new child > >> RandomState. > > > > I disagree. This is a workaround in the absence of proper jumpahead or > > guaranteed-independent streams. I would not encourage it. > > > >> > Internally, this generates seed arrays of about the size of the MT19937 > >> > state so make sure that you can access more of the state space. That will at > >> > least make the chance of collision tiny. And it can be easily rewritten to > >> > take advantage of one of the newer PRNGs that have true independent streams: > >> > > >> > https://github.com/bashtage/ng-numpy-randomstate > >> > >> ... But unfortunately I'm not sure how to make my interface suggestion > >> above work on top of one of these RNGs, because for RandomState.random_state > >> you really want a tree of independent RNGs and the fancy new PRNGs only > >> provide a single flat namespace :-/. And even more annoyingly, the tree API > >> is actually a nicer API, because with a flat namespace you have to know up > >> front about all possible RNGs your code will use, which is an unfortunate > >> global coupling that makes it difficult to compose programs out of > >> independent pieces, while the RandomState.random_state approach composes > >> beautifully. Maybe there's some clever way to allocate a 64-bit namespace to > >> make it look tree-like? I'm not sure 64 bits is really enough... > > > > MT19937 doesn't have a "tree" any more than the others. It's the same flat > > state space. You are just getting the illusion of a tree by hoping that you > > never collide. You ought to think about precisely the same global coupling > > issues with MT19937 as you do with guaranteed-independent streams. > > Hope-and-prayer isn't really a substitute for properly engineering your > > problem. It's just a moral hazard to promote this method to the main API. > > Nonsense. > > If your definition of "hope and prayer" includes assuming that we > won't encounter a random collision in a 2**19937 state space, then > literally all engineering is hope-and-prayer. A collision could > happen, but if it does it's overwhelmingly more likely to happen > because of a flaw in the mathematical analysis, or a bug in the > implementation, or because random quantum fluctuations caused you and > your program to suddenly be transported to a parallel world where 1 + > 1 = 1, than that you just got unlucky with your random state. And all > of these hazards apply equally to both MT19937 and more modern PRNGs. Granted. > ...anyway, the real reason I'm a bit grumpy is because there are solid > engineering reasons why users *want* this API, I remain unconvinced on this mark. Grumpily. > so whether or not it > turns out to be possible I think we should at least be allowed to have > a discussion about whether there's some way to give it to them. I'm not shutting down discussion of the option. I *implemented* the option. I think that discussing whether it should be part of the main API is premature. There probably ought to be a paper or three out there supporting its safety and utility first. Let the utility function version flourish first. > It's > not even 100% out of the question that we conclude that existing PRNGs > are buggy because they don't take this use case into account -- it > would be far from the first time that numpy found itself going beyond > the limits of older numerical tools that weren't designed to build the > kind of large composable systems that numpy gets used for. > > MT19937's state space is large enough that you could explicitly encode > a "tree seed" into it, even if you don't trust the laws of probability > -- e.g., you start with a RandomState with id [], then its children > have id [0], [1], [2], ..., and their children have ids [0, 0], [0, > 1], ..., [1, 0], ..., and yo