Re: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design
On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe mwwi...@gmail.com wrote: The missing data thread has gotten a bit heated, and after sitting down with Travis to discuss the issues a bit, we've concluded that it would be nice to do a call with everyone who's interested in the discussion with better communication bandwidth. There are lots of good ideas out there, and it is very easy for things to get lost when we're just emailing. Getting on the phone should provide a more effective way to ensure everyone is properly being heard. We're proposing to set up a GotoMeeting call at 4pm CST today. Please respond if you can make it and your level of interest. I've created a Doodle where you can indicate your availability if 4pm today is too short notice, and we should schedule for a different time: http://www.doodle.com/eu9k3xip47a6gnue I hope everyone had a great weekend. Thanks to all who filled in the doodle, we have a unanimous winning time of 2PM central time today. I'll post with details about how to connect to the call when that has been prepared. Cheers, Mark ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] custom atlas
I thought I'd try to speed up numpy on my fedora system by rebuilding the atlas package so it would be tuned for my machine. But when I do: rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec it fails with: res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS. A bit of googling has not revealed a solution. Any hints? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] custom atlas
On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker ndbeck...@gmail.com wrote: I thought I'd try to speed up numpy on my fedora system by rebuilding the atlas package so it would be tuned for my machine. But when I do: rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec it fails with: res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS. A bit of googling has not revealed a solution. Any hints? I've never seen that, OTOH, I haven't built ATLAS in the last few years. Do you have all the power saving/frequency changing options turned off? What version of ATLAS are you using? What CPU? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] custom atlas
Charles R Harris wrote: On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker ndbeck...@gmail.com wrote: I thought I'd try to speed up numpy on my fedora system by rebuilding the atlas package so it would be tuned for my machine. But when I do: rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec it fails with: res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS. A bit of googling has not revealed a solution. Any hints? I've never seen that, OTOH, I haven't built ATLAS in the last few years. Do you have all the power saving/frequency changing options turned off? What version of ATLAS are you using? What CPU? Chuck Ah, hadn't tried turing off cpuspeed. Try again... nope same error. 2 cpus, each: model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping: 11 cpu MHz : 800.000 that's what it says @idle cache size : 4096 KB ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Conditional random variables
On 07/05/2011 10:17 AM, josef.p...@gmail.com wrote: On Mon, Jul 4, 2011 at 10:13 PM, Ted To rainexpec...@theo.to wrote: Hi, Is there an easy way to make random draws from a conditional random variable? E.g., draw a random variable, x conditional on x=\bar x. If you mean here truncated distribution, then I asked a similar question on the scipy user list a month ago for the normal distribution. The answer was use rejection sampling, Gibbs or MCMC. I just sample from the original distribution and throw away those values that are not in the desired range. This works fine if there is only a small truncation, but not so well for distribution with support only in the tails. It's reasonably fast for distributions that numpy.random produces relatively fast. (Having a bi- or multi-variate distribution and sampling y conditional on given x sounds more fun.) Yes, that is what I had been doing but in some cases my truncations moves into the upper tail and it takes an extraordinary amount of time. I found that I could use scipy.stats.truncnorm but I haven't yet figured out how to use it for a joint distribution. E.g., I have 2 normal rv's X and Y from which I would like to draw X and Y where X+Y= U. Any suggestions? Cheers, Ted To ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] custom atlas
On Tue, Jul 5, 2011 at 8:13 AM, Neal Becker ndbeck...@gmail.com wrote: Charles R Harris wrote: On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker ndbeck...@gmail.com wrote: I thought I'd try to speed up numpy on my fedora system by rebuilding the atlas package so it would be tuned for my machine. But when I do: rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec it fails with: res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS. A bit of googling has not revealed a solution. Any hints? I've never seen that, OTOH, I haven't built ATLAS in the last few years. Do you have all the power saving/frequency changing options turned off? What version of ATLAS are you using? What CPU? Chuck Ah, hadn't tried turing off cpuspeed. Try again... nope same error. 2 cpus, each: model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping: 11 cpu MHz : 800.000 that's what it says @idle You haven't got cpu frequency scaling under control. Linux? Depending on the distro you can write to a file in /sys (for each cpu) or run a program to make the setting, or click on a panel applet. Sometimes the scaling is set in the bios also. Google is your friend here. I have $charris@f13 ~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ondemand And what you want to see is performance instead of ondemand. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] custom atlas
On Tue, Jul 5, 2011 at 8:37 AM, Charles R Harris charlesr.har...@gmail.comwrote: On Tue, Jul 5, 2011 at 8:13 AM, Neal Becker ndbeck...@gmail.com wrote: Charles R Harris wrote: On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker ndbeck...@gmail.com wrote: I thought I'd try to speed up numpy on my fedora system by rebuilding the atlas package so it would be tuned for my machine. But when I do: rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec it fails with: res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS. A bit of googling has not revealed a solution. Any hints? I've never seen that, OTOH, I haven't built ATLAS in the last few years. Do you have all the power saving/frequency changing options turned off? What version of ATLAS are you using? What CPU? Chuck Ah, hadn't tried turing off cpuspeed. Try again... nope same error. 2 cpus, each: model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping: 11 cpu MHz : 800.000 that's what it says @idle You haven't got cpu frequency scaling under control. Linux? Depending on the distro you can write to a file in /sys (for each cpu) or run a program to make the setting, or click on a panel applet. Sometimes the scaling is set in the bios also. Google is your friend here. I have $charris@f13 ~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ondemand And what you want to see is performance instead of ondemand. Here's some good info http://tinyurl.com/o8o7b. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Conditional random variables
On Tue, Jul 5, 2011 at 10:33 AM, Ted To rainexpec...@theo.to wrote: On 07/05/2011 10:17 AM, josef.p...@gmail.com wrote: On Mon, Jul 4, 2011 at 10:13 PM, Ted To rainexpec...@theo.to wrote: Hi, Is there an easy way to make random draws from a conditional random variable? E.g., draw a random variable, x conditional on x=\bar x. If you mean here truncated distribution, then I asked a similar question on the scipy user list a month ago for the normal distribution. The answer was use rejection sampling, Gibbs or MCMC. I just sample from the original distribution and throw away those values that are not in the desired range. This works fine if there is only a small truncation, but not so well for distribution with support only in the tails. It's reasonably fast for distributions that numpy.random produces relatively fast. (Having a bi- or multi-variate distribution and sampling y conditional on given x sounds more fun.) Yes, that is what I had been doing but in some cases my truncations moves into the upper tail and it takes an extraordinary amount of time. I found that I could use scipy.stats.truncnorm but I haven't yet figured out how to use it for a joint distribution. E.g., I have 2 normal rv's X and Y from which I would like to draw X and Y where X+Y= U. Any suggestions? If you only need to sample the sum Z=X+Y, then it would be just a univariate normal again (in Z). For the general case, I'm at least a month away from being able to sample from a generic multivariate distribution. There is an integral transform that does recursive conditioning y|x. (like F^{-1} transform for multivariate distributions, used for example for copulas) For example sample x=U and then sample y=u-x. That's two univariate normal samples. Another trick I used for the tail is to take the absolute value around the mean, because of symmetry you get twice as many valid samples. I also never tried importance sampling and the other biased sampling procedures. If you find something, then I'm also interested in a solution. Cheers, Josef Cheers, Ted To ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Conditional random variables
On 07/05/2011 11:07 AM, josef.p...@gmail.com wrote: For example sample x=U and then sample y=u-x. That's two univariate normal samples. Ah, that's what I was looking for! Many thanks! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing/accumulating data
On 7/3/11 9:03 PM, Joe Harrington wrote: Christopher Barker, Ph.D. wrote quick note on this: I like the FALSE == good way, because: So, you like to have multiple different kinds of masked, but I need multiple good values for counts. fair enough, maybe there isn't a consensus about what is best, or most common, interpretation. However, I was thinking less different kinds of masks than, something special -- so if there is ANY additional information about a given element, it has a non-zero value. so less FALSE == good, then FALSE == raw_value seems like the cleanest way to do it. That having been said, I generally DON'T like the zero is false convention -- I wish that Python actually required a Boolean where one was called, for, rather that being able to pass in zero or any-other-value. Speaking of which, would we make the NA value be false? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing/accumulating data
On Tue, Jul 5, 2011 at 11:34 AM, Chris Barker chris.bar...@noaa.gov wrote: On 7/3/11 9:03 PM, Joe Harrington wrote: Christopher Barker, Ph.D. wrote quick note on this: I like the FALSE == good way, because: So, you like to have multiple different kinds of masked, but I need multiple good values for counts. fair enough, maybe there isn't a consensus about what is best, or most common, interpretation. However, I was thinking less different kinds of masks than, something special -- so if there is ANY additional information about a given element, it has a non-zero value. so less FALSE == good, then FALSE == raw_value seems like the cleanest way to do it. That having been said, I generally DON'T like the zero is false convention -- I wish that Python actually required a Boolean where one was called, for, rather that being able to pass in zero or any-other-value. Speaking of which, would we make the NA value be false? For booleans, it works out like this: http://en.wikipedia.org/wiki/Ternary_logic#Kleene_logic In R, trying to test the truth value of NA (if (NA) ...) raises an exception. Adopting this behavior seems reasonable to me. -Mark -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing/accumulating data
On Tue, Jul 5, 2011 at 11:34 AM, Chris Barker chris.bar...@noaa.gov wrote: Speaking of which, would we make the NA value be false? The NEP currently states that accessing np.NA as a boolean will act as an error. However, logical_and([NA, False]) == False and logical_or([NA, True]) will be special-cased. This does raise the question... how should np.any() and np.all() behave? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing/accumulating data
On Tue, Jul 5, 2011 at 11:45 AM, Benjamin Root ben.r...@ou.edu wrote: On Tue, Jul 5, 2011 at 11:34 AM, Chris Barker chris.bar...@noaa.govwrote: Speaking of which, would we make the NA value be false? The NEP currently states that accessing np.NA as a boolean will act as an error. However, logical_and([NA, False]) == False and logical_or([NA, True]) will be special-cased. This does raise the question... how should np.any() and np.all() behave? I've added a paragraph/examples for this case to the NEP in pull request 99. -Mark Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Conditional random variables
On Tue, Jul 5, 2011 at 12:26 PM, Ted To rainexpec...@theo.to wrote: On 07/05/2011 11:07 AM, josef.p...@gmail.com wrote: For example sample x=U and then sample y=u-x. That's two univariate normal samples. Ah, that's what I was looking for! Many thanks! just in case I wasn't clear, if x and y are correlated, then y: yu-x needs to be sampled from the conditional distribution y|x http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] custom atlas
Charles R Harris wrote: On Tue, Jul 5, 2011 at 8:37 AM, Charles R Harris charlesr.har...@gmail.comwrote: On Tue, Jul 5, 2011 at 8:13 AM, Neal Becker ndbeck...@gmail.com wrote: Charles R Harris wrote: On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker ndbeck...@gmail.com wrote: I thought I'd try to speed up numpy on my fedora system by rebuilding the atlas package so it would be tuned for my machine. But when I do: rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec it fails with: res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS. A bit of googling has not revealed a solution. Any hints? I've never seen that, OTOH, I haven't built ATLAS in the last few years. Do you have all the power saving/frequency changing options turned off? What version of ATLAS are you using? What CPU? Chuck Ah, hadn't tried turing off cpuspeed. Try again... nope same error. 2 cpus, each: model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping: 11 cpu MHz : 800.000 that's what it says @idle You haven't got cpu frequency scaling under control. Linux? Depending on the distro you can write to a file in /sys (for each cpu) or run a program to make the setting, or click on a panel applet. Sometimes the scaling is set in the bios also. Google is your friend here. I have $charris@f13 ~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ondemand And what you want to see is performance instead of ondemand. Here's some good info http://tinyurl.com/o8o7b. Chuck Thanks! Good info. But same result. # service cpuspeed stop # echo performance /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor # echo performance /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor build stopped exactly same as before. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NEPaNEP lessons - was: alterNEP
On Sat, Jul 2, 2011 at 6:28 AM, Matthew Brett matthew.br...@gmail.com wrote: Here the primary discussion I was trying to start was about why the discussion failed and led to bad feeling. Well, I have a hypothesis, don't know if it's true. It goes like this: Most of the time, when one of us decides to take the trouble to try and implement some change to the numpy core, it's because we really want to be able to take advantage of that change in our own work. This has two consequences: (a) it's only worth bothering if we can make sure that the resulting code is really useful to us. So we're really motivated to make sure we nail at least one use case. (b) we don't get any benefit from our work unless the code actually gets merged. So we're really motivated to build consensus and convince other people they really want our code too, because otherwise it'll probably get dropped. In this case, though, Mark got asked to write some code as part of his job. Making commercial development and FOSS mix has this notorious habit of going off the rails despite everyone having the best of intentions, and I wonder if that was part of the problem here. If Travis hired me to implement some feature demanded by the community, then I wouldn't feel the same urgency to really make sure that everyone was on board before investing my time. And I wouldn't have the same urgency to make sure that it really nailed my use cases, because that wouldn't be so central to my motivation for doing the work. And on a limited-length contract, I'd have more urgency to get something done quick. As it is, I don't want to waste this opportunity enabled by Mark's time and Enthought's money, but I do care a lot more about getting a good result than I do about making something happen this month -- because I'll have to work with, support, and teach people about whatever we come up with for the next however many years, and that weighs a lot more heavily in my calculations. Hopefully i tgoes without saying, but to be clear -- I'm sure Mark *is* worrying about all the things I mentioned, and doing his best to make something awesome that works for people. (And, Mark, sorry for talking about you in the third person... not sure how to talk about this better.) But sometimes that's not enough when the incentives are weird. It also doesn't help that apparently there have been multiple discussions going on in different venues (on the mailing list, in github, and presumably some face-to-face at Enthought's offices too), which makes it very hard to keep everyone in the loop. I'm a big fan of Karl's book too -- here are some sections I think might be particularly relevant: http://producingoss.com/en/contracting.html http://producingoss.com/en/setting-tone.html#avoid-private-discussions http://producingoss.com/en/bug-tracker-usage.html -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design
Also, I can see the motivation for wanting a voice meeting, but on the subject of keeping people in the loop, could we make sure that someone is taking notes on what happens, and that they get posted to the list? -- Nathaniel On Tue, Jul 5, 2011 at 6:12 AM, Mark Wiebe mwwi...@gmail.com wrote: On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe mwwi...@gmail.com wrote: The missing data thread has gotten a bit heated, and after sitting down with Travis to discuss the issues a bit, we've concluded that it would be nice to do a call with everyone who's interested in the discussion with better communication bandwidth. There are lots of good ideas out there, and it is very easy for things to get lost when we're just emailing. Getting on the phone should provide a more effective way to ensure everyone is properly being heard. We're proposing to set up a GotoMeeting call at 4pm CST today. Please respond if you can make it and your level of interest. I've created a Doodle where you can indicate your availability if 4pm today is too short notice, and we should schedule for a different time: http://www.doodle.com/eu9k3xip47a6gnue I hope everyone had a great weekend. Thanks to all who filled in the doodle, we have a unanimous winning time of 2PM central time today. I'll post with details about how to connect to the call when that has been prepared. Cheers, Mark ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design
Here are the details for the call: 1. Please join my meeting, Jul 5, 2011 at 2:00 PM Central time. https://www1.gotomeeting.com/join/972295593 2. Use your microphone and speakers (VoIP) - a headset is recommended. Or, call in using your telephone. Dial +1 (312) 878-3070 Access Code: 972-295-593 Audio PIN: Shown after joining the meeting Meeting ID: 972-295-593 GoToMeeting® Online Meetings Made Easy™ We'll have someone taking notes to create a summary as Nathaniel suggested. The NEP and other reference material will be visible with the screen sharing of gotomeeting, but those running Linux can follow along by viewing the document we're browsing here: https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst Thanks, Mark On Tue, Jul 5, 2011 at 12:54 PM, Nathaniel Smith n...@pobox.com wrote: Also, I can see the motivation for wanting a voice meeting, but on the subject of keeping people in the loop, could we make sure that someone is taking notes on what happens, and that they get posted to the list? -- Nathaniel On Tue, Jul 5, 2011 at 6:12 AM, Mark Wiebe mwwi...@gmail.com wrote: On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe mwwi...@gmail.com wrote: The missing data thread has gotten a bit heated, and after sitting down with Travis to discuss the issues a bit, we've concluded that it would be nice to do a call with everyone who's interested in the discussion with better communication bandwidth. There are lots of good ideas out there, and it is very easy for things to get lost when we're just emailing. Getting on the phone should provide a more effective way to ensure everyone is properly being heard. We're proposing to set up a GotoMeeting call at 4pm CST today. Please respond if you can make it and your level of interest. I've created a Doodle where you can indicate your availability if 4pm today is too short notice, and we should schedule for a different time: http://www.doodle.com/eu9k3xip47a6gnue I hope everyone had a great weekend. Thanks to all who filled in the doodle, we have a unanimous winning time of 2PM central time today. I'll post with details about how to connect to the call when that has been prepared. Cheers, Mark ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Moving lib.recfunctions?
On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold jsseab...@gmail.com wrote: On Fri, Jul 1, 2011 at 2:22 PM, josef.p...@gmail.com wrote: On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold jsseab...@gmail.com wrote: lib.recfunctions has never been fully advertised. The two bugs I just discovered lead me to believe that it's not that well vetted, but it is useful. I can't be the only one using these? What do people think of either deprecating lib.recfunctions or at least importing them into the numpy.rec namespace? I'm sure this has come up before, but gmane search isn't working for me. about once a year http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655 my guess is not much has changed since then Ah, yes. I recall now. I agree that they're more general than rec, but also don't have a first best solution for this. So I think we should move them (in a correct way) to numpy.rec and add (some of?) them as methods to recarrays. The best we can do beyond that is put some docs on the structured array page and notes in the docstrings that they also work for ndarrays with structured dtype. I'll submit a pull request soon and maybe that'll generate some interest. Had a brief look at what getting lib.recfunctions into rec/core.rec namespace would entail. It's not as simple as it could be, because there are circular imports between core.records and recfunctions (and its imports). It seems that it is possible to work around the circular imports in some of the code except for the degree to which recfunctions is wrapped up with the masked array code. The path of least resistance is to just import lib.recfunctions.* into the (already crowded) main numpy namespace and be done with it. Another option, though it's more work, is to remove all the internal masked array support and let the user decide what do with the record/structured arrays after they're returned (I invariably have to set usemask=False anyway). The functions can then be wrapped by higher-level ones in np.ma if the old usemask behavior is still desirable for people. This should probably wait until the new masked array changes are in and settled a bit though. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Moving lib.recfunctions?
On Jul 5, 2011, at 8:33 PM, Skipper Seabold wrote: On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold jsseab...@gmail.com wrote: On Fri, Jul 1, 2011 at 2:22 PM, josef.p...@gmail.com wrote: On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold jsseab...@gmail.com wrote: lib.recfunctions has never been fully advertised. The two bugs I just discovered lead me to believe that it's not that well vetted, but it is useful. I can't be the only one using these? What do people think of either deprecating lib.recfunctions or at least importing them into the numpy.rec namespace? I'm sure this has come up before, but gmane search isn't working for me. about once a year http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655 my guess is not much has changed since then Ah, yes. I recall now. I agree that they're more general than rec, but also don't have a first best solution for this. So I think we should move them (in a correct way) to numpy.rec and add (some of?) them as methods to recarrays. The best we can do beyond that is put some docs on the structured array page and notes in the docstrings that they also work for ndarrays with structured dtype. I'll submit a pull request soon and maybe that'll generate some interest. Had a brief look at what getting lib.recfunctions into rec/core.rec namespace would entail. It's not as simple as it could be, because there are circular imports between core.records and recfunctions (and its imports). It seems that it is possible to work around the circular imports in some of the code except for the degree to which recfunctions is wrapped up with the masked array code. Hello, The idea behin having a lib.recfunctions and not a rec.recfunctions or whatever was to illustrate that the functions of this package are more generic than they appear. They work with regular structured ndarrays and don't need recarrays. Methinks we gonna lose this aspect if you try to rename it, but hey, your call. As to as why they were never really advertised ? Because I never received any feedback when I started developing them (developing is a big word here, I just took a lot of code that John D Hunter had developed in matplotlib and make it more consistent). I advertised them once or twice on the list, wrote the basic docstrings, but waited for other people to start using them. Anyhow. So, yes, there might be some weird import to polish. Note that if you decided to just rename the package and leave it where it was, it would probably be easier. The path of least resistance is to just import lib.recfunctions.* into the (already crowded) main numpy namespace and be done with it. Why ? Why can't you leave it available through numpy.lib ? Once again, if it's only a matter of PRing, you could start writing an entry page in the doc describing the functions, that would improve the visibility. Another option, though it's more work, is to remove all the internal masked array support and let the user decide what do with the record/structured arrays after they're returned (I invariably have to set usemask=False anyway). Or you just port the functions in numpy.ma (making a numpy.ma.recfunctions, for example). The functions can then be wrapped by higher-level ones in np.ma if the old usemask behavior is still desirable for people. This should probably wait until the new masked array changes are in and settled a bit though. Oh yes... I agree with that P. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Moving lib.recfunctions?
On Tue, Jul 5, 2011 at 2:46 PM, Pierre GM pgmdevl...@gmail.com wrote: On Jul 5, 2011, at 8:33 PM, Skipper Seabold wrote: On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold jsseab...@gmail.com wrote: On Fri, Jul 1, 2011 at 2:22 PM, josef.p...@gmail.com wrote: On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold jsseab...@gmail.com wrote: lib.recfunctions has never been fully advertised. The two bugs I just discovered lead me to believe that it's not that well vetted, but it is useful. I can't be the only one using these? What do people think of either deprecating lib.recfunctions or at least importing them into the numpy.rec namespace? I'm sure this has come up before, but gmane search isn't working for me. about once a year http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655 my guess is not much has changed since then Ah, yes. I recall now. I agree that they're more general than rec, but also don't have a first best solution for this. So I think we should move them (in a correct way) to numpy.rec and add (some of?) them as methods to recarrays. The best we can do beyond that is put some docs on the structured array page and notes in the docstrings that they also work for ndarrays with structured dtype. I'll submit a pull request soon and maybe that'll generate some interest. Had a brief look at what getting lib.recfunctions into rec/core.rec namespace would entail. It's not as simple as it could be, because there are circular imports between core.records and recfunctions (and its imports). It seems that it is possible to work around the circular imports in some of the code except for the degree to which recfunctions is wrapped up with the masked array code. Hello, The idea behin having a lib.recfunctions and not a rec.recfunctions or whatever was to illustrate that the functions of this package are more generic than they appear. They work with regular structured ndarrays and don't need recarrays. Methinks we gonna lose this aspect if you try to rename it, but hey, your call. I agree (even though 'rec' is already in the name). My goal was to just have numpy.rec.join_by, numpy.rec.stack_arrays, etc, so they're right there (rec seems more intuitive than lib to me). Do you think that they may be better off in the main numpy namespace? This is far from my call, just trying to reach some consensus and make an effort to move the status quo. As to as why they were never really advertised ? Because I never received any feedback when I started developing them (developing is a big word here, I just took a lot of code that John D Hunter had developed in matplotlib and make it more consistent). I advertised them once or twice on the list, wrote the basic docstrings, but waited for other people to start using them. As Josef pointed out before, it's a chicken and egg thing re: advertisement and feedback. I think the best advertisement is by namespace. I use them frequently, and I haven't offered any feedback because I've never been left wanting (recent pull request is the only exception). For the most part they do what I want and the docs are good. Anyhow. So, yes, there might be some weird import to polish. Note that if you decided to just rename the package and leave it where it was, it would probably be easier. Imports are fine as long as they stay where they are and aren't imported into numpy.core. The path of least resistance is to just import lib.recfunctions.* into the (already crowded) main numpy namespace and be done with it. Why ? Why can't you leave it available through numpy.lib ? Once again, if it's only a matter of PRing, you could start writing an entry page in the doc describing the functions, that would improve the visibility. I'm fine with leaving the code where it is, but typing numpy.lib.recfunctions.function is painful (ditto `import numpy.lib.recfunctions as nprf`). Every time. And I have to do this often. Even if they are imported into the lib namespace (they aren't), it would be an improvement, but I still don't think it occurs to people to hunt through lib to try and join two structured arrays. It looks like everything in the lib namespace is imported into the main numpy namespace anyway. And 2) I found a little buglet recently that made me think this code should be banged on more. The best way to do this is to get it out there. If other users are anything like me, I rely on tab-completion and docstrings not online docs for working with projects that I don't need to be intimately familiar with, the implication being that lib is intimate, I guess. Skipper (standing astride this molehill) Another option, though it's more work, is to remove all the internal masked array support and let the user decide what do with the record/structured arrays after they're returned (I invariably have to set usemask=False anyway). Or you just port the functions in numpy.ma (making a numpy.ma.recfunctions,
Re: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design
Mark Wiebe writes: We'll have someone taking notes to create a summary as Nathaniel suggested. Thanks. -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing/accumulating data
Mark Wiebe wrote: Speaking of which, would we make the NA value be false? For booleans, it works out like this: http://en.wikipedia.org/wiki/Ternary_logic#Kleene_logic That's pretty cool! In R, trying to test the truth value of NA (if (NA) ...) raises an exception. Adopting this behavior seems reasonable to me. I'm not so sure. the other president is Python, where None is interpreted as False. In general, in non-numpy code, I use None to mean not set yet or I'm not sure, or, whatever. It's pretty useful to have it be false. However, I also do: if x is not None: rather than- if x: so as to be unambiguous about what I'm testing for (and because if x == 0, I don't want the test to fail), so I guess: if arr[i] is np.NA: would be perfectly analogous. -Chris -Mark -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 tel:%28206%29%20526-6959 voice 7600 Sand Point Way NE (206) 526-6329 tel:%28206%29%20526-6329 fax Seattle, WA 98115 (206) 526-6317 tel:%28206%29%20526-6317 main reception chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] suggestions on optimising a special matrix reshape
i have to reshape a matrix beta of the form (4**N, 4**N, 4**N, 4**N) into betam like (16**N, 16**N) following: betam = np.zeros((16**N,16**N), dtype = complex) for k in xrange(16**N): ind1 = np.mod(k,4**N) ind2 = k/4**N for l in xrange(16**N): betam[k,l] = beta[np.mod(l,4**N), l/4**N, ind1 , ind2] is there a smarter/faster way of getting the above done? for N=2, that already takes 0.5 seconds but i intend to use it for N=3 and N=4 ... thanks for your input, q -- History consists of nothing more than the lies we tell ourselves to justify the present. The king who needs to remind his people of his rank, is no king. A beggar's mistake harms no one but the beggar. A king's mistake, however, harms everyone but the king. Too often, the measure of power lies not in the number who obey your will, but in the number who suffer your stupidity. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggestions on optimising a special matrix reshape
qu...@gmx.at wrote: i have to reshape a matrix beta of the form (4**N, 4**N, 4**N, 4**N) into betam like (16**N, 16**N) following: betam = np.zeros((16**N,16**N), dtype = complex) for k in xrange(16**N): ind1 = np.mod(k,4**N) ind2 = k/4**N for l in xrange(16**N): betam[k,l] = beta[np.mod(l,4**N), l/4**N, ind1 , ind2] is there a smarter/faster way of getting the above done? no time to check if this is what you want, but is this it? a = np.arange((4**(4*N))).reshape(4**N,4**N,4**N,4**N) b = a.reshape((16**N, 16**N)) If that doesn't do it right, you may be able to mess with the strides, etc. do some googling, and check out: numpy.lib.stride_tricks -Chris for N=2, that already takes 0.5 seconds but i intend to use it for N=3 and N=4 ... thanks for your input, q -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2
In article BANLkTi=LXiTcrv1LgMtP=p9nF8eMr8=+h...@mail.gmail.com, Ralf Gommers ralf.gomm...@googlemail.com wrote: https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ Will there be a Mac binary for 32-bit pythons (one that is compatible with older versions of MacOS X)? At present I only see a 64-bit 10.6-only version. -- Russell ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Moving lib.recfunctions?
On Jul 5, 2011, at 9:23 PM, Skipper Seabold wrote: On Tue, Jul 5, 2011 at 2:46 PM, Pierre GM pgmdevl...@gmail.com wrote: ... Hello, The idea behin having a lib.recfunctions and not a rec.recfunctions or whatever was to illustrate that the functions of this package are more generic than they appear. They work with regular structured ndarrays and don't need recarrays. Methinks we gonna lose this aspect if you try to rename it, but hey, your call. I agree (even though 'rec' is already in the name). My goal was to just have numpy.rec.join_by, numpy.rec.stack_arrays, etc, so they're right there (rec seems more intuitive than lib to me). Do you think that they may be better off in the main numpy namespace? This is far from my call, just trying to reach some consensus and make an effort to move the status quo. Sure, a np.join_by or np.stack_array is easy and non-ambiguous enough... As to as why they were never really advertised ? Because I never received any feedback when I started developing them (developing is a big word here, I just took a lot of code that John D Hunter had developed in matplotlib and make it more consistent). I advertised them once or twice on the list, wrote the basic docstrings, but waited for other people to start using them. As Josef pointed out before, it's a chicken and egg thing re: advertisement and feedback. I think the best advertisement is by namespace. I use them frequently, and I haven't offered any feedback because I've never been left wanting (recent pull request is the only exception). For the most part they do what I want and the docs are good. Cool ! The path of least resistance is to just import lib.recfunctions.* into the (already crowded) main numpy namespace and be done with it. Why ? Why can't you leave it available through numpy.lib ? Once again, if it's only a matter of PRing, you could start writing an entry page in the doc describing the functions, that would improve the visibility. I'm fine with leaving the code where it is, but typing numpy.lib.recfunctions.function is painful (ditto `import numpy.lib.recfunctions as nprf`). Every time. And I have to do this often. You have a point. As long as nobody minds and you don't lose too much time trying to tweak the imports, I'm quite OK with it. Even if they are imported into the lib namespace (they aren't), it would be an improvement, but I still don't think it occurs to people to hunt through lib to try and join two structured arrays. It looks like everything in the lib namespace is imported into the main numpy namespace anyway. And 2) I found a little buglet recently that made me think this code should be banged on more. The best way to do this is to get it out there. If other users are anything like me, I rely on tab-completion and docstrings not online docs for working with projects that I don't need to be intimately familiar with, the implication being that lib is intimate, I guess. Skipper (standing astride this molehill) Trample it! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] NA/Missing Data Conference Call Summary
Here's a short-ish summary of the topics discussed in the conference call this afternoon. WARNING: I try to give examples for everything discussed to make it as concrete as possible. However, most of the examples were not explicitly discussed during the conference. I apologize in advance if I mischaracterize anyone's arguments, and please jump in to correct me if I did. Participants: Travis Oliphant, Mark Wiebe, Matthew Brett, Nathaniel Smith, Pierre GM, Ben Root, Chuck Harris, Wes McKinney, Chris Jordan-Squire First, areas of broad agreement: *There should be more functionality for missing data *There should be dtypes which support missing data ('parameterized dtypes' in the current NEP) *Adding a 'where' semantic to ufuncs *Have the same data with different sets of missing elements in different views *Easy for non-expert numpy users Since we only have Mark is only around Austin until early August, there's also broad agreement that we need to get something done quickly. However, the numpy community (and Travis in particular) are balancing this against the possibility of a sub-optimal solution which can't be taken back. BIT PATTERN MASK IMPLEMENTATIONS FOR NA -- The current NEP proposes both mask and bit pattern implementations for missing data. I use the terms bit pattern and parameterized dtype interchangeably, since the parameterized dtype will use a bit pattern for its implementation. The two implementations will support the same functionality with respect to NA, and the implementation details will be largely invisible to the user. Their differences are in the 'extra' features each supports. Two common questions were: 1. Why make two implementations of missing data: one with masks and the other with parameterized dtypes? 2. Why does the implementation using masks have higher priority? The answers are: 1. The mask implementation is more general and easier to implement and maintain. The bit pattern implementation saves memory, makes interoperability easier, and makes ABI (Application Binary Interface) compatibility easier. Since each has different strengths, the argument is both should be implemented. 2. The implementation for the parameterized dtypes will rely on the implementation using a mask. NA VS. IGNORE - A lot of discussion centered on IGNORE vs. NA types. We take IGNORE in aNEP sense and NA in NEP sense. With NA, there is a clear notion of how NA propagates through all basic numpy operations. (e.g., 3+NA=NA and log(NA) = NA, while NA | True = True.) IGNORE is separate from NA, with different interpretations depending on the use case. IGNORE could mean: 1. Data that is being temporarily ignored. e.g., a possible outlier that is temporarily being removed from consideration. 2. Data that cannot exist. e.g., a matrix representing a grid of water depths for a lake. Since the lake isn't square, some entries will represent land, and so depth will be a meaningless concept for those entries. 3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], [IGNORE, 3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] ]. Though this leaves open how [1, 2, IGNORE] + [3 , 4] should behave. Because of these different uses of IGNORE, it doesn't have as clear a theoretical interpretation as NA. (For instance, what is IGNORE+3, IGNORE*3, or IGNORE | True?) But several of the discussants thought the use cases for IGNORE were very compelling. Specifically, they wanted to be able to use IGNORE's and NA's simultaneously while still being able to differentiate between them. So, for example, being able to designate some data as IGNORE while still able to determine which data was NA but not IGNORE. The current NEP does not allow for this directly. Although in some cases it can be indirectly done via views. (By taking a view of the original data, expanding the values which are considered NA in the view, and then comparing with the original data to see if the NA is in the original or not.) Since both are possible in this sense, Mark's NEP makes it so IGNORE is allowed but isn't the default. Another important point from the current NEP is that not being able to access values considered missing, even if the implementation of missingness is via a mask, is a feature and not a bug. It is a feature because if the data is missing then, conceptually, neither the user nor any function the user calls should be able to obtain that data. This is precisely why the indirect route, via views of the original data, is required to access data that a different view says is missing. The current NEP treats all NA's the same. The reasoning is that, regardless of where the NA originated, the functions the numpy array is fed in to will either ignore all NA's or propagate them (i.e. not ignore them). These two different behaviors are chosen when passed into a ufunc by setting the skipna ufunc
Re: [Numpy-discussion] NA/Missing Data Conference Call Summary
Thanks for these notes. Just a couple of thoughts as I looked over these notes. On Tue, Jul 5, 2011 at 6:46 PM, Christopher Jordan-Squire cjord...@uw.eduwrote: 3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], [IGNORE, 3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] ]. Though this leaves open how [1, 2, IGNORE] + [3 , 4] should behave. I don't think there is any confusion about that particular case. Even when using the IGNORE semantics, numpy broadcasting rules are still in play. This particular case should throw an exception. Because of these different uses of IGNORE, it doesn't have as clear a theoretical interpretation as NA. (For instance, what is IGNORE+3, IGNORE*3, or IGNORE | True?) I think we were more referring to matrix operations like dot products. Element-by-element operations should still behave the same as NA. Scalar operations should return IGNORE. HOW DOES THIS RELATE TO THE CURRENT MASKED ARRAY? Everyone seems to agree they'd love it if this could encompass all current use cases of the numpy.ma arrays, so numpy.ma arrays could be deprecated. (However they wouldn't be eliminated for several years, even in the most optimistic scenarios.) This is going to be a very tricky thing to handle and it is going to require coordination and agreements among many of the third-party toolkits like scipy and matplotlib. In addition to these notes (unless I missed it), Nathaniel pointed out that with the ufunc where= parameter feature and the ufunc wrapper, we have the potential to greatly improve the codebase of numpy.ma as it stands. Potentially mitigating the need for moving more of numpy.ma into the core, and to focus more on NA. While I am not 100% on board with this idea, I can definitely see the potential for this path. Thanks everybody for the productive chat! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2
On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen ro...@uw.edu wrote: In article BANLkTi=LXiTcrv1LgMtP=p9nF8eMr8=+h...@mail.gmail.com, Ralf Gommers ralf.gomm...@googlemail.com wrote: https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ Will there be a Mac binary for 32-bit pythons (one that is compatible with older versions of MacOS X)? At present I only see a 64-bit 10.6-only version. Yes there will be for the final release (10.4-10.6 compatible). I can't create those on my own computer, so sometimes I don't make them for RCs. Cheers, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion