Re: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design

2011-07-05 Thread Mark Wiebe
On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe mwwi...@gmail.com wrote:

 The missing data thread has gotten a bit heated, and after sitting down
 with Travis to discuss the issues a bit, we've concluded that it would be
 nice to do a call with everyone who's interested in the discussion with
 better communication bandwidth. There are lots of good ideas out there, and
 it is very easy for things to get lost when we're just emailing. Getting on
 the phone should provide a more effective way to ensure everyone is properly
 being heard.

 We're proposing to set up a GotoMeeting call at 4pm CST today. Please
 respond if you can make it and your level of interest. I've created a Doodle
 where you can indicate your availability if 4pm today is too short notice,
 and we should schedule for a different time:

 http://www.doodle.com/eu9k3xip47a6gnue


I hope everyone had a great weekend. Thanks to all who filled in the doodle,
we have a unanimous winning time of 2PM central time today. I'll post with
details about how to connect to the call when that has been prepared.

Cheers,
Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] custom atlas

2011-07-05 Thread Neal Becker
I thought I'd try to speed up numpy on my fedora system by rebuilding the atlas 
package so it would be tuned for my machine.  But when I do:

rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec 

it fails with:

res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS.

A bit of googling has not revealed a solution.  Any hints?


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] custom atlas

2011-07-05 Thread Charles R Harris
On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker ndbeck...@gmail.com wrote:

 I thought I'd try to speed up numpy on my fedora system by rebuilding the
 atlas
 package so it would be tuned for my machine.  But when I do:

 rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec

 it fails with:

 res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS.

 A bit of googling has not revealed a solution.  Any hints?



I've never seen that, OTOH, I haven't built ATLAS in the last few years. Do
you have all the power saving/frequency changing options turned off? What
version of ATLAS are you using? What CPU?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] custom atlas

2011-07-05 Thread Neal Becker
Charles R Harris wrote:

 On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker ndbeck...@gmail.com wrote:
 
 I thought I'd try to speed up numpy on my fedora system by rebuilding the
 atlas
 package so it would be tuned for my machine.  But when I do:

 rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec

 it fails with:

 res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS.

 A bit of googling has not revealed a solution.  Any hints?



 I've never seen that, OTOH, I haven't built ATLAS in the last few years. Do
 you have all the power saving/frequency changing options turned off? What
 version of ATLAS are you using? What CPU?
 
 Chuck

Ah, hadn't tried turing off cpuspeed.  Try again... nope same error.

2 cpus, each:
model name  : Intel(R) Core(TM)2 Duo CPU T7500  @ 2.20GHz
stepping: 11
cpu MHz : 800.000   that's what it says @idle
cache size  : 4096 KB


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Conditional random variables

2011-07-05 Thread Ted To
On 07/05/2011 10:17 AM, josef.p...@gmail.com wrote:
 On Mon, Jul 4, 2011 at 10:13 PM, Ted To rainexpec...@theo.to wrote:
 Hi,

 Is there an easy way to make random draws from a conditional random
 variable?  E.g., draw a random variable, x conditional on x=\bar x.
 
 If you mean here truncated distribution, then I asked a similar
 question on the scipy user list a month ago for the normal
 distribution.
 
 The answer was use rejection sampling, Gibbs or MCMC.
 
 I just sample from the original distribution and throw away those
 values that are not in the desired range. This works fine if there is
 only a small truncation, but not so well for distribution with support
 only in the tails. It's reasonably fast for distributions that
 numpy.random produces relatively fast.
 
 (Having a bi- or multi-variate distribution and sampling y conditional
 on given x sounds more fun.)

Yes, that is what I had been doing but in some cases my truncations
moves into the upper tail and it takes an extraordinary amount of time.
 I found that I could use scipy.stats.truncnorm but I haven't yet
figured out how to use it for a joint distribution.  E.g., I have 2
normal rv's X and Y from which I would like to draw X and Y where X+Y= U.

Any suggestions?

Cheers,
Ted To
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] custom atlas

2011-07-05 Thread Charles R Harris
On Tue, Jul 5, 2011 at 8:13 AM, Neal Becker ndbeck...@gmail.com wrote:

 Charles R Harris wrote:

  On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker ndbeck...@gmail.com wrote:
 
  I thought I'd try to speed up numpy on my fedora system by rebuilding
 the
  atlas
  package so it would be tuned for my machine.  But when I do:
 
  rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec
 
  it fails with:
 
  res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER
 REPS.
 
  A bit of googling has not revealed a solution.  Any hints?
 
 
 
  I've never seen that, OTOH, I haven't built ATLAS in the last few years.
 Do
  you have all the power saving/frequency changing options turned off? What
  version of ATLAS are you using? What CPU?
 
  Chuck

 Ah, hadn't tried turing off cpuspeed.  Try again... nope same error.

 2 cpus, each:
 model name  : Intel(R) Core(TM)2 Duo CPU T7500  @ 2.20GHz
 stepping: 11
 cpu MHz : 800.000   that's what it says @idle


You haven't got cpu frequency scaling under control. Linux? Depending on the
distro you can write to a file in /sys (for each cpu) or run a program to
make the setting, or click on a panel applet. Sometimes the scaling is set
in the bios also. Google is your friend here. I have

$charris@f13 ~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
ondemand

And what you want to see is performance instead of ondemand.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] custom atlas

2011-07-05 Thread Charles R Harris
On Tue, Jul 5, 2011 at 8:37 AM, Charles R Harris
charlesr.har...@gmail.comwrote:



 On Tue, Jul 5, 2011 at 8:13 AM, Neal Becker ndbeck...@gmail.com wrote:

 Charles R Harris wrote:

  On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker ndbeck...@gmail.com
 wrote:
 
  I thought I'd try to speed up numpy on my fedora system by rebuilding
 the
  atlas
  package so it would be tuned for my machine.  But when I do:
 
  rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec
 
  it fails with:
 
  res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER
 REPS.
 
  A bit of googling has not revealed a solution.  Any hints?
 
 
 
  I've never seen that, OTOH, I haven't built ATLAS in the last few years.
 Do
  you have all the power saving/frequency changing options turned off?
 What
  version of ATLAS are you using? What CPU?
 
  Chuck

 Ah, hadn't tried turing off cpuspeed.  Try again... nope same error.

 2 cpus, each:
 model name  : Intel(R) Core(TM)2 Duo CPU T7500  @ 2.20GHz
 stepping: 11
 cpu MHz : 800.000   that's what it says @idle


 You haven't got cpu frequency scaling under control. Linux? Depending on
 the distro you can write to a file in /sys (for each cpu) or run a program
 to make the setting, or click on a panel applet. Sometimes the scaling is
 set in the bios also. Google is your friend here. I have

 $charris@f13 ~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
 ondemand

 And what you want to see is performance instead of ondemand.


Here's some good info http://tinyurl.com/o8o7b.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Conditional random variables

2011-07-05 Thread josef . pktd
On Tue, Jul 5, 2011 at 10:33 AM, Ted To rainexpec...@theo.to wrote:
 On 07/05/2011 10:17 AM, josef.p...@gmail.com wrote:
 On Mon, Jul 4, 2011 at 10:13 PM, Ted To rainexpec...@theo.to wrote:
 Hi,

 Is there an easy way to make random draws from a conditional random
 variable?  E.g., draw a random variable, x conditional on x=\bar x.

 If you mean here truncated distribution, then I asked a similar
 question on the scipy user list a month ago for the normal
 distribution.

 The answer was use rejection sampling, Gibbs or MCMC.

 I just sample from the original distribution and throw away those
 values that are not in the desired range. This works fine if there is
 only a small truncation, but not so well for distribution with support
 only in the tails. It's reasonably fast for distributions that
 numpy.random produces relatively fast.

 (Having a bi- or multi-variate distribution and sampling y conditional
 on given x sounds more fun.)

 Yes, that is what I had been doing but in some cases my truncations
 moves into the upper tail and it takes an extraordinary amount of time.
  I found that I could use scipy.stats.truncnorm but I haven't yet
 figured out how to use it for a joint distribution.  E.g., I have 2
 normal rv's X and Y from which I would like to draw X and Y where X+Y= U.

 Any suggestions?

If you only need to sample the sum Z=X+Y, then it would be just a
univariate normal again (in Z).

For the general case, I'm at least a month away from being able to
sample from a generic multivariate distribution. There is an integral
transform that does recursive conditioning y|x.
(like F^{-1} transform for multivariate distributions, used for
example for copulas)

For example sample x=U and then sample y=u-x. That's two univariate
normal samples.

Another trick I used for the tail is to take the absolute value around
the mean, because of symmetry you get twice as many valid samples.

I also never tried importance sampling and the other biased sampling procedures.

If you find something, then I'm also interested in a solution.

Cheers,

Josef



 Cheers,
 Ted To
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Conditional random variables

2011-07-05 Thread Ted To
On 07/05/2011 11:07 AM, josef.p...@gmail.com wrote:
 For example sample x=U and then sample y=u-x. That's two univariate
 normal samples.

Ah, that's what I was looking for!  Many thanks!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing/accumulating data

2011-07-05 Thread Chris Barker
On 7/3/11 9:03 PM, Joe Harrington wrote:
 Christopher Barker, Ph.D. wrote
 quick note on this: I like the FALSE == good way, because:

 So, you like to have multiple different kinds of masked, but I need
 multiple good values for counts.

fair enough, maybe there isn't a consensus about what is best, or most 
common, interpretation.

However, I was thinking less different kinds of masks than, something 
special -- so if there is ANY additional information about a given 
element, it has a non-zero value.

so less FALSE == good, then FALSE == raw_value

seems like the cleanest way to do it.

That having been said, I generally DON'T like the zero is false 
convention -- I wish that Python actually required a Boolean where one 
was called, for, rather that being able to pass in zero or any-other-value.

Speaking of which, would we make the NA value be false?

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing/accumulating data

2011-07-05 Thread Mark Wiebe
On Tue, Jul 5, 2011 at 11:34 AM, Chris Barker chris.bar...@noaa.gov wrote:

 On 7/3/11 9:03 PM, Joe Harrington wrote:
  Christopher Barker, Ph.D. wrote
  quick note on this: I like the FALSE == good way, because:
 
  So, you like to have multiple different kinds of masked, but I need
  multiple good values for counts.

 fair enough, maybe there isn't a consensus about what is best, or most
 common, interpretation.

 However, I was thinking less different kinds of masks than, something
 special -- so if there is ANY additional information about a given
 element, it has a non-zero value.

 so less FALSE == good, then FALSE == raw_value

 seems like the cleanest way to do it.

 That having been said, I generally DON'T like the zero is false
 convention -- I wish that Python actually required a Boolean where one
 was called, for, rather that being able to pass in zero or any-other-value.

 Speaking of which, would we make the NA value be false?


For booleans, it works out like this:

http://en.wikipedia.org/wiki/Ternary_logic#Kleene_logic

In R, trying to test the truth value of NA (if (NA) ...) raises an
exception. Adopting this behavior seems reasonable to me.

-Mark



 -Chris


 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing/accumulating data

2011-07-05 Thread Benjamin Root
On Tue, Jul 5, 2011 at 11:34 AM, Chris Barker chris.bar...@noaa.gov wrote:


 Speaking of which, would we make the NA value be false?


The NEP currently states that accessing np.NA as a boolean will act as an
error.  However, logical_and([NA, False]) == False and logical_or([NA,
True]) will be special-cased.

This does raise the question... how should np.any() and np.all() behave?

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing/accumulating data

2011-07-05 Thread Mark Wiebe
On Tue, Jul 5, 2011 at 11:45 AM, Benjamin Root ben.r...@ou.edu wrote:


 On Tue, Jul 5, 2011 at 11:34 AM, Chris Barker chris.bar...@noaa.govwrote:


 Speaking of which, would we make the NA value be false?


 The NEP currently states that accessing np.NA as a boolean will act as an
 error.  However, logical_and([NA, False]) == False and logical_or([NA,
 True]) will be special-cased.

 This does raise the question... how should np.any() and np.all() behave?


I've added a paragraph/examples for this case to the NEP in pull request 99.

-Mark



 Ben Root


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Conditional random variables

2011-07-05 Thread josef . pktd
On Tue, Jul 5, 2011 at 12:26 PM, Ted To rainexpec...@theo.to wrote:
 On 07/05/2011 11:07 AM, josef.p...@gmail.com wrote:
 For example sample x=U and then sample y=u-x. That's two univariate
 normal samples.

 Ah, that's what I was looking for!  Many thanks!

just in case I wasn't clear, if x and y are correlated, then y: yu-x
needs to be sampled from the conditional distribution y|x
http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions

Josef

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] custom atlas

2011-07-05 Thread Neal Becker
Charles R Harris wrote:

 On Tue, Jul 5, 2011 at 8:37 AM, Charles R Harris
 charlesr.har...@gmail.comwrote:
 


 On Tue, Jul 5, 2011 at 8:13 AM, Neal Becker ndbeck...@gmail.com wrote:

 Charles R Harris wrote:

  On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker ndbeck...@gmail.com
 wrote:
 
  I thought I'd try to speed up numpy on my fedora system by rebuilding
 the
  atlas
  package so it would be tuned for my machine.  But when I do:
 
  rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec
 
  it fails with:
 
  res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER
 REPS.
 
  A bit of googling has not revealed a solution.  Any hints?
 
 
 
  I've never seen that, OTOH, I haven't built ATLAS in the last few years.
 Do
  you have all the power saving/frequency changing options turned off?
 What
  version of ATLAS are you using? What CPU?
 
  Chuck

 Ah, hadn't tried turing off cpuspeed.  Try again... nope same error.

 2 cpus, each:
 model name  : Intel(R) Core(TM)2 Duo CPU T7500  @ 2.20GHz
 stepping: 11
 cpu MHz : 800.000   that's what it says @idle


 You haven't got cpu frequency scaling under control. Linux? Depending on
 the distro you can write to a file in /sys (for each cpu) or run a program
 to make the setting, or click on a panel applet. Sometimes the scaling is
 set in the bios also. Google is your friend here. I have

 $charris@f13 ~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
 ondemand

 And what you want to see is performance instead of ondemand.


 Here's some good info http://tinyurl.com/o8o7b.
 
 Chuck

Thanks!  Good info.  But same result.

# service cpuspeed stop
# echo performance  /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# echo performance  /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

build stopped exactly same as before.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEPaNEP lessons - was: alterNEP

2011-07-05 Thread Nathaniel Smith
On Sat, Jul 2, 2011 at 6:28 AM, Matthew Brett matthew.br...@gmail.com wrote:
 Here the primary discussion I was trying to start was about why the
 discussion failed and led to bad feeling.

Well, I have a hypothesis, don't know if it's true. It goes like this:
Most of the time, when one of us decides to take the trouble to try
and implement some change to the numpy core, it's because we really
want to be able to take advantage of that change in our own work. This
has two consequences:
 (a) it's only worth bothering if we can make sure that the resulting
code is really useful to us. So we're really motivated to make sure we
nail at least one use case.
 (b) we don't get any benefit from our work unless the code actually
gets merged. So we're really motivated to build consensus and convince
other people they really want our code too, because otherwise it'll
probably get dropped.

In this case, though, Mark got asked to write some code as part of his
job. Making commercial development and FOSS mix has this notorious
habit of going off the rails despite everyone having the best of
intentions, and I wonder if that was part of the problem here. If
Travis hired me to implement some feature demanded by the community,
then I wouldn't feel the same urgency to really make sure that
everyone was on board before investing my time. And I wouldn't have
the same urgency to make sure that it really nailed my use cases,
because that wouldn't be so central to my motivation for doing the
work. And on a limited-length contract, I'd have more urgency to get
something done quick. As it is, I don't want to waste this opportunity
enabled by Mark's time and Enthought's money, but I do care a lot more
about getting a good result than I do about making something happen
this month -- because I'll have to work with, support, and teach
people about whatever we come up with for the next however many years,
and that weighs a lot more heavily in my calculations.

Hopefully i tgoes without saying, but to be clear -- I'm sure Mark
*is* worrying about all the things I mentioned, and doing his best to
make something awesome that works for people. (And, Mark, sorry for
talking about you in the third person... not sure how to talk about
this better.) But sometimes that's not enough when the incentives are
weird.

It also doesn't help that apparently there have been multiple
discussions going on in different venues (on the mailing list, in
github, and presumably some face-to-face at Enthought's offices too),
which makes it very hard to keep everyone in the loop.

I'm a big fan of Karl's book too -- here are some sections I think
might be particularly relevant:
  http://producingoss.com/en/contracting.html
  http://producingoss.com/en/setting-tone.html#avoid-private-discussions
  http://producingoss.com/en/bug-tracker-usage.html

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design

2011-07-05 Thread Nathaniel Smith
Also, I can see the motivation for wanting a voice meeting, but on the
subject of keeping people in the loop, could we make sure that someone
is taking notes on what happens, and that they get posted to the list?

-- Nathaniel

On Tue, Jul 5, 2011 at 6:12 AM, Mark Wiebe mwwi...@gmail.com wrote:
 On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe mwwi...@gmail.com wrote:

 The missing data thread has gotten a bit heated, and after sitting down
 with Travis to discuss the issues a bit, we've concluded that it would be
 nice to do a call with everyone who's interested in the discussion with
 better communication bandwidth. There are lots of good ideas out there, and
 it is very easy for things to get lost when we're just emailing. Getting on
 the phone should provide a more effective way to ensure everyone is properly
 being heard.
 We're proposing to set up a GotoMeeting call at 4pm CST today. Please
 respond if you can make it and your level of interest. I've created a Doodle
 where you can indicate your availability if 4pm today is too short notice,
 and we should schedule for a different time:
 http://www.doodle.com/eu9k3xip47a6gnue

 I hope everyone had a great weekend. Thanks to all who filled in the doodle,
 we have a unanimous winning time of 2PM central time today. I'll post with
 details about how to connect to the call when that has been prepared.
 Cheers,
 Mark
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design

2011-07-05 Thread Mark Wiebe
Here are the details for the call:

1.  Please join my meeting, Jul 5, 2011 at 2:00 PM Central time.
https://www1.gotomeeting.com/join/972295593

2.  Use your microphone and speakers (VoIP) - a headset is recommended. Or,
call in using your telephone.

Dial +1 (312) 878-3070
Access Code: 972-295-593
Audio PIN: Shown after joining the meeting

Meeting ID: 972-295-593

GoToMeeting®
Online Meetings Made Easy™


We'll have someone taking notes to create a summary as Nathaniel suggested.

The NEP and other reference material will be visible with the screen sharing
of gotomeeting, but those running Linux can follow along by viewing the
document we're browsing here:

https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst

Thanks,
Mark

On Tue, Jul 5, 2011 at 12:54 PM, Nathaniel Smith n...@pobox.com wrote:

 Also, I can see the motivation for wanting a voice meeting, but on the
 subject of keeping people in the loop, could we make sure that someone
 is taking notes on what happens, and that they get posted to the list?


 -- Nathaniel

 On Tue, Jul 5, 2011 at 6:12 AM, Mark Wiebe mwwi...@gmail.com wrote:
  On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe mwwi...@gmail.com wrote:
 
  The missing data thread has gotten a bit heated, and after sitting down
  with Travis to discuss the issues a bit, we've concluded that it would
 be
  nice to do a call with everyone who's interested in the discussion with
  better communication bandwidth. There are lots of good ideas out there,
 and
  it is very easy for things to get lost when we're just emailing. Getting
 on
  the phone should provide a more effective way to ensure everyone is
 properly
  being heard.
  We're proposing to set up a GotoMeeting call at 4pm CST today. Please
  respond if you can make it and your level of interest. I've created a
 Doodle
  where you can indicate your availability if 4pm today is too short
 notice,
  and we should schedule for a different time:
  http://www.doodle.com/eu9k3xip47a6gnue
 
  I hope everyone had a great weekend. Thanks to all who filled in the
 doodle,
  we have a unanimous winning time of 2PM central time today. I'll post
 with
  details about how to connect to the call when that has been prepared.
  Cheers,
  Mark
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving lib.recfunctions?

2011-07-05 Thread Skipper Seabold
On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold jsseab...@gmail.com wrote:
 On Fri, Jul 1, 2011 at 2:22 PM,  josef.p...@gmail.com wrote:
 On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold jsseab...@gmail.com wrote:
 lib.recfunctions has never been fully advertised. The two bugs I just
 discovered lead me to believe that it's not that well vetted, but it
 is useful. I can't be the only one using these?

 What do people think of either deprecating lib.recfunctions or at
 least importing them into the numpy.rec namespace?

 I'm sure this has come up before, but gmane search isn't working for me.

 about once a year

 http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655

 my guess is not much has changed since then


 Ah, yes. I recall now.

 I agree that they're more general than rec, but also don't have a
 first best solution for this. So I think we should move them (in a
 correct way) to numpy.rec and add (some of?) them as methods to
 recarrays. The best we can do beyond that is put some docs on the
 structured array page and notes in the docstrings that they also work
 for ndarrays with structured dtype.

 I'll submit a pull request soon and maybe that'll generate some interest.


Had a brief look at what getting lib.recfunctions into rec/core.rec
namespace would entail. It's not as simple as it could be, because
there are circular imports between core.records and recfunctions (and
its imports). It seems that it is possible to work around the circular
imports in some of the code except for the degree to which
recfunctions is wrapped up with the masked array code.

The path of least resistance is to just import lib.recfunctions.* into
the (already crowded) main numpy namespace and be done with it.
Another option, though it's more work, is to remove all the internal
masked array support and let the user decide what do with the
record/structured arrays after they're returned (I invariably have to
set usemask=False anyway). The functions can then be wrapped by
higher-level ones in np.ma if the old usemask behavior is still
desirable for people. This should probably wait until the new masked
array changes are in and settled a bit though.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving lib.recfunctions?

2011-07-05 Thread Pierre GM

On Jul 5, 2011, at 8:33 PM, Skipper Seabold wrote:

 On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold jsseab...@gmail.com wrote:
 On Fri, Jul 1, 2011 at 2:22 PM,  josef.p...@gmail.com wrote:
 On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold jsseab...@gmail.com wrote:
 lib.recfunctions has never been fully advertised. The two bugs I just
 discovered lead me to believe that it's not that well vetted, but it
 is useful. I can't be the only one using these?
 
 What do people think of either deprecating lib.recfunctions or at
 least importing them into the numpy.rec namespace?
 
 I'm sure this has come up before, but gmane search isn't working for me.
 
 about once a year
 
 http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655
 
 my guess is not much has changed since then
 
 
 Ah, yes. I recall now.
 
 I agree that they're more general than rec, but also don't have a
 first best solution for this. So I think we should move them (in a
 correct way) to numpy.rec and add (some of?) them as methods to
 recarrays. The best we can do beyond that is put some docs on the
 structured array page and notes in the docstrings that they also work
 for ndarrays with structured dtype.
 
 I'll submit a pull request soon and maybe that'll generate some interest.
 
 
 Had a brief look at what getting lib.recfunctions into rec/core.rec
 namespace would entail. It's not as simple as it could be, because
 there are circular imports between core.records and recfunctions (and
 its imports). It seems that it is possible to work around the circular
 imports in some of the code except for the degree to which
 recfunctions is wrapped up with the masked array code.

Hello,
The idea behin having a lib.recfunctions and not a rec.recfunctions or whatever 
was to illustrate that the functions of this package are more generic than they 
appear. They work with regular structured ndarrays and don't need recarrays. 
Methinks we gonna lose this aspect if you try to rename it, but hey, your call.
As to as why they were never really advertised ? Because I never received any 
feedback when I started developing them (developing is a big word here, I just 
took a lot of code that John D Hunter had developed in matplotlib and make it 
more consistent). I advertised them once or twice on the list, wrote the basic 
docstrings, but waited for other people to start using them. 
Anyhow.
So, yes, there might be some weird import to polish. Note that if you decided 
to just rename the package and leave it where it was, it would probably be 
easier.


 The path of least resistance is to just import lib.recfunctions.* into
 the (already crowded) main numpy namespace and be done with it.

Why ? Why can't you leave it available through numpy.lib ? Once again, if it's 
only a matter of PRing, you could start writing an entry page in the doc 
describing the functions, that would improve the visibility.


 Another option, though it's more work, is to remove all the internal
 masked array support and let the user decide what do with the
 record/structured arrays after they're returned (I invariably have to
 set usemask=False anyway).

Or you just port the functions in numpy.ma (making a numpy.ma.recfunctions, for 
example). 


 The functions can then be wrapped by
 higher-level ones in np.ma if the old usemask behavior is still
 desirable for people. This should probably wait until the new masked
 array changes are in and settled a bit though.

Oh yes... I agree with that
P.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving lib.recfunctions?

2011-07-05 Thread Skipper Seabold
On Tue, Jul 5, 2011 at 2:46 PM, Pierre GM pgmdevl...@gmail.com wrote:

 On Jul 5, 2011, at 8:33 PM, Skipper Seabold wrote:

 On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold jsseab...@gmail.com wrote:
 On Fri, Jul 1, 2011 at 2:22 PM,  josef.p...@gmail.com wrote:
 On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold jsseab...@gmail.com 
 wrote:
 lib.recfunctions has never been fully advertised. The two bugs I just
 discovered lead me to believe that it's not that well vetted, but it
 is useful. I can't be the only one using these?

 What do people think of either deprecating lib.recfunctions or at
 least importing them into the numpy.rec namespace?

 I'm sure this has come up before, but gmane search isn't working for me.

 about once a year

 http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655

 my guess is not much has changed since then


 Ah, yes. I recall now.

 I agree that they're more general than rec, but also don't have a
 first best solution for this. So I think we should move them (in a
 correct way) to numpy.rec and add (some of?) them as methods to
 recarrays. The best we can do beyond that is put some docs on the
 structured array page and notes in the docstrings that they also work
 for ndarrays with structured dtype.

 I'll submit a pull request soon and maybe that'll generate some interest.


 Had a brief look at what getting lib.recfunctions into rec/core.rec
 namespace would entail. It's not as simple as it could be, because
 there are circular imports between core.records and recfunctions (and
 its imports). It seems that it is possible to work around the circular
 imports in some of the code except for the degree to which
 recfunctions is wrapped up with the masked array code.

 Hello,
 The idea behin having a lib.recfunctions and not a rec.recfunctions or 
 whatever was to illustrate that the functions of this package are more 
 generic than they appear. They work with regular structured ndarrays and 
 don't need recarrays. Methinks we gonna lose this aspect if you try to rename 
 it, but hey, your call.

I agree (even though 'rec' is already in the name). My goal was to
just have numpy.rec.join_by, numpy.rec.stack_arrays, etc, so they're
right there (rec seems more intuitive than lib to me). Do you think
that they may be better off in the main numpy namespace? This is far
from my call, just trying to reach some consensus and make an effort
to move the status quo.

 As to as why they were never really advertised ? Because I never received any 
 feedback when I started developing them (developing is a big word here, I 
 just took a lot of code that John D Hunter had developed in matplotlib and 
 make it more consistent). I advertised them once or twice on the list, wrote 
 the basic docstrings, but waited for other people to start using them.

As Josef pointed out before, it's a chicken and egg thing re:
advertisement and feedback. I think the best advertisement is by
namespace. I use them frequently, and I haven't offered any feedback
because I've never been left wanting (recent pull request is the only
exception). For the most part they do what I want and the docs are
good.

 Anyhow.
 So, yes, there might be some weird import to polish. Note that if you decided 
 to just rename the package and leave it where it was, it would probably be 
 easier.


Imports are fine as long as they stay where they are and aren't
imported into numpy.core.


 The path of least resistance is to just import lib.recfunctions.* into
 the (already crowded) main numpy namespace and be done with it.

 Why ? Why can't you leave it available through numpy.lib ? Once again, if 
 it's only a matter of PRing, you could start writing an entry page in the doc 
 describing the functions, that would improve the visibility.

I'm fine with leaving the code where it is, but typing
numpy.lib.recfunctions.function is painful (ditto `import
numpy.lib.recfunctions as nprf`). Every time. And I have to do this
often. Even if they are imported into the lib namespace (they aren't),
it would be an improvement, but I still don't think it occurs to
people to hunt through lib to try and join two structured arrays. It
looks like everything in the lib namespace is imported into the main
numpy namespace anyway. And 2) I found a little buglet recently that
made me think this code should be banged on more. The best way to do
this is to get it out there. If other users are anything like me, I
rely on tab-completion and docstrings not online docs for working with
projects that I don't need to be intimately familiar with, the
implication being that lib is intimate, I guess.

Skipper
(standing astride this molehill)



 Another option, though it's more work, is to remove all the internal
 masked array support and let the user decide what do with the
 record/structured arrays after they're returned (I invariably have to
 set usemask=False anyway).

 Or you just port the functions in numpy.ma (making a numpy.ma.recfunctions, 
 

Re: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design

2011-07-05 Thread Lluís
Mark Wiebe writes:

 We'll have someone taking notes to create a summary as Nathaniel suggested.

Thanks.

-- 
 And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer.
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing/accumulating data

2011-07-05 Thread Christopher Barker
Mark Wiebe wrote:
 Speaking of which, would we make the NA value be false?
 
 For booleans, it works out like this:
 
 http://en.wikipedia.org/wiki/Ternary_logic#Kleene_logic

That's pretty cool!

 In R, trying to test the truth value of NA (if (NA) ...) raises an 
 exception. Adopting this behavior seems reasonable to me.

I'm not so sure. the other president is Python, where None is 
interpreted as False.

In general, in non-numpy code, I use None to mean not set yet or I'm 
not sure, or, whatever. It's pretty useful to have it be false.

However, I also do:

if x is not None:

rather than-

if x:

so as to be unambiguous about what I'm testing for (and because if x == 
0, I don't want the test to fail), so I guess:

if arr[i] is np.NA:

would be perfectly analogous.

-Chris






 -Mark
  
 
 
 -Chris
 
 
 --
 Christopher Barker, Ph.D.
 Oceanographer
 
 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959 tel:%28206%29%20526-6959  
 voice
 7600 Sand Point Way NE   (206) 526-6329 tel:%28206%29%20526-6329   fax
 Seattle, WA  98115   (206) 526-6317 tel:%28206%29%20526-6317  
 main reception
 
 chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] suggestions on optimising a special matrix reshape

2011-07-05 Thread qubax
i have to reshape a matrix beta of the form (4**N, 4**N, 4**N, 4**N)
into betam like (16**N, 16**N) following:

betam = np.zeros((16**N,16**N), dtype = complex)
for k in xrange(16**N):
ind1 = np.mod(k,4**N)
ind2 = k/4**N
for l in xrange(16**N):
betam[k,l] = beta[np.mod(l,4**N), l/4**N, ind1 , ind2]

is there a smarter/faster way of getting the above done?

for N=2, that already takes 0.5 seconds but i intend to use it
for N=3 and N=4 ...

thanks for your input,
q

-- 
History consists of nothing more than the lies we tell ourselves
to justify the present.

The king who needs to remind his people of his rank, is no king.

A beggar's mistake harms no one but the beggar. A king's mistake,
however, harms everyone but the king. Too often, the measure of
power lies not in the number who obey your will, but in the number
who suffer your stupidity.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggestions on optimising a special matrix reshape

2011-07-05 Thread Christopher Barker
qu...@gmx.at wrote:
 i have to reshape a matrix beta of the form (4**N, 4**N, 4**N, 4**N)
 into betam like (16**N, 16**N) following:
 
 betam = np.zeros((16**N,16**N), dtype = complex)
 for k in xrange(16**N):
 ind1 = np.mod(k,4**N)
 ind2 = k/4**N
 for l in xrange(16**N):
 betam[k,l] = beta[np.mod(l,4**N), l/4**N, ind1 , ind2]
 
 is there a smarter/faster way of getting the above done?

no time to check if this is what you want, but is this it?

a = np.arange((4**(4*N))).reshape(4**N,4**N,4**N,4**N)

b = a.reshape((16**N, 16**N))

If that doesn't do it right, you may be able to mess with the strides, 
etc. do some googling, and check out:

numpy.lib.stride_tricks

-Chris



 for N=2, that already takes 0.5 seconds but i intend to use it
 for N=3 and N=4 ...
 
 thanks for your input,
 q
 


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2

2011-07-05 Thread Russell E. Owen
In article BANLkTi=LXiTcrv1LgMtP=p9nF8eMr8=+h...@mail.gmail.com,
 Ralf Gommers ralf.gomm...@googlemail.com wrote:

 https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/

Will there be a Mac binary for 32-bit pythons (one that is compatible 
with older versions of MacOS X)? At present I only see a 64-bit 
10.6-only version.

-- Russell

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving lib.recfunctions?

2011-07-05 Thread Pierre GM

On Jul 5, 2011, at 9:23 PM, Skipper Seabold wrote:

 On Tue, Jul 5, 2011 at 2:46 PM, Pierre GM pgmdevl...@gmail.com wrote:
 
 ...
 
 Hello,
 The idea behin having a lib.recfunctions and not a rec.recfunctions or 
 whatever was to illustrate that the functions of this package are more 
 generic than they appear. They work with regular structured ndarrays and 
 don't need recarrays. Methinks we gonna lose this aspect if you try to 
 rename it, but hey, your call.
 
 I agree (even though 'rec' is already in the name). My goal was to
 just have numpy.rec.join_by, numpy.rec.stack_arrays, etc, so they're
 right there (rec seems more intuitive than lib to me). Do you think
 that they may be better off in the main numpy namespace? This is far
 from my call, just trying to reach some consensus and make an effort
 to move the status quo.

Sure, a np.join_by or np.stack_array is easy and non-ambiguous enough...


 As to as why they were never really advertised ? Because I never received 
 any feedback when I started developing them (developing is a big word here, 
 I just took a lot of code that John D Hunter had developed in matplotlib and 
 make it more consistent). I advertised them once or twice on the list, wrote 
 the basic docstrings, but waited for other people to start using them.
 
 As Josef pointed out before, it's a chicken and egg thing re:
 advertisement and feedback. I think the best advertisement is by
 namespace. I use them frequently, and I haven't offered any feedback
 because I've never been left wanting (recent pull request is the only
 exception). For the most part they do what I want and the docs are
 good.

Cool ! 
 
 
 The path of least resistance is to just import lib.recfunctions.* into
 the (already crowded) main numpy namespace and be done with it.
 
 Why ? Why can't you leave it available through numpy.lib ? Once again, if 
 it's only a matter of PRing, you could start writing an entry page in the 
 doc describing the functions, that would improve the visibility.
 
 I'm fine with leaving the code where it is, but typing
 numpy.lib.recfunctions.function is painful (ditto `import
 numpy.lib.recfunctions as nprf`). Every time. And I have to do this
 often.

You have a point. As long as nobody minds and you don't lose too much time 
trying to tweak the imports, I'm quite OK with it.


 Even if they are imported into the lib namespace (they aren't),
 it would be an improvement, but I still don't think it occurs to
 people to hunt through lib to try and join two structured arrays. It
 looks like everything in the lib namespace is imported into the main
 numpy namespace anyway. And 2) I found a little buglet recently that
 made me think this code should be banged on more. The best way to do
 this is to get it out there. If other users are anything like me, I
 rely on tab-completion and docstrings not online docs for working with
 projects that I don't need to be intimately familiar with, the
 implication being that lib is intimate, I guess.
 
 Skipper
 (standing astride this molehill)

Trample it! 
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NA/Missing Data Conference Call Summary

2011-07-05 Thread Christopher Jordan-Squire
Here's a short-ish summary of the topics discussed in the conference call
this afternoon. WARNING: I try to give examples for everything discussed to
make it as concrete as possible. However, most of the examples were not
explicitly discussed during the conference. I apologize in advance if I
mischaracterize anyone's arguments, and please jump in to correct me if I
did.

Participants: Travis Oliphant, Mark Wiebe, Matthew Brett, Nathaniel Smith,
Pierre GM, Ben Root, Chuck Harris, Wes McKinney, Chris Jordan-Squire

First, areas of broad agreement:
*There should be more functionality for missing data
*There should be dtypes which support missing data ('parameterized dtypes'
in the current NEP)
*Adding a 'where' semantic to ufuncs
*Have the same data with different sets of missing elements in different
views
*Easy for non-expert numpy users

Since we only have Mark is only around Austin until early August, there's
also broad agreement that we need to get something done quickly. However,
the numpy community (and Travis in particular) are balancing this against
the possibility of a sub-optimal solution which can't be taken back.

BIT PATTERN  MASK IMPLEMENTATIONS FOR NA
--

The current NEP proposes both mask and bit pattern implementations for
missing data. I use the terms bit pattern and parameterized dtype
interchangeably, since the parameterized dtype will use a bit pattern for
its implementation. The two implementations will support the same
functionality with respect to NA, and the implementation details will be
largely invisible to the user. Their differences are in the 'extra' features
each supports.

Two common questions were:
1. Why make two implementations of missing data: one with masks and the
other with parameterized dtypes?
2. Why does the implementation using masks have higher priority?

The answers are:
1.  The mask implementation is more general and easier to implement and
maintain.  The bit pattern implementation saves memory, makes
interoperability easier, and makes ABI (Application Binary Interface)
compatibility easier. Since each has different strengths, the argument is
both should be implemented.
2. The implementation for the parameterized dtypes will rely on the
implementation using a mask.


NA VS. IGNORE
-

A lot of discussion centered on IGNORE vs. NA types. We take IGNORE in aNEP
sense and NA in  NEP sense. With NA, there is a clear notion of how NA
propagates through all basic numpy operations.  (e.g., 3+NA=NA and log(NA) =
NA, while NA | True = True.) IGNORE is separate from NA, with different
interpretations depending on the use case.

IGNORE could mean:
1. Data that is being temporarily ignored. e.g., a possible outlier that is
temporarily being removed from consideration.
2. Data that cannot exist. e.g., a matrix representing a grid of water
depths for a lake. Since the lake isn't square, some entries will represent
land, and so depth will be a meaningless concept for those entries.
3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], [IGNORE,
3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] ]. Though this
leaves open how [1, 2, IGNORE] + [3 , 4] should behave.

Because of these different uses of IGNORE, it doesn't have as clear a
theoretical interpretation as NA. (For instance, what is IGNORE+3, IGNORE*3,
or IGNORE | True?)

But several of the discussants thought the use cases for IGNORE were very
compelling. Specifically, they wanted to be able to use IGNORE's and NA's
simultaneously while still being able to differentiate between them. So, for
example, being able to designate some data as IGNORE while still able to
determine which data was NA but not IGNORE. The current NEP does not allow
for this directly. Although in some cases it can be indirectly done via
views. (By taking a view of the original data, expanding the values which
are considered NA in the view, and then comparing with the original data to
see if the NA is in the original or not.) Since both are possible in this
sense, Mark's NEP makes it so IGNORE is allowed but isn't the default.

Another important point from the current NEP is that not being able to
access values considered missing, even if the implementation of missingness
is via a mask, is a feature and not a bug. It is a feature because if the
data is missing then, conceptually, neither the user nor any function the
user calls should be able to obtain that data. This is precisely why the
indirect route, via views of the original data, is required to access data
that a different view says is missing.

The current NEP treats all NA's the same. The reasoning is that, regardless
of where the NA originated, the functions the numpy array is fed in to will
either ignore all NA's or propagate them (i.e. not ignore them). These two
different behaviors are chosen when passed into a ufunc by setting the
skipna ufunc 

Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

2011-07-05 Thread Benjamin Root
Thanks for these notes.  Just a couple of thoughts as I looked over these
notes.

On Tue, Jul 5, 2011 at 6:46 PM, Christopher Jordan-Squire
cjord...@uw.eduwrote:



3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], [IGNORE,
 3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] ]. Though this
 leaves open how [1, 2, IGNORE] + [3 , 4] should behave.


I don't think there is any confusion about that particular case. Even when
using the IGNORE semantics, numpy broadcasting rules are still in play.
This particular case should throw an exception.



 Because of these different uses of IGNORE, it doesn't have as clear a
 theoretical interpretation as NA. (For instance, what is IGNORE+3, IGNORE*3,
 or IGNORE | True?)


I think we were more referring to matrix operations like dot products.
Element-by-element operations should still behave the same as NA.  Scalar
operations should return IGNORE.

HOW DOES THIS RELATE TO THE CURRENT MASKED ARRAY?

 

 Everyone seems to agree they'd love it if this could encompass all current
 use cases of the numpy.ma arrays, so numpy.ma arrays could be deprecated.
 (However they wouldn't be eliminated for several years, even in the most
 optimistic scenarios.)


This is going to be a very tricky thing to handle and it is going to require
coordination and agreements among many of the third-party toolkits like
scipy and matplotlib.


In addition to these notes (unless I missed it), Nathaniel pointed out that
with the ufunc where= parameter feature and the ufunc wrapper, we have the
potential to greatly improve the codebase of numpy.ma as it stands.
Potentially mitigating the need for moving more of numpy.ma into the core,
and to focus more on NA.  While I am not 100% on board with this idea, I can
definitely see the potential for this path.

Thanks everybody for the productive chat!
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2

2011-07-05 Thread Ralf Gommers
On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen ro...@uw.edu wrote:

 In article BANLkTi=LXiTcrv1LgMtP=p9nF8eMr8=+h...@mail.gmail.com,
  Ralf Gommers ralf.gomm...@googlemail.com wrote:

  https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/

 Will there be a Mac binary for 32-bit pythons (one that is compatible
 with older versions of MacOS X)? At present I only see a 64-bit
 10.6-only version.


 Yes there will be for the final release (10.4-10.6 compatible). I can't
create those on my own computer, so sometimes I don't make them for RCs.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion