Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Anne Archibald
2008/9/19 David Cournapeau [EMAIL PROTECTED]:
 Anne Archibald wrote:

 That was in amax/amin. Pretty much every other function that does
 comparisons needs to be fixed to work with nans. In some cases it's
 not even clear how: where should a sort put the nans in an array?

 The problem is more on how the functions use sort than sort itself in
 the case of median. There can't be a 'good' way to put nan in soft, for
 example, since nans cannot be ordered.

Well, for example, you might ask that all the non-nan elements be in
order, even if you don't specify where the nan goes.

 I don't know about the best strategy: either we fix every function using
 comparison, handling nan as a special case as you mentioned, or there
 may be a more clever thing to do to avoid special casing everywhere. I
 don't have a clear idea of how many functions rely on ordering in numpy.

You can always just set numpy to raise an exception whenever it comes
across a nan. In fact, apart from the difficulty of correctly frobbing
numpy's floating-point handling, how reasonable is it for (say) median
to just run as it is now, but if an exception is thrown, fall back to
a nan-aware version?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread David Cournapeau
Anne Archibald wrote:

 Well, for example, you might ask that all the non-nan elements be in
 order, even if you don't specify where the nan goes.


Ah, there are two problems, then:
- sort
- how median use sort.

For sort, I don't know how sort speed would be influenced by treating
nan. In a way, calling sort with nan inside is a user error (if you take
the POV nan are not comparable), but nan are used for all kind of
purpose, hence maybe having a nansort would be nice. OTOH (I took a look
at this when I fixed nanmean and co a while ago in scipy), matlab and R
treat sort differently than mean and co.

I am puzzled by this:
- R sort arrays with nan as you want by default (nan can be ignored,
put in front or at the end of the array).
- R max does not ignore nan by default.
- R median does not ignore median by default.

I don't know how to set a consistency here. I don't think we are
consistent by having max/amax/etc... ignoring nan but sort not ignoring
it. OTOH, R is not consistent either.


 You can always just set numpy to raise an exception whenever it comes
 across a nan. In fact, apart from the difficulty of correctly frobbing
 numpy's floating-point handling, how reasonable is it for (say) median
 to just run as it is now, but if an exception is thrown, fall back to
 a nan-aware version?

It would be different from the current nan vs usual function behavior
for median/mean/etc...: why should sort handle nan by default, but not
the other functions ? For mean/std/variance/median, if having nan is an
error, you see it in the result (once we fix our median), but not with sort.

Hm, I am always puzzled when I think about nan handling :) It always
seem there is not good answer.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Pierre GM
On Friday 19 September 2008 03:11:05 David Cournapeau wrote:

 Hm, I am always puzzled when I think about nan handling :) It always
 seem there is not good answer.

Which is why we have masked arrays, of course ;)
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Anne Archibald
2008/9/19 Pierre GM [EMAIL PROTECTED]:
 On Friday 19 September 2008 03:11:05 David Cournapeau wrote:

 Hm, I am always puzzled when I think about nan handling :) It always
 seem there is not good answer.

 Which is why we have masked arrays, of course ;)

I think the numpy attitude to nans should be that they are unexpected
bogus values that signify that something went wrong with the
calculation somewhere. They can be left in place for most operations,
but any operation that depends on the value should (ideally) return
nan, or failing that, raise an exception. (If users want exceptions
all the time, that's what seterr is for.) If people want to flag bad
data, let's tell them to use masked arrays.

So by this rule amax/maximum/mean/median should all return nan when
there's a nan in their input; I don't think it's reasonable for sort
to return an array full of nans, so I think its default behaviour
should be to raise an exception if there's a nan. It's valuable (for
example in median) to be able to sort them all to the end, but I don't
think this should be the default. If people want nanmin, I would be
tempted to tell them to use masked arrays (is there a convenience
function that makes a masked array with a mask everywhere the data is
nan?).

I am assuming that appropriate masked sort/amax/maximum/mean/median
exist already. They're definitely needed, so how much effort is it
worth putting in to duplicate that functionality with nans instead of
masked elements?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion for recarray.view

2008-09-19 Thread Stéfan van der Walt
2008/9/19 Travis E. Oliphant [EMAIL PROTECTED]:
 #---
 def view(self, dtype=None, type=None):
 if dtype is None:
 return ndarray.view(self, type)
 elif type is None:
 try:
 if issubclass(dtype, ndarray):
 return ndarray.view(self, dtype)
 except TypeError:
 pass
 dtype = sb.dtype(dtype)
 if dtype.fields is None:
 return self.__array__().view(dtype)
 return ndarray.view(self, dtype)
 else:
 return ndarray.view(self, dtype, type)
 #---

 This looks pretty good to me.

 +1 for adding it.

+1 and another +1 to your karma for requesting peer review.  Let me
know if you need me to whip up a couple of tests for verifying the
different usage cases.

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] profiling line by line

2008-09-19 Thread Robert Cimrman
Robert Kern wrote:
 On Thu, Sep 18, 2008 at 06:01, Robert Cimrman [EMAIL PROTECTED] wrote:
 Hi Robert,

 Robert Kern wrote:
 On Mon, Sep 15, 2008 at 11:13, Arnar Flatberg [EMAIL PROTECTED] wrote:
 That would make me an extremely happy user, I've been looking for this for
 years!
 I can't imagine I'm the only one who profiles some hundred lines of code 
 and
 ends up with 90% of total time in the dot-function
 For the time being, you can grab it here:

 http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/

 It requires Cython and a C compiler to build. I'm still debating
 myself about the desired workflow for using it, but for now, it only
 profiles functions which you have registered with it. I have made the
 profiler work as a decorator to make this easy. E.g.,
 many thanks for this! I have wanted to try out the profiler but failed
 to build it (changeset 6 0de294aa75bf):

 $ python setup.py install --root=/home/share/software/
 running install
 running build
 running build_py
 creating build
 creating build/lib.linux-i686-2.4
 copying line_profiler.py - build/lib.linux-i686-2.4
 running build_ext
 cythoning _line_profiler.pyx to _line_profiler.c
 building '_line_profiler' extension
 creating build/temp.linux-i686-2.4
 i486-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC
 -I/usr/include/python2.4 -c -I/usr/include/python2.4 -c _line_profiler.c
 -o build/temp.linux-i686-2.4/_line_profiler.o
 _line_profiler.c:1614: error: 'T_LONGLONG' undeclared here (not in a
 function)
 error: command 'i486-pc-linux-gnu-gcc' failed with exit status 1

 I have cython-0.9.8.1 and GCC 4.1.2, 32-bit machine.
 
 It uses the #define'd macro PY_LONG_LONG. Go through your Python
 headers to see what this gets expanded to.
 

I have Python 2.4.4

in pyconfig.h

#define HAVE_LONG_LONG 1

in pyport.h:

#ifdef HAVE_LONG_LONG
#ifndef PY_LONG_LONG
#define PY_LONG_LONG long long
#endif
#endif /* HAVE_LONG_LONG */

so it seems compatible with 'ctypedef long long PY_LONG_LONG'

r.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Stéfan van der Walt
2008/9/19 Anne Archibald [EMAIL PROTECTED]:
 I think the numpy attitude to nans should be that they are unexpected
 bogus values that signify that something went wrong with the
 calculation somewhere. They can be left in place for most operations,
 but any operation that depends on the value should (ideally) return
 nan, or failing that, raise an exception.

I agree completely.

 I am assuming that appropriate masked sort/amax/maximum/mean/median
 exist already. They're definitely needed, so how much effort is it
 worth putting in to duplicate that functionality with nans instead of
 masked elements?

Unfortunately, this needs to happen at the C level.  Is anyone reading
this willing to spend some time taking care of the issue?  It's an
important one.

Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] profiling line by line

2008-09-19 Thread Robert Cimrman
Ondrej Certik wrote:
 On Thu, Sep 18, 2008 at 4:12 PM, Ryan May [EMAIL PROTECTED] wrote:
 Ondrej Certik wrote:
 On Thu, Sep 18, 2008 at 1:01 PM, Robert Cimrman [EMAIL PROTECTED] wrote:
 It requires Cython and a C compiler to build. I'm still debating
 myself about the desired workflow for using it, but for now, it only
 profiles functions which you have registered with it. I have made the
 profiler work as a decorator to make this easy. E.g.,
 many thanks for this! I have wanted to try out the profiler but failed
 to build it (changeset 6 0de294aa75bf):

 $ python setup.py install --root=/home/share/software/
 running install
 running build
 running build_py
 creating build
 creating build/lib.linux-i686-2.4
 copying line_profiler.py - build/lib.linux-i686-2.4
 running build_ext
 cythoning _line_profiler.pyx to _line_profiler.c
 building '_line_profiler' extension
 creating build/temp.linux-i686-2.4
 i486-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC
 -I/usr/include/python2.4 -c -I/usr/include/python2.4 -c _line_profiler.c
 -o build/temp.linux-i686-2.4/_line_profiler.o
 _line_profiler.c:1614: error: 'T_LONGLONG' undeclared here (not in a
 function)
 error: command 'i486-pc-linux-gnu-gcc' failed with exit status 1

 I have cython-0.9.8.1 and GCC 4.1.2, 32-bit machine.
 I am telling you all the time Robert to use Debian that it just works
 and you say, no no, gentoo is the best. :)
 And what's wrong with that? :)  Once you get over the learning curve,
 Gentoo works just fine.  Must be Robert K.'s fault. :)
 
 Well, I think if Robert C. hasn't yet get over the learning curve
 after so many years of hard work, maybe the learning curve is too
 steep. :)

This is most probably not related to Gentoo at all and certainly not 
related to me knowing Gentoo or not :) (and no, learning Gentoo is not 
that hard.)

r.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion for recarray.view

2008-09-19 Thread Pierre GM
On Friday 19 September 2008 04:13:39 Stéfan van der Walt wrote:

 +1 and another +1 to your karma for requesting peer review.  Let me
 know if you need me to whip up a couple of tests for verifying the
 different usage cases.

That'd be lovely. I'm a bit swamped with tricky issues in mrecords and 
dependents...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread David Cournapeau
Stéfan van der Walt wrote:

 I agree completely.

Me too, but I am extremely biased toward nan is always bogus by my own
usage of numpy/scipy (I never use NaN as missing value, and nan is
always caused by divide by 0 and co).

I like that sort raise an exception by default with NaN: it breaks the
API, OTOH, I can't see a good use of sort with NaN since sort does not
sort values in that case: we would break the API of a broken function.


 Unfortunately, this needs to happen at the C level. 

Why ?

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Pierre GM
On Friday 19 September 2008 04:31:38 David Cournapeau wrote:
 Pierre GM wrote:
  That said, numpy.nanmin, numpy.nansum... don't come with the heavy
  machinery of numpy.ma, and are therefore faster.
  I'm really going to have to learn C.

 FWIW, nanmean/nanmean/etc... are written in python,

I know. I was more dreading the time when MaskedArrays would have to be ported 
to C. In a way, that would probably simplify a few issues. OTOH, I don't 
really see it happening any time soon.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Generating random samples without repeats

2008-09-19 Thread Paul Moore
Robert Kern robert.kern at gmail.com writes:
 On Thu, Sep 18, 2008 at 16:55, Paul Moore pf_moore at yahoo.co.uk wrote:
  I want to generate a series of random samples, to do simulations based
  on them. Essentially, I want to be able to produce a SAMPLESIZE * N
  matrix, where each row of N values consists of either
 
  1. Integers between 1 and M (simulating M rolls of an N-sided die), or
  2. A sample of N numbers between 1 and M without repeats (simulating
 deals of N cards from an M-card deck).
 
  Example (1) is easy, numpy.random.random_integers(1, M, (SAMPLESIZE, N))
 
  But I can't find an obvious equivalent for (2). Am I missing something
  glaringly obvious? I'm using numpy - is there maybe something in scipy I
  should be looking at?
 
 numpy.array([(numpy.random.permutation(M) + 1)[:N]
 for i in range(SAMPLESIZE)])
 

Thanks.

And yet, this takes over 70s and peaks at around 400M memory use, whereas the 
equivalent for (1)

numpy.random.random_integers(1,M,(SAMPLESIZE,N))

takes less than half a second, and negligible working memory (both end up 
allocating an array of the same size, but your suggestion consumes temporary 
working memory - I suspect, but can't prove, that the time taken comes from 
memory allocations rather than computation.

As a one-off cost initialising my data, it's not a disaster, but I anticipate 
using idioms like this later in my calculations as well, where the costs could 
hurt more.

If I'm going to need to write C code, are there any good examples of this? (I 
guess the source for numpy.random is a good place to start).

Paul

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Stéfan van der Walt
2008/9/19 David Cournapeau [EMAIL PROTECTED]:
 Stéfan van der Walt wrote:

 I agree completely.

 Me too, but I am extremely biased toward nan is always bogus by my own
 usage of numpy/scipy (I never use NaN as missing value, and nan is
 always caused by divide by 0 and co).

So am I.  In all my use cases, NaNs indicate trouble.

 Why ?

Because we have x.max() silently ignoring NaNs, which causes a lot of
head-scratching, swearing and failed experiments.

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Generating random samples without repeats

2008-09-19 Thread Pierre GM
On Friday 19 September 2008 05:08:20 Paul Moore wrote:
 Robert Kern robert.kern at gmail.com writes:
  On Thu, Sep 18, 2008 at 16:55, Paul Moore pf_moore at yahoo.co.uk 
wrote:
   I want to generate a series of random samples, to do simulations based
   on them. Essentially, I want to be able to produce a SAMPLESIZE * N
   matrix, where each row of N values consists of either

   2. A sample of N numbers between 1 and M without repeats (simulating
  deals of N cards from an M-card deck).

Have you considered numpy.random.shuffle ?

a = np.arange(1, M+1)
result = np.random.shuffle(a)[:N]



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] profiling line by line

2008-09-19 Thread Robert Kern
On Fri, Sep 19, 2008 at 03:33, Robert Cimrman [EMAIL PROTECTED] wrote:

 I have Python 2.4.4

 in pyconfig.h

 #define HAVE_LONG_LONG 1

 in pyport.h:

 #ifdef HAVE_LONG_LONG
 #ifndef PY_LONG_LONG
 #define PY_LONG_LONG long long
 #endif
 #endif /* HAVE_LONG_LONG */

 so it seems compatible with 'ctypedef long long PY_LONG_LONG'

Ah, found it. T_LONGLONG is a #define from structmember.h which is
used to describe the types of attributes. Apparently, this was not
added until Python 2.5. That particular member didn't actually need to
be long long, so I've fixed that.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] profiling line by line

2008-09-19 Thread Ondrej Certik
On Fri, Sep 19, 2008 at 10:37 AM, Robert Cimrman [EMAIL PROTECTED] wrote:
 Ondrej Certik wrote:
 On Thu, Sep 18, 2008 at 4:12 PM, Ryan May [EMAIL PROTECTED] wrote:
 Ondrej Certik wrote:
 On Thu, Sep 18, 2008 at 1:01 PM, Robert Cimrman [EMAIL PROTECTED] wrote:
 It requires Cython and a C compiler to build. I'm still debating
 myself about the desired workflow for using it, but for now, it only
 profiles functions which you have registered with it. I have made the
 profiler work as a decorator to make this easy. E.g.,
 many thanks for this! I have wanted to try out the profiler but failed
 to build it (changeset 6 0de294aa75bf):

 $ python setup.py install --root=/home/share/software/
 running install
 running build
 running build_py
 creating build
 creating build/lib.linux-i686-2.4
 copying line_profiler.py - build/lib.linux-i686-2.4
 running build_ext
 cythoning _line_profiler.pyx to _line_profiler.c
 building '_line_profiler' extension
 creating build/temp.linux-i686-2.4
 i486-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC
 -I/usr/include/python2.4 -c -I/usr/include/python2.4 -c _line_profiler.c
 -o build/temp.linux-i686-2.4/_line_profiler.o
 _line_profiler.c:1614: error: 'T_LONGLONG' undeclared here (not in a
 function)
 error: command 'i486-pc-linux-gnu-gcc' failed with exit status 1

 I have cython-0.9.8.1 and GCC 4.1.2, 32-bit machine.
 I am telling you all the time Robert to use Debian that it just works
 and you say, no no, gentoo is the best. :)
 And what's wrong with that? :)  Once you get over the learning curve,
 Gentoo works just fine.  Must be Robert K.'s fault. :)

 Well, I think if Robert C. hasn't yet get over the learning curve
 after so many years of hard work, maybe the learning curve is too
 steep. :)

 This is most probably not related to Gentoo at all and certainly not
 related to me knowing Gentoo or not :) (and no, learning Gentoo is not
 that hard.)

Let us know where the problem was. :) I am just using common sense, if
something works on Debian and macosx and doesn't work on gentoo, I
thought it was safe to say it was gentoo related, but I may well be
wrong. :))

Ondrej
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] profiling line by line

2008-09-19 Thread Robert Kern
On Wed, Sep 17, 2008 at 18:29, Robert Kern [EMAIL PROTECTED] wrote:
 On Wed, Sep 17, 2008 at 18:09, Ondrej Certik [EMAIL PROTECTED] wrote:

 This is what I am getting:

 $ ./kernprof.py -l pystone.py
 Wrote profile results to pystone.py.lprof
 $ ./view_line_prof.py pystone.py.lprof
 Timer unit: 1e-06 s

 $

 So I think you meant:

 $ ./kernprof.py -l mystone.py
 20628
 Wrote profile results to mystone.py.lprof
 $ ./view_line_prof.py mystone.py.lprof
 Timer unit: 1e-06 s

 File: pystone.py
 Function: Proc0 at line 79
 Total time: 13.0803 s
 [...]

 Now it works.

 No, I meant pystone.py. My script-finding code may have (incorrectly)
 found a different, uninstrumented pystone.py file somewhere else,
 though. Try with ./pystone.py.

There was a bug in how I was constructing the munged namespaces. Fixed now.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread David Cournapeau
Stéfan van der Walt wrote:

 So am I.  In all my use cases, NaNs indicate trouble.

Yes, so I would like to  have the opinion of people with other usage
than ours.

 Because we have x.max() silently ignoring NaNs, which causes a lot of
 head-scratching, swearing and failed experiments.

But cannot this be fixed at the python level of the max function ? I
think it is expected to have the low level C functions to ignore/be
bogus if you have Nan. After all, if you use sort of the libc with nan,
or sort in C++ for a vector of double, it will not work either.

But on my numpy, it looks like nan breaks min/max, they are not ignored:

np.min(np.array([0, np.nan, 1]))
- 1.0 # bogus

np.min(np.array([0, np.nan, 2]))
- 2.0 # ok

np.min(np.array([0, np.nan, -1]))
- -1.0 # ok

np.max(np.array([0, np.nan, -1]))
 -1.0 # bogus

Which only makes sense when you guess how they are implemented in C...

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Generating random samples without repeats

2008-09-19 Thread Anne Archibald
2008/9/19 Paul Moore [EMAIL PROTECTED]:
 Robert Kern robert.kern at gmail.com writes:
 On Thu, Sep 18, 2008 at 16:55, Paul Moore pf_moore at yahoo.co.uk wrote:
  I want to generate a series of random samples, to do simulations based
  on them. Essentially, I want to be able to produce a SAMPLESIZE * N
  matrix, where each row of N values consists of either
 
  1. Integers between 1 and M (simulating M rolls of an N-sided die), or
  2. A sample of N numbers between 1 and M without repeats (simulating
 deals of N cards from an M-card deck).
 
  Example (1) is easy, numpy.random.random_integers(1, M, (SAMPLESIZE, N))
 
  But I can't find an obvious equivalent for (2). Am I missing something
  glaringly obvious? I'm using numpy - is there maybe something in scipy I
  should be looking at?

 numpy.array([(numpy.random.permutation(M) + 1)[:N]
 for i in range(SAMPLESIZE)])


 Thanks.

 And yet, this takes over 70s and peaks at around 400M memory use, whereas the
 equivalent for (1)

 numpy.random.random_integers(1,M,(SAMPLESIZE,N))

 takes less than half a second, and negligible working memory (both end up
 allocating an array of the same size, but your suggestion consumes temporary
 working memory - I suspect, but can't prove, that the time taken comes from
 memory allocations rather than computation.

 As a one-off cost initialising my data, it's not a disaster, but I anticipate
 using idioms like this later in my calculations as well, where the costs could
 hurt more.

 If I'm going to need to write C code, are there any good examples of this? (I
 guess the source for numpy.random is a good place to start).

This was discussed on one of the mailing lists several months ago. It
turns out that there is no simple way to efficiently choose without
replacement in numpy/scipy. I posted a hack that does this somewhat
efficiently (if SAMPLESIZEM/2, choose the first SAMPLESIZE of a
permutation; if SAMPLESIZEM/2, choose with replacement and redraw any
duplicates) but it's not vectorized across many sample sets. Is your
problem large M or large N? what is SAMPLESIZE/M?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Peter Saffrey
David Cournapeau david at ar.media.kyoto-u.ac.jp writes:

 You can use nanmean (from scipy.stats):
 

I rejoiced when I saw this answer, because it looks like a function I can just
drop in and it works. Unfortunately, nanmedian seems to be quite a bit slower
than just using lists (ignoring nan values from my experiments) and a home-brew
implementation of median. I was mostly using numpy for speed...

I would like to try the masked array approach, but the Ubuntu packages for scipy
and matplotlib depend on numpy. Does anybody know whether I can naively do sudo
python setup.py install on a more modern numpy without disturbing scipy and
matplotlib, or do I need to uninstall all three packages and install them
manually from source?

On my 64 bit machine, the Ubuntu numpy package is even more out of date:

$ dpkg -l | grep numpy
ii  python-numpy   1:1.0.4-6ubuntu3 

Does anybody know why this is? I might be willing to help bring the repository
up to date, if anybody can give me pointers on how to do this.

Peter

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Pierre GM
On Friday 19 September 2008 05:51:55 Peter Saffrey wrote:

 I would like to try the masked array approach, but the Ubuntu packages for
 scipy and matplotlib depend on numpy. Does anybody know whether I can
 naively do sudo python setup.py install on a more modern numpy without
 disturbing scipy and matplotlib, or do I need to uninstall all three
 packages and install them manually from source?

I think there were some changes on the C side of numpy between 1.0 and 1.1, 
you may have to recompile scipy and matplotlib from sources. What versions 
are you using for those 2 packages ?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Peter Saffrey
David Cournapeau david at ar.media.kyoto-u.ac.jp writes:

 It may be that nanmedian is slow. But I would sincerly be surprised if
 it were slower than python list, except for some pathological cases, or
 maybe a bug in nanmedian. What do your data look like ? (size, number of
 nan, etc...)
 

I've posted my test code below, which gives me the results:

$ ./arrayspeed3.py
list build time: 0.01
list median time: 0.01
array nanmedian time: 0.36

I must have done something wrong to hobble nanmedian in this way... I'm quite
new to numpy, so feel free to point out any obviously egregious errors.

Peter

===

from numpy import array, nan, inf
from pylab import rand
from time import clock
from scipy.stats.stats import nanmedian

import pdb
_pdb = pdb.Pdb()
breakpoint = _pdb.set_trace

def my_median(vallist):
num_vals = len(vallist)
vallist.sort()
if num_vals % 2 == 1: # odd
index = (num_vals - 1) / 2
return vallist[index]
else: # even
index = num_vals / 2
return (vallist[index] + vallist[index - 1]) / 2

numtests = 100
testsize = 100
pointlen = 3

t0 = clock()
natests = rand(numtests,testsize,pointlen)
# have to start with inf because list.remove(nan) doesn't remove nan
natests[natests  0.9] = inf
tests = natests.tolist()
natests[natests==inf] = nan
for test in tests:
for point in test:
if inf in point:
point.remove(inf)
t1 = clock()
print list build time:, t1-t0


t0 = clock()
allmedians = []
for test in tests:
medians = [ my_median(x) for x in test ]
allmedians.append(medians)
t1 = clock()
print list median time:, t1-t0

t0 = clock()
namedians = []
for natest in natests:
thismed = nanmedian(natest, axis=1)
namedians.append(thismed)
t1 = clock()
print array nanmedian time:, t1-t0



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Peter Saffrey
Pierre GM pgmdevlist at gmail.com writes:

 I think there were some changes on the C side of numpy between 1.0 and 1.1, 
 you may have to recompile scipy and matplotlib from sources. What versions 
 are you using for those 2 packages ?
 

$ dpkg -l | grep scipy
ii  python-scipy   0.6.0-8ubuntu1  
scientific tools for Python

$ dpkg -l | grep matplotlib
ii  python-matplotlib  0.91.2-0ubuntu1 
Python based plotting system in a style simi
ii  python-matplotlib-data 0.91.2-0ubuntu1 
Python based plotting system (data package)
ii  python-matplotlib-doc  0.91.2-0ubuntu1 
Python based plotting system (documentation 

Peter

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread David Cournapeau
Peter Saffrey wrote:
 Pierre GM pgmdevlist at gmail.com writes:

 I think there were some changes on the C side of numpy between 1.0 and 1.1, 
 you may have to recompile scipy and matplotlib from sources. What versions 
 are you using for those 2 packages ?


 $ dpkg -l | grep scipy
 ii  python-scipy   0.6.0-8ubuntu1 
  
 scientific tools for Python

 $ dpkg -l | grep matplotlib
 ii  python-matplotlib  0.91.2-0ubuntu1
  
 Python based plotting system in a style simi
 ii  python-matplotlib-data 0.91.2-0ubuntu1
  
 Python based plotting system (data package)
 ii  python-matplotlib-doc  0.91.2-0ubuntu1
  
 Python based plotting system (documentation 

If you build numpy from sources, please don't install it into /usr ! It
will more than likely break everything which depends on numpy, as well
as your debian installation (because you will overwrite packages handled
by dpkg). You should really install in a local directory, outside /usr.

You will have to install scipy and matplotlib in any case, too.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread David Cournapeau
Peter Saffrey wrote:

 I've posted my test code below, which gives me the results:

 $ ./arrayspeed3.py
 list build time: 0.01
 list median time: 0.01
 array nanmedian time: 0.36

 I must have done something wrong to hobble nanmedian in this way... I'm quite
 new to numpy, so feel free to point out any obviously egregious errors.

Ok: it is pathological, and can be done better :)

First:

 for natest in natests:
   thismed = nanmedian(natest, axis=1)
   namedians.append(thismed)

^^^ Here, you are doing nanmedian on a direction with 3 elements: this
will be slow in numpy, because numpy involves some relatively heavy
machinery to run on arrays. The machinery pays off for 'big' arrays, but
for really small arrays like here, list can (and often are) be faster.

Still, it is indeed really slow for your case; when I fixed nanmean and
co, I did not know much about numpy, I just wanted them to give the
right answer :) I think this can be made faster, specially for your case
(where the axis along which the median is computed is really small).

I opened a bug:

http://scipy.org/scipy/scipy/ticket/740

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] profiling line by line

2008-09-19 Thread Robert Cimrman
Robert Kern wrote:
 Ah, found it. T_LONGLONG is a #define from structmember.h which is
 used to describe the types of attributes. Apparently, this was not
 added until Python 2.5. That particular member didn't actually need to
 be long long, so I've fixed that.

Great, I will try it after it appears on the web page.

Thank you,
r.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Stéfan van der Walt
2008/9/19 David Cournapeau [EMAIL PROTECTED]:
 But cannot this be fixed at the python level of the max function ? I

Why shouldn't we have nanmin-like behaviour for the C min itself?
I'd rather have a specialised function to deal with the rare kinds of
datasets where NaNs are guaranteed never to occur.

 But on my numpy, it looks like nan breaks min/max, they are not ignored:

Yes, that's the problem.

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] which one is best?

2008-09-19 Thread Stéfan van der Walt
2008/9/19 mark [EMAIL PROTECTED]:
 I need to multiply items in a list and need a list back. Which one of
 the four options is best (I thought in Python there was only one way
 to do something???)

With the emphasis on preferably and obvious :)

There should be one-- and preferably only one --obvious way to do it.

The modern idiom is the list comprehension, rather than the for-loop.
Of those options,
I personally prefer using zip.

 [ x * y for x,y in zip(a,b) ]  # method 4
 [10, 40, 90, 160]

If you have very large arrays, you can also consider

(np.array(x) * np.array(y)).tolist()

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] which one is best?

2008-09-19 Thread Arnar Flatberg
On Fri, Sep 19, 2008 at 3:09 PM, Stéfan van der Walt [EMAIL PROTECTED]wrote:

 2008/9/19 mark [EMAIL PROTECTED]:
  I need to multiply items in a list and need a list back. Which one of
  the four options is best (I thought in Python there was only one way
  to do something???)

 With the emphasis on preferably and obvious :)

 There should be one-- and preferably only one --obvious way to do it.

 The modern idiom is the list comprehension, rather than the for-loop.
 Of those options,
 I personally prefer using zip.

  [ x * y for x,y in zip(a,b) ]  # method 4
  [10, 40, 90, 160]

 If you have very large arrays, you can also consider

 (np.array(x) * np.array(y)).tolist()

 Cheers
 Stéfan
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


I think
[x*y for x in a for y in b]
feels pythonic, however it has a surprisingly lousy performance.

In [30]: %timeit [ x * y for x,y in zip(a,b) ]
10 loops, best of 3: 3.96 µs per loop

In [31]: %timeit [ i*j for i in a for j in b ]
10 loops, best of 3: 6.53 µs per loop

In [32]: a = range(100)

In [33]: b = range(100)

In [34]: %timeit [ x * y for x,y in zip(a,b) ]
1 loops, best of 3: 51.9 µs per loop

In [35]: %timeit [ i*j for i in a for j in b ]
100 loops, best of 3: 2.78 ms per loop

Arnar
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] which one is best?

2008-09-19 Thread Arnar Flatberg
On Fri, Sep 19, 2008 at 4:09 PM, lorenzo [EMAIL PROTECTED] wrote:



 On Fri, Sep 19, 2008 at 2:50 PM, Arnar Flatberg [EMAIL PROTECTED]wrote:



 I think
 [x*y for x in a for y in b]
 feels pythonic, however it has a surprisingly lousy performance.


 This returns a len(x)*len(y) long list, which is not what you want.


My bad, Its friday afternoon, I'll go home now :-)

Arnar
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] which one is best?

2008-09-19 Thread David M. Kaplan
Hi Arnar,

Your two commands below aren't doing the same thing - one is doing
a[i]*b[i] and the other is doing a[i]*b[j] for all i and j.  As the
second is harder, it takes longer.

Cheers,
David

On Fri, 2008-09-19 at 09:08 -0500, [EMAIL PROTECTED]
wrote:
 I think
 [x*y for x in a for y in b]
 feels pythonic, however it has a surprisingly lousy performance.
 
 In [30]: %timeit [ x * y for x,y in zip(a,b) ]
 10 loops, best of 3: 3.96 ?s per loop
 
 In [31]: %timeit [ i*j for i in a for j in b ]
 10 loops, best of 3: 6.53 ?s per loop
 
 In [32]: a = range(100)
 
 In [33]: b = range(100)
 
 In [34]: %timeit [ x * y for x,y in zip(a,b) ]
 1 loops, best of 3: 51.9 ?s per loop
 
 In [35]: %timeit [ i*j for i in a for j in b ]
 100 loops, best of 3: 2.78 ms per loop
 
 Arnar
-- 
**
David M. Kaplan
Charge de Recherche 1
Institut de Recherche pour le Developpement
Centre de Recherche Halieutique Mediterraneenne et Tropicale
av. Jean Monnet
B.P. 171
34203 Sete cedex
France

Phone: +33 (0)4 99 57 32 27
Fax: +33 (0)4 99 57 32 95
http://www.ur097.ird.fr/team/dkaplan/index.html
**


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Generating random samples without repeats

2008-09-19 Thread Paul Moore
Rick White rlw at stsci.edu writes:

 It seems like numpy.random.permutation is pretty suboptimal in its  
 speed.  Here's a Python 1-liner that does the same thing (I think)  
 but is a lot faster:
 
 a = 1+numpy.random.rand(M).argsort()[0:N-1]
 
 This still has the the problem that it generates a size N array to  
 start with.  But at least it is fast compared with permutation:

Interesting. For my generation of a million samples, this takes about 46 sec 
vs the original 75. That's a 35% increase in speed. As you mention, it doesn't 
help memory, which still peaks at around 450M.

Interestingly, I was reminded of J (http://www.jsoftware.com/), an APL 
derivative, which does this in a blistering 1.3 seconds, with no detectable 
memory overhead. Of course, being descended from APL, the code to do this is 
pretty obscure:

5 ? (100 $ 52)

(Here, ? is the deal operator, and $ reshapes an array - so it's deal 5 
from each item in a 100-long array of 52's. Everything is a primitive 
here, so it's not hard to see why it's fast).

A Python/Numpy - J bridge might be a fun exercise...

Paul.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Alan G Isaac
On 9/19/2008 11:09 AM Stefan Van der Walt apparently wrote:
 Masked arrays.  Using NaN's for missing values is dangerous.  You may  
 do some operation, which generates invalid results, and then you have  
 a mixed bag of missing and invalid values.

That rather evades my full question, I think?

In the case I mentioned,
I am filling an array inside a loop,
and the possible fill values are not constrained.
So I cannot mask based on value,
and I cannot mask based on position
(at least until after the computations are complete).

It seems to me that there are pragmatic reasons
why people work with NaNs for missing values,
that perhaps shd not be dismissed so quickly.
But maybe I am overlooking a simple solution.

Alan

PS I confess I do not understand NaNs.
E.g., why could there not be a value np.miss
that would be a NaN that represents a missing value?
Are all NaNs already assigned standard meanings?

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Pierre GM
On Friday 19 September 2008 11:36:17 Alan G Isaac wrote:
 On 9/19/2008 11:09 AM Stefan Van der Walt apparently wrote:
  Masked arrays.  Using NaN's for missing values is dangerous.  You may
  do some operation, which generates invalid results, and then you have
  a mixed bag of missing and invalid values.

 That rather evades my full question, I think?

 In the case I mentioned,
 I am filling an array inside a loop,
 and the possible fill values are not constrained.
 So I cannot mask based on value,
 and I cannot mask based on position
 (at least until after the computations are complete).

No, but you may do the opposite: just start with an array completely masked, 
and unmasked it as you need:
Say, you have  4x5 array, and want to unmask (0,0), (1,2), (3,4)
 a = ma.empty((4,5), dtype=float)
 a.mask=True
 a[0,0] = 0
 a[1,2]=1
 a[3,4]=3
a 
masked_array(data =
 [[0.0 -- -- -- --]
 [-- -- 1.0 -- --]
 [-- -- -- -- --]
 [-- -- -- -- 3.0]],
  mask =
 [[False  True  True  True  True]
 [ True  True False  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True False]],
  fill_value=1e+20)
a.max(axis=0)
masked_array(data = [0.0 -- 1.0 -- 3.0],
  mask = [False  True False  True False],
  fill_value=1e+20)


 It seems to me that there are pragmatic reasons
 why people work with NaNs for missing values,
 that perhaps shd not be dismissed so quickly.
 But maybe I am overlooking a simple solution.

nansomething solutions tend to be considerably faster, that might be one 
reason. A lack of visibility of numpy.ma could be a second. In any case, I 
can't but agree with other posters: a NaN in an array usually means something 
went astray.

 PS I confess I do not understand NaNs.
 E.g., why could there not be a value np.miss
 that would be a NaN that represents a missing value?

You can't compare NaNs to anything. How do you know this np.miss is a masked 
value, when np.sqrt(-1.) is NaN ?




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] New patch for new mgrid / ogrid functionality

2008-09-19 Thread David M. Kaplan
Hi all,

Attached is a newer version of my patch that adds new mgrid / ogrid
functionality for working with arrays in addition to slices.  In fact, I
have attached two versions of the patch: index_tricks.patch, that is
just the last version of the patch I sent, and index_tricks.new.patch,
that has been modified so that it is backward compatible.  In the last
version, mgrid calls where all arguments are slices will return an
array, otherwise it returns a list as ogrid does.  This is the only
reasonable way to have the new functionality and maintain backwards
compatibility.  

My 2 cents - I personally think the version that always returns a list
will ultimately be more transparent and cause fewer problems than the
newer version.  In either case, the plan should be to eventually have it
always return a list as that is the only fully consistent option, the
question is just when that switch should be made and by who.  If it is
done at the next major release, someone else will have to remember to ax
the additional code and correct the documentation

Other changes that would be nice: add a __call__ method, create an
instance called ndgrid for matlab compatibility, and have meshgrid be
reimplimented using an nd_grid instance.

Cheers,
David 


-- 
**
David M. Kaplan
Charge de Recherche 1
Institut de Recherche pour le Developpement
Centre de Recherche Halieutique Mediterraneenne et Tropicale
av. Jean Monnet
B.P. 171
34203 Sete cedex
France

Phone: +33 (0)4 99 57 32 27
Fax: +33 (0)4 99 57 32 95
http://www.ur097.ird.fr/team/dkaplan/index.html
**

Index: numpy/lib/tests/test_index_tricks.py
===
--- numpy/lib/tests/test_index_tricks.py	(revision 5834)
+++ numpy/lib/tests/test_index_tricks.py	(working copy)
@@ -24,15 +24,21 @@
 def test_nd(self):
 c = mgrid[-1:1:10j,-2:2:10j]
 d = mgrid[-1:1:0.1,-2:2:0.2]
-assert(c.shape == (2,10,10))
-assert(d.shape == (2,20,20))
+assert(array(c).shape == (2,10,10))
+assert(array(d).shape == (2,20,20))
 assert_array_equal(c[0][0,:],-ones(10,'d'))
 assert_array_equal(c[1][:,0],-2*ones(10,'d'))
 assert_array_almost_equal(c[0][-1,:],ones(10,'d'),11)
 assert_array_almost_equal(c[1][:,-1],2*ones(10,'d'),11)
-assert_array_almost_equal(d[0,1,:]-d[0,0,:], 0.1*ones(20,'d'),11)
-assert_array_almost_equal(d[1,:,1]-d[1,:,0], 0.2*ones(20,'d'),11)
+assert_array_almost_equal(d[0][1,:]-d[0][0,:], 0.1*ones(20,'d'),11)
+assert_array_almost_equal(d[1][:,1]-d[1][:,0], 0.2*ones(20,'d'),11)
 
+def test_listargs(self):
+e = mgrid[ :2, ['a', 'b', 'c'], [1,5,50,500] ]
+assert( array(e).shape == (3,2,3,4) )
+assert_array_equal( e[0][:,1,1].ravel(), r_[:2] )
+assert_array_equal( e[1][1,:,1].ravel(), array(['a','b','c']) )
+assert_array_equal( e[2][1,1,:].ravel(), array([1,5,50,500]) )
 
 class TestConcatenator(TestCase):
 def test_1d(self):
Index: numpy/lib/index_tricks.py
===
--- numpy/lib/index_tricks.py	(revision 5834)
+++ numpy/lib/index_tricks.py	(working copy)
@@ -11,7 +11,7 @@
 from numpy.core.numerictypes import find_common_type
 import math
 
-import function_base
+import function_base, shape_base
 import numpy.core.defmatrix as matrix
 makemat = matrix.matrix
 
@@ -118,14 +118,28 @@
 number of points to create between the start and stop values, where
 the stop value **is inclusive**.
 
+One can also use lists or arrays as indexing arguments, in which case
+these will be meshed out themselves instead of generating matrices from
+the slice arguments.  See examples below.
+
 If instantiated with an argument of sparse=True, the mesh-grid is
 open (or not fleshed out) so that only one-dimension of each returned
 argument is greater than 1
 
+***IMPORTANT NOTE*** Indexing an nd_grid instance with
+sparse=False will currently return an array N+1 axis array if all
+arguments are slices (i.e., something like -4:5:20j or :20:0.5)
+and there are N arguments.  However, if any of the arguments is
+not a slice (e.g., is an array or list), then the return is a list
+of arrays.  This is to maintain backwards compatibility.  However,
+this functionality will disappear during the next major release
+(after today's date: 2008-09-19) so that returns will always be
+lists in the future.
+
 Examples
 
  mgrid = np.lib.index_tricks.nd_grid()
- mgrid[0:5,0:5]
+ mgrid[0:5,0:5] # NOTE currently returns array, but will become a list
 array([[[0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1],
 [2, 2, 2, 2, 2],
@@ -139,6 +153,27 @@
 [0, 1, 2, 3, 4]]])
  mgrid[-1:1:5j]
 array([-1. , -0.5,  0. ,  0.5,  1. ])
+ mgrid[:2,[1,5,50],['a','b']] # Example 

Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Peter Saffrey
Alan G Isaac aisaac at american.edu writes:

 Recently I needed to fill a 2d array with values
 from computations that could go wrong.
 I created an array of NaN and then replaced
 the elements where the computation produced
 a useful value.  I then applied ``nanmax``,
 to get the maximum of the useful values.
 

I'm glad you posted this, because this is exactly the method I'm using. How do
you detect whether there are still any missing spots in your array? nan has some
rather unfortunate properties:

 from numpy import *
 a = array([1,2,nan])
 nan in a
False
 nan == nan
False

Should I take the earlier advice and switch to masked arrays?

Peter

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Pierre GM
On Friday 19 September 2008 12:02:08 Peter Saffrey wrote:
 Alan G Isaac aisaac at american.edu writes:
  Recently I needed to fill a 2d array with values
  from computations that could go wrong.

 Should I take the earlier advice and switch to masked arrays?

 Peter

Yes. As you've noticed, you can't compare nans (after all, nans are not 
numbers...), which limits their use.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Alan G Isaac
On 9/19/2008 11:46 AM Pierre GM apparently wrote:
 You can't compare NaNs to anything. How do you know this np.miss is a masked 
 value, when np.sqrt(-1.) is NaN ?

I thought you could use ``is``.
E.g.,
  np.nan == np.nan
False
  np.nan is np.nan
True

Alan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Alan G Isaac
On 9/19/2008 11:46 AM Pierre GM apparently wrote:
 No, but you may do the opposite: just start with an array completely masked, 
 and unmasked it as you need:

Very useful example.
I did not understand this possibility.
Alan


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Charles R Harris
On Fri, Sep 19, 2008 at 1:11 AM, David Cournapeau 
[EMAIL PROTECTED] wrote:

 Anne Archibald wrote:
 
  Well, for example, you might ask that all the non-nan elements be in
  order, even if you don't specify where the nan goes.


 Ah, there are two problems, then:
- sort
- how median use sort.

 For sort, I don't know how sort speed would be influenced by treating
 nan. In a way, calling sort with nan inside is a user error (if you take
 the POV nan are not comparable), but nan are used for all kind of
 purpose,


used - misused. Using nan to flag anything but a numerical error is going
to cause problems. It wouldn't be too hard to implement nansorts, they just
need a real comparison function so that all the nans end up at on end or the
other. I don't know that that would make medians any easier, though. Are the
nans part of the data set? A nansearchsorted would probably be needed also.
If this functionality is added, the best way might be something like
kind='nanquicksort'.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Alan G Isaac
On 9/19/2008 12:02 PM Peter Saffrey apparently wrote:
  a = array([1,2,nan])
  nan in a
 False

Huh.  I'm inclined to call this a bug,
since normal Python behavior is that
``in`` should check for identity::

 xl = [1.,np.nan]
 np.nan in xl
True

Alan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Alan G Isaac
Might someone explain this to me?

  x = [1.,np.nan]
  np.nan in x
 True
  np.nan in np.array(x)
 False
  np.nan in np.array(x).tolist()
 False
  np.nan is float(np.nan)
 True

Thank you,
Alan Isaac


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Lisandro Dalcin
You, know, float are inmutable objects, and then 'float(f)' just
returns a new reference to 'f' is 'f' is (exactly) of type 'float'

In [1]: f = 1.234
In [2]: f is float(f)
Out[2]: True

I do not remember right now the implementations of comparisons in core
Python, but I believe the 'in' operator is testing first for object
identity, and then 'np.nan in [np.nan]' then returns True, and then
the fact that 'np.nan==np.nan' returns False is never considered.

On Fri, Sep 19, 2008 at 1:59 PM, Alan G Isaac [EMAIL PROTECTED] wrote:
 Might someone explain this to me?

  x = [1.,np.nan]
  np.nan in x
 True
  np.nan in np.array(x)
 False
  np.nan in np.array(x).tolist()
 False
  np.nan is float(np.nan)
 True

 Thank you,
 Alan Isaac


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion




-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Alan G Isaac
  On Fri, Sep 19, 2008 at 1:59 PM, Alan G Isaac [EMAIL PROTECTED] wrote:
  Might someone explain this to me?
 
   x = [1.,np.nan]
   np.nan in x
  True
   np.nan in np.array(x)
  False
   np.nan in np.array(x).tolist()
  False
   np.nan is float(np.nan)
  True


On 9/19/2008 1:15 PM Lisandro Dalcin apparently wrote:
 I do not remember right now the implementations of comparisons in core
 Python, but I believe the 'in' operator is testing first for object
 identity, and then 'np.nan in [np.nan]' then returns True, and then
 the fact that 'np.nan==np.nan' returns False is never considered.


Sure.  All evaluations to True make sense to me.
I am asking about the ones that evaluate to False.
Thanks,
Alan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Christopher Barker
Alan G Isaac wrote:
 Might someone explain this to me?
 
   x = [1.,np.nan]
   np.nan in x
  True
   np.nan in np.array(x)
  False
   np.nan in np.array(x).tolist()
  False
   np.nan is float(np.nan)
  True

not quite -- but I do know that is is tricky -- it tests object 
identity. I think it actually compares the pointer to the object. What 
makes this tricky is that python interns some objects, so that when you 
create two that have the same value, they may actually be the same object:

  s1 = this
  s2 = this
  s1 is s2

True

So short strings are interned, as are small integers and maybe floats? 
However, longer strings are not:

  s1 = A much longer string
  s2 = A much longer string
  s1 is  s2
False

I don't know the interning rules, but I do know that you should never 
count on them, then may not be consistent between implementations, or 
even different runs.

NaN is a floating point number with a specific value. np.nan is 
particular instance of that, but not all nans will be the same instance:

  np.array(0.0) / 0
nan
  np.array(0.0) / 0 is np.nan
False

So you can't use is to check.

  np.array(0.0) / 0 == np.nan
False

and you can't use ==

The only way to do it reliably is:

  np.isnan(np.array(0.0) / 0)
True


So, the short answer is that the only way to deal with NaNs properly is 
to have NaN-aware functions, like nanmin() and friends.


Regardless of how man nan* functions get written, or what exactly they 
do, we really do need to make sure that no numpy function gives bogus 
results in the presence of NaNs, which doesn't appear to be the case now.

I also think I see a consensus building that non-nan-specific numpy 
functions should either preserve NaN's or raise exceptions, rather than 
ignoring them.

-Chris








-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Robert Kern
On Fri, Sep 19, 2008 at 11:34, Alan G Isaac [EMAIL PROTECTED] wrote:
 On 9/19/2008 12:02 PM Peter Saffrey apparently wrote:
  a = array([1,2,nan])
  nan in a
 False

 Huh.  I'm inclined to call this a bug,
 since normal Python behavior is that
 ``in`` should check for identity::

 xl = [1.,np.nan]
 np.nan in xl
True

Except that there are no objects inside non-object arrays. There is
nothing with identity inside the arrays to compare against.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Understanding mgrid

2008-09-19 Thread Robert Kern
On Fri, Sep 19, 2008 at 12:59, Brad Malone [EMAIL PROTECTED] wrote:
 Hi, I was wondering if someone could englighten me on what the geometrical
 significance of numpy.mgrid is. I can play around with it and see trends in
 the sizes and number of arrays, but why does it give the output that it
 does? Looking at the example shown below, why does it return a matrix and
 its transpose?

Well, it returns one array. In your example, there is a (2,5,5) array,
which is basically the concatenation of two arrays which *happen* to
be transposes of each other. If you had chosen differently sized axes,
they wouldn't be transposes.

In [14]: mgrid[0:2,0:3]
Out[14]:
array([[[0, 0, 0],
[1, 1, 1]],

   [[0, 1, 2],
[0, 1, 2]]])

 Is this a representation of some geometrical grid?

It can be. There are other uses for it.

 Does the
 output imply some sort of connectivity?

It describes an orthogonal grid.

 If so, how do you see it?

   mgrid[0:5,0:5]
array([[[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],

[3, 3, 3, 3, 3],

[4, 4, 4, 4, 4]],
BLANKLINE
   [[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],


[0, 1, 2, 3, 4]]])



 I have a cubic grid in 3D space that is spanned by 3 orthogonal vectors. Am
 I able to generate this equivalent grid with mgrid somehow? If so, how is it
 done? I am using mayavi and I need to be able to construct some arrays in
 the same way that mgrid would have constructed them, so this is why I ask.

I would probably use indices() instead of mgrid if you are just given
the x, y, and z vectors. indices([n,m,k]) is equivalent to
mgrid[0:n,0:m,0:k]:


In [19]: x = linspace(0, 1, 3)

In [20]: x
Out[20]: array([ 0. ,  0.5,  1. ])

In [21]: y = linspace(1, 2.5, 4)

In [22]: y
Out[22]: array([ 1. ,  1.5,  2. ,  2.5])

In [23]: z = linspace(3, 5, 5)

In [24]: z
Out[24]: array([ 3. ,  3.5,  4. ,  4.5,  5. ])

In [25]: ix, iy, iz = indices([len(x), len(y), len(z)])

In [26]: x[ix]
Out[26]:
array([[[ 0. ,  0. ,  0. ,  0. ,  0. ],
[ 0. ,  0. ,  0. ,  0. ,  0. ],
[ 0. ,  0. ,  0. ,  0. ,  0. ],
[ 0. ,  0. ,  0. ,  0. ,  0. ]],

   [[ 0.5,  0.5,  0.5,  0.5,  0.5],
[ 0.5,  0.5,  0.5,  0.5,  0.5],
[ 0.5,  0.5,  0.5,  0.5,  0.5],
[ 0.5,  0.5,  0.5,  0.5,  0.5]],

   [[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1. ,  1. ,  1. ,  1. ,  1. ]]])

In [27]: y[iy]
Out[27]:
array([[[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1.5,  1.5,  1.5,  1.5,  1.5],
[ 2. ,  2. ,  2. ,  2. ,  2. ],
[ 2.5,  2.5,  2.5,  2.5,  2.5]],

   [[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1.5,  1.5,  1.5,  1.5,  1.5],
[ 2. ,  2. ,  2. ,  2. ,  2. ],
[ 2.5,  2.5,  2.5,  2.5,  2.5]],

   [[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1.5,  1.5,  1.5,  1.5,  1.5],
[ 2. ,  2. ,  2. ,  2. ,  2. ],
[ 2.5,  2.5,  2.5,  2.5,  2.5]]])

In [28]: z[iz]
Out[28]:
array([[[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ]],

   [[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ]],

   [[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ]]])

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Andrew Dalke
On Sep 19, 2008, at 7:52 PM, Christopher Barker wrote:
 I don't know the interning rules, but I do know that you should never
 count on them, then may not be consistent between implementations, or
 even different runs.

There are a few things that Python-the-language guarantees are singleton
objects which can be compared correctly with is.  Those are:

   True, False, None

Otherwise there is no guarantee that two objects of a given type
which are equal in some sense of the word, are actually the same  
object.

As Chris pointed out, the C implementation does (as a performance
matter) have additional singletons.  For example, the integers between
-5 to 257 are also singletons


#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS   257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS   5
#endif
/* References to small integers are saved in this array so that they
can be shared.
The integers that are saved are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];


This used to be -1 to 100 but some testing showed it was better
to extend the range somewhat.

There was also some performance testing about special-casing 0.0
and +/- 1.0 but I think it showed the results weren't worthwhile.


So, back to NaN.  There's no guarantee NaN is a singleton
object, so testing with is almost certainly is wrong.
In fact, at the bit-level there are multiple NaNs.  A
NaN (according to Wikipedia) fits the following bit pattern.

   NaN: xaxx. x = undefined. If a = 1,

   it is a quiet NaN, otherwise it is a signalling NaN.


So  
and 1110
and 1100

are all NaN values.



Andrew
[EMAIL PROTECTED]


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] profiling line by line

2008-09-19 Thread Robert Kern
On Fri, Sep 19, 2008 at 07:00, Robert Cimrman [EMAIL PROTECTED] wrote:
 Robert Kern wrote:
 Ah, found it. T_LONGLONG is a #define from structmember.h which is
 used to describe the types of attributes. Apparently, this was not
 added until Python 2.5. That particular member didn't actually need to
 be long long, so I've fixed that.

 Great, I will try it after it appears on the web page.

Oops! It's now pushed.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Understanding mgrid

2008-09-19 Thread Brad Malone
Thanks for the response Robert.

So, at least in this case, the results of mgrid (or indices) only provides
information about the spacing of the grid and not on the absolute value of
the point coordinates?

In your example, is there a way to see within your x[ix], y[iy], and z[iz]
matrices the same collection of points that you would see if you did
something like the following?

points=[]
x=linspace(0,1,3)
y=linspace(1,2.5,4)
z=linspace(3,5,5)
for k in z.tolist():
for j in y.tolist():
 for i in x.tolist():
  point=array([i,j,k])
  points.append(point)

Thanks,
Brad

On Fri, Sep 19, 2008 at 11:22 AM, Robert Kern [EMAIL PROTECTED] wrote:

 On Fri, Sep 19, 2008 at 12:59, Brad Malone [EMAIL PROTECTED] wrote:
  Hi, I was wondering if someone could englighten me on what the
 geometrical
  significance of numpy.mgrid is. I can play around with it and see trends
 in
  the sizes and number of arrays, but why does it give the output that it
  does? Looking at the example shown below, why does it return a matrix and
  its transpose?

 Well, it returns one array. In your example, there is a (2,5,5) array,
 which is basically the concatenation of two arrays which *happen* to
 be transposes of each other. If you had chosen differently sized axes,
 they wouldn't be transposes.

 In [14]: mgrid[0:2,0:3]
 Out[14]:
 array([[[0, 0, 0],
[1, 1, 1]],

   [[0, 1, 2],
[0, 1, 2]]])

  Is this a representation of some geometrical grid?

 It can be. There are other uses for it.

  Does the
  output imply some sort of connectivity?

 It describes an orthogonal grid.

  If so, how do you see it?
 
mgrid[0:5,0:5]
 array([[[0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1],
 [2, 2, 2, 2, 2],
 
 [3, 3, 3, 3, 3],
 
 [4, 4, 4, 4, 4]],
 BLANKLINE
[[0, 1, 2, 3, 4],
 
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 
 
 [0, 1, 2, 3, 4]]])
 
 
 
  I have a cubic grid in 3D space that is spanned by 3 orthogonal vectors.
 Am
  I able to generate this equivalent grid with mgrid somehow? If so, how is
 it
  done? I am using mayavi and I need to be able to construct some arrays in
  the same way that mgrid would have constructed them, so this is why I
 ask.

 I would probably use indices() instead of mgrid if you are just given
 the x, y, and z vectors. indices([n,m,k]) is equivalent to
 mgrid[0:n,0:m,0:k]:


 In [19]: x = linspace(0, 1, 3)

 In [20]: x
 Out[20]: array([ 0. ,  0.5,  1. ])

 In [21]: y = linspace(1, 2.5, 4)

 In [22]: y
 Out[22]: array([ 1. ,  1.5,  2. ,  2.5])

 In [23]: z = linspace(3, 5, 5)

 In [24]: z
 Out[24]: array([ 3. ,  3.5,  4. ,  4.5,  5. ])

 In [25]: ix, iy, iz = indices([len(x), len(y), len(z)])

 In [26]: x[ix]
 Out[26]:
 array([[[ 0. ,  0. ,  0. ,  0. ,  0. ],
[ 0. ,  0. ,  0. ,  0. ,  0. ],
[ 0. ,  0. ,  0. ,  0. ,  0. ],
[ 0. ,  0. ,  0. ,  0. ,  0. ]],

   [[ 0.5,  0.5,  0.5,  0.5,  0.5],
[ 0.5,  0.5,  0.5,  0.5,  0.5],
[ 0.5,  0.5,  0.5,  0.5,  0.5],
[ 0.5,  0.5,  0.5,  0.5,  0.5]],

   [[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1. ,  1. ,  1. ,  1. ,  1. ]]])

 In [27]: y[iy]
 Out[27]:
 array([[[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1.5,  1.5,  1.5,  1.5,  1.5],
[ 2. ,  2. ,  2. ,  2. ,  2. ],
[ 2.5,  2.5,  2.5,  2.5,  2.5]],

   [[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1.5,  1.5,  1.5,  1.5,  1.5],
[ 2. ,  2. ,  2. ,  2. ,  2. ],
[ 2.5,  2.5,  2.5,  2.5,  2.5]],

   [[ 1. ,  1. ,  1. ,  1. ,  1. ],
[ 1.5,  1.5,  1.5,  1.5,  1.5],
[ 2. ,  2. ,  2. ,  2. ,  2. ],
[ 2.5,  2.5,  2.5,  2.5,  2.5]]])

 In [28]: z[iz]
 Out[28]:
 array([[[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ]],

   [[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ]],

   [[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ],
[ 3. ,  3.5,  4. ,  4.5,  5. ]]])

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Understanding mgrid

2008-09-19 Thread Robert Kern
On Fri, Sep 19, 2008 at 14:13, Brad Malone [EMAIL PROTECTED] wrote:
 Thanks for the response Robert.

 So, at least in this case, the results of mgrid (or indices) only provides
 information about the spacing of the grid and not on the absolute value of
 the point coordinates?

No, they give indices. You can use those indices in a variety of ways.
In my example, I used them to index into vectors which gave the
absolute positions of the grid lines. That turned into bricks giving
the absolute coordinates for each 3D grid point.

 In your example, is there a way to see within your x[ix], y[iy], and z[iz]
 matrices the same collection of points that you would see if you did
 something like the following?

 points=[]
 x=linspace(0,1,3)
 y=linspace(1,2.5,4)
 z=linspace(3,5,5)
 for k in z.tolist():
 for j in y.tolist():
  for i in x.tolist():
   point=array([i,j,k])
   points.append(point)

points = column_stack([x[ix].flat, y[iy].flat, z[iz].flat])

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Christian Heimes
Andrew Dalke wrote:
 There are a few things that Python-the-language guarantees are singleton
 objects which can be compared correctly with is.  Those are:
 
True, False, None

The empty tuple () and all interned strings are also guaranteed to be 
singletons. String interning is used to optimize code on C level. It's 
much faster to compare memory addresses than objects. All strings can be 
interned through the builtin function intern like s = intern(s). For 
Python 3.x the function was moved in the the sys module and changed to 
support str which are PyUnicode objects.


 So, back to NaN.  There's no guarantee NaN is a singleton
 object, so testing with is almost certainly is wrong.
 In fact, at the bit-level there are multiple NaNs.  A
 NaN (according to Wikipedia) fits the following bit pattern.
 
NaN: xaxx. x = undefined. If a = 1,
 
it is a quiet NaN, otherwise it is a signalling NaN.

The definition is correct for all doubles on IEEE 754 aware platforms. 
Python's float type uses the double C type. Almost all modern computers 
have either hardware IEEE 754 support or software support for embedded 
devices (some mobile phones and PDAs). 
http://en.wikipedia.org/wiki/IEEE_754-1985

The Python core makes no difference between quiet NaNs and signaling 
NaNs. Only errno, input and output values are checked to raise an 
exception. We were discussion the possibility of a NaN singleton during 
our revamp of Python's IEEE 754 and math support for Python 2.6 and 3.0. 
But we decided against it because the extra code and cost wasn't worth 
the risks. Instead I added isnan() and isinf() to the math module.

All checks for NaN, inf and the sign bit of a float must be made through 
the appropriate APIs - either the NumPy API or the new APIs for floats.

Hope to shed some light on things
Christian

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Alan G Isaac
On 9/19/2008 11:46 AM Pierre GM apparently wrote:
 a.mask=True

This is great, but is apparently
new behavior as of NumPy 1.2?
Alan Isaac


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Pierre GM
On Friday 19 September 2008 16:28:34 Alan G Isaac wrote:
 On 9/19/2008 11:46 AM Pierre GM apparently wrote:
  a.mask=True

 This is great, but is apparently
 new behavior as of NumPy 1.2?

I'm not sure, sorry. Another way is 
ma.array(np.empty(yourshape,yourdtype), mask=True)
which should work with earlier versions.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Alan G Isaac
On 9/19/2008 4:54 PM Pierre GM apparently wrote:
 Another way is 
 ma.array(np.empty(yourshape,yourdtype), mask=True)
 which should work with earlier versions.

Seems like ``mask`` would be a natural
keyword for ``ma.empty``?

Thanks,
Alan Isaac

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Pierre GM
On Friday 19 September 2008 17:25:53 Alan G Isaac wrote:
 On 9/19/2008 4:54 PM Pierre GM apparently wrote:
  Another way is
  ma.array(np.empty(yourshape,yourdtype), mask=True)
  which should work with earlier versions.

 Seems like ``mask`` would be a natural
 keyword for ``ma.empty``?

Not a bad idea. I'll plug that in.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] newb question

2008-09-19 Thread paul taney
Hi,

What am I doing wrong here?  The reshape doesnt take.

% cat test1.py
import numpy as np

a = np.uint8([39, 39, 231,  239, 39, 231,  39, 39, 231, 
  39, 39, 231,  239, 39, 231,  39, 39, 231,
  39, 39, 231,  239, 39, 231,  39, 39, 231,
  39, 39, 231,  239, 39, 231,  39, 39, 231,])
a.reshape(3, 4, 3)
print a = %r % (a)
% 
% python test1.py
a = array([ 39, 39, 231, 239, 39, 231, 39, 39, 231,  
39, 39, 231, 239, 39, 231, 39, 39, 231,  
39, 39, 231, 239, 39, 231, 39, 39, 231,  
39, 39, 231, 239, 39, 231, 39, 39, 231], dtype=uint8)



I am expecting:

a = array([[[39, 39, 231],  [239, 39, 231],  [39, 39, 231]], 
   [[39, 39, 231],  [239, 39, 231],  [39, 39, 231]],
   [[39, 39, 231],  [239, 39, 231],  [39, 39, 231]],
   [[39, 39, 231],  [239, 39, 231],  [39, 39, 231]]], \
dtype=np.uint8)


paul



def vanderWalt(a, f):
thanks Stefan
RED, GRN, BLU = 0, 1, 2
bluemask = (a[...,BLU]  f*a[...,GRN])  \
   (a[...,BLU]  f*a[...,RED])
return np.array(bluemask.nonzero()).swapaxes(0,1)


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] newb question

2008-09-19 Thread Eric Firing
paul taney wrote:
 Hi,
 
 What am I doing wrong here?  The reshape doesnt take.

Reshape does not act in place, it returns either a new view or a copy.

To reshape in place, you can assign to the shape attribute:

In [13]:a = np.arange(10)

In [14]:a.shape = (2,5)

In [15]:a
Out[15]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])

Eric

 
 % cat test1.py
 import numpy as np
 
 a = np.uint8([39, 39, 231,  239, 39, 231,  39, 39, 231, 
   39, 39, 231,  239, 39, 231,  39, 39, 231,
   39, 39, 231,  239, 39, 231,  39, 39, 231,
   39, 39, 231,  239, 39, 231,  39, 39, 231,])
 a.reshape(3, 4, 3)
 print a = %r % (a)
 % 
 % python test1.py
 a = array([ 39, 39, 231, 239, 39, 231, 39, 39, 231,  
 39, 39, 231, 239, 39, 231, 39, 39, 231,  
 39, 39, 231, 239, 39, 231, 39, 39, 231,  
 39, 39, 231, 239, 39, 231, 39, 39, 231], dtype=uint8)
 
 
 
 I am expecting:
 
 a = array([[[39, 39, 231],  [239, 39, 231],  [39, 39, 231]], 
[[39, 39, 231],  [239, 39, 231],  [39, 39, 231]],
[[39, 39, 231],  [239, 39, 231],  [39, 39, 231]],
[[39, 39, 231],  [239, 39, 231],  [39, 39, 231]]], \
 dtype=np.uint8)
 
 
 paul
 
 
 
 def vanderWalt(a, f):
 thanks Stefan
 RED, GRN, BLU = 0, 1, 2
 bluemask = (a[...,BLU]  f*a[...,GRN])  \
(a[...,BLU]  f*a[...,RED])
 return np.array(bluemask.nonzero()).swapaxes(0,1)
 
 
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] newb question

2008-09-19 Thread Pierre GM
On Friday 19 September 2008 20:47:12 paul taney wrote:
 Hi,

 What am I doing wrong here?  The reshape doesnt take.

help(reshape)
a.reshape(shape, order='C')

Returns an array containing the data of a, but with a new shape.

Refer to `numpy.reshape` for full documentation.

You see that you're not modifying in place.

Instead, you should use 
a.shape = (3,4,3)

Play with the tuple to find what you want -- (4,3,3) seems to meet your 
expectations.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Andrew Dalke
On Sep 19, 2008, at 10:04 PM, Christian Heimes wrote:
 Andrew Dalke wrote:
 There are a few things that Python-the-language guarantees are  
 singleton
 objects which can be compared correctly with is.

 The empty tuple () and all interned strings are also guaranteed to be
 singletons.

Where's the guarantee?  As far as I know it's not part of
Python-the-language, and I thought it was only an implementation
detail of CPython.

tupleobject.c says:

PyTuple_Fini(void)
{
#if PyTuple_MAXSAVESIZE  0
 /* empty tuples are used all over the place and applications  
may
  * rely on the fact that an empty tuple is a singleton. */
 Py_XDECREF(free_list[0]);
 free_list[0] = NULL;

 (void)PyTuple_ClearFreeList();
#endif
}

but that doesn't hold under Jython 2.2a1:


Jython 2.2a1 on java1.4.2_16 (JIT: null)
Type copyright, credits or license for more information.
  () is ()
0
  1 is 1
1



 String interning is used to optimize code on C level. It's
 much faster to compare memory addresses than objects. All strings  
 can be
 interned through the builtin function intern like s = intern(s). For
 Python 3.x the function was moved in the the sys module and changed to
 support str which are PyUnicode objects.

intern being listed in the documentation under
 http://docs.python.org/lib/non-essential-built-in-funcs.html

 2.2 Non-essential Built-in Functions

 There are several built-in functions that are no longer

 essential to learn, know or use in modern Python programming.

 They have been kept here to maintain backwards compatibility

 with programs written for older versions of Python.




Again, I think this is only an aspect of the CPython implementation.



 The Python core makes no difference between quiet NaNs and signaling
 NaNs.

Based on my limited readings just now, it seems that that's the general
consensus:

   http://www.open-std.org/jtc1/sc22/wg14/www/docs/n965.htm
   Standard C only adopted Quiet NaNs. It did not adopt Signaling
   NaNs because it was believed that they are of too limited
   utility for the amount of work required.

   http://www.digitalmars.com/d/archives/digitalmars/D/ 
signaling_NaNs_and_quiet_NaNs_75844.html
   Signaling NaNs have fallen out of favor. No exceptions get raised  
for them.

   http://en.wikipedia.org/wiki/NaN
   There were questions about if signalling NaNs should continue  
to be
   required in the revised standard. In the end it appears they will
   be left in.



 We were discussion the possibility of a NaN singleton during
 our revamp of Python's IEEE 754 and math support for Python 2.6 and  
 3.0.
 But we decided against it because the extra code and cost wasn't worth
 the risks. Instead I added isnan() and isinf() to the math module.

I couldn't find that thread.  What are the advantages of converting
all NaNs to a singleton?  All I can come up with are disadvantages.

BTW, another place to look is the Decimal module

  import decimal
  decimal.Decimal(nan)
Decimal(NaN)
 

Looking at the decimal docs now I see a canonical() method which

The result has the same value as the operand but always
uses a canonical encoding. The definition of canonical
is implementation-defined; if more than one internal
encoding for a given NaN, Infinity, or finite number
is possible then one ‘preferred’ encoding is deemed
canonical. This operation then returns the value using
that preferred encoding.



Andrew
[EMAIL PROTECTED]


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread David Cournapeau
Stéfan van der Walt wrote:

 Why shouldn't we have nanmin-like behaviour for the C min itself?
   

Ah, I was not arguing we should not do it in C, but rather we did not
have to do in C. The current behavior for nan with functions relying on
ordering is broken; if someone prefer fixing it in C, great. But I was
guessing more people could fix it using python, that's all.

I opened a bug for min/max and nan, this should be fixed for 1.3.0,
maybe 1.2.1 too.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Robert Kern
On Fri, Sep 19, 2008 at 22:25, David Cournapeau
[EMAIL PROTECTED] wrote:
 Stéfan van der Walt wrote:

 Why shouldn't we have nanmin-like behaviour for the C min itself?


 Ah, I was not arguing we should not do it in C, but rather we did not
 have to do in C. The current behavior for nan with functions relying on
 ordering is broken; if someone prefer fixing it in C, great. But I was
 guessing more people could fix it using python, that's all.

How, exactly? ndarray.min() is the where the implementation is.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread David Cournapeau
Robert Kern wrote:
 On Fri, Sep 19, 2008 at 22:25, David Cournapeau
 [EMAIL PROTECTED] wrote:
   

 How, exactly? ndarray.min() is the where the implementation is.
   

Ah, I keep forgetting those are implemented in the array object, sorry
for that. Now I understand Stefan point. Do I understand correctly that
we should then do:
- implement a min/max NaN aware for every float type (real and
complex) in umathmodule.c, which ignores nan (called @[EMAIL PROTECTED], etc...)
- fix the current min/max to propagate NaN instead of giving broken
result
- How to do the dispatching ? Having PyArray_Min and PyArray_NanMin
sounds the easiest (we don't change any C api, only add an argument to
the python-callable function min, in array_min method ?)

Or am I missing something ? If this is the right way to fix it I am
willing to do it (we still have to agree on the default behavior first).
I am not really familiar with sort module, but maybe it is really
similar to min/max case.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Anne Archibald
2008/9/19 David Cournapeau [EMAIL PROTECTED]:

 I guess my formulation was poor: I never use NaN as missing values
 because I never use missing values, which is why I wanted the opinion of
 people who use NaN in a different manner (because I don't have a good
 idea on how those people would like to see numpy behave). I was
 certainly not arguing they should not be use for the purpose of missing
 value.

I, on the other hand, was making specifically that suggestion: users
should not use nans to indicate missing values. Users should use
masked arrays to indicate missing values.

 The problem with NaN is that you cannot mix the missing value behavior
 and the error behavior. Dealing with them in a consistent manner is
 difficult. Because numpy is a general numerical computation tool, I
 think that NaN should be propagated and never ignored *by default*. If
 you have NaN because of divide by 0, etc... it should not be ignored at
 all. But if you want it to ignore, then numpy should make it possible:

- max, min: should return NaN if NaN is in the array, or maybe even
 fail ?
- argmax, argmin ?
- sort: should fail ?
- mean, std, variance: should return Nan
- median: should fail (to be consistent if sort fails) ? Should
 return NaN ?

This part I pretty much agree with.

 We could then add an argument to failing functions to tell them either
 to ignore NaN/put them at some special location (like R does, for
 example). The ones I am not sure are median and argmax/argmin. For
 median, failing when sort does is consistent; but this can break a lot
 of code. For argmin/argmax, failing is the most logical, but OTOH,
 making argmin/argmax failing and not max/min is not consistent either.
 Breaking the code is maybe not that bad because currently, neither
 max/min nor argmax/argmin nor sort does return a meaningful function.
 Does that sound reasonable to you ?

The problem with this approach is that all those decisions need to be
made and all that code needs to be implemented for masked arrays. In
fact I suspect that it has already been done in that case. So really
what you are suggesting here is that we duplicate all this effort to
implement the same functions for nans as we have for masked arrays.
It's important, too, that the masked array implementation and the nan
implementation behave the same way, or users will become badly
confused. Who gets the task of keeping the two implementations in
sync?

The current situation is that numpy has two ways to indicate bad data
for floating-point arrays: nans and masked arrays. We can't get rid of
either: nans appear on their own, and masked arrays are the only way
to mark bad data in non-floating-point arrays. We can try to make them
behave the same, which will be a lot of work to provide redundant
capabilities. Or we can make them behave drastically differently.
Masked arrays clearly need to be able to handle masked values flexibly
and explicitly. So I think nans should be handled simply and
conservatively: propagate them if possible, raise if not.

If users are concerned about performance, it's worth noting that on
some machines nans force a fallback to software floating-point
handling, with a corresponding very large performance hit. This
includes some but not all x86 (and I think x86-64) CPUs. How this
compares to the performance of masked arrays is not clear.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NEW GENERATED DLL ERROR FOUND WITHIN f2PY.py

2008-09-19 Thread Blubaugh, David A.
To All,
 
 
I have now been able to generate a .pyd file from a FORTRAN file that I am 
trying to interface with python.  I was able to execute this with an additional 
insight into how f2py operates.  It seems as though the documentation requires 
an upgrade, since there appears to be missing information that might misdirect 
a   f2py newcomer, such as myself.  However, I am now facing the following new 
error:
 
ImportError: DLL load with error code 193
 
The python script is as follows:
 
import hello

print hello.__doc__

print hello.foo.__doc__

hello.foo(4) 

 

The Fortran code is as follows:

! -*- f90 -*-

subroutine foo(a)

integer a 

print*, Hello from Fortran! 

print*, a=, a 

end

 
I was wondering as to what I should now try in order to finally produce a 
python sending and receiving information from a FORTRAN .pyd file.
 
 
Any Suggestions???
 
Do I have to recompile Python with mingw32 in order to finally resolve this 
issue??  
 
 
 
Thanks,
 
 
David Blubaugh
 
 
  

This e-mail transmission contains information that is confidential and may be 
privileged. It is intended only for the addressee(s) named above. If you 
receive 
this e-mail in error, please do not read, copy or disseminate it in any manner. 
If you are not the intended recipient, any disclosure, copying, distribution or 
use of the contents of this information is prohibited. Please reply to the 
message immediately by informing the sender that the message was misdirected. 
After replying, please erase it from your computer system. Your assistance in 
correcting this error is appreciated.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread David Cournapeau
Anne Archibald wrote:

 I, on the other hand, was making specifically that suggestion: users
 should not use nans to indicate missing values. Users should use
 masked arrays to indicate missing values.

I agree it is the nicest solution in theory, but I think it is
impractical (as mentioned by Eric Firing in his email).


 This part I pretty much agree with.

I can't really see which one is better (failing or returning NaN for
sort/min/max and their sort counterpat), or if we should let the choice
be left to the user. I am fine with both, and they both require the same
amount of work.

  Or we can make them behave drastically differently.
 Masked arrays clearly need to be able to handle masked values flexibly
 and explicitly. So I think nans should be handled simply and
 conservatively: propagate them if possible, raise if not.

I agree about this behavior being the default. I just think that for a
couple of functions, we could we give either separate functions, or
additional arguments to existing functions to ignore them: I am thinking
about min/max/sort and their arg* counterpart, because those are really
basic, and because we already have nanmean/nanstd/nanmedian (e.g. having
a nansort would help for nanmean to be much faster).


 If users are concerned about performance, it's worth noting that on
 some machines nans force a fallback to software floating-point
 handling, with a corresponding very large performance hit.

I was more concerned with the cost of treating NaN when you do not have
NaN in your array when you have to treat for NaN explicitely (everything
involving comparison). But I don't see any obvious way to avoid that cost,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion