Re: [Numpy-discussion] Medians that ignore values
2008/9/19 David Cournapeau [EMAIL PROTECTED]: Anne Archibald wrote: That was in amax/amin. Pretty much every other function that does comparisons needs to be fixed to work with nans. In some cases it's not even clear how: where should a sort put the nans in an array? The problem is more on how the functions use sort than sort itself in the case of median. There can't be a 'good' way to put nan in soft, for example, since nans cannot be ordered. Well, for example, you might ask that all the non-nan elements be in order, even if you don't specify where the nan goes. I don't know about the best strategy: either we fix every function using comparison, handling nan as a special case as you mentioned, or there may be a more clever thing to do to avoid special casing everywhere. I don't have a clear idea of how many functions rely on ordering in numpy. You can always just set numpy to raise an exception whenever it comes across a nan. In fact, apart from the difficulty of correctly frobbing numpy's floating-point handling, how reasonable is it for (say) median to just run as it is now, but if an exception is thrown, fall back to a nan-aware version? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Anne Archibald wrote: Well, for example, you might ask that all the non-nan elements be in order, even if you don't specify where the nan goes. Ah, there are two problems, then: - sort - how median use sort. For sort, I don't know how sort speed would be influenced by treating nan. In a way, calling sort with nan inside is a user error (if you take the POV nan are not comparable), but nan are used for all kind of purpose, hence maybe having a nansort would be nice. OTOH (I took a look at this when I fixed nanmean and co a while ago in scipy), matlab and R treat sort differently than mean and co. I am puzzled by this: - R sort arrays with nan as you want by default (nan can be ignored, put in front or at the end of the array). - R max does not ignore nan by default. - R median does not ignore median by default. I don't know how to set a consistency here. I don't think we are consistent by having max/amax/etc... ignoring nan but sort not ignoring it. OTOH, R is not consistent either. You can always just set numpy to raise an exception whenever it comes across a nan. In fact, apart from the difficulty of correctly frobbing numpy's floating-point handling, how reasonable is it for (say) median to just run as it is now, but if an exception is thrown, fall back to a nan-aware version? It would be different from the current nan vs usual function behavior for median/mean/etc...: why should sort handle nan by default, but not the other functions ? For mean/std/variance/median, if having nan is an error, you see it in the result (once we fix our median), but not with sort. Hm, I am always puzzled when I think about nan handling :) It always seem there is not good answer. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 03:11:05 David Cournapeau wrote: Hm, I am always puzzled when I think about nan handling :) It always seem there is not good answer. Which is why we have masked arrays, of course ;) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 Pierre GM [EMAIL PROTECTED]: On Friday 19 September 2008 03:11:05 David Cournapeau wrote: Hm, I am always puzzled when I think about nan handling :) It always seem there is not good answer. Which is why we have masked arrays, of course ;) I think the numpy attitude to nans should be that they are unexpected bogus values that signify that something went wrong with the calculation somewhere. They can be left in place for most operations, but any operation that depends on the value should (ideally) return nan, or failing that, raise an exception. (If users want exceptions all the time, that's what seterr is for.) If people want to flag bad data, let's tell them to use masked arrays. So by this rule amax/maximum/mean/median should all return nan when there's a nan in their input; I don't think it's reasonable for sort to return an array full of nans, so I think its default behaviour should be to raise an exception if there's a nan. It's valuable (for example in median) to be able to sort them all to the end, but I don't think this should be the default. If people want nanmin, I would be tempted to tell them to use masked arrays (is there a convenience function that makes a masked array with a mask everywhere the data is nan?). I am assuming that appropriate masked sort/amax/maximum/mean/median exist already. They're definitely needed, so how much effort is it worth putting in to duplicate that functionality with nans instead of masked elements? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion for recarray.view
2008/9/19 Travis E. Oliphant [EMAIL PROTECTED]: #--- def view(self, dtype=None, type=None): if dtype is None: return ndarray.view(self, type) elif type is None: try: if issubclass(dtype, ndarray): return ndarray.view(self, dtype) except TypeError: pass dtype = sb.dtype(dtype) if dtype.fields is None: return self.__array__().view(dtype) return ndarray.view(self, dtype) else: return ndarray.view(self, dtype, type) #--- This looks pretty good to me. +1 for adding it. +1 and another +1 to your karma for requesting peer review. Let me know if you need me to whip up a couple of tests for verifying the different usage cases. Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
Robert Kern wrote: On Thu, Sep 18, 2008 at 06:01, Robert Cimrman [EMAIL PROTECTED] wrote: Hi Robert, Robert Kern wrote: On Mon, Sep 15, 2008 at 11:13, Arnar Flatberg [EMAIL PROTECTED] wrote: That would make me an extremely happy user, I've been looking for this for years! I can't imagine I'm the only one who profiles some hundred lines of code and ends up with 90% of total time in the dot-function For the time being, you can grab it here: http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/ It requires Cython and a C compiler to build. I'm still debating myself about the desired workflow for using it, but for now, it only profiles functions which you have registered with it. I have made the profiler work as a decorator to make this easy. E.g., many thanks for this! I have wanted to try out the profiler but failed to build it (changeset 6 0de294aa75bf): $ python setup.py install --root=/home/share/software/ running install running build running build_py creating build creating build/lib.linux-i686-2.4 copying line_profiler.py - build/lib.linux-i686-2.4 running build_ext cythoning _line_profiler.pyx to _line_profiler.c building '_line_profiler' extension creating build/temp.linux-i686-2.4 i486-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC -I/usr/include/python2.4 -c -I/usr/include/python2.4 -c _line_profiler.c -o build/temp.linux-i686-2.4/_line_profiler.o _line_profiler.c:1614: error: 'T_LONGLONG' undeclared here (not in a function) error: command 'i486-pc-linux-gnu-gcc' failed with exit status 1 I have cython-0.9.8.1 and GCC 4.1.2, 32-bit machine. It uses the #define'd macro PY_LONG_LONG. Go through your Python headers to see what this gets expanded to. I have Python 2.4.4 in pyconfig.h #define HAVE_LONG_LONG 1 in pyport.h: #ifdef HAVE_LONG_LONG #ifndef PY_LONG_LONG #define PY_LONG_LONG long long #endif #endif /* HAVE_LONG_LONG */ so it seems compatible with 'ctypedef long long PY_LONG_LONG' r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 Anne Archibald [EMAIL PROTECTED]: I think the numpy attitude to nans should be that they are unexpected bogus values that signify that something went wrong with the calculation somewhere. They can be left in place for most operations, but any operation that depends on the value should (ideally) return nan, or failing that, raise an exception. I agree completely. I am assuming that appropriate masked sort/amax/maximum/mean/median exist already. They're definitely needed, so how much effort is it worth putting in to duplicate that functionality with nans instead of masked elements? Unfortunately, this needs to happen at the C level. Is anyone reading this willing to spend some time taking care of the issue? It's an important one. Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
Ondrej Certik wrote: On Thu, Sep 18, 2008 at 4:12 PM, Ryan May [EMAIL PROTECTED] wrote: Ondrej Certik wrote: On Thu, Sep 18, 2008 at 1:01 PM, Robert Cimrman [EMAIL PROTECTED] wrote: It requires Cython and a C compiler to build. I'm still debating myself about the desired workflow for using it, but for now, it only profiles functions which you have registered with it. I have made the profiler work as a decorator to make this easy. E.g., many thanks for this! I have wanted to try out the profiler but failed to build it (changeset 6 0de294aa75bf): $ python setup.py install --root=/home/share/software/ running install running build running build_py creating build creating build/lib.linux-i686-2.4 copying line_profiler.py - build/lib.linux-i686-2.4 running build_ext cythoning _line_profiler.pyx to _line_profiler.c building '_line_profiler' extension creating build/temp.linux-i686-2.4 i486-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC -I/usr/include/python2.4 -c -I/usr/include/python2.4 -c _line_profiler.c -o build/temp.linux-i686-2.4/_line_profiler.o _line_profiler.c:1614: error: 'T_LONGLONG' undeclared here (not in a function) error: command 'i486-pc-linux-gnu-gcc' failed with exit status 1 I have cython-0.9.8.1 and GCC 4.1.2, 32-bit machine. I am telling you all the time Robert to use Debian that it just works and you say, no no, gentoo is the best. :) And what's wrong with that? :) Once you get over the learning curve, Gentoo works just fine. Must be Robert K.'s fault. :) Well, I think if Robert C. hasn't yet get over the learning curve after so many years of hard work, maybe the learning curve is too steep. :) This is most probably not related to Gentoo at all and certainly not related to me knowing Gentoo or not :) (and no, learning Gentoo is not that hard.) r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion for recarray.view
On Friday 19 September 2008 04:13:39 Stéfan van der Walt wrote: +1 and another +1 to your karma for requesting peer review. Let me know if you need me to whip up a couple of tests for verifying the different usage cases. That'd be lovely. I'm a bit swamped with tricky issues in mrecords and dependents... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Stéfan van der Walt wrote: I agree completely. Me too, but I am extremely biased toward nan is always bogus by my own usage of numpy/scipy (I never use NaN as missing value, and nan is always caused by divide by 0 and co). I like that sort raise an exception by default with NaN: it breaks the API, OTOH, I can't see a good use of sort with NaN since sort does not sort values in that case: we would break the API of a broken function. Unfortunately, this needs to happen at the C level. Why ? cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 04:31:38 David Cournapeau wrote: Pierre GM wrote: That said, numpy.nanmin, numpy.nansum... don't come with the heavy machinery of numpy.ma, and are therefore faster. I'm really going to have to learn C. FWIW, nanmean/nanmean/etc... are written in python, I know. I was more dreading the time when MaskedArrays would have to be ported to C. In a way, that would probably simplify a few issues. OTOH, I don't really see it happening any time soon. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generating random samples without repeats
Robert Kern robert.kern at gmail.com writes: On Thu, Sep 18, 2008 at 16:55, Paul Moore pf_moore at yahoo.co.uk wrote: I want to generate a series of random samples, to do simulations based on them. Essentially, I want to be able to produce a SAMPLESIZE * N matrix, where each row of N values consists of either 1. Integers between 1 and M (simulating M rolls of an N-sided die), or 2. A sample of N numbers between 1 and M without repeats (simulating deals of N cards from an M-card deck). Example (1) is easy, numpy.random.random_integers(1, M, (SAMPLESIZE, N)) But I can't find an obvious equivalent for (2). Am I missing something glaringly obvious? I'm using numpy - is there maybe something in scipy I should be looking at? numpy.array([(numpy.random.permutation(M) + 1)[:N] for i in range(SAMPLESIZE)]) Thanks. And yet, this takes over 70s and peaks at around 400M memory use, whereas the equivalent for (1) numpy.random.random_integers(1,M,(SAMPLESIZE,N)) takes less than half a second, and negligible working memory (both end up allocating an array of the same size, but your suggestion consumes temporary working memory - I suspect, but can't prove, that the time taken comes from memory allocations rather than computation. As a one-off cost initialising my data, it's not a disaster, but I anticipate using idioms like this later in my calculations as well, where the costs could hurt more. If I'm going to need to write C code, are there any good examples of this? (I guess the source for numpy.random is a good place to start). Paul ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 David Cournapeau [EMAIL PROTECTED]: Stéfan van der Walt wrote: I agree completely. Me too, but I am extremely biased toward nan is always bogus by my own usage of numpy/scipy (I never use NaN as missing value, and nan is always caused by divide by 0 and co). So am I. In all my use cases, NaNs indicate trouble. Why ? Because we have x.max() silently ignoring NaNs, which causes a lot of head-scratching, swearing and failed experiments. Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generating random samples without repeats
On Friday 19 September 2008 05:08:20 Paul Moore wrote: Robert Kern robert.kern at gmail.com writes: On Thu, Sep 18, 2008 at 16:55, Paul Moore pf_moore at yahoo.co.uk wrote: I want to generate a series of random samples, to do simulations based on them. Essentially, I want to be able to produce a SAMPLESIZE * N matrix, where each row of N values consists of either 2. A sample of N numbers between 1 and M without repeats (simulating deals of N cards from an M-card deck). Have you considered numpy.random.shuffle ? a = np.arange(1, M+1) result = np.random.shuffle(a)[:N] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
On Fri, Sep 19, 2008 at 03:33, Robert Cimrman [EMAIL PROTECTED] wrote: I have Python 2.4.4 in pyconfig.h #define HAVE_LONG_LONG 1 in pyport.h: #ifdef HAVE_LONG_LONG #ifndef PY_LONG_LONG #define PY_LONG_LONG long long #endif #endif /* HAVE_LONG_LONG */ so it seems compatible with 'ctypedef long long PY_LONG_LONG' Ah, found it. T_LONGLONG is a #define from structmember.h which is used to describe the types of attributes. Apparently, this was not added until Python 2.5. That particular member didn't actually need to be long long, so I've fixed that. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
On Fri, Sep 19, 2008 at 10:37 AM, Robert Cimrman [EMAIL PROTECTED] wrote: Ondrej Certik wrote: On Thu, Sep 18, 2008 at 4:12 PM, Ryan May [EMAIL PROTECTED] wrote: Ondrej Certik wrote: On Thu, Sep 18, 2008 at 1:01 PM, Robert Cimrman [EMAIL PROTECTED] wrote: It requires Cython and a C compiler to build. I'm still debating myself about the desired workflow for using it, but for now, it only profiles functions which you have registered with it. I have made the profiler work as a decorator to make this easy. E.g., many thanks for this! I have wanted to try out the profiler but failed to build it (changeset 6 0de294aa75bf): $ python setup.py install --root=/home/share/software/ running install running build running build_py creating build creating build/lib.linux-i686-2.4 copying line_profiler.py - build/lib.linux-i686-2.4 running build_ext cythoning _line_profiler.pyx to _line_profiler.c building '_line_profiler' extension creating build/temp.linux-i686-2.4 i486-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC -I/usr/include/python2.4 -c -I/usr/include/python2.4 -c _line_profiler.c -o build/temp.linux-i686-2.4/_line_profiler.o _line_profiler.c:1614: error: 'T_LONGLONG' undeclared here (not in a function) error: command 'i486-pc-linux-gnu-gcc' failed with exit status 1 I have cython-0.9.8.1 and GCC 4.1.2, 32-bit machine. I am telling you all the time Robert to use Debian that it just works and you say, no no, gentoo is the best. :) And what's wrong with that? :) Once you get over the learning curve, Gentoo works just fine. Must be Robert K.'s fault. :) Well, I think if Robert C. hasn't yet get over the learning curve after so many years of hard work, maybe the learning curve is too steep. :) This is most probably not related to Gentoo at all and certainly not related to me knowing Gentoo or not :) (and no, learning Gentoo is not that hard.) Let us know where the problem was. :) I am just using common sense, if something works on Debian and macosx and doesn't work on gentoo, I thought it was safe to say it was gentoo related, but I may well be wrong. :)) Ondrej ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
On Wed, Sep 17, 2008 at 18:29, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 18:09, Ondrej Certik [EMAIL PROTECTED] wrote: This is what I am getting: $ ./kernprof.py -l pystone.py Wrote profile results to pystone.py.lprof $ ./view_line_prof.py pystone.py.lprof Timer unit: 1e-06 s $ So I think you meant: $ ./kernprof.py -l mystone.py 20628 Wrote profile results to mystone.py.lprof $ ./view_line_prof.py mystone.py.lprof Timer unit: 1e-06 s File: pystone.py Function: Proc0 at line 79 Total time: 13.0803 s [...] Now it works. No, I meant pystone.py. My script-finding code may have (incorrectly) found a different, uninstrumented pystone.py file somewhere else, though. Try with ./pystone.py. There was a bug in how I was constructing the munged namespaces. Fixed now. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Stéfan van der Walt wrote: So am I. In all my use cases, NaNs indicate trouble. Yes, so I would like to have the opinion of people with other usage than ours. Because we have x.max() silently ignoring NaNs, which causes a lot of head-scratching, swearing and failed experiments. But cannot this be fixed at the python level of the max function ? I think it is expected to have the low level C functions to ignore/be bogus if you have Nan. After all, if you use sort of the libc with nan, or sort in C++ for a vector of double, it will not work either. But on my numpy, it looks like nan breaks min/max, they are not ignored: np.min(np.array([0, np.nan, 1])) - 1.0 # bogus np.min(np.array([0, np.nan, 2])) - 2.0 # ok np.min(np.array([0, np.nan, -1])) - -1.0 # ok np.max(np.array([0, np.nan, -1])) -1.0 # bogus Which only makes sense when you guess how they are implemented in C... cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generating random samples without repeats
2008/9/19 Paul Moore [EMAIL PROTECTED]: Robert Kern robert.kern at gmail.com writes: On Thu, Sep 18, 2008 at 16:55, Paul Moore pf_moore at yahoo.co.uk wrote: I want to generate a series of random samples, to do simulations based on them. Essentially, I want to be able to produce a SAMPLESIZE * N matrix, where each row of N values consists of either 1. Integers between 1 and M (simulating M rolls of an N-sided die), or 2. A sample of N numbers between 1 and M without repeats (simulating deals of N cards from an M-card deck). Example (1) is easy, numpy.random.random_integers(1, M, (SAMPLESIZE, N)) But I can't find an obvious equivalent for (2). Am I missing something glaringly obvious? I'm using numpy - is there maybe something in scipy I should be looking at? numpy.array([(numpy.random.permutation(M) + 1)[:N] for i in range(SAMPLESIZE)]) Thanks. And yet, this takes over 70s and peaks at around 400M memory use, whereas the equivalent for (1) numpy.random.random_integers(1,M,(SAMPLESIZE,N)) takes less than half a second, and negligible working memory (both end up allocating an array of the same size, but your suggestion consumes temporary working memory - I suspect, but can't prove, that the time taken comes from memory allocations rather than computation. As a one-off cost initialising my data, it's not a disaster, but I anticipate using idioms like this later in my calculations as well, where the costs could hurt more. If I'm going to need to write C code, are there any good examples of this? (I guess the source for numpy.random is a good place to start). This was discussed on one of the mailing lists several months ago. It turns out that there is no simple way to efficiently choose without replacement in numpy/scipy. I posted a hack that does this somewhat efficiently (if SAMPLESIZEM/2, choose the first SAMPLESIZE of a permutation; if SAMPLESIZEM/2, choose with replacement and redraw any duplicates) but it's not vectorized across many sample sets. Is your problem large M or large N? what is SAMPLESIZE/M? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
David Cournapeau david at ar.media.kyoto-u.ac.jp writes: You can use nanmean (from scipy.stats): I rejoiced when I saw this answer, because it looks like a function I can just drop in and it works. Unfortunately, nanmedian seems to be quite a bit slower than just using lists (ignoring nan values from my experiments) and a home-brew implementation of median. I was mostly using numpy for speed... I would like to try the masked array approach, but the Ubuntu packages for scipy and matplotlib depend on numpy. Does anybody know whether I can naively do sudo python setup.py install on a more modern numpy without disturbing scipy and matplotlib, or do I need to uninstall all three packages and install them manually from source? On my 64 bit machine, the Ubuntu numpy package is even more out of date: $ dpkg -l | grep numpy ii python-numpy 1:1.0.4-6ubuntu3 Does anybody know why this is? I might be willing to help bring the repository up to date, if anybody can give me pointers on how to do this. Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 05:51:55 Peter Saffrey wrote: I would like to try the masked array approach, but the Ubuntu packages for scipy and matplotlib depend on numpy. Does anybody know whether I can naively do sudo python setup.py install on a more modern numpy without disturbing scipy and matplotlib, or do I need to uninstall all three packages and install them manually from source? I think there were some changes on the C side of numpy between 1.0 and 1.1, you may have to recompile scipy and matplotlib from sources. What versions are you using for those 2 packages ? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
David Cournapeau david at ar.media.kyoto-u.ac.jp writes: It may be that nanmedian is slow. But I would sincerly be surprised if it were slower than python list, except for some pathological cases, or maybe a bug in nanmedian. What do your data look like ? (size, number of nan, etc...) I've posted my test code below, which gives me the results: $ ./arrayspeed3.py list build time: 0.01 list median time: 0.01 array nanmedian time: 0.36 I must have done something wrong to hobble nanmedian in this way... I'm quite new to numpy, so feel free to point out any obviously egregious errors. Peter === from numpy import array, nan, inf from pylab import rand from time import clock from scipy.stats.stats import nanmedian import pdb _pdb = pdb.Pdb() breakpoint = _pdb.set_trace def my_median(vallist): num_vals = len(vallist) vallist.sort() if num_vals % 2 == 1: # odd index = (num_vals - 1) / 2 return vallist[index] else: # even index = num_vals / 2 return (vallist[index] + vallist[index - 1]) / 2 numtests = 100 testsize = 100 pointlen = 3 t0 = clock() natests = rand(numtests,testsize,pointlen) # have to start with inf because list.remove(nan) doesn't remove nan natests[natests 0.9] = inf tests = natests.tolist() natests[natests==inf] = nan for test in tests: for point in test: if inf in point: point.remove(inf) t1 = clock() print list build time:, t1-t0 t0 = clock() allmedians = [] for test in tests: medians = [ my_median(x) for x in test ] allmedians.append(medians) t1 = clock() print list median time:, t1-t0 t0 = clock() namedians = [] for natest in natests: thismed = nanmedian(natest, axis=1) namedians.append(thismed) t1 = clock() print array nanmedian time:, t1-t0 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Pierre GM pgmdevlist at gmail.com writes: I think there were some changes on the C side of numpy between 1.0 and 1.1, you may have to recompile scipy and matplotlib from sources. What versions are you using for those 2 packages ? $ dpkg -l | grep scipy ii python-scipy 0.6.0-8ubuntu1 scientific tools for Python $ dpkg -l | grep matplotlib ii python-matplotlib 0.91.2-0ubuntu1 Python based plotting system in a style simi ii python-matplotlib-data 0.91.2-0ubuntu1 Python based plotting system (data package) ii python-matplotlib-doc 0.91.2-0ubuntu1 Python based plotting system (documentation Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Peter Saffrey wrote: Pierre GM pgmdevlist at gmail.com writes: I think there were some changes on the C side of numpy between 1.0 and 1.1, you may have to recompile scipy and matplotlib from sources. What versions are you using for those 2 packages ? $ dpkg -l | grep scipy ii python-scipy 0.6.0-8ubuntu1 scientific tools for Python $ dpkg -l | grep matplotlib ii python-matplotlib 0.91.2-0ubuntu1 Python based plotting system in a style simi ii python-matplotlib-data 0.91.2-0ubuntu1 Python based plotting system (data package) ii python-matplotlib-doc 0.91.2-0ubuntu1 Python based plotting system (documentation If you build numpy from sources, please don't install it into /usr ! It will more than likely break everything which depends on numpy, as well as your debian installation (because you will overwrite packages handled by dpkg). You should really install in a local directory, outside /usr. You will have to install scipy and matplotlib in any case, too. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Peter Saffrey wrote: I've posted my test code below, which gives me the results: $ ./arrayspeed3.py list build time: 0.01 list median time: 0.01 array nanmedian time: 0.36 I must have done something wrong to hobble nanmedian in this way... I'm quite new to numpy, so feel free to point out any obviously egregious errors. Ok: it is pathological, and can be done better :) First: for natest in natests: thismed = nanmedian(natest, axis=1) namedians.append(thismed) ^^^ Here, you are doing nanmedian on a direction with 3 elements: this will be slow in numpy, because numpy involves some relatively heavy machinery to run on arrays. The machinery pays off for 'big' arrays, but for really small arrays like here, list can (and often are) be faster. Still, it is indeed really slow for your case; when I fixed nanmean and co, I did not know much about numpy, I just wanted them to give the right answer :) I think this can be made faster, specially for your case (where the axis along which the median is computed is really small). I opened a bug: http://scipy.org/scipy/scipy/ticket/740 cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
Robert Kern wrote: Ah, found it. T_LONGLONG is a #define from structmember.h which is used to describe the types of attributes. Apparently, this was not added until Python 2.5. That particular member didn't actually need to be long long, so I've fixed that. Great, I will try it after it appears on the web page. Thank you, r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 David Cournapeau [EMAIL PROTECTED]: But cannot this be fixed at the python level of the max function ? I Why shouldn't we have nanmin-like behaviour for the C min itself? I'd rather have a specialised function to deal with the rare kinds of datasets where NaNs are guaranteed never to occur. But on my numpy, it looks like nan breaks min/max, they are not ignored: Yes, that's the problem. Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] which one is best?
2008/9/19 mark [EMAIL PROTECTED]: I need to multiply items in a list and need a list back. Which one of the four options is best (I thought in Python there was only one way to do something???) With the emphasis on preferably and obvious :) There should be one-- and preferably only one --obvious way to do it. The modern idiom is the list comprehension, rather than the for-loop. Of those options, I personally prefer using zip. [ x * y for x,y in zip(a,b) ] # method 4 [10, 40, 90, 160] If you have very large arrays, you can also consider (np.array(x) * np.array(y)).tolist() Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] which one is best?
On Fri, Sep 19, 2008 at 3:09 PM, Stéfan van der Walt [EMAIL PROTECTED]wrote: 2008/9/19 mark [EMAIL PROTECTED]: I need to multiply items in a list and need a list back. Which one of the four options is best (I thought in Python there was only one way to do something???) With the emphasis on preferably and obvious :) There should be one-- and preferably only one --obvious way to do it. The modern idiom is the list comprehension, rather than the for-loop. Of those options, I personally prefer using zip. [ x * y for x,y in zip(a,b) ] # method 4 [10, 40, 90, 160] If you have very large arrays, you can also consider (np.array(x) * np.array(y)).tolist() Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion I think [x*y for x in a for y in b] feels pythonic, however it has a surprisingly lousy performance. In [30]: %timeit [ x * y for x,y in zip(a,b) ] 10 loops, best of 3: 3.96 µs per loop In [31]: %timeit [ i*j for i in a for j in b ] 10 loops, best of 3: 6.53 µs per loop In [32]: a = range(100) In [33]: b = range(100) In [34]: %timeit [ x * y for x,y in zip(a,b) ] 1 loops, best of 3: 51.9 µs per loop In [35]: %timeit [ i*j for i in a for j in b ] 100 loops, best of 3: 2.78 ms per loop Arnar ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] which one is best?
On Fri, Sep 19, 2008 at 4:09 PM, lorenzo [EMAIL PROTECTED] wrote: On Fri, Sep 19, 2008 at 2:50 PM, Arnar Flatberg [EMAIL PROTECTED]wrote: I think [x*y for x in a for y in b] feels pythonic, however it has a surprisingly lousy performance. This returns a len(x)*len(y) long list, which is not what you want. My bad, Its friday afternoon, I'll go home now :-) Arnar ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] which one is best?
Hi Arnar, Your two commands below aren't doing the same thing - one is doing a[i]*b[i] and the other is doing a[i]*b[j] for all i and j. As the second is harder, it takes longer. Cheers, David On Fri, 2008-09-19 at 09:08 -0500, [EMAIL PROTECTED] wrote: I think [x*y for x in a for y in b] feels pythonic, however it has a surprisingly lousy performance. In [30]: %timeit [ x * y for x,y in zip(a,b) ] 10 loops, best of 3: 3.96 ?s per loop In [31]: %timeit [ i*j for i in a for j in b ] 10 loops, best of 3: 6.53 ?s per loop In [32]: a = range(100) In [33]: b = range(100) In [34]: %timeit [ x * y for x,y in zip(a,b) ] 1 loops, best of 3: 51.9 ?s per loop In [35]: %timeit [ i*j for i in a for j in b ] 100 loops, best of 3: 2.78 ms per loop Arnar -- ** David M. Kaplan Charge de Recherche 1 Institut de Recherche pour le Developpement Centre de Recherche Halieutique Mediterraneenne et Tropicale av. Jean Monnet B.P. 171 34203 Sete cedex France Phone: +33 (0)4 99 57 32 27 Fax: +33 (0)4 99 57 32 95 http://www.ur097.ird.fr/team/dkaplan/index.html ** ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generating random samples without repeats
Rick White rlw at stsci.edu writes: It seems like numpy.random.permutation is pretty suboptimal in its speed. Here's a Python 1-liner that does the same thing (I think) but is a lot faster: a = 1+numpy.random.rand(M).argsort()[0:N-1] This still has the the problem that it generates a size N array to start with. But at least it is fast compared with permutation: Interesting. For my generation of a million samples, this takes about 46 sec vs the original 75. That's a 35% increase in speed. As you mention, it doesn't help memory, which still peaks at around 450M. Interestingly, I was reminded of J (http://www.jsoftware.com/), an APL derivative, which does this in a blistering 1.3 seconds, with no detectable memory overhead. Of course, being descended from APL, the code to do this is pretty obscure: 5 ? (100 $ 52) (Here, ? is the deal operator, and $ reshapes an array - so it's deal 5 from each item in a 100-long array of 52's. Everything is a primitive here, so it's not hard to see why it's fast). A Python/Numpy - J bridge might be a fun exercise... Paul. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 11:09 AM Stefan Van der Walt apparently wrote: Masked arrays. Using NaN's for missing values is dangerous. You may do some operation, which generates invalid results, and then you have a mixed bag of missing and invalid values. That rather evades my full question, I think? In the case I mentioned, I am filling an array inside a loop, and the possible fill values are not constrained. So I cannot mask based on value, and I cannot mask based on position (at least until after the computations are complete). It seems to me that there are pragmatic reasons why people work with NaNs for missing values, that perhaps shd not be dismissed so quickly. But maybe I am overlooking a simple solution. Alan PS I confess I do not understand NaNs. E.g., why could there not be a value np.miss that would be a NaN that represents a missing value? Are all NaNs already assigned standard meanings? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 11:36:17 Alan G Isaac wrote: On 9/19/2008 11:09 AM Stefan Van der Walt apparently wrote: Masked arrays. Using NaN's for missing values is dangerous. You may do some operation, which generates invalid results, and then you have a mixed bag of missing and invalid values. That rather evades my full question, I think? In the case I mentioned, I am filling an array inside a loop, and the possible fill values are not constrained. So I cannot mask based on value, and I cannot mask based on position (at least until after the computations are complete). No, but you may do the opposite: just start with an array completely masked, and unmasked it as you need: Say, you have 4x5 array, and want to unmask (0,0), (1,2), (3,4) a = ma.empty((4,5), dtype=float) a.mask=True a[0,0] = 0 a[1,2]=1 a[3,4]=3 a masked_array(data = [[0.0 -- -- -- --] [-- -- 1.0 -- --] [-- -- -- -- --] [-- -- -- -- 3.0]], mask = [[False True True True True] [ True True False True True] [ True True True True True] [ True True True True False]], fill_value=1e+20) a.max(axis=0) masked_array(data = [0.0 -- 1.0 -- 3.0], mask = [False True False True False], fill_value=1e+20) It seems to me that there are pragmatic reasons why people work with NaNs for missing values, that perhaps shd not be dismissed so quickly. But maybe I am overlooking a simple solution. nansomething solutions tend to be considerably faster, that might be one reason. A lack of visibility of numpy.ma could be a second. In any case, I can't but agree with other posters: a NaN in an array usually means something went astray. PS I confess I do not understand NaNs. E.g., why could there not be a value np.miss that would be a NaN that represents a missing value? You can't compare NaNs to anything. How do you know this np.miss is a masked value, when np.sqrt(-1.) is NaN ? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] New patch for new mgrid / ogrid functionality
Hi all, Attached is a newer version of my patch that adds new mgrid / ogrid functionality for working with arrays in addition to slices. In fact, I have attached two versions of the patch: index_tricks.patch, that is just the last version of the patch I sent, and index_tricks.new.patch, that has been modified so that it is backward compatible. In the last version, mgrid calls where all arguments are slices will return an array, otherwise it returns a list as ogrid does. This is the only reasonable way to have the new functionality and maintain backwards compatibility. My 2 cents - I personally think the version that always returns a list will ultimately be more transparent and cause fewer problems than the newer version. In either case, the plan should be to eventually have it always return a list as that is the only fully consistent option, the question is just when that switch should be made and by who. If it is done at the next major release, someone else will have to remember to ax the additional code and correct the documentation Other changes that would be nice: add a __call__ method, create an instance called ndgrid for matlab compatibility, and have meshgrid be reimplimented using an nd_grid instance. Cheers, David -- ** David M. Kaplan Charge de Recherche 1 Institut de Recherche pour le Developpement Centre de Recherche Halieutique Mediterraneenne et Tropicale av. Jean Monnet B.P. 171 34203 Sete cedex France Phone: +33 (0)4 99 57 32 27 Fax: +33 (0)4 99 57 32 95 http://www.ur097.ird.fr/team/dkaplan/index.html ** Index: numpy/lib/tests/test_index_tricks.py === --- numpy/lib/tests/test_index_tricks.py (revision 5834) +++ numpy/lib/tests/test_index_tricks.py (working copy) @@ -24,15 +24,21 @@ def test_nd(self): c = mgrid[-1:1:10j,-2:2:10j] d = mgrid[-1:1:0.1,-2:2:0.2] -assert(c.shape == (2,10,10)) -assert(d.shape == (2,20,20)) +assert(array(c).shape == (2,10,10)) +assert(array(d).shape == (2,20,20)) assert_array_equal(c[0][0,:],-ones(10,'d')) assert_array_equal(c[1][:,0],-2*ones(10,'d')) assert_array_almost_equal(c[0][-1,:],ones(10,'d'),11) assert_array_almost_equal(c[1][:,-1],2*ones(10,'d'),11) -assert_array_almost_equal(d[0,1,:]-d[0,0,:], 0.1*ones(20,'d'),11) -assert_array_almost_equal(d[1,:,1]-d[1,:,0], 0.2*ones(20,'d'),11) +assert_array_almost_equal(d[0][1,:]-d[0][0,:], 0.1*ones(20,'d'),11) +assert_array_almost_equal(d[1][:,1]-d[1][:,0], 0.2*ones(20,'d'),11) +def test_listargs(self): +e = mgrid[ :2, ['a', 'b', 'c'], [1,5,50,500] ] +assert( array(e).shape == (3,2,3,4) ) +assert_array_equal( e[0][:,1,1].ravel(), r_[:2] ) +assert_array_equal( e[1][1,:,1].ravel(), array(['a','b','c']) ) +assert_array_equal( e[2][1,1,:].ravel(), array([1,5,50,500]) ) class TestConcatenator(TestCase): def test_1d(self): Index: numpy/lib/index_tricks.py === --- numpy/lib/index_tricks.py (revision 5834) +++ numpy/lib/index_tricks.py (working copy) @@ -11,7 +11,7 @@ from numpy.core.numerictypes import find_common_type import math -import function_base +import function_base, shape_base import numpy.core.defmatrix as matrix makemat = matrix.matrix @@ -118,14 +118,28 @@ number of points to create between the start and stop values, where the stop value **is inclusive**. +One can also use lists or arrays as indexing arguments, in which case +these will be meshed out themselves instead of generating matrices from +the slice arguments. See examples below. + If instantiated with an argument of sparse=True, the mesh-grid is open (or not fleshed out) so that only one-dimension of each returned argument is greater than 1 +***IMPORTANT NOTE*** Indexing an nd_grid instance with +sparse=False will currently return an array N+1 axis array if all +arguments are slices (i.e., something like -4:5:20j or :20:0.5) +and there are N arguments. However, if any of the arguments is +not a slice (e.g., is an array or list), then the return is a list +of arrays. This is to maintain backwards compatibility. However, +this functionality will disappear during the next major release +(after today's date: 2008-09-19) so that returns will always be +lists in the future. + Examples mgrid = np.lib.index_tricks.nd_grid() - mgrid[0:5,0:5] + mgrid[0:5,0:5] # NOTE currently returns array, but will become a list array([[[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], @@ -139,6 +153,27 @@ [0, 1, 2, 3, 4]]]) mgrid[-1:1:5j] array([-1. , -0.5, 0. , 0.5, 1. ]) + mgrid[:2,[1,5,50],['a','b']] # Example
Re: [Numpy-discussion] Medians that ignore values
Alan G Isaac aisaac at american.edu writes: Recently I needed to fill a 2d array with values from computations that could go wrong. I created an array of NaN and then replaced the elements where the computation produced a useful value. I then applied ``nanmax``, to get the maximum of the useful values. I'm glad you posted this, because this is exactly the method I'm using. How do you detect whether there are still any missing spots in your array? nan has some rather unfortunate properties: from numpy import * a = array([1,2,nan]) nan in a False nan == nan False Should I take the earlier advice and switch to masked arrays? Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 12:02:08 Peter Saffrey wrote: Alan G Isaac aisaac at american.edu writes: Recently I needed to fill a 2d array with values from computations that could go wrong. Should I take the earlier advice and switch to masked arrays? Peter Yes. As you've noticed, you can't compare nans (after all, nans are not numbers...), which limits their use. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 11:46 AM Pierre GM apparently wrote: You can't compare NaNs to anything. How do you know this np.miss is a masked value, when np.sqrt(-1.) is NaN ? I thought you could use ``is``. E.g., np.nan == np.nan False np.nan is np.nan True Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 11:46 AM Pierre GM apparently wrote: No, but you may do the opposite: just start with an array completely masked, and unmasked it as you need: Very useful example. I did not understand this possibility. Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Fri, Sep 19, 2008 at 1:11 AM, David Cournapeau [EMAIL PROTECTED] wrote: Anne Archibald wrote: Well, for example, you might ask that all the non-nan elements be in order, even if you don't specify where the nan goes. Ah, there are two problems, then: - sort - how median use sort. For sort, I don't know how sort speed would be influenced by treating nan. In a way, calling sort with nan inside is a user error (if you take the POV nan are not comparable), but nan are used for all kind of purpose, used - misused. Using nan to flag anything but a numerical error is going to cause problems. It wouldn't be too hard to implement nansorts, they just need a real comparison function so that all the nans end up at on end or the other. I don't know that that would make medians any easier, though. Are the nans part of the data set? A nansearchsorted would probably be needed also. If this functionality is added, the best way might be something like kind='nanquicksort'. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 12:02 PM Peter Saffrey apparently wrote: a = array([1,2,nan]) nan in a False Huh. I'm inclined to call this a bug, since normal Python behavior is that ``in`` should check for identity:: xl = [1.,np.nan] np.nan in xl True Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] np.nan and ``is``
Might someone explain this to me? x = [1.,np.nan] np.nan in x True np.nan in np.array(x) False np.nan in np.array(x).tolist() False np.nan is float(np.nan) True Thank you, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.nan and ``is``
You, know, float are inmutable objects, and then 'float(f)' just returns a new reference to 'f' is 'f' is (exactly) of type 'float' In [1]: f = 1.234 In [2]: f is float(f) Out[2]: True I do not remember right now the implementations of comparisons in core Python, but I believe the 'in' operator is testing first for object identity, and then 'np.nan in [np.nan]' then returns True, and then the fact that 'np.nan==np.nan' returns False is never considered. On Fri, Sep 19, 2008 at 1:59 PM, Alan G Isaac [EMAIL PROTECTED] wrote: Might someone explain this to me? x = [1.,np.nan] np.nan in x True np.nan in np.array(x) False np.nan in np.array(x).tolist() False np.nan is float(np.nan) True Thank you, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.nan and ``is``
On Fri, Sep 19, 2008 at 1:59 PM, Alan G Isaac [EMAIL PROTECTED] wrote: Might someone explain this to me? x = [1.,np.nan] np.nan in x True np.nan in np.array(x) False np.nan in np.array(x).tolist() False np.nan is float(np.nan) True On 9/19/2008 1:15 PM Lisandro Dalcin apparently wrote: I do not remember right now the implementations of comparisons in core Python, but I believe the 'in' operator is testing first for object identity, and then 'np.nan in [np.nan]' then returns True, and then the fact that 'np.nan==np.nan' returns False is never considered. Sure. All evaluations to True make sense to me. I am asking about the ones that evaluate to False. Thanks, Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.nan and ``is``
Alan G Isaac wrote: Might someone explain this to me? x = [1.,np.nan] np.nan in x True np.nan in np.array(x) False np.nan in np.array(x).tolist() False np.nan is float(np.nan) True not quite -- but I do know that is is tricky -- it tests object identity. I think it actually compares the pointer to the object. What makes this tricky is that python interns some objects, so that when you create two that have the same value, they may actually be the same object: s1 = this s2 = this s1 is s2 True So short strings are interned, as are small integers and maybe floats? However, longer strings are not: s1 = A much longer string s2 = A much longer string s1 is s2 False I don't know the interning rules, but I do know that you should never count on them, then may not be consistent between implementations, or even different runs. NaN is a floating point number with a specific value. np.nan is particular instance of that, but not all nans will be the same instance: np.array(0.0) / 0 nan np.array(0.0) / 0 is np.nan False So you can't use is to check. np.array(0.0) / 0 == np.nan False and you can't use == The only way to do it reliably is: np.isnan(np.array(0.0) / 0) True So, the short answer is that the only way to deal with NaNs properly is to have NaN-aware functions, like nanmin() and friends. Regardless of how man nan* functions get written, or what exactly they do, we really do need to make sure that no numpy function gives bogus results in the presence of NaNs, which doesn't appear to be the case now. I also think I see a consensus building that non-nan-specific numpy functions should either preserve NaN's or raise exceptions, rather than ignoring them. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Fri, Sep 19, 2008 at 11:34, Alan G Isaac [EMAIL PROTECTED] wrote: On 9/19/2008 12:02 PM Peter Saffrey apparently wrote: a = array([1,2,nan]) nan in a False Huh. I'm inclined to call this a bug, since normal Python behavior is that ``in`` should check for identity:: xl = [1.,np.nan] np.nan in xl True Except that there are no objects inside non-object arrays. There is nothing with identity inside the arrays to compare against. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Understanding mgrid
On Fri, Sep 19, 2008 at 12:59, Brad Malone [EMAIL PROTECTED] wrote: Hi, I was wondering if someone could englighten me on what the geometrical significance of numpy.mgrid is. I can play around with it and see trends in the sizes and number of arrays, but why does it give the output that it does? Looking at the example shown below, why does it return a matrix and its transpose? Well, it returns one array. In your example, there is a (2,5,5) array, which is basically the concatenation of two arrays which *happen* to be transposes of each other. If you had chosen differently sized axes, they wouldn't be transposes. In [14]: mgrid[0:2,0:3] Out[14]: array([[[0, 0, 0], [1, 1, 1]], [[0, 1, 2], [0, 1, 2]]]) Is this a representation of some geometrical grid? It can be. There are other uses for it. Does the output imply some sort of connectivity? It describes an orthogonal grid. If so, how do you see it? mgrid[0:5,0:5] array([[[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3], [4, 4, 4, 4, 4]], BLANKLINE [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]]) I have a cubic grid in 3D space that is spanned by 3 orthogonal vectors. Am I able to generate this equivalent grid with mgrid somehow? If so, how is it done? I am using mayavi and I need to be able to construct some arrays in the same way that mgrid would have constructed them, so this is why I ask. I would probably use indices() instead of mgrid if you are just given the x, y, and z vectors. indices([n,m,k]) is equivalent to mgrid[0:n,0:m,0:k]: In [19]: x = linspace(0, 1, 3) In [20]: x Out[20]: array([ 0. , 0.5, 1. ]) In [21]: y = linspace(1, 2.5, 4) In [22]: y Out[22]: array([ 1. , 1.5, 2. , 2.5]) In [23]: z = linspace(3, 5, 5) In [24]: z Out[24]: array([ 3. , 3.5, 4. , 4.5, 5. ]) In [25]: ix, iy, iz = indices([len(x), len(y), len(z)]) In [26]: x[ix] Out[26]: array([[[ 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ]], [[ 0.5, 0.5, 0.5, 0.5, 0.5], [ 0.5, 0.5, 0.5, 0.5, 0.5], [ 0.5, 0.5, 0.5, 0.5, 0.5], [ 0.5, 0.5, 0.5, 0.5, 0.5]], [[ 1. , 1. , 1. , 1. , 1. ], [ 1. , 1. , 1. , 1. , 1. ], [ 1. , 1. , 1. , 1. , 1. ], [ 1. , 1. , 1. , 1. , 1. ]]]) In [27]: y[iy] Out[27]: array([[[ 1. , 1. , 1. , 1. , 1. ], [ 1.5, 1.5, 1.5, 1.5, 1.5], [ 2. , 2. , 2. , 2. , 2. ], [ 2.5, 2.5, 2.5, 2.5, 2.5]], [[ 1. , 1. , 1. , 1. , 1. ], [ 1.5, 1.5, 1.5, 1.5, 1.5], [ 2. , 2. , 2. , 2. , 2. ], [ 2.5, 2.5, 2.5, 2.5, 2.5]], [[ 1. , 1. , 1. , 1. , 1. ], [ 1.5, 1.5, 1.5, 1.5, 1.5], [ 2. , 2. , 2. , 2. , 2. ], [ 2.5, 2.5, 2.5, 2.5, 2.5]]]) In [28]: z[iz] Out[28]: array([[[ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ]], [[ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ]], [[ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ]]]) -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.nan and ``is``
On Sep 19, 2008, at 7:52 PM, Christopher Barker wrote: I don't know the interning rules, but I do know that you should never count on them, then may not be consistent between implementations, or even different runs. There are a few things that Python-the-language guarantees are singleton objects which can be compared correctly with is. Those are: True, False, None Otherwise there is no guarantee that two objects of a given type which are equal in some sense of the word, are actually the same object. As Chris pointed out, the C implementation does (as a performance matter) have additional singletons. For example, the integers between -5 to 257 are also singletons #ifndef NSMALLPOSINTS #define NSMALLPOSINTS 257 #endif #ifndef NSMALLNEGINTS #define NSMALLNEGINTS 5 #endif /* References to small integers are saved in this array so that they can be shared. The integers that are saved are those in the range -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive). */ static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS]; This used to be -1 to 100 but some testing showed it was better to extend the range somewhat. There was also some performance testing about special-casing 0.0 and +/- 1.0 but I think it showed the results weren't worthwhile. So, back to NaN. There's no guarantee NaN is a singleton object, so testing with is almost certainly is wrong. In fact, at the bit-level there are multiple NaNs. A NaN (according to Wikipedia) fits the following bit pattern. NaN: xaxx. x = undefined. If a = 1, it is a quiet NaN, otherwise it is a signalling NaN. So and 1110 and 1100 are all NaN values. Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
On Fri, Sep 19, 2008 at 07:00, Robert Cimrman [EMAIL PROTECTED] wrote: Robert Kern wrote: Ah, found it. T_LONGLONG is a #define from structmember.h which is used to describe the types of attributes. Apparently, this was not added until Python 2.5. That particular member didn't actually need to be long long, so I've fixed that. Great, I will try it after it appears on the web page. Oops! It's now pushed. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Understanding mgrid
Thanks for the response Robert. So, at least in this case, the results of mgrid (or indices) only provides information about the spacing of the grid and not on the absolute value of the point coordinates? In your example, is there a way to see within your x[ix], y[iy], and z[iz] matrices the same collection of points that you would see if you did something like the following? points=[] x=linspace(0,1,3) y=linspace(1,2.5,4) z=linspace(3,5,5) for k in z.tolist(): for j in y.tolist(): for i in x.tolist(): point=array([i,j,k]) points.append(point) Thanks, Brad On Fri, Sep 19, 2008 at 11:22 AM, Robert Kern [EMAIL PROTECTED] wrote: On Fri, Sep 19, 2008 at 12:59, Brad Malone [EMAIL PROTECTED] wrote: Hi, I was wondering if someone could englighten me on what the geometrical significance of numpy.mgrid is. I can play around with it and see trends in the sizes and number of arrays, but why does it give the output that it does? Looking at the example shown below, why does it return a matrix and its transpose? Well, it returns one array. In your example, there is a (2,5,5) array, which is basically the concatenation of two arrays which *happen* to be transposes of each other. If you had chosen differently sized axes, they wouldn't be transposes. In [14]: mgrid[0:2,0:3] Out[14]: array([[[0, 0, 0], [1, 1, 1]], [[0, 1, 2], [0, 1, 2]]]) Is this a representation of some geometrical grid? It can be. There are other uses for it. Does the output imply some sort of connectivity? It describes an orthogonal grid. If so, how do you see it? mgrid[0:5,0:5] array([[[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3], [4, 4, 4, 4, 4]], BLANKLINE [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]]) I have a cubic grid in 3D space that is spanned by 3 orthogonal vectors. Am I able to generate this equivalent grid with mgrid somehow? If so, how is it done? I am using mayavi and I need to be able to construct some arrays in the same way that mgrid would have constructed them, so this is why I ask. I would probably use indices() instead of mgrid if you are just given the x, y, and z vectors. indices([n,m,k]) is equivalent to mgrid[0:n,0:m,0:k]: In [19]: x = linspace(0, 1, 3) In [20]: x Out[20]: array([ 0. , 0.5, 1. ]) In [21]: y = linspace(1, 2.5, 4) In [22]: y Out[22]: array([ 1. , 1.5, 2. , 2.5]) In [23]: z = linspace(3, 5, 5) In [24]: z Out[24]: array([ 3. , 3.5, 4. , 4.5, 5. ]) In [25]: ix, iy, iz = indices([len(x), len(y), len(z)]) In [26]: x[ix] Out[26]: array([[[ 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ]], [[ 0.5, 0.5, 0.5, 0.5, 0.5], [ 0.5, 0.5, 0.5, 0.5, 0.5], [ 0.5, 0.5, 0.5, 0.5, 0.5], [ 0.5, 0.5, 0.5, 0.5, 0.5]], [[ 1. , 1. , 1. , 1. , 1. ], [ 1. , 1. , 1. , 1. , 1. ], [ 1. , 1. , 1. , 1. , 1. ], [ 1. , 1. , 1. , 1. , 1. ]]]) In [27]: y[iy] Out[27]: array([[[ 1. , 1. , 1. , 1. , 1. ], [ 1.5, 1.5, 1.5, 1.5, 1.5], [ 2. , 2. , 2. , 2. , 2. ], [ 2.5, 2.5, 2.5, 2.5, 2.5]], [[ 1. , 1. , 1. , 1. , 1. ], [ 1.5, 1.5, 1.5, 1.5, 1.5], [ 2. , 2. , 2. , 2. , 2. ], [ 2.5, 2.5, 2.5, 2.5, 2.5]], [[ 1. , 1. , 1. , 1. , 1. ], [ 1.5, 1.5, 1.5, 1.5, 1.5], [ 2. , 2. , 2. , 2. , 2. ], [ 2.5, 2.5, 2.5, 2.5, 2.5]]]) In [28]: z[iz] Out[28]: array([[[ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ]], [[ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ]], [[ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ], [ 3. , 3.5, 4. , 4.5, 5. ]]]) -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Understanding mgrid
On Fri, Sep 19, 2008 at 14:13, Brad Malone [EMAIL PROTECTED] wrote: Thanks for the response Robert. So, at least in this case, the results of mgrid (or indices) only provides information about the spacing of the grid and not on the absolute value of the point coordinates? No, they give indices. You can use those indices in a variety of ways. In my example, I used them to index into vectors which gave the absolute positions of the grid lines. That turned into bricks giving the absolute coordinates for each 3D grid point. In your example, is there a way to see within your x[ix], y[iy], and z[iz] matrices the same collection of points that you would see if you did something like the following? points=[] x=linspace(0,1,3) y=linspace(1,2.5,4) z=linspace(3,5,5) for k in z.tolist(): for j in y.tolist(): for i in x.tolist(): point=array([i,j,k]) points.append(point) points = column_stack([x[ix].flat, y[iy].flat, z[iz].flat]) -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.nan and ``is``
Andrew Dalke wrote: There are a few things that Python-the-language guarantees are singleton objects which can be compared correctly with is. Those are: True, False, None The empty tuple () and all interned strings are also guaranteed to be singletons. String interning is used to optimize code on C level. It's much faster to compare memory addresses than objects. All strings can be interned through the builtin function intern like s = intern(s). For Python 3.x the function was moved in the the sys module and changed to support str which are PyUnicode objects. So, back to NaN. There's no guarantee NaN is a singleton object, so testing with is almost certainly is wrong. In fact, at the bit-level there are multiple NaNs. A NaN (according to Wikipedia) fits the following bit pattern. NaN: xaxx. x = undefined. If a = 1, it is a quiet NaN, otherwise it is a signalling NaN. The definition is correct for all doubles on IEEE 754 aware platforms. Python's float type uses the double C type. Almost all modern computers have either hardware IEEE 754 support or software support for embedded devices (some mobile phones and PDAs). http://en.wikipedia.org/wiki/IEEE_754-1985 The Python core makes no difference between quiet NaNs and signaling NaNs. Only errno, input and output values are checked to raise an exception. We were discussion the possibility of a NaN singleton during our revamp of Python's IEEE 754 and math support for Python 2.6 and 3.0. But we decided against it because the extra code and cost wasn't worth the risks. Instead I added isnan() and isinf() to the math module. All checks for NaN, inf and the sign bit of a float must be made through the appropriate APIs - either the NumPy API or the new APIs for floats. Hope to shed some light on things Christian ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 11:46 AM Pierre GM apparently wrote: a.mask=True This is great, but is apparently new behavior as of NumPy 1.2? Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 16:28:34 Alan G Isaac wrote: On 9/19/2008 11:46 AM Pierre GM apparently wrote: a.mask=True This is great, but is apparently new behavior as of NumPy 1.2? I'm not sure, sorry. Another way is ma.array(np.empty(yourshape,yourdtype), mask=True) which should work with earlier versions. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 4:54 PM Pierre GM apparently wrote: Another way is ma.array(np.empty(yourshape,yourdtype), mask=True) which should work with earlier versions. Seems like ``mask`` would be a natural keyword for ``ma.empty``? Thanks, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 17:25:53 Alan G Isaac wrote: On 9/19/2008 4:54 PM Pierre GM apparently wrote: Another way is ma.array(np.empty(yourshape,yourdtype), mask=True) which should work with earlier versions. Seems like ``mask`` would be a natural keyword for ``ma.empty``? Not a bad idea. I'll plug that in. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] newb question
Hi, What am I doing wrong here? The reshape doesnt take. % cat test1.py import numpy as np a = np.uint8([39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231,]) a.reshape(3, 4, 3) print a = %r % (a) % % python test1.py a = array([ 39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231], dtype=uint8) I am expecting: a = array([[[39, 39, 231], [239, 39, 231], [39, 39, 231]], [[39, 39, 231], [239, 39, 231], [39, 39, 231]], [[39, 39, 231], [239, 39, 231], [39, 39, 231]], [[39, 39, 231], [239, 39, 231], [39, 39, 231]]], \ dtype=np.uint8) paul def vanderWalt(a, f): thanks Stefan RED, GRN, BLU = 0, 1, 2 bluemask = (a[...,BLU] f*a[...,GRN]) \ (a[...,BLU] f*a[...,RED]) return np.array(bluemask.nonzero()).swapaxes(0,1) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] newb question
paul taney wrote: Hi, What am I doing wrong here? The reshape doesnt take. Reshape does not act in place, it returns either a new view or a copy. To reshape in place, you can assign to the shape attribute: In [13]:a = np.arange(10) In [14]:a.shape = (2,5) In [15]:a Out[15]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) Eric % cat test1.py import numpy as np a = np.uint8([39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231,]) a.reshape(3, 4, 3) print a = %r % (a) % % python test1.py a = array([ 39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231, 39, 39, 231, 239, 39, 231, 39, 39, 231], dtype=uint8) I am expecting: a = array([[[39, 39, 231], [239, 39, 231], [39, 39, 231]], [[39, 39, 231], [239, 39, 231], [39, 39, 231]], [[39, 39, 231], [239, 39, 231], [39, 39, 231]], [[39, 39, 231], [239, 39, 231], [39, 39, 231]]], \ dtype=np.uint8) paul def vanderWalt(a, f): thanks Stefan RED, GRN, BLU = 0, 1, 2 bluemask = (a[...,BLU] f*a[...,GRN]) \ (a[...,BLU] f*a[...,RED]) return np.array(bluemask.nonzero()).swapaxes(0,1) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] newb question
On Friday 19 September 2008 20:47:12 paul taney wrote: Hi, What am I doing wrong here? The reshape doesnt take. help(reshape) a.reshape(shape, order='C') Returns an array containing the data of a, but with a new shape. Refer to `numpy.reshape` for full documentation. You see that you're not modifying in place. Instead, you should use a.shape = (3,4,3) Play with the tuple to find what you want -- (4,3,3) seems to meet your expectations. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.nan and ``is``
On Sep 19, 2008, at 10:04 PM, Christian Heimes wrote: Andrew Dalke wrote: There are a few things that Python-the-language guarantees are singleton objects which can be compared correctly with is. The empty tuple () and all interned strings are also guaranteed to be singletons. Where's the guarantee? As far as I know it's not part of Python-the-language, and I thought it was only an implementation detail of CPython. tupleobject.c says: PyTuple_Fini(void) { #if PyTuple_MAXSAVESIZE 0 /* empty tuples are used all over the place and applications may * rely on the fact that an empty tuple is a singleton. */ Py_XDECREF(free_list[0]); free_list[0] = NULL; (void)PyTuple_ClearFreeList(); #endif } but that doesn't hold under Jython 2.2a1: Jython 2.2a1 on java1.4.2_16 (JIT: null) Type copyright, credits or license for more information. () is () 0 1 is 1 1 String interning is used to optimize code on C level. It's much faster to compare memory addresses than objects. All strings can be interned through the builtin function intern like s = intern(s). For Python 3.x the function was moved in the the sys module and changed to support str which are PyUnicode objects. intern being listed in the documentation under http://docs.python.org/lib/non-essential-built-in-funcs.html 2.2 Non-essential Built-in Functions There are several built-in functions that are no longer essential to learn, know or use in modern Python programming. They have been kept here to maintain backwards compatibility with programs written for older versions of Python. Again, I think this is only an aspect of the CPython implementation. The Python core makes no difference between quiet NaNs and signaling NaNs. Based on my limited readings just now, it seems that that's the general consensus: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n965.htm Standard C only adopted Quiet NaNs. It did not adopt Signaling NaNs because it was believed that they are of too limited utility for the amount of work required. http://www.digitalmars.com/d/archives/digitalmars/D/ signaling_NaNs_and_quiet_NaNs_75844.html Signaling NaNs have fallen out of favor. No exceptions get raised for them. http://en.wikipedia.org/wiki/NaN There were questions about if signalling NaNs should continue to be required in the revised standard. In the end it appears they will be left in. We were discussion the possibility of a NaN singleton during our revamp of Python's IEEE 754 and math support for Python 2.6 and 3.0. But we decided against it because the extra code and cost wasn't worth the risks. Instead I added isnan() and isinf() to the math module. I couldn't find that thread. What are the advantages of converting all NaNs to a singleton? All I can come up with are disadvantages. BTW, another place to look is the Decimal module import decimal decimal.Decimal(nan) Decimal(NaN) Looking at the decimal docs now I see a canonical() method which The result has the same value as the operand but always uses a canonical encoding. The definition of canonical is implementation-defined; if more than one internal encoding for a given NaN, Infinity, or finite number is possible then one ‘preferred’ encoding is deemed canonical. This operation then returns the value using that preferred encoding. Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Stéfan van der Walt wrote: Why shouldn't we have nanmin-like behaviour for the C min itself? Ah, I was not arguing we should not do it in C, but rather we did not have to do in C. The current behavior for nan with functions relying on ordering is broken; if someone prefer fixing it in C, great. But I was guessing more people could fix it using python, that's all. I opened a bug for min/max and nan, this should be fixed for 1.3.0, maybe 1.2.1 too. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Fri, Sep 19, 2008 at 22:25, David Cournapeau [EMAIL PROTECTED] wrote: Stéfan van der Walt wrote: Why shouldn't we have nanmin-like behaviour for the C min itself? Ah, I was not arguing we should not do it in C, but rather we did not have to do in C. The current behavior for nan with functions relying on ordering is broken; if someone prefer fixing it in C, great. But I was guessing more people could fix it using python, that's all. How, exactly? ndarray.min() is the where the implementation is. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Robert Kern wrote: On Fri, Sep 19, 2008 at 22:25, David Cournapeau [EMAIL PROTECTED] wrote: How, exactly? ndarray.min() is the where the implementation is. Ah, I keep forgetting those are implemented in the array object, sorry for that. Now I understand Stefan point. Do I understand correctly that we should then do: - implement a min/max NaN aware for every float type (real and complex) in umathmodule.c, which ignores nan (called @[EMAIL PROTECTED], etc...) - fix the current min/max to propagate NaN instead of giving broken result - How to do the dispatching ? Having PyArray_Min and PyArray_NanMin sounds the easiest (we don't change any C api, only add an argument to the python-callable function min, in array_min method ?) Or am I missing something ? If this is the right way to fix it I am willing to do it (we still have to agree on the default behavior first). I am not really familiar with sort module, but maybe it is really similar to min/max case. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 David Cournapeau [EMAIL PROTECTED]: I guess my formulation was poor: I never use NaN as missing values because I never use missing values, which is why I wanted the opinion of people who use NaN in a different manner (because I don't have a good idea on how those people would like to see numpy behave). I was certainly not arguing they should not be use for the purpose of missing value. I, on the other hand, was making specifically that suggestion: users should not use nans to indicate missing values. Users should use masked arrays to indicate missing values. The problem with NaN is that you cannot mix the missing value behavior and the error behavior. Dealing with them in a consistent manner is difficult. Because numpy is a general numerical computation tool, I think that NaN should be propagated and never ignored *by default*. If you have NaN because of divide by 0, etc... it should not be ignored at all. But if you want it to ignore, then numpy should make it possible: - max, min: should return NaN if NaN is in the array, or maybe even fail ? - argmax, argmin ? - sort: should fail ? - mean, std, variance: should return Nan - median: should fail (to be consistent if sort fails) ? Should return NaN ? This part I pretty much agree with. We could then add an argument to failing functions to tell them either to ignore NaN/put them at some special location (like R does, for example). The ones I am not sure are median and argmax/argmin. For median, failing when sort does is consistent; but this can break a lot of code. For argmin/argmax, failing is the most logical, but OTOH, making argmin/argmax failing and not max/min is not consistent either. Breaking the code is maybe not that bad because currently, neither max/min nor argmax/argmin nor sort does return a meaningful function. Does that sound reasonable to you ? The problem with this approach is that all those decisions need to be made and all that code needs to be implemented for masked arrays. In fact I suspect that it has already been done in that case. So really what you are suggesting here is that we duplicate all this effort to implement the same functions for nans as we have for masked arrays. It's important, too, that the masked array implementation and the nan implementation behave the same way, or users will become badly confused. Who gets the task of keeping the two implementations in sync? The current situation is that numpy has two ways to indicate bad data for floating-point arrays: nans and masked arrays. We can't get rid of either: nans appear on their own, and masked arrays are the only way to mark bad data in non-floating-point arrays. We can try to make them behave the same, which will be a lot of work to provide redundant capabilities. Or we can make them behave drastically differently. Masked arrays clearly need to be able to handle masked values flexibly and explicitly. So I think nans should be handled simply and conservatively: propagate them if possible, raise if not. If users are concerned about performance, it's worth noting that on some machines nans force a fallback to software floating-point handling, with a corresponding very large performance hit. This includes some but not all x86 (and I think x86-64) CPUs. How this compares to the performance of masked arrays is not clear. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] NEW GENERATED DLL ERROR FOUND WITHIN f2PY.py
To All, I have now been able to generate a .pyd file from a FORTRAN file that I am trying to interface with python. I was able to execute this with an additional insight into how f2py operates. It seems as though the documentation requires an upgrade, since there appears to be missing information that might misdirect a f2py newcomer, such as myself. However, I am now facing the following new error: ImportError: DLL load with error code 193 The python script is as follows: import hello print hello.__doc__ print hello.foo.__doc__ hello.foo(4) The Fortran code is as follows: ! -*- f90 -*- subroutine foo(a) integer a print*, Hello from Fortran! print*, a=, a end I was wondering as to what I should now try in order to finally produce a python sending and receiving information from a FORTRAN .pyd file. Any Suggestions??? Do I have to recompile Python with mingw32 in order to finally resolve this issue?? Thanks, David Blubaugh This e-mail transmission contains information that is confidential and may be privileged. It is intended only for the addressee(s) named above. If you receive this e-mail in error, please do not read, copy or disseminate it in any manner. If you are not the intended recipient, any disclosure, copying, distribution or use of the contents of this information is prohibited. Please reply to the message immediately by informing the sender that the message was misdirected. After replying, please erase it from your computer system. Your assistance in correcting this error is appreciated. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Anne Archibald wrote: I, on the other hand, was making specifically that suggestion: users should not use nans to indicate missing values. Users should use masked arrays to indicate missing values. I agree it is the nicest solution in theory, but I think it is impractical (as mentioned by Eric Firing in his email). This part I pretty much agree with. I can't really see which one is better (failing or returning NaN for sort/min/max and their sort counterpat), or if we should let the choice be left to the user. I am fine with both, and they both require the same amount of work. Or we can make them behave drastically differently. Masked arrays clearly need to be able to handle masked values flexibly and explicitly. So I think nans should be handled simply and conservatively: propagate them if possible, raise if not. I agree about this behavior being the default. I just think that for a couple of functions, we could we give either separate functions, or additional arguments to existing functions to ignore them: I am thinking about min/max/sort and their arg* counterpart, because those are really basic, and because we already have nanmean/nanstd/nanmedian (e.g. having a nansort would help for nanmean to be much faster). If users are concerned about performance, it's worth noting that on some machines nans force a fallback to software floating-point handling, with a corresponding very large performance hit. I was more concerned with the cost of treating NaN when you do not have NaN in your array when you have to treat for NaN explicitely (everything involving comparison). But I don't see any obvious way to avoid that cost, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion