Re: [Numpy-discussion] Efficient removal of duplicates
Thanks Daran, that works like a charm! Hanno On Tue, Dec 16, 2008, Daran Rife dr...@ucar.edu said: Whoops! A hasty cut-and-paste from my IDLE session. This should read: import numpy as np a = [(x0,y0), (x1,y1), ...] # A numpy array, but could be a list l = a.tolist() l.sort() unique = [x for i, x in enumerate(l) if not i or x != l[i-1]] # a_unique = np.asarray(unique) Daran -- On Dec 15, 2008, at 5:24 PM, Daran Rife wrote: How about a solution inspired by recipe 18.1 in the Python Cookbook, 2nd Ed: import numpy as np a = [(x0,y0), (x1,y1), ...] l = a.tolist() l.sort() unique = [x for i, x in enumerate(l) if not i or x != b[l-1]] a_unique = np.asarray(unique) Performance of this approach should be highly scalable. Daran -- Hi, I the following problem: I have a relatively long array of points [(x0,y0), (x1,y1), ...]. Apparently, I have some duplicate entries, which prevents the Delaunay triangulation algorithm from completing its task. Question, is there an efficent way, of getting rid of the duplicate entries? All I can think of involves loops. Thanks and regards, Hanno ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Hanno Klemm kl...@phys.ethz.ch ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] finding docs (was: unique1d docs)
On 12/16/2008 1:29 AM Jarrod Millman apparently wrote: Yes. Please don't start new moin wiki documentation. We have a good solution for documentation that didn't exist when the moin documentation was started. Either put new docs in the docstrings or in the scipy tutorial. OK, in this case I think the main NumPy needs a change: an explicit link to the new docs, and a section titled Documentation (linked in the contents), and an explict link to the new Numpy Reference Guide. As far as I can tell, I have no way to edit this page: http://numpy.scipy.org/ Imagine a new user looking for docs. This is what I think they would do. 1. Use `numpy` as a browser search term, and get directed to http://numpy.scipy.org/ 2. Notice no Docmentation link in contents. *Maybe* notice that Download the Guide means get some documentation, but probably that is more detailed and encyclopedic than many are first seeking. 3. Perhaps they will read the text and get pointed to the Numeric docs. Nothing will point them to the new docs. They may notice Other Documentation is available at the scipy website but if they follow that, will they guess that they should try a snapshot of a work in progress? Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Efficient removal of duplicates
There was an discussion about this on the c.l.p a while ago. Using a sort will scale like O(n log n) or worse, whereas using a set (hash table) will scale like amortized O(n). How to use a Python set to get a unique collection of objects I'll leave to your imagination. Sturla Molden On Mon, Dec 15, 2008 at 18:24, Daran Rife dr...@ucar.edu wrote: How about a solution inspired by recipe 18.1 in the Python Cookbook, 2nd Ed: import numpy as np a = [(x0,y0), (x1,y1), ...] l = a.tolist() l.sort() unique = [x for i, x in enumerate(l) if not i or x != b[l-1]] a_unique = np.asarray(unique) Performance of this approach should be highly scalable. That basic idea is what unique1d() does; however, it uses numpy primitives to keep the heavy lifting in C instead of Python. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Unexpected MaskedArray behavior
Hi, I just noticed the following and I was kind of surprised: a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False]) b = a*5 b masked_array(data = [5 -- -- 20 25], mask = [False True True False False], fill_value=99) b.data array([ 5, 10, 15, 20, 25]) I was expecting that the underlying data wouldn't get modified while masked. Is this actual behavior expected? Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] error compiling umathmodule.c (numpy 1.3) on 64-bit windows xp
Hi, I found this earlier dialog about refactoring umathmodule.c (see bottom) where David mentioned it wasn't tested on 64-bit Windows. I tried compiling numpy 1.3.0.dev6118 on both a 32-bit and 64-bit Windows for Python 2.6.1 with VS 9.0, and not surprisingly, it worked on 32-bit but not on 64-bit: the compiler returned a non-specific Internal Compiler Error when working on umathmodule.c: .. building 'numpy.core.umath' extension compiling C sources creating build\temp.win-amd64-2.6\Release\build creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6 creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy\core creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy\core\src D:\Program Files\Microsoft Visual Studio 9.0\VC\BIN\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ibuild\src.win-amd64-2.6\numpy\core\src -Inumpy\core\include -Ibuild\src.win-amd64-2.6\numpy\core\include/numpy -Inumpy\core\src -Inumpy\core\include -ID:\Python26\include -ID:\Python26\PC /Tcbuild\src.win-amd64-2.6\numpy\core\src\umathmodule.c /Fobuild\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy\core\src\umathmodule.obj umathmodule.c numpy\core\src\umath_funcs_c99.inc.src(140) : warning C4273: '_hypot' : inconsistent dll linkage D:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\math.h(139) : see previous definition of '_hypot' numpy\core\src\umath_funcs_c99.inc.src(341) : warning C4273: 'sinf' : inconsistent dll linkage Internal Compiler Error in D:\Program Files\Microsoft Visual Studio 9.0\VC\BIN\amd64\cl.exe. You will be prompted to send an error report to Microsoft later. Any idea what's going on? I'd like to volunteer to test compiling numpy on 64-bit Windows system since I have a VS 2008 professional edition installed. Thanks! --lin 2008/10/5 David Cournapeau da...@ar.media.kyoto-u.ac.jp: .. #ifndef HAVE_FREXPF static float frexpf(float x, int * i) { return (float)frexp((double)(x), i); } #endif #ifndef HAVE_LDEXPF static float ldexpf(float x, int i) { return (float)ldexp((double)(x), i); } #endif At the time I had tried to send further output following a checkout, but couldn't get it to post to the list, I think the message was too big or something. I will probably be having a go with 1.2.0, when I get some time. I'll let you know how it goes. I did some heavy refactoring for the above problems, and it should be now easier to handle (in the trunk). I could build 1.2.0 with VS 2008 express on 32 bits (wo blas/lapack), and there are some test errors - albeit relatively minor at first sight. I have not tried on 64 bits. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genloadtxt : last call
Pierre GM wrote: All, Here's the latest version of genloadtxt, with some recent corrections. With just a couple of tweaking, we end up with some decent speed: it's still slower than np.loadtxt, but only 15% so according to the test at the end of the package. I have one more use issue that you may or may not want to fix. My problem is that missing values are specified by their string representation, so that a string representing a missing value, while having the same actual numeric value, may not compare equal when represented as a string. For instance, if you specify that -999.0 represents a missing value, but the value written to the file is -999.00, you won't end up masking the -999.00 data point. I'm sure a test case will help here: def test_withmissing_float(self): data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00') test = mloadtxt(data, dtype=None, delimiter=',', missing='-999.0', names=True) control = ma.array([(0, 1.5), (2, -1.)], mask=[(False, False), (False, True)], dtype=[('A', np.int), ('B', np.float)]) print control print test assert_equal(test, control) assert_equal(test.mask, control.mask) Right now this fails with the latest version of genloadtxt. I've worked around this by specifying a whole bunch of string representations of the values, but I wasn't sure if you knew of a better way that this could be handled within genloadtxt. I can only think of two ways, though I'm not thrilled with either: 1) Call the converter on the string form of the missing value and compare against the converted value from the file to determine if missing. (Probably very slow) 2) Add a list of objects (ints, floats, etc.) to compare against after conversion to determine if they're missing. This might needlessly complicate the function, which I know you've already taken pains to optimize. If there's no good way to do it, I'm content to live with a workaround. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genloadtxt : last call
Ryan, OK, I'll look into that. I won't have time to address it before this next week, however. Option #2 looks like the best. In other news, I was considering renaming genloadtxt to genfromtxt, and using ndfromtxt, mafromtxt, recfromtxt, recfromcsv for the function names. That way, loadtxt is untouched. On Dec 16, 2008, at 6:07 PM, Ryan May wrote: Pierre GM wrote: All, Here's the latest version of genloadtxt, with some recent corrections. With just a couple of tweaking, we end up with some decent speed: it's still slower than np.loadtxt, but only 15% so according to the test at the end of the package. I have one more use issue that you may or may not want to fix. My problem is that missing values are specified by their string representation, so that a string representing a missing value, while having the same actual numeric value, may not compare equal when represented as a string. For instance, if you specify that -999.0 represents a missing value, but the value written to the file is -999.00, you won't end up masking the -999.00 data point. I'm sure a test case will help here: def test_withmissing_float(self): data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00') test = mloadtxt(data, dtype=None, delimiter=',', missing='-999.0', names=True) control = ma.array([(0, 1.5), (2, -1.)], mask=[(False, False), (False, True)], dtype=[('A', np.int), ('B', np.float)]) print control print test assert_equal(test, control) assert_equal(test.mask, control.mask) Right now this fails with the latest version of genloadtxt. I've worked around this by specifying a whole bunch of string representations of the values, but I wasn't sure if you knew of a better way that this could be handled within genloadtxt. I can only think of two ways, though I'm not thrilled with either: 1) Call the converter on the string form of the missing value and compare against the converted value from the file to determine if missing. (Probably very slow) 2) Add a list of objects (ints, floats, etc.) to compare against after conversion to determine if they're missing. This might needlessly complicate the function, which I know you've already taken pains to optimize. If there's no good way to do it, I'm content to live with a workaround. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unexpected MaskedArray behavior
On Dec 16, 2008, at 1:57 PM, Ryan May wrote: I just noticed the following and I was kind of surprised: a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False]) b = a*5 b masked_array(data = [5 -- -- 20 25], mask = [False True True False False], fill_value=99) b.data array([ 5, 10, 15, 20, 25]) I was expecting that the underlying data wouldn't get modified while masked. Is this actual behavior expected? Meh. Masked data shouldn't be trusted anyway, so I guess it doesn't really matter one way or the other. But I tend to agree, it'd make more sense leave masked data untouched (or at least, reset them to their original value after the operation), which would mimic the behavior of gimp/photoshop. Looks like there's a relatively easy fix. I need time to check whether it doesn't break anything elsewhere, nor that it slows things down too much. I won't have time to test all that before next week, though. In any case, that would be for 1.3.x, not for 1.2.x. In the meantime, if you need the functionality, use something like ma.where(a.mask,a,a*5) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] error compiling umathmodule.c (numpy 1.3) on 64-bit windows xp
On Wed, Dec 17, 2008 at 5:09 AM, Lin Shao s...@msg.ucsf.edu wrote: Hi, I found this earlier dialog about refactoring umathmodule.c (see bottom) where David mentioned it wasn't tested on 64-bit Windows. I tried compiling numpy 1.3.0.dev6118 on both a 32-bit and 64-bit Windows for Python 2.6.1 with VS 9.0, and not surprisingly, it worked on 32-bit but not on 64-bit: the compiler returned a non-specific Internal Compiler Error when working on umathmodule.c: It is a bug in VS, but the problem is caused by buggy code in numpy, so this can be avoided. Incidentally, I was working on it yesterday, but went to bed before having fixed everything :) David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Recent umath changes
Hi, There have been some changes recently in the umath code, which breaks windows 64 compilation - and I don't understand their rationale either. I have myself spent quite a good deal of time to make sure this works on many platforms/toolchains, by fixing the config distutils command and that platform specificities are contained in a very localized part of the code. It may not be very well documented (see below), but may I ask that next time someone wants to change file file, people ask for review before putting it directly in the trunk ? thanks, David How to deal with platform oddities: --- Basically, the code to replace missing C99 math funcs is, for an hypothetical double foo(double) function: #ifndef HAVE_FOO #udnef foo static double npy_foo(double a) { // define a npy_foo function with the same requirements as C99 foo } #define npy_foo foo #else double foo(double); #endif I think this code is wrong on several accounts: - we should not undef foo if foo is available: if foo is available at that point, it is a bug in the configuration, and should not be dealt in the code. Some cases may be complicated (IEEE754-related macro which are sometimes macro, something functions, etc...), but that should be dealt in very narrow cases. - we should not declare our own function: function declaration is not portable, and varies among OS/toolchains. Some toolchains use intrinsic, some non standard inline mechanism, etc... which can crash the resulting binary because there is a discrepency between our code calling conventions and the library convention. The reported problem with VS compiler on amd64 is caused by this exact problem. Unless there is a strong rationale otherwise, I would like that we follow how autoconfed projects do. They have long experience on dealing with platforms idiosyncrasies, and the above method is not the one they follow. They follow the simple: #ifnfdef HAVE_FOO //define foo #endif And deal with platform oddities in the *configuration* code instead of directly in the code. That really makes my life easier when I deal with windows compilers, which are already painful enough to deal with as it is. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Recent umath changes
On Tue, Dec 16, 2008 at 8:59 PM, David Cournapeau da...@ar.media.kyoto-u.ac.jp wrote: Hi, There have been some changes recently in the umath code, which breaks windows 64 compilation - and I don't understand their rationale either. I have myself spent quite a good deal of time to make sure this works on many platforms/toolchains, by fixing the config distutils command and that platform specificities are contained in a very localized part of the code. It may not be very well documented (see below), but may I ask that next time someone wants to change file file, people ask for review before putting it directly in the trunk ? thanks, David How to deal with platform oddities: --- Basically, the code to replace missing C99 math funcs is, for an hypothetical double foo(double) function: #ifndef HAVE_FOO #udnef foo static double npy_foo(double a) { // define a npy_foo function with the same requirements as C99 foo } #define npy_foo foo #else double foo(double); #endif I think this code is wrong on several accounts: - we should not undef foo if foo is available: if foo is available at that point, it is a bug in the configuration, and should not be dealt in the code. Some cases may be complicated (IEEE754-related macro which are sometimes macro, something functions, etc...), but that should be dealt in very narrow cases. - we should not declare our own function: function declaration is not portable, and varies among OS/toolchains. Some toolchains use intrinsic, some non standard inline mechanism, etc... which can crash the resulting binary because there is a discrepency between our code calling conventions and the library convention. The reported problem with VS compiler on amd64 is caused by this exact problem. Unless there is a strong rationale otherwise, I would like that we follow how autoconfed projects do. They have long experience on dealing with platforms idiosyncrasies, and the above method is not the one they follow. They follow the simple: Yes, the rational was to fix compilation on windows 64 with msvc and etch on SPARC, both of which were working after the changes. You are, of course, free to break these builds again. However, I designated space at the top of the file for compiler/distro specific defines, I think you should use them, there is a reason other folks do. The macro undef could be moved but I preferred to generate an error if there was a conflict with the the standard c function prototypes. We can't use inline code for these functions as they are passed to the generic loops as function pointers. I assume compilers have some way of recognizing this case and perhaps generating function code on the fly. If so, we need to figure out how to detect that. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Recent umath changes
Charles R Harris wrote: On Tue, Dec 16, 2008 at 8:59 PM, David Cournapeau da...@ar.media.kyoto-u.ac.jp mailto:da...@ar.media.kyoto-u.ac.jp wrote: Hi, There have been some changes recently in the umath code, which breaks windows 64 compilation - and I don't understand their rationale either. I have myself spent quite a good deal of time to make sure this works on many platforms/toolchains, by fixing the config distutils command and that platform specificities are contained in a very localized part of the code. It may not be very well documented (see below), but may I ask that next time someone wants to change file file, people ask for review before putting it directly in the trunk ? thanks, David How to deal with platform oddities: --- Basically, the code to replace missing C99 math funcs is, for an hypothetical double foo(double) function: #ifndef HAVE_FOO #udnef foo static double npy_foo(double a) { // define a npy_foo function with the same requirements as C99 foo } #define npy_foo foo #else double foo(double); #endif I think this code is wrong on several accounts: - we should not undef foo if foo is available: if foo is available at that point, it is a bug in the configuration, and should not be dealt in the code. Some cases may be complicated (IEEE754-related macro which are sometimes macro, something functions, etc...), but that should be dealt in very narrow cases. - we should not declare our own function: function declaration is not portable, and varies among OS/toolchains. Some toolchains use intrinsic, some non standard inline mechanism, etc... which can crash the resulting binary because there is a discrepency between our code calling conventions and the library convention. The reported problem with VS compiler on amd64 is caused by this exact problem. Unless there is a strong rationale otherwise, I would like that we follow how autoconfed projects do. They have long experience on dealing with platforms idiosyncrasies, and the above method is not the one they follow. They follow the simple: Yes, the rational was to fix compilation on windows 64 with msvc and etch on SPARC, both of which were working after the changes. It does not work at the moment on windows at least :) But more essentially, I don't see why you declared those functions: can you explain me what was your intention, because I don't understand the rationale. You are, of course, free to break these builds again. However, I designated space at the top of the file for compiler/distro specific defines, I think you should use them, there is a reason other folks do. The problem is two folds: - by declaring functions everywhere in the code, you are effectively spreading toolchain specific oddities in the whole source file. This is not good, IMHO: those should be detected at configuration stage, and dealt in the source code using those informations. That's how every autoconf project does it. If a function is actually a macro, this should be detected at configuration. - declarations are toolchain specific; it is actually worse, it even depends on the compiler flags. It is at least the case with MS compilers. So there is no way to guarantee that your declaration matches the math runtime one (the compiler crash reported is exactly caused by this). The macro undef could be moved but I preferred to generate an error if there was a conflict with the the standard c function prototypes. We can't use inline code for these functions as they are passed to the generic loops as function pointers. Yes, I believe this is another problem when declaring function: if we use say cosl, and cosl is an inline function in the runtime, by re-declaring it, you are telling the compiler that it is not inline anymore, so the compiler does not know anymore you can't take the address of cosl, unless it detects the mismatch between the runtime declaration and ours, and considers it as an error (I am not sure whether this is always an error with MS compilers; it may only be a warning on some versions - it is certainly not dealt in a gracious manner every time, since the linker crashes in some cases). David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion