Re: [Numpy-discussion] Efficient removal of duplicates

2008-12-16 Thread Hanno Klemm

Thanks Daran,

that works like a charm!

Hanno

On Tue, Dec 16, 2008, Daran Rife dr...@ucar.edu said:

 Whoops! A hasty cut-and-paste from my IDLE session.
 This should read:
 
 import numpy as np
 
 a = [(x0,y0), (x1,y1), ...] # A numpy array, but could be a list
 l = a.tolist()
 l.sort()
 unique = [x for i, x in enumerate(l) if not i or x != l[i-1]] # 
 a_unique = np.asarray(unique)
 
 
 Daran
 
 --
 
 On Dec 15, 2008, at 5:24 PM, Daran Rife wrote:
 
 How about a solution inspired by recipe 18.1 in the Python Cookbook,
 2nd Ed:

 import numpy as np

 a = [(x0,y0), (x1,y1), ...]
 l = a.tolist()
 l.sort()
 unique = [x for i, x in enumerate(l) if not i or x != b[l-1]]
 a_unique = np.asarray(unique)

 Performance of this approach should be highly scalable.

 Daran

 --


 Hi,

 I the following problem: I have a relatively long array of points
 [(x0,y0), (x1,y1), ...]. Apparently, I have some duplicate entries,  
 which
 prevents the Delaunay triangulation algorithm from completing its  
 task.

 Question, is there an efficent way, of getting rid of the duplicate
 entries?
 All I can think of involves loops.

 Thanks and regards,
 Hanno
 
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
 

-- 
Hanno Klemm
kl...@phys.ethz.ch


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] finding docs (was: unique1d docs)

2008-12-16 Thread Alan G Isaac
On 12/16/2008 1:29 AM Jarrod Millman apparently wrote:
 Yes.  Please don't start new moin wiki documentation.  We have a good
 solution for documentation that didn't exist when the moin
 documentation was started.  Either put new docs in the docstrings or
 in the scipy tutorial.


OK, in this case I think the main NumPy needs a change:
an explicit link to the new docs,
and a section titled Documentation (linked in the contents),
and an explict link to the new Numpy Reference Guide.

As far as I can tell, I have no way to edit this page:
http://numpy.scipy.org/

Imagine a new user looking for docs.
This is what I think they would do.

1. Use `numpy` as a browser search term,
and get directed to http://numpy.scipy.org/

2. Notice no Docmentation link in contents.
*Maybe* notice that Download the Guide means
get some documentation, but probably that is
more detailed and encyclopedic than many are
first seeking.

3. Perhaps they will read the text and get pointed
to the Numeric docs.

Nothing will point them to the new docs. They
may notice Other Documentation is available
at the scipy website but if they follow that,
will they guess that they should try a snapshot
of a work in progress?

Alan Isaac

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Efficient removal of duplicates

2008-12-16 Thread Sturla Molden

There was an discussion about this on the c.l.p a while ago. Using a sort
will scale like O(n log n) or worse, whereas using a set (hash table) will
scale like amortized O(n). How to use a Python set to get a unique
collection of objects I'll leave to your imagination.

Sturla Molden

 On Mon, Dec 15, 2008 at 18:24, Daran Rife dr...@ucar.edu wrote:
 How about a solution inspired by recipe 18.1 in the Python Cookbook,
 2nd Ed:

 import numpy as np

 a = [(x0,y0), (x1,y1), ...]
 l = a.tolist()
 l.sort()
 unique = [x for i, x in enumerate(l) if not i or x != b[l-1]]
 a_unique = np.asarray(unique)

 Performance of this approach should be highly scalable.

 That basic idea is what unique1d() does; however, it uses numpy
 primitives to keep the heavy lifting in C instead of Python.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
   -- Umberto Eco
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Unexpected MaskedArray behavior

2008-12-16 Thread Ryan May
Hi,

I just noticed the following and I was kind of surprised:

 a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False])
 b = a*5
  b
masked_array(data = [5 -- -- 20 25],
   mask = [False  True  True False False],
   fill_value=99)
  b.data
array([ 5, 10, 15, 20, 25])

I was expecting that the underlying data wouldn't get modified while masked.  
Is 
this actual behavior expected?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] error compiling umathmodule.c (numpy 1.3) on 64-bit windows xp

2008-12-16 Thread Lin Shao
Hi,

I found this earlier dialog about refactoring umathmodule.c (see
bottom) where David mentioned it wasn't tested on 64-bit Windows.

I tried compiling numpy 1.3.0.dev6118 on both a 32-bit and 64-bit
Windows for Python 2.6.1 with VS 9.0, and not surprisingly, it worked
on 32-bit but not on 64-bit: the compiler returned a non-specific
Internal Compiler Error when working on umathmodule.c:

..
building 'numpy.core.umath' extension
compiling C sources
creating build\temp.win-amd64-2.6\Release\build
creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6
creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy
creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy\core
creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy\core\src
D:\Program Files\Microsoft Visual Studio 9.0\VC\BIN\amd64\cl.exe /c
/nologo /Ox /MD /W3 /GS- /DNDEBUG
-Ibuild\src.win-amd64-2.6\numpy\core\src -Inumpy\core\include
-Ibuild\src.win-amd64-2.6\numpy\core\include/numpy -Inumpy\core\src
-Inumpy\core\include -ID:\Python26\include -ID:\Python26\PC
/Tcbuild\src.win-amd64-2.6\numpy\core\src\umathmodule.c
/Fobuild\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy\core\src\umathmodule.obj
umathmodule.c
numpy\core\src\umath_funcs_c99.inc.src(140) : warning C4273: '_hypot'
: inconsistent dll linkage
   D:\Program Files\Microsoft Visual Studio
9.0\VC\INCLUDE\math.h(139) : see previous definition of '_hypot'
numpy\core\src\umath_funcs_c99.inc.src(341) : warning C4273: 'sinf' :
inconsistent dll linkage
Internal Compiler Error in D:\Program Files\Microsoft Visual Studio
9.0\VC\BIN\amd64\cl.exe.  You will be prompted to send an error report
to Microsoft later.


Any idea what's going on? I'd like to volunteer to test compiling
numpy on 64-bit Windows system since I have a VS 2008 professional
edition installed.

Thanks!

--lin


2008/10/5 David Cournapeau da...@ar.media.kyoto-u.ac.jp:
 ..

 
  #ifndef HAVE_FREXPF
  static float frexpf(float x, int * i)
  {
  return (float)frexp((double)(x), i);
  }
  #endif
  #ifndef HAVE_LDEXPF
  static float ldexpf(float x, int i)
  {
  return (float)ldexp((double)(x), i);
  }
  #endif

 At the time I had tried to send further output following a checkout,
 but couldn't get it to post to the list, I think the message was too
 big or something. I will probably be having a go with 1.2.0, when I
 get some time. I'll let you know how it goes.

 I did some heavy refactoring for the above problems, and it should be
 now easier to handle (in the trunk). I could build 1.2.0 with VS 2008
 express on 32 bits (wo blas/lapack), and there are some test errors -
 albeit relatively minor at first sight. I have not tried on 64 bits.

 cheers,

 David
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt : last call

2008-12-16 Thread Ryan May
Pierre GM wrote:
 All,
 Here's the latest version of genloadtxt, with some recent corrections. 
 With just a couple of tweaking, we end up with some decent speed: it's 
 still slower than np.loadtxt, but only 15% so according to the test at 
 the end of the package.

I have one more use issue that you may or may not want to fix. My problem is 
that 
missing values are specified by their string representation, so that a string 
representing a missing value, while having the same actual numeric value, may 
not 
compare equal when represented as a string.  For instance, if you specify that 
-999.0 represents a missing value, but the value written to the file is 
-999.00, 
you won't end up masking the -999.00 data point.  I'm sure a test case will 
help 
here:

 def test_withmissing_float(self):
 data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00')
 test = mloadtxt(data, dtype=None, delimiter=',', missing='-999.0',
 names=True)
 control = ma.array([(0, 1.5), (2, -1.)],
mask=[(False, False), (False, True)],
dtype=[('A', np.int), ('B', np.float)])
 print control
 print test
 assert_equal(test, control)
 assert_equal(test.mask, control.mask)

Right now this fails with the latest version of genloadtxt.  I've worked around 
this by specifying a whole bunch of string representations of the values, but I 
wasn't sure if you knew of a better way that this could be handled within 
genloadtxt.  I can only think of two ways, though I'm not thrilled with either:

1) Call the converter on the string form of the missing value and compare 
against 
the converted value from the file to determine if missing. (Probably very slow)

2) Add a list of objects (ints, floats, etc.) to compare against after 
conversion 
to determine if they're missing. This might needlessly complicate the function, 
which I know you've already taken pains to optimize.

If there's no good way to do it, I'm content to live with a workaround.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt : last call

2008-12-16 Thread Pierre GM
Ryan,
OK, I'll look into that. I won't have time to address it before this  
next week, however. Option #2 looks like the best.

In other news, I was considering renaming genloadtxt to genfromtxt,  
and using ndfromtxt, mafromtxt, recfromtxt, recfromcsv for the  
function names. That way, loadtxt is untouched.



On Dec 16, 2008, at 6:07 PM, Ryan May wrote:

 Pierre GM wrote:
 All,
 Here's the latest version of genloadtxt, with some recent  
 corrections.
 With just a couple of tweaking, we end up with some decent speed:  
 it's
 still slower than np.loadtxt, but only 15% so according to the test  
 at
 the end of the package.

 I have one more use issue that you may or may not want to fix. My  
 problem is that
 missing values are specified by their string representation, so  
 that a string
 representing a missing value, while having the same actual numeric  
 value, may not
 compare equal when represented as a string.  For instance, if you  
 specify that
 -999.0 represents a missing value, but the value written to the file  
 is -999.00,
 you won't end up masking the -999.00 data point.  I'm sure a test  
 case will help
 here:

 def test_withmissing_float(self):
 data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00')
 test = mloadtxt(data, dtype=None, delimiter=',',  
 missing='-999.0',
 names=True)
 control = ma.array([(0, 1.5), (2, -1.)],
mask=[(False, False), (False, True)],
dtype=[('A', np.int), ('B', np.float)])
 print control
 print test
 assert_equal(test, control)
 assert_equal(test.mask, control.mask)

 Right now this fails with the latest version of genloadtxt.  I've  
 worked around
 this by specifying a whole bunch of string representations of the  
 values, but I
 wasn't sure if you knew of a better way that this could be handled  
 within
 genloadtxt.  I can only think of two ways, though I'm not thrilled  
 with either:

 1) Call the converter on the string form of the missing value and  
 compare against
 the converted value from the file to determine if missing. (Probably  
 very slow)

 2) Add a list of objects (ints, floats, etc.) to compare against  
 after conversion
 to determine if they're missing. This might needlessly complicate  
 the function,
 which I know you've already taken pains to optimize.

 If there's no good way to do it, I'm content to live with a  
 workaround.

 Ryan

 -- 
 Ryan May
 Graduate Research Assistant
 School of Meteorology
 University of Oklahoma
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unexpected MaskedArray behavior

2008-12-16 Thread Pierre GM

On Dec 16, 2008, at 1:57 PM, Ryan May wrote:

 I just noticed the following and I was kind of surprised:

 a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False])
 b = a*5
 b
 masked_array(data = [5 -- -- 20 25],
   mask = [False  True  True False False],
   fill_value=99)
 b.data
 array([ 5, 10, 15, 20, 25])

 I was expecting that the underlying data wouldn't get modified while  
 masked.  Is
 this actual behavior expected?

Meh. Masked data shouldn't be trusted anyway, so I guess it doesn't  
really matter one way or the other.
But I tend to agree, it'd make more sense leave masked data untouched  
(or at least, reset them to their original value after the operation),  
which would mimic the behavior of gimp/photoshop.
Looks like there's a relatively easy fix. I need time to check whether  
it doesn't break anything elsewhere, nor that it slows things down too  
much. I won't have time to test all that before next week, though. In  
any case, that would be for 1.3.x, not for 1.2.x.
In the meantime, if you need the functionality, use something like
ma.where(a.mask,a,a*5)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] error compiling umathmodule.c (numpy 1.3) on 64-bit windows xp

2008-12-16 Thread David Cournapeau
On Wed, Dec 17, 2008 at 5:09 AM, Lin Shao s...@msg.ucsf.edu wrote:
 Hi,

 I found this earlier dialog about refactoring umathmodule.c (see
 bottom) where David mentioned it wasn't tested on 64-bit Windows.

 I tried compiling numpy 1.3.0.dev6118 on both a 32-bit and 64-bit
 Windows for Python 2.6.1 with VS 9.0, and not surprisingly, it worked
 on 32-bit but not on 64-bit: the compiler returned a non-specific
 Internal Compiler Error when working on umathmodule.c:

It is a bug in VS, but the problem is caused by buggy code in numpy,
so this can be avoided. Incidentally, I was working on it yesterday,
but went to bed before having fixed everything :)

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Recent umath changes

2008-12-16 Thread David Cournapeau
Hi,

There have been some changes recently in the umath code, which
breaks windows 64 compilation - and I don't understand their rationale
either. I have myself spent quite a good deal of time to make sure this 
works on many platforms/toolchains, by fixing the config distutils
command and that platform specificities are contained in a very
localized part of the code. It may not be very well documented (see
below), but may I ask that next time someone wants to change file file,
people ask for review before putting it directly in the trunk ?

thanks,

David


How to deal with platform oddities:
---

Basically, the code to replace missing C99 math funcs is, for an
hypothetical double foo(double) function:

#ifndef HAVE_FOO
#udnef foo
static double npy_foo(double a)
{
// define a npy_foo function with the same requirements as C99 foo
}

#define npy_foo foo
#else
double foo(double);
#endif

I think this code is wrong on several accounts:
 - we should not undef foo if foo is available: if foo is available at
that point, it is a bug in the configuration, and should not be dealt in
the code. Some cases may be complicated (IEEE754-related macro which are
sometimes macro, something functions, etc...), but that should be dealt
in very narrow cases.
 - we should not declare our own function: function declaration is not
portable, and varies among OS/toolchains. Some toolchains use intrinsic,
some non standard inline mechanism, etc... which can crash the resulting
binary because there is a discrepency between our code calling
conventions and the library convention. The reported problem with VS
compiler on amd64 is caused by this exact problem.

Unless there is a strong rationale otherwise, I would like that we
follow how autoconfed projects do. They have long experience on
dealing with platforms idiosyncrasies, and the above method is not the
one they follow. They follow the simple:

#ifnfdef HAVE_FOO
//define foo
#endif

And deal with platform oddities in the *configuration* code instead of
directly in the code. That really makes my life easier when I deal with
windows compilers, which are already painful enough to deal with as it is.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Recent umath changes

2008-12-16 Thread Charles R Harris
On Tue, Dec 16, 2008 at 8:59 PM, David Cournapeau 
da...@ar.media.kyoto-u.ac.jp wrote:

 Hi,

There have been some changes recently in the umath code, which
 breaks windows 64 compilation - and I don't understand their rationale
 either. I have myself spent quite a good deal of time to make sure this
 works on many platforms/toolchains, by fixing the config distutils
 command and that platform specificities are contained in a very
 localized part of the code. It may not be very well documented (see
 below), but may I ask that next time someone wants to change file file,
 people ask for review before putting it directly in the trunk ?

 thanks,

 David


 How to deal with platform oddities:
 ---

 Basically, the code to replace missing C99 math funcs is, for an
 hypothetical double foo(double) function:

 #ifndef HAVE_FOO
 #udnef foo
 static double npy_foo(double a)
 {
 // define a npy_foo function with the same requirements as C99 foo
 }

 #define npy_foo foo
 #else
 double foo(double);
 #endif

 I think this code is wrong on several accounts:
  - we should not undef foo if foo is available: if foo is available at
 that point, it is a bug in the configuration, and should not be dealt in
 the code. Some cases may be complicated (IEEE754-related macro which are
 sometimes macro, something functions, etc...), but that should be dealt
 in very narrow cases.
  - we should not declare our own function: function declaration is not
 portable, and varies among OS/toolchains. Some toolchains use intrinsic,
 some non standard inline mechanism, etc... which can crash the resulting
 binary because there is a discrepency between our code calling
 conventions and the library convention. The reported problem with VS
 compiler on amd64 is caused by this exact problem.

 Unless there is a strong rationale otherwise, I would like that we
 follow how autoconfed projects do. They have long experience on
 dealing with platforms idiosyncrasies, and the above method is not the
 one they follow. They follow the simple:


Yes, the rational was to fix compilation on windows 64 with msvc and etch on
SPARC, both of which were working after the changes. You are, of course,
free to break these builds again. However, I designated space at the top of
the file for compiler/distro specific defines, I think you should use them,
there is a reason other folks do. The macro undef could be moved but I
preferred to generate an error if there was a conflict with the the standard
c function prototypes.

We can't use inline code for these functions as they are passed to the
generic loops as function pointers. I assume compilers have some way of
recognizing this case and perhaps generating function code on the fly. If
so, we need to figure out how to detect that.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Recent umath changes

2008-12-16 Thread David Cournapeau
Charles R Harris wrote:


 On Tue, Dec 16, 2008 at 8:59 PM, David Cournapeau
 da...@ar.media.kyoto-u.ac.jp mailto:da...@ar.media.kyoto-u.ac.jp
 wrote:

 Hi,

There have been some changes recently in the umath code, which
 breaks windows 64 compilation - and I don't understand their rationale
 either. I have myself spent quite a good deal of time to make sure
 this
 works on many platforms/toolchains, by fixing the config distutils
 command and that platform specificities are contained in a very
 localized part of the code. It may not be very well documented (see
 below), but may I ask that next time someone wants to change file
 file,
 people ask for review before putting it directly in the trunk ?

 thanks,

 David


 How to deal with platform oddities:
 ---

 Basically, the code to replace missing C99 math funcs is, for an
 hypothetical double foo(double) function:

 #ifndef HAVE_FOO
 #udnef foo
 static double npy_foo(double a)
 {
 // define a npy_foo function with the same requirements as C99 foo
 }

 #define npy_foo foo
 #else
 double foo(double);
 #endif

 I think this code is wrong on several accounts:
  - we should not undef foo if foo is available: if foo is available at
 that point, it is a bug in the configuration, and should not be
 dealt in
 the code. Some cases may be complicated (IEEE754-related macro
 which are
 sometimes macro, something functions, etc...), but that should be
 dealt
 in very narrow cases.
  - we should not declare our own function: function declaration is not
 portable, and varies among OS/toolchains. Some toolchains use
 intrinsic,
 some non standard inline mechanism, etc... which can crash the
 resulting
 binary because there is a discrepency between our code calling
 conventions and the library convention. The reported problem with VS
 compiler on amd64 is caused by this exact problem.

 Unless there is a strong rationale otherwise, I would like that we
 follow how autoconfed projects do. They have long experience on
 dealing with platforms idiosyncrasies, and the above method is not the
 one they follow. They follow the simple:


 Yes, the rational was to fix compilation on windows 64 with msvc and
 etch on SPARC, both of which were working after the changes.

It does not work at the moment on windows at least :) But more
essentially, I don't see why you declared those functions: can you
explain me what was your intention, because I don't understand the
rationale.

 You are, of course, free to break these builds again. However, I
 designated space at the top of the file for compiler/distro specific
 defines, I think you should use them, there is a reason other folks do.

The problem is two folds:
   - by declaring functions everywhere in the code, you are effectively
spreading toolchain specific oddities in the whole source file. This is
not good, IMHO: those should be detected at configuration stage, and
dealt in the source code using those informations. That's how every
autoconf project does it. If a function is actually a macro, this should
be detected at configuration.
 - declarations are toolchain specific; it is actually worse, it even
depends on the compiler flags. It is at least the case with MS
compilers. So there is no way to guarantee that your declaration matches
the math runtime one (the compiler crash reported is exactly caused by
this).

 The macro undef could be moved but I preferred to generate an error if
 there was a conflict with the the standard c function prototypes.

 We can't use inline code for these functions as they are passed to the
 generic loops as function pointers.

Yes, I believe this is another problem when declaring function: if we
use say cosl, and cosl is an inline function in the runtime, by
re-declaring it, you are telling the compiler that it is not inline
anymore, so the compiler does not know anymore you can't take the
address of cosl, unless it detects the mismatch between the runtime
declaration and ours, and considers it as an error (I am not sure
whether this is always an error with MS compilers; it may only be a
warning on some versions - it is certainly not dealt in a gracious
manner every time, since the linker crashes in some cases).

David

 

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion