[Numpy-discussion] nan, sign, and all that

2008-10-02 Thread Charles R Harris
Hi All,

I've added ufuncs fmin and fmax that behave as follows:

In [3]: a = array([NAN, 0, NAN, 1])

In [4]: b = array([0, NAN, NAN, 0])

In [5]: fmax(a,b)
Out[5]: array([  0.,   0.,  NaN,   1.])

In [6]: fmin(a,b)
Out[6]: array([  0.,   0.,  NaN,   0.])

In [7]: fmax.reduce(a)
Out[7]: 1.0

In [8]: fmin.reduce(a)
Out[8]: 0.0

In [9]: fmax.reduce([NAN,NAN])
Out[9]: nan

In [10]: fmin.reduce([NAN,NAN])
Out[10]: nan

I also made the sign ufunc return the sign of nan. That works, but I'm not
sure it is the way to go because there doesn't seem to be any spec as to
what sign nan takes. The current np.nan on my machine is negative and 0/0,
inf/inf all return negative nan. So it doesn't look like the actual sign of
nan makes any sense. Currently sign(NAN) returns 0, which doesn't look right
either, so I think the thing to do is return nan but this will be a change
in numpy behavior. Any thoughts?

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan, sign, and all that

2008-10-02 Thread Stéfan van der Walt
Hi Charles,

2008/10/2 Charles R Harris [EMAIL PROTECTED]:
 In [3]: a = array([NAN, 0, NAN, 1])
 In [4]: b = array([0, NAN, NAN, 0])

 In [5]: fmax(a,b)
 Out[5]: array([  0.,   0.,  NaN,   1.])

 In [6]: fmin(a,b)
 Out[6]: array([  0.,   0.,  NaN,   0.])

These are great, many thanks!

My only gripe is that they have the same NaN-handling as amin and
friends, which I consider to be broken.  Others also mentioned that
this should be changed, and I think David C wrote a patch for it (but
I am not informed as to the speed implications).

If I had to choose, this would be my preferred output:

In [5]: fmax(a,b)
Out[5]: array([  NaN,   NaN,  NaN,   1.])

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan, sign, and all that

2008-10-02 Thread Robert Kern
On Thu, Oct 2, 2008 at 02:37, Stéfan van der Walt [EMAIL PROTECTED] wrote:
 Hi Charles,

 2008/10/2 Charles R Harris [EMAIL PROTECTED]:
 In [3]: a = array([NAN, 0, NAN, 1])
 In [4]: b = array([0, NAN, NAN, 0])

 In [5]: fmax(a,b)
 Out[5]: array([  0.,   0.,  NaN,   1.])

 In [6]: fmin(a,b)
 Out[6]: array([  0.,   0.,  NaN,   0.])

 These are great, many thanks!

 My only gripe is that they have the same NaN-handling as amin and
 friends, which I consider to be broken.

No, these follow well-defined C99 semantics of the fmin() and fmax()
functions in libm. If exactly one of the arguments is a NaN, the
non-NaN argument is returned. This is *not* the current behavior of
amin() et al., which just do naive comparisons.

 Others also mentioned that
 this should be changed, and I think David C wrote a patch for it (but
 I am not informed as to the speed implications).

 If I had to choose, this would be my preferred output:

 In [5]: fmax(a,b)
 Out[5]: array([  NaN,   NaN,  NaN,   1.])

Chuck proposes letting minimum() and maximum() have that behavior.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] loadtxt

2008-10-02 Thread Nils Wagner
Hi all,

how can I load ASCII data if the file contains characters 
instead of floats

Traceback (most recent call last):
   File test_csv.py, line 2, in module
 A = loadtxt('ca6_sets.csv',dtype=char ,delimiter=';')
NameError: name 'char' is not defined
  
Nils
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan, sign, and all that

2008-10-02 Thread Stéfan van der Walt
2008/10/2 Robert Kern [EMAIL PROTECTED]:
 My only gripe is that they have the same NaN-handling as amin and
 friends, which I consider to be broken.

 No, these follow well-defined C99 semantics of the fmin() and fmax()
 functions in libm. If exactly one of the arguments is a NaN, the
 non-NaN argument is returned. This is *not* the current behavior of
 amin() et al., which just do naive comparisons.

Let me rephrase: I'm not convinced that these C99 semantics provide an
optimal user experience.  It worries me greatly that NaN's pop up in
operations and then disappear again.  It is entirely possible for a
script to run without failure and spew out garbage without the user
ever knowing.

 Others also mentioned that
 this should be changed, and I think David C wrote a patch for it (but
 I am not informed as to the speed implications).

 If I had to choose, this would be my preferred output:

 In [5]: fmax(a,b)
 Out[5]: array([  NaN,   NaN,  NaN,   1.])

 Chuck proposes letting minimum() and maximum() have that behavior.

That would be a good start, which would be complemented by educating
the user via some appropriate mechanism (I still don't know if one
exists; there is no NumPy Paperclip TM that states You have decided
to commit scientific suicide.  Would you like me to cut your
wrists?).  That's meant only half-tongue-in-cheekedly :)

Thanks for your comments,

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan, sign, and all that

2008-10-02 Thread David Cournapeau
On Thu, Oct 2, 2008 at 4:37 PM, Stéfan van der Walt [EMAIL PROTECTED] wrote:

 These are great, many thanks!

 My only gripe is that they have the same NaN-handling as amin and
 friends, which I consider to be broken.  Others also mentioned that
 this should be changed, and I think David C wrote a patch for it (but
 I am not informed as to the speed implications).

Hopefully, Chuck and me synchronised a bit on this :) The idea is that
before, I thought that there was a nan ignoring and nan propagating
behavior. Robert later mentioned that fmin/fmax has a third, well
specified behavior in C99. All those three are useful, and as such
have been more or less implemented by Chuck or me.

I think having the new C functions by Chuck makes sense as a new
python API, to follow C99 fmax/fmin. They could be used for the new
max/min, but then, it feels it a bit strange compared to
nanmax/nanmin, so I would prefer having the *current* numpy.max and
numpy.min propagate the NaN, and nanmax/nanmin ignoring the NaN
altogether.

Also note that matlab does not propagate NaN for max/min.

The last question is FPU status flag handling: I thought comparing NaN
directly with  would throw a FPE_INVALID. But this is not the case
(at least on Linux with glibc and Mac OS X). This is confusing because
I thought the whole point of C99 macro isgreater was to not throw
this. This is also how I understand both glibc manual and mac os x man
isgreater. Robert, do you have any insight on this ?

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] complex numpy.ndarray dtypes

2008-10-02 Thread Francesc Alted
A Thursday 02 October 2008, John Gu escrigué:
 Hello,

 I am using numpy in conjunction with pyTables.  The data that I read
 in from pyTables seem to have the following dtype:

 p = hdf5.root.myTable.read()

 p.__class__
 type 'numpy.ndarray'

 p[0].__class__
 type 'numpy.void'

 p.dtype
 dtype([('time', 'f4'), ('obs1', 'f4'), ('obs2', 'f8'), ('obs3',
 'f4')])

 p.shape
 (61230,)

 The manner in which I access a particular column is p['time'] or
 p['obs1']. I have a couple of questions regarding this data
 structure: 1) how do I restructure the array into a 61230 x 4 array
 that can be indexed using [r,c] notation?

In your example, the table (record array in NumPy jargon) is 
inhomogeneous (all fields are 'f4' except 'obs2' which is 'f8').  In 
that case, you can obtain an homogeneous array by doing something like:

In [44]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','f4'),
('obs2','f8')])

In [45]: b = numpy.array([(val['obs1'], val['obs2']) for val in a], 
dtype='f4')

In [46]: b
Out[46]:
array([[ 1.,  2.],
   [ 3.,  4.]], dtype=float32)

In case your table would be homegenous, there is a simpler way:

In [41]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','f4'),
('obs2','f4')])

In [42]: d = a.view(('f4',2))

In [43]: d
Out[43]:
array([[ 1.,  2.],
   [ 3.,  4.]], dtype=float32)

which is faster:

In [68]: timeit d = a.view(('f4',2))
10 loops, best of 3: 11.5 µs per loop

In [69]: timeit b=numpy.array([(val['obs1'], val['obs2']) for val in a], 
dtype='f4')
1 loops, best of 3: 39.8 µs per loop

 2) What kind of dtype is 
 pyTables using?  How do I create a similar array that can be indexed
 by a named column?  I tried various ways:

 a = array([[1,2],[3,4]],
 dtype=dtype([('obs1','f4'),('obs2','f4')]))
 -
-- type 'exceptions.TypeError' Traceback (most
 recent call last)

 p:\AsiaDesk\johngu\projects\deltaForce\ipython console in
 module()

 type 'exceptions.TypeError': expected a readable buffer object

Yeah, the error message is too terse in this case.  Record array 
constructor needs to be sure where your records start and end, and this 
is achieved by mapping tuples to records.  So, your example must be 
rewritten as:

In [70]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','f4'),
('obs2','f4')])

In [71]: a
Out[71]:
array([(1.0, 2.0), (3.0, 4.0)],
  dtype=[('obs1', 'f4'), ('obs2', 'f4')])

Have a look at:

http://www.scipy.org/RecordArrays

for more info on record arrays.

 I did find some documentation about array type descriptors when
 reading from files... it seems like these array types are specific to
 arrays created when reading from some sort of file / buffer?  Any
 help is appreciated.  Thanks!

I'm not sure on what you are asking here.  At any rate, it might be 
useful to have a look at complex dtype examples in:

http://www.scipy.org/Numpy_Example_List#head-f9175c69cccd74b9e4ee92e2a060af27c7447b76

Hope that helps,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt

2008-10-02 Thread Francesc Alted
A Thursday 02 October 2008, Nils Wagner escrigué:
 Hi all,

 how can I load ASCII data if the file contains characters
 instead of floats

 Traceback (most recent call last):
File test_csv.py, line 2, in module
  A = loadtxt('ca6_sets.csv',dtype=char ,delimiter=';')
 NameError: name 'char' is not defined

You would need to specify the length of your strings.  Try with 
dtype=SN, where N is the expected length of the strings.

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Portable functions for nans, signbit, etc.

2008-10-02 Thread David Cournapeau
On Thu, Oct 2, 2008 at 11:41 AM, Charles R Harris
[EMAIL PROTECTED] wrote:


 Which is rather clever. I think binary_cast will require some pointer abuse.

Yep (the funny thing is that the bit twiddling will likely end up more
readable than this C++ stuff)

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt

2008-10-02 Thread Stéfan van der Walt
2008/10/2 Francesc Alted [EMAIL PROTECTED]:
 how can I load ASCII data if the file contains characters
 instead of floats

 You would need to specify the length of your strings.  Try with
 dtype=SN, where N is the expected length of the strings.

Other options include:

- using converters to convert the character to a value:

  np.loadtxt('/tmp/bleh.dat', converters={2: lambda x: 0})

- Skipping the specified column:

  np.loadtxt('/tmp/bleh.dat', usecols=(0,1))

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan, sign, and all that

2008-10-02 Thread Pete Forman
Stéfan van der Walt [EMAIL PROTECTED] writes:

  Let me rephrase: I'm not convinced that these C99 semantics provide
  an optimal user experience.  It worries me greatly that NaN's pop
  up in operations and then disappear again.  It is entirely possible
  for a script to run without failure and spew out garbage without
  the user ever knowing.

By default NaNs are propagated through operations on them.  At the end
of this discussion we ought to end up with a list of functions such as
fmax, isnan, and copysign that are the exceptions.

I think that it is right to defer to IEEE for their decisions on the
behavior of NaNs, etc.  That is what C and Fortran are doing.  I have
not checked but I would guess that CPUs and FPUs behave that way too.
So it should be easier and faster to follow IEEE.

Note that in the just released Python 2.6 floating point support of
IEEE 754 has been beefed up.
-- 
Pete Forman-./\.-  Disclaimer: This post is originated
WesternGeco  -./\.-   by myself and does not represent
[EMAIL PROTECTED]-./\.-   the opinion of Schlumberger or
http://petef.22web.net   -./\.-   WesternGeco.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan, sign, and all that

2008-10-02 Thread Charles R Harris
On Thu, Oct 2, 2008 at 1:42 AM, Robert Kern [EMAIL PROTECTED] wrote:

 On Thu, Oct 2, 2008 at 02:37, Stéfan van der Walt [EMAIL PROTECTED]
 wrote:
  Hi Charles,
 
  2008/10/2 Charles R Harris [EMAIL PROTECTED]:
  In [3]: a = array([NAN, 0, NAN, 1])
  In [4]: b = array([0, NAN, NAN, 0])
 
  In [5]: fmax(a,b)
  Out[5]: array([  0.,   0.,  NaN,   1.])
 
  In [6]: fmin(a,b)
  Out[6]: array([  0.,   0.,  NaN,   0.])
 
  These are great, many thanks!
 
  My only gripe is that they have the same NaN-handling as amin and
  friends, which I consider to be broken.

 No, these follow well-defined C99 semantics of the fmin() and fmax()
 functions in libm. If exactly one of the arguments is a NaN, the
 non-NaN argument is returned. This is *not* the current behavior of
 amin() et al., which just do naive comparisons.

  Others also mentioned that
  this should be changed, and I think David C wrote a patch for it (but
  I am not informed as to the speed implications).
 
  If I had to choose, this would be my preferred output:
 
  In [5]: fmax(a,b)
  Out[5]: array([  NaN,   NaN,  NaN,   1.])

 Chuck proposes letting minimum() and maximum() have that behavior.


Yes. If there is any agreement on this I would like to go ahead and do it.
It does change the current behavior of maximum and minimum.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan, sign, and all that

2008-10-02 Thread David Cournapeau
Charles R Harris wrote:

 Yes. If there is any agreement on this I would like to go ahead and do
 it. It does change the current behavior of maximum and minimum.

If you do it, please do it with as many tests as possible (it should not
be difficult to have a comprehensive test with *all* float data types),
because this is likely to cause problems on some platforms.

thanks,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Portable functions for nans, signbit, etc.

2008-10-02 Thread Charles R Harris
On Thu, Oct 2, 2008 at 2:41 AM, David Cournapeau [EMAIL PROTECTED] wrote:

 On Thu, Oct 2, 2008 at 11:41 AM, Charles R Harris
 [EMAIL PROTECTED] wrote:

 
  Which is rather clever. I think binary_cast will require some pointer
 abuse.

 Yep (the funny thing is that the bit twiddling will likely end up more
 readable than this C++ stuff)


The zip file has the bit twiddling, which is worth looking at if only for
the note on the PPC extended precision. Motorola seems to  be a problem but
I don't think we support any of the 66xx series.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Help to process a large data file

2008-10-02 Thread David Huard
Frank,

How about that:

x = np.loadtxt('file')

z = x.sum(1)   # Reduce data to an array of 0,1,2

rz = z[z0]   # Remove all 0s since you don't want to count those.

loc = np.where(rz==2)[0]  # The location of the (1,1)s

count = np.diff(loc) - 1  # The spacing between those (1,1)s, ie, the number
of elements that have one 1.


HTH,

David


On Wed, Oct 1, 2008 at 9:27 PM, frank wang [EMAIL PROTECTED] wrote:

  Hi,

 I have a large data file which contains 2 columns of data. The two columns
 only have zero and one. Now I want to cound how many one in between if both
 columns are one. For example, if my data is:

 1 0
 0 0
 1 1
 0 0
 0 1x
 0 1x
 0 0
 0 1x
 1 1
 0 0
 0 1x
 0 1x
 1 1

 Then my count will be 3 and 2 (the numbers with x).

 Are there an efficient way to do this? My data file is pretty big.

 Thanks

 Frank

 --
 See how Windows connects the people, information, and fun that are part of
 your life. See 
 Nowhttp://clk.atdmt.com/MRT/go/msnnkwxp1020093175mrt/direct/01/

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Help to process a large data file

2008-10-02 Thread orionbelt2
Frank,

I would imagine that you cannot get a much better performance in python 
than this, which avoids string conversions:

c = []
count = 0
for line in open('foo'):
if line == '1 1\n':
c.append(count)
count = 0
else:
if '1' in line: count += 1

One could do some numpy trick like:

a = np.loadtxt('foo',dtype=int)
a = np.sum(a,axis=1)# Add the two columns horizontally
b = np.where(a==2)[0]   # Find with sum == 2 (1 + 1)
count = []
for i,j in zip(b[:-1],b[1:]):
count.append( a[i+1:j].sum() )  # Calculate number of lines with 1

but on my machine the numpy version takes about 20 sec for a 'foo' file 
of 2,500,000 lines versus 1.2 sec for the pure python version...

As a side note, if i replace line == '1 1\n' with line.startswith('1 
1'), the pure python version goes up to 1.8 sec... Isn't this a bit 
weird, i'd think startswith() should be faster...

Chris

On Wed, Oct 01, 2008 at 07:27:27PM -0600, frank wang wrote:

Hi,
 
I have a large data file which contains 2 columns of data. The two 
columns only have zero and one. Now I want to cound how many one in 
between if both columns are one. For example, if my data is:
 
1 0
0 0
1 1
0 0
0 1x
0 1x
0 0
0 1x
1 1
0 0
0 1x
0 1x
1 1
 
Then my count will be 3 and 2 (the numbers with x).
 
Are there an efficient way to do this? My data file is pretty big.
 
Thanks
 
Frank
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Help to process a large data file

2008-10-02 Thread frank wang

Thans David and Chris for providing the nice solution.
 
Both method works gread. I could not tell the speed difference between the two 
solutions. My data size is 1048577 lines.
 
I did not try the second solution from Chris since it is too slow as Chris 
stated.
 
Frank
 
 Date: Thu, 2 Oct 2008 17:43:37 +0200 From: [EMAIL PROTECTED] To: 
 numpy-discussion@scipy.org CC: [EMAIL PROTECTED] Subject: Re: 
 [Numpy-discussion] Help to process a large data file  Frank,  I would 
 imagine that you cannot get a much better performance in python  than this, 
 which avoids string conversions:  c = [] count = 0 for line in 
 open('foo'): if line == '1 1\n': c.append(count) count = 0 else: if '1' 
 in line: count += 1  One could do some numpy trick like:  a = 
 np.loadtxt('foo',dtype=int) a = np.sum(a,axis=1) # Add the two columns 
 horizontally b = np.where(a==2)[0] # Find with sum == 2 (1 + 1) count = [] 
 for i,j in zip(b[:-1],b[1:]): count.append( a[i+1:j].sum() ) # Calculate 
 number of lines with 1  but on my machine the numpy version takes about 20 
 sec for a 'foo' file  of 2,500,000 lines versus 1.2 sec for the pure python 
 version...  As a side note, if i replace line == '1 1\n' with 
 line.startswith('1  1'), the pure python version goes up to 1.8 sec... 
 Isn't this a bit  weird, i'd think startswith() should be faster...  
 Chris  On Wed, Oct 01, 2008 at 07:27:27PM -0600, frank wang wrote:   
 Hi,I have a large data file which contains 2 columns of data. The two 
   columns only have zero and one. Now I want to cound how many one in   
 between if both columns are one. For example, if my data is:1 0  0 
 0  1 1  0 0  0 1 x  0 1 x  0 0  0 1 x  1 1  0 0  0 1 x  0 
 1 x  1 1Then my count will be 3 and 2 (the numbers with x).
 Are there an efficient way to do this? My data file is pretty big.
 ThanksFrank ___ 
 Numpy-discussion mailing list Numpy-discussion@scipy.org 
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
_
See how Windows connects the people, information, and fun that are part of your 
life.
http://clk.atdmt.com/MRT/go/msnnkwxp1020093175mrt/direct/01/___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Texas Python Regional Unconference Reminders

2008-10-02 Thread Travis Vaught
Hey Steve,

I'll bring my camera and try to recruit a volunteer.  No guarantees,  
but we should at least be able to record things (any volunteers to  
transcode a pile of scipy videos? ;-) ).

Best,

Travis


On Oct 1, 2008, at 7:56 PM, Steve Lianoglou wrote:

 Hi,

 Are there any plans to tape the presentations? Unfortunately some of
 us can't make it down to Texas, but the talks look quite interesting.

 Thanks,
 -steve

 On Oct 1, 2008, at 10:36 AM, Travis Vaught wrote:

 Greetings,

 The Texas Python Regional Unconference is coming up this weekend
 (October 4-5) and I wanted to send out some more details of the
 meeting.  The web page for the meeting is here:

 http://www.scipy.org/TXUncon2008

 The meeting is _absolutely free_, so please add yourself to the
 Attendees page if you're able to make it.  Also, if you're planning  
 to
 attend, please send me the following information (to [EMAIL PROTECTED]
 ) so I can request wireless access for you during the meeting:

 - Full Name
 - Phone or email
 - Address
 - Affiliation

 There are still opportunities to present your pet projects at the
 meeting, so feel free to sign up on the presentation schedule here:

 http://www.scipy.org/TXUncon2008Schedule

 For those who are in town Friday evening, we're planning to get
 together for a casual dinner in downtown Austin that night.  We'll
 meet at Enthought offices 
 (http://www.enthought.com/contact/map-directions.php
 ) and walk to a casual restaurant nearby.  Show up as early as 5:30pm
 and you can hang out and tour the Enthought offices--we'll head out  
 to
 eat at 7:00pm sharp.

 Best,

 Travis


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion




 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan, sign, and all that

2008-10-02 Thread Robert Kern
On Thu, Oct 2, 2008 at 08:22, Charles R Harris
[EMAIL PROTECTED] wrote:

 On Thu, Oct 2, 2008 at 1:42 AM, Robert Kern [EMAIL PROTECTED] wrote:

 On Thu, Oct 2, 2008 at 02:37, Stéfan van der Walt [EMAIL PROTECTED]
 wrote:
  Hi Charles,
 
  2008/10/2 Charles R Harris [EMAIL PROTECTED]:
  In [3]: a = array([NAN, 0, NAN, 1])
  In [4]: b = array([0, NAN, NAN, 0])
 
  In [5]: fmax(a,b)
  Out[5]: array([  0.,   0.,  NaN,   1.])
 
  In [6]: fmin(a,b)
  Out[6]: array([  0.,   0.,  NaN,   0.])
 
  These are great, many thanks!
 
  My only gripe is that they have the same NaN-handling as amin and
  friends, which I consider to be broken.

 No, these follow well-defined C99 semantics of the fmin() and fmax()
 functions in libm. If exactly one of the arguments is a NaN, the
 non-NaN argument is returned. This is *not* the current behavior of
 amin() et al., which just do naive comparisons.

  Others also mentioned that
  this should be changed, and I think David C wrote a patch for it (but
  I am not informed as to the speed implications).
 
  If I had to choose, this would be my preferred output:
 
  In [5]: fmax(a,b)
  Out[5]: array([  NaN,   NaN,  NaN,   1.])

 Chuck proposes letting minimum() and maximum() have that behavior.

 Yes. If there is any agreement on this I would like to go ahead and do it.
 It does change the current behavior of maximum and minimum.

I think the position we've held is that in the presence of NaNs, the
behavior of these functions have been left unspecified, so I think it
is okay to change them.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: scipy.spatial

2008-10-02 Thread David Bolme
I also like the idea of a scipy.spatial library.  For the research I  
do in machine learning and computer vision we are often interested in  
specifying different distance measures.  It would be nice to have a  
way to specify the distance measure.  I would like to see a standard  
set included: City Block, Euclidean, Correlation, etc as well as a  
capability for a user defined distance or similarity function.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: scipy.spatial

2008-10-02 Thread Matthieu Brucher
2008/10/2 David Bolme [EMAIL PROTECTED]:
 I also like the idea of a scipy.spatial library.  For the research I
 do in machine learning and computer vision we are often interested in
 specifying different distance measures.  It would be nice to have a
 way to specify the distance measure.  I would like to see a standard
 set included: City Block, Euclidean, Correlation, etc as well as a
 capability for a user defined distance or similarity function.

You mean similarity or dissimilarity ? Distance is a dissimilarity but
correlation is a similarity measure.

Matthieu
-- 
French PhD student
Information System Engineer
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] f2py IS NOW WORKING

2008-10-02 Thread Blubaugh, David A.
To all,


I have now been able to develop a stable file via f2py!!  However, I had
to execute the following:

1.) First, I had to copy all required library files from my selected
Compaq visual Fortran compiler under python's scripts directory along
with f2py itself.  


2.)  I also had to include a dll from my compiler under python's dll
directory as well.  


I know that the reason as to why I needed to execute these actions, is
that I do not know as to what should be my correct environmental
variables within windows XP running Compaq Visual Fortran 6.6.  


Once again, I would appreciate to know as to what are the correct
environmental variables should I set my windows xp under, given that the
compiler I must utilize is a Compaq Visual Fortran Compiler 6.6??? 


Thanks,

David Blubaugh

This e-mail transmission contains information that is confidential and may be 
privileged. It is intended only for the addressee(s) named above. If you 
receive 
this e-mail in error, please do not read, copy or disseminate it in any manner. 
If you are not the intended recipient, any disclosure, copying, distribution or 
use of the contents of this information is prohibited. Please reply to the 
message immediately by informing the sender that the message was misdirected. 
After replying, please erase it from your computer system. Your assistance in 
correcting this error is appreciated.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST--

2008-10-02 Thread Chris Barker
Jarrod Millman wrote:
 The 1.2.0rc2 is now available:
 http://svn.scipy.org/svn/numpy/tags/1.2.0rc2

what's the status of this?

 Here are the Window's binaries:
 http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0rc2-win32-superpack-python2.5.exe

this appears to be a dead link.

thanks,
-Chris

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST--

2008-10-02 Thread Robert Kern
On Thu, Oct 2, 2008 at 16:45, Chris Barker [EMAIL PROTECTED] wrote:
 Jarrod Millman wrote:
 The 1.2.0rc2 is now available:
 http://svn.scipy.org/svn/numpy/tags/1.2.0rc2

 what's the status of this?

Superceded by the 1.2.0 release. See the thread ANN: NumPy 1.2.0.

 Here are the Window's binaries:
 http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0rc2-win32-superpack-python2.5.exe

 this appears to be a dead link.

Superceded by 
http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0-win32-superpack-python2.5.exe

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: scipy.spatial

2008-10-02 Thread David Bolme
It may be useful to have an interface that handles both cases:  
similarity and dissimilarity.  Often I have seen Nearest Neighbor  
algorithms that look for maximum similarity instead of minimum  
distance.  In my field (biometrics) we often deal with very  
specialized distance or similarity measures.  I would like to see  
support for user defined distance and similarity functions.  It should  
be easy to implement by passing a function object to the KNN class.  I  
am not sure if kd-trees or other fast algorithms are compatible with  
similarities or non-euclidian norms, however I would be willing to  
implement an exhaustive search KNN that would support user defined  
functions.

On Oct 2, 2008, at 2:01 PM, Matthieu Brucher wrote:

 2008/10/2 David Bolme [EMAIL PROTECTED]:
 I also like the idea of a scipy.spatial library.  For the research I
 do in machine learning and computer vision we are often interested in
 specifying different distance measures.  It would be nice to have a
 way to specify the distance measure.  I would like to see a standard
 set included: City Block, Euclidean, Correlation, etc as well as a
 capability for a user defined distance or similarity function.

 You mean similarity or dissimilarity ? Distance is a dissimilarity but
 correlation is a similarity measure.

 Matthieu
 -- 
 French PhD student
 Information System Engineer
 Website: http://matthieu-brucher.developpez.com/
 Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
 LinkedIn: http://www.linkedin.com/in/matthieubrucher
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST--

2008-10-02 Thread Chris Barker
Robert Kern wrote:
 Superceded by the 1.2.0 release. See the thread ANN: NumPy 1.2.0.

I thought I'd seen that, but when I went to:

http://www.scipy.org/Download

And I still got 1.1

 Superceded by 
 http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0-win32-superpack-python2.5.exe

thanks,

-Chris



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: scipy.spatial

2008-10-02 Thread Anne Archibald
2008/10/2 David Bolme [EMAIL PROTECTED]:
 It may be useful to have an interface that handles both cases:
 similarity and dissimilarity.  Often I have seen Nearest Neighbor
 algorithms that look for maximum similarity instead of minimum
 distance.  In my field (biometrics) we often deal with very
 specialized distance or similarity measures.  I would like to see
 support for user defined distance and similarity functions.  It should
 be easy to implement by passing a function object to the KNN class.  I
 am not sure if kd-trees or other fast algorithms are compatible with
 similarities or non-euclidian norms, however I would be willing to
 implement an exhaustive search KNN that would support user defined
 functions.

kd-trees can only work for distance measures which have certain
special properties (in particular, you have to be able to bound them
based on coordinate differences). This is just fine for all the
Minkowski p-norms (so in particular, Euclidean distance,
maximum-coordinate-difference, and Manhattan distance) and in fact the
current implementation already supports all of these. I don't think
that correlation can be made into such a distance measure - the
neighborhoods are the wrong shape. In fact the basic space is
projective n-1 space rather than affine n-space, so I think you're
going to need some very different algorithm. If you make a metric
space out of it - define d(A,B) to be the angle between A and B - then
cover trees can serve as a spatial data structure for nearest-neighbor
search. Cover trees may be worth implementing, as they're a very
generic data structure, suitable for (among other things)
low-dimensional data in high-dimensional spaces.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.random.hypergeometric - strange results

2008-10-02 Thread joep
see http://scipy.org/scipy/numpy/ticket/921

I think I found the error

http://scipy.org/scipy/numpy/browser/trunk/numpy/random/mtrand/distributions.c

{{{
805 /* this is a correction to HRUA* by Ivan Frohne in rv.py */
806 if (good  bad) Z = m - Z;
}}}

Quickly looking at the referenced program,
downloaded from: http://pal.ece.iisc.ernet.in/~dhani/frohne/rv.py

Notation: alpha = bad, beta = good:

{{{
   if alpha  beta:   # Error in HRUA*, this is
correct.
  z = m - z
}}}

As you can see, if my interpretation is correct, then line 806 should
have good and bad reversed, i.e.

{{{
806 if (bad  good) Z = m - Z;
}}}

Can you verify this? I never tried to build numpy from source.

Josef

On Sep 25, 4:18 pm, joep [EMAIL PROTECTED] wrote:
 In my fuzz testing of scipy stats, I get sometimes a test failure. I
 think there is something
 wrong with numpy.random.hypergeometric for some cases:

 Josef

  import numpy.random as mtrand
  mtrand.hypergeometric(3,17,12,size=10)   # there are only 3 good balls in 
  urn

 array([16, 17, 16, 16, 15, 16, 17, 16, 17, 16]) 
 mtrand.hypergeometric(17,3,12,size=10)   #negative result

 array([-3, -4, -3, -4, -3, -3, -4, -4, -5, -4])

  mtrand.hypergeometric(4,3,12,size=10)
  np.version.version

 '1.2.0rc2'

 I did not find any clear pattern when trying out different parameter
 values:

  mtrand.hypergeometric(10,10,12,size=10)

 array([5, 6, 4, 4, 8, 5, 4, 6, 7, 4]) 
 mtrand.hypergeometric(10,10,20,size=10)

 array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10]) 
 mtrand.hypergeometric(10,10,19,size=10)

 array([10,  9,  9,  9,  9,  9, 10,  9,  9,  9]) 
 mtrand.hypergeometric(10,10,5,size=10)

 array([3, 5, 2, 2, 1, 2, 2, 4, 3, 1]) mtrand.hypergeometric(10,2,5,size=10)

 array([4, 5, 4, 5, 5, 5, 4, 3, 4, 4]) mtrand.hypergeometric(2,10,5,size=10)

 array([0, 2, 1, 0, 2, 2, 1, 1, 1, 1])

  mtrand.hypergeometric(17,3,12,size=10)

 array([-5, -3, -4, -4, -4, -3, -4, -4, -3, -3]) 
 mtrand.hypergeometric(3,17,12,size=10)

 array([15, 16, 17, 16, 15, 16, 15, 15, 17, 17]) 
 mtrand.hypergeometric(18,3,12,size=10)

 array([-5, -6, -6, -4, -4, -4, -5, -3, -5, -5])

  mtrand.hypergeometric(18,3,5,size=10)

 array([4, 5, 5, 5, 5, 5, 4, 5, 4, 3]) 
 mtrand.hypergeometric(18,3,19,size=10)

 array([1, 1, 2, 1, 1, 1, 1, 3, 1, 1])
 ___
 Numpy-discussion mailing list
 [EMAIL PROTECTED]://projects.scipy.org/mailman/listinfo/numpy-discussion
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST--

2008-10-02 Thread Jarrod Millman
On Thu, Oct 2, 2008 at 4:29 PM, Chris Barker [EMAIL PROTECTED] wrote:
 Robert Kern wrote:
 Superceded by the 1.2.0 release. See the thread ANN: NumPy 1.2.0.

 I thought I'd seen that, but when I went to:

 http://www.scipy.org/Download

 And I still got 1.1

I updated the page to point to the sourceforge page.  Thanks for catching that.

-- 
Jarrod Millman
Computational Infrastructure for Research Labs
10 Giannini Hall, UC Berkeley
phone: 510.643.4014
http://cirl.berkeley.edu/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.random.logseries - incorrect convergence for k=1, k=2

2008-10-02 Thread joep
Filed as http://scipy.org/scipy/numpy/ticket/923

and I think i finally tracked down the source of the incorrect random
numbers, a reversed inequality in
http://scipy.org/scipy/numpy/browser/trunk/numpy/random/mtrand/distributions.c
line 871, see my last comment to the trac ticket.

Josef


On Sep 27, 2:12 pm, joep [EMAIL PROTECTED] wrote:
 random numbers generated by numpy.random.logseries do not converge to
 theoretical distribution:

 for probability paramater pr = 0.8, the random number generator
 converges to a
 frequency for k=1 at 39.8 %, while the theoretical probability mass is
 49.71
 k=2 is oversampled, other k's look ok

 check frequency of k=1 and k=2 at N =  100
 0.398406 0.296465
 pmf at k = 1 and k=2 with formula
 [ 0.4971  0.1988]

 for probability paramater pr = 0.3, the results are not as bad, but
 still off:
 frequency for k=1 at 82.6 %, while the theoretical probability mass is
 84.11

 check frequency of k=1 and k=2 at N =  100
 0.826006 0.141244
 pmf at k = 1 and k=2 with formula
 [ 0.8411  0.1262]

 below is a quick script for checking this

 Josef

 {{{
 import numpy as np
 from scipy import stats

 pr = 0.8
 np.set_printoptions(precision=2, suppress=True)

 # calculation for N=1million takes some time
 for N in [1000, 1, 1, 100]:
     rvsn=np.random.logseries(pr,size=N)
     fr=stats.itemfreq(rvsn)
     pmfs=stats.logser.pmf(fr[:,0],pr)*100
     print 'log series sample frequency and pmf (in %) with N = ', N
     print np.column_stack((fr[:,0],fr[:,1]*100.0/N,pmfs))

 np.set_printoptions(precision=4, suppress=True)

 print 'check frequency of k=1 and k=2 at N = ', N
 print np.sum(rvsn==1)/float(N),
 print np.sum(rvsn==2)/float(N)

 k = np.array([1,2])
 print 'pmf at k = 1 and k=2 with formula'
 print -pr**k * 1.0 / k / np.log(1-pr)}}}

 ___
 Numpy-discussion mailing list
 [EMAIL PROTECTED]://projects.scipy.org/mailman/listinfo/numpy-discussion
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion