Re: [Numpy-discussion] newbie: convert recarray to floating-point ndarray with mixed types

2010-05-12 Thread Eric Firing
On 05/12/2010 12:37 PM, Gregory, Matthew wrote:
> Apologies for what is likely a simple question and I hope it hasn't been 
> asked before ...
>
> Given a recarray with a dtype consisting of more than one type, e.g.
>
>>>>  import numpy as n
>>>>  a = n.array([(1.0, 2), (3.0, 4)], dtype=[('x', float), ('y', int)])
>>>>  b = a.view(n.recarray)
>>>>  b
>rec.array([(1.0, 2), (3.0, 4)],
>  dtype=[('x', '
> Is there a simple way to convert 'b' to a floating-point ndarray, casting the 
> integer field to a floating-point?  I've tried the naïve:
>
>>>>  c = b.view(dtype='float').reshape(b.size,-1)
>
> but that fails with:
>
>ValueError: new type not compatible with array.
>
> I understand why this would fail (as it is a view and not a copy), but I'm 
> lost on a method to do this conversion simply.
>

It may not be as simple as you would like, but the following works 
efficiently:

import numpy as np
a = np.array([(1.0, 2), (3.0, 4)], dtype=[('x', float), ('y', int)])
b = np.empty((a.shape[0], 2), dtype=np.float)
b[:,0] = a['x']
b[:,1] = a['y']

Eric



> thanks, matt
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug with how numpy.distutils.system_info handles the site.cfg

2010-05-12 Thread Chris Colbert
On Wed, May 12, 2010 at 11:06 PM, Chris Colbert  wrote:

> I had this problem back in 2009 when building Enthought Enable, and was
> happy with a work around. It just bit me again, and I finally got around to
> drilling down to the problem.
>
> On linux, if one uses the numpy/site.cfg [default] section when building
> from source to specifiy local library directories, the x11 libs won't be
> found by NumPy.
>
> The relevant section of the site.cfg.example reads as follows:
>
> # Defaults
> # 
> # The settings given here will apply to all other sections if not
> overridden.
> # This is a good place to add general library and include directories like
> # /usr/local/{lib,include}
> #
> #[DEFAULT]
> #library_dirs = /usr/local/lib
> #include_dirs = /usr/local/include
>
> Now, I build NumPy with Atlas and my Atlas libs are installed in
> /usr/local, so my [default] section of site.cfg looks like this (as
> suggested by the site.cfg.example):
>
> # Defaults
> # 
> # The settings given here will apply to all other sections if not
> overridden.
> # This is a good place to add general library and include directories like
> # /usr/local/{lib,include}
> #
> [DEFAULT]
> library_dirs = /usr/local/lib:/usr/local/lib/atlas
> include_dirs = /usr/local/include
>
>
> NumPy builds and works fine with this. The problem occurs when other
> libraries use numpy.distutils.system_info.get_info('x11') (ala Enthought
> Enable). That function eventually calls
> numpy.distutils.system_info.system_info.parse_config_files which has the
> following definition:
>
> def parse_config_files(self):
> self.cp.read(self.files)
> if not self.cp.has_section(self.section):
> if self.section is not None:
> self.cp.add_section(self.section)
>
> When self.cp is instantiated (when looking for the x11 libs), it is
> provided the following defaults:
>
> {'libraries': '', 'src_dirs': '.:/usr/local/src', 'search_static_first':
> '0', 'library_dirs':
> '/usr/X11R6/lib64:/usr/X11R6/lib:/usr/X11/lib64:/usr/X11/lib:/usr/lib64:/usr/lib',
> 'include_dirs': '/usr/X11R6/include:/usr/X11/include:/usr/include'}
>
> As is clearly seen, the 'library_dirs' contains the proper paths to find
> the x11 libs. But since the config file has [default] section, these paths
> get trampled and replaced with whatever is contained in the site.cfg
> [default] section. In my case, this is /usr/local/lib:/usr/local/lib/atlas.
> Thus, my x11 libs aren't found and the Enable build fails.
>
> The workaround is to include an [x11] section in site.cfg with the
> appropriate paths, but I don't really feel this should be necessary. Would
> the better behavior be to look for a [default] section in the config file in
> the parse_config_files method and add those paths to the already specified
> defaults?
>
>
Then again, another workaround could be to add the atlas directory paths to
the [blas_opt] and [lapack_opt] sections. This would work for my case, but
it doesn't solve the larger problem of any directories put in [default]
trouncing any of the other standard dirs that would otherwise be used.


> Changing the site.cfg [default] section to read as follows:
>
> [DEFAULT]
> library_dirs = /usr/lib:/usr/local/lib:/usr/local/lib/atlas
> include_dirs = /usr/include:/usr/local/include
>
> is not an option because then NumPy will find and use the system atlas,
> which in my case is not threaded nor optimized for my machine.
>
> If you want me to patch the parse_config_files method, just let me know.
>
> Cheers,
>
> Chris
>
>
>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bug with how numpy.distutils.system_info handles the site.cfg

2010-05-12 Thread Chris Colbert
I had this problem back in 2009 when building Enthought Enable, and was
happy with a work around. It just bit me again, and I finally got around to
drilling down to the problem.

On linux, if one uses the numpy/site.cfg [default] section when building
from source to specifiy local library directories, the x11 libs won't be
found by NumPy.

The relevant section of the site.cfg.example reads as follows:

# Defaults
# 
# The settings given here will apply to all other sections if not
overridden.
# This is a good place to add general library and include directories like
# /usr/local/{lib,include}
#
#[DEFAULT]
#library_dirs = /usr/local/lib
#include_dirs = /usr/local/include

Now, I build NumPy with Atlas and my Atlas libs are installed in /usr/local,
so my [default] section of site.cfg looks like this (as suggested by the
site.cfg.example):

# Defaults
# 
# The settings given here will apply to all other sections if not
overridden.
# This is a good place to add general library and include directories like
# /usr/local/{lib,include}
#
[DEFAULT]
library_dirs = /usr/local/lib:/usr/local/lib/atlas
include_dirs = /usr/local/include


NumPy builds and works fine with this. The problem occurs when other
libraries use numpy.distutils.system_info.get_info('x11') (ala Enthought
Enable). That function eventually calls
numpy.distutils.system_info.system_info.parse_config_files which has the
following definition:

def parse_config_files(self):
self.cp.read(self.files)
if not self.cp.has_section(self.section):
if self.section is not None:
self.cp.add_section(self.section)

When self.cp is instantiated (when looking for the x11 libs), it is provided
the following defaults:

{'libraries': '', 'src_dirs': '.:/usr/local/src', 'search_static_first':
'0', 'library_dirs':
'/usr/X11R6/lib64:/usr/X11R6/lib:/usr/X11/lib64:/usr/X11/lib:/usr/lib64:/usr/lib',
'include_dirs': '/usr/X11R6/include:/usr/X11/include:/usr/include'}

As is clearly seen, the 'library_dirs' contains the proper paths to find the
x11 libs. But since the config file has [default] section, these paths get
trampled and replaced with whatever is contained in the site.cfg [default]
section. In my case, this is /usr/local/lib:/usr/local/lib/atlas. Thus, my
x11 libs aren't found and the Enable build fails.

The workaround is to include an [x11] section in site.cfg with the
appropriate paths, but I don't really feel this should be necessary. Would
the better behavior be to look for a [default] section in the config file in
the parse_config_files method and add those paths to the already specified
defaults?

Changing the site.cfg [default] section to read as follows:

[DEFAULT]
library_dirs = /usr/lib:/usr/local/lib:/usr/local/lib/atlas
include_dirs = /usr/include:/usr/local/include

is not an option because then NumPy will find and use the system atlas,
which in my case is not threaded nor optimized for my machine.

If you want me to patch the parse_config_files method, just let me know.

Cheers,

Chris
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] efficient way to manage a set of floats?

2010-05-12 Thread Robert Kern
On Wed, May 12, 2010 at 21:37,   wrote:
> On Wed, May 12, 2010 at 9:27 PM, Robert Kern  wrote:
>> On Wed, May 12, 2010 at 20:09, Dr. Phillip M. Feldman
>>  wrote:
>>>
>>> Warren Weckesser-3 wrote:

 A couple questions:

 How many floats will you be storing?

 When you test for membership, will you want to allow for a numerical
 tolerance, so that if the value 1 - 0.7 is added to the set, a test for
 the value 0.3 returns True?  (0.3 is actually 0.2, while
 1-0.7 is 0.30004)

 Warren

>>>
>>> Anne- Thanks for that absolutely beautiful explanation!!
>>>
>>> Warren- I had not initially thought about numerical tolerance, but this
>>> could potentially be an issue, in which case the management of the data
>>> would have to be completely different.  Thanks for pointing this out!  I
>>> might have as many as 50,000 values.
>>
>> You may want to explain your higher-level problem. Maintaining sets of
>> floating point numbers is almost never the right approach. With sets,
>> comparison must necessarily be by exact equality because fuzzy
>> equality is not transitive.
>
> with consistent scaling, shouldn't something like rounding to a fixed
> precision be enough?

Then you might was well convert to integers and do integer sets. The
problem is that two floats very close to a border (and hence each
other) would end up in rounding to different bins. They will compare
unequal to each other and equal to numbers farther away but in the
same arbitrary bin.

Again, it depends on the higher-level problem.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] efficient way to manage a set of floats?

2010-05-12 Thread Benjamin Root
On Wed, May 12, 2010 at 8:37 PM,  wrote:

> On Wed, May 12, 2010 at 9:27 PM, Robert Kern 
> wrote:
> > On Wed, May 12, 2010 at 20:09, Dr. Phillip M. Feldman
> >  wrote:
> >>
> >> Warren Weckesser-3 wrote:
> >>>
> >>> A couple questions:
> >>>
> >>> How many floats will you be storing?
> >>>
> >>> When you test for membership, will you want to allow for a numerical
> >>> tolerance, so that if the value 1 - 0.7 is added to the set, a test for
> >>> the value 0.3 returns True?  (0.3 is actually 0.2,
> while
> >>> 1-0.7 is 0.30004)
> >>>
> >>> Warren
> >>>
> >>
> >> Anne- Thanks for that absolutely beautiful explanation!!
> >>
> >> Warren- I had not initially thought about numerical tolerance, but this
> >> could potentially be an issue, in which case the management of the data
> >> would have to be completely different.  Thanks for pointing this out!  I
> >> might have as many as 50,000 values.
> >
> > You may want to explain your higher-level problem. Maintaining sets of
> > floating point numbers is almost never the right approach. With sets,
> > comparison must necessarily be by exact equality because fuzzy
> > equality is not transitive.
>
> with consistent scaling, shouldn't something like rounding to a fixed
> precision be enough?
>
> >>> round(1 - 0.7,14) == round(0.3, 14)
> True
> >>> 1 - 0.7 == 0.3
> False
>
> or approx_equal instead of almost_equal
>
> Josef
>
> I have to agree with Robert.  Whenever a fellow student comes to me
describing an issue where they needed to find a floating point number in an
array, the problem can usually be restated in a way that makes much more
sense.

There are so many issues with doing a naive comparison using round()
(largely because it is intransitive as someone else already stated).  As a
quick and dirty solution to very specific issues, they work -- but they are
almost never left as a final solution.

Ben
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] efficient way to manage a set of floats?

2010-05-12 Thread Anne Archibald
On 12 May 2010 20:09, Dr. Phillip M. Feldman  wrote:
>
>
> Warren Weckesser-3 wrote:
>>
>> A couple questions:
>>
>> How many floats will you be storing?
>>
>> When you test for membership, will you want to allow for a numerical
>> tolerance, so that if the value 1 - 0.7 is added to the set, a test for
>> the value 0.3 returns True?  (0.3 is actually 0.2, while
>> 1-0.7 is 0.30004)
>>
>> Warren
>>
>
> Anne- Thanks for that absolutely beautiful explanation!!
>
> Warren- I had not initially thought about numerical tolerance, but this
> could potentially be an issue, in which case the management of the data
> would have to be completely different.  Thanks for pointing this out!  I
> might have as many as 50,000 values.

If you want one-dimensional "sets" with numerical tolerances, then
either a sorted-array implementation looks more appealing. A
sorted-tree implementation will be a little awkward, since you will
often need to explore two branches to find out the nearest neighbour
of a query point. In fact what you have is a one-dimensional kd-tree,
which is helpfully provided by scipy.spatial, albeit without insertion
or deletion operators.

I should also point out that when you start wanting approximate
matches, which you will as soon as you do any sort of arithmetic on
your floats, your idea of a "set" becomes extremely messy. For
example, suppose you try to insert a float that's one part in a
million different from one that's in the table. Does it get inserted
too or is it "equal" to what's there? When it comes time to remove it,
your query will probably have a value slightly different from either
previous value - which one, or both, do you remove? Or do you raise an
exception? Resolving these questions satisfactorily will probably
require you to know the scales that are relevant in your problem and
implement sensible handling of scales larger or smaller than this (but
beware of the "teapot in a stadium problem", of wildly different
scales in the same data set). Even so, you will want to write
algorithms that are robust to imprecision, duplication, and
disappearance of points in your sets.

(If this sounds like the voice of bitter experience, well, I
discovered while writing a commercial ray-tracer that when you shoot
billions of rays into millions of triangles, all sorts of astonishing
limitations of floating-point turn into graphical artifacts. Which are
always *highly* visible. It was during this period that the
interval-arithmetic camp nearly gained a convert.)

Anne

> Phillip
> --
> View this message in context: 
> http://old.nabble.com/efficient-way-to-manage-a-set-of-floats--tp28518014p28542439.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] efficient way to manage a set of floats?

2010-05-12 Thread josef . pktd
On Wed, May 12, 2010 at 9:27 PM, Robert Kern  wrote:
> On Wed, May 12, 2010 at 20:09, Dr. Phillip M. Feldman
>  wrote:
>>
>> Warren Weckesser-3 wrote:
>>>
>>> A couple questions:
>>>
>>> How many floats will you be storing?
>>>
>>> When you test for membership, will you want to allow for a numerical
>>> tolerance, so that if the value 1 - 0.7 is added to the set, a test for
>>> the value 0.3 returns True?  (0.3 is actually 0.2, while
>>> 1-0.7 is 0.30004)
>>>
>>> Warren
>>>
>>
>> Anne- Thanks for that absolutely beautiful explanation!!
>>
>> Warren- I had not initially thought about numerical tolerance, but this
>> could potentially be an issue, in which case the management of the data
>> would have to be completely different.  Thanks for pointing this out!  I
>> might have as many as 50,000 values.
>
> You may want to explain your higher-level problem. Maintaining sets of
> floating point numbers is almost never the right approach. With sets,
> comparison must necessarily be by exact equality because fuzzy
> equality is not transitive.

with consistent scaling, shouldn't something like rounding to a fixed
precision be enough?

>>> round(1 - 0.7,14) == round(0.3, 14)
True
>>> 1 - 0.7 == 0.3
False

or approx_equal instead of almost_equal

Josef


> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] efficient way to manage a set of floats?

2010-05-12 Thread Robert Kern
On Wed, May 12, 2010 at 20:09, Dr. Phillip M. Feldman
 wrote:
>
> Warren Weckesser-3 wrote:
>>
>> A couple questions:
>>
>> How many floats will you be storing?
>>
>> When you test for membership, will you want to allow for a numerical
>> tolerance, so that if the value 1 - 0.7 is added to the set, a test for
>> the value 0.3 returns True?  (0.3 is actually 0.2, while
>> 1-0.7 is 0.30004)
>>
>> Warren
>>
>
> Anne- Thanks for that absolutely beautiful explanation!!
>
> Warren- I had not initially thought about numerical tolerance, but this
> could potentially be an issue, in which case the management of the data
> would have to be completely different.  Thanks for pointing this out!  I
> might have as many as 50,000 values.

You may want to explain your higher-level problem. Maintaining sets of
floating point numbers is almost never the right approach. With sets,
comparison must necessarily be by exact equality because fuzzy
equality is not transitive.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] default behavior of argsort

2010-05-12 Thread Dr. Phillip M. Feldman

When operating on an array whose last dimension is unity, the default
behavior of argsort is not very useful:

|6> x=random.random((4,1))
|7> shape(x)
  <7> (4, 1)
|8> argsort(x)
  <8>
array([[0],
   [0],
   [0],
   [0]])
|9> argsort(x,axis=0)
  <9>
array([[0],
   [2],
   [1],
   [3]])
-- 
View this message in context: 
http://old.nabble.com/default-behavior-of-argsort-tp28542476p28542476.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] efficient way to manage a set of floats?

2010-05-12 Thread Dr. Phillip M. Feldman


Warren Weckesser-3 wrote:
> 
> A couple questions:
> 
> How many floats will you be storing?
> 
> When you test for membership, will you want to allow for a numerical 
> tolerance, so that if the value 1 - 0.7 is added to the set, a test for 
> the value 0.3 returns True?  (0.3 is actually 0.2, while 
> 1-0.7 is 0.30004)
> 
> Warren
> 

Anne- Thanks for that absolutely beautiful explanation!!

Warren- I had not initially thought about numerical tolerance, but this
could potentially be an issue, in which case the management of the data
would have to be completely different.  Thanks for pointing this out!  I
might have as many as 50,000 values.

Phillip
-- 
View this message in context: 
http://old.nabble.com/efficient-way-to-manage-a-set-of-floats--tp28518014p28542439.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] newbie: convert recarray to floating-point ndarray with mixed types

2010-05-12 Thread Gregory, Matthew
Apologies for what is likely a simple question and I hope it hasn't been asked 
before ...

Given a recarray with a dtype consisting of more than one type, e.g.

  >>> import numpy as n
  >>> a = n.array([(1.0, 2), (3.0, 4)], dtype=[('x', float), ('y', int)])
  >>> b = a.view(n.recarray)
  >>> b
  rec.array([(1.0, 2), (3.0, 4)],
dtype=[('x', '>> c = b.view(dtype='float').reshape(b.size,-1)

but that fails with:

  ValueError: new type not compatible with array.

I understand why this would fail (as it is a view and not a copy), but I'm lost 
on a method to do this conversion simply.  

thanks, matt



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion