Re: [Numpy-discussion] newbie: convert recarray to floating-point ndarray with mixed types
On 05/12/2010 12:37 PM, Gregory, Matthew wrote: > Apologies for what is likely a simple question and I hope it hasn't been > asked before ... > > Given a recarray with a dtype consisting of more than one type, e.g. > >>>> import numpy as n >>>> a = n.array([(1.0, 2), (3.0, 4)], dtype=[('x', float), ('y', int)]) >>>> b = a.view(n.recarray) >>>> b >rec.array([(1.0, 2), (3.0, 4)], > dtype=[('x', ' > Is there a simple way to convert 'b' to a floating-point ndarray, casting the > integer field to a floating-point? I've tried the naïve: > >>>> c = b.view(dtype='float').reshape(b.size,-1) > > but that fails with: > >ValueError: new type not compatible with array. > > I understand why this would fail (as it is a view and not a copy), but I'm > lost on a method to do this conversion simply. > It may not be as simple as you would like, but the following works efficiently: import numpy as np a = np.array([(1.0, 2), (3.0, 4)], dtype=[('x', float), ('y', int)]) b = np.empty((a.shape[0], 2), dtype=np.float) b[:,0] = a['x'] b[:,1] = a['y'] Eric > thanks, matt ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug with how numpy.distutils.system_info handles the site.cfg
On Wed, May 12, 2010 at 11:06 PM, Chris Colbert wrote: > I had this problem back in 2009 when building Enthought Enable, and was > happy with a work around. It just bit me again, and I finally got around to > drilling down to the problem. > > On linux, if one uses the numpy/site.cfg [default] section when building > from source to specifiy local library directories, the x11 libs won't be > found by NumPy. > > The relevant section of the site.cfg.example reads as follows: > > # Defaults > # > # The settings given here will apply to all other sections if not > overridden. > # This is a good place to add general library and include directories like > # /usr/local/{lib,include} > # > #[DEFAULT] > #library_dirs = /usr/local/lib > #include_dirs = /usr/local/include > > Now, I build NumPy with Atlas and my Atlas libs are installed in > /usr/local, so my [default] section of site.cfg looks like this (as > suggested by the site.cfg.example): > > # Defaults > # > # The settings given here will apply to all other sections if not > overridden. > # This is a good place to add general library and include directories like > # /usr/local/{lib,include} > # > [DEFAULT] > library_dirs = /usr/local/lib:/usr/local/lib/atlas > include_dirs = /usr/local/include > > > NumPy builds and works fine with this. The problem occurs when other > libraries use numpy.distutils.system_info.get_info('x11') (ala Enthought > Enable). That function eventually calls > numpy.distutils.system_info.system_info.parse_config_files which has the > following definition: > > def parse_config_files(self): > self.cp.read(self.files) > if not self.cp.has_section(self.section): > if self.section is not None: > self.cp.add_section(self.section) > > When self.cp is instantiated (when looking for the x11 libs), it is > provided the following defaults: > > {'libraries': '', 'src_dirs': '.:/usr/local/src', 'search_static_first': > '0', 'library_dirs': > '/usr/X11R6/lib64:/usr/X11R6/lib:/usr/X11/lib64:/usr/X11/lib:/usr/lib64:/usr/lib', > 'include_dirs': '/usr/X11R6/include:/usr/X11/include:/usr/include'} > > As is clearly seen, the 'library_dirs' contains the proper paths to find > the x11 libs. But since the config file has [default] section, these paths > get trampled and replaced with whatever is contained in the site.cfg > [default] section. In my case, this is /usr/local/lib:/usr/local/lib/atlas. > Thus, my x11 libs aren't found and the Enable build fails. > > The workaround is to include an [x11] section in site.cfg with the > appropriate paths, but I don't really feel this should be necessary. Would > the better behavior be to look for a [default] section in the config file in > the parse_config_files method and add those paths to the already specified > defaults? > > Then again, another workaround could be to add the atlas directory paths to the [blas_opt] and [lapack_opt] sections. This would work for my case, but it doesn't solve the larger problem of any directories put in [default] trouncing any of the other standard dirs that would otherwise be used. > Changing the site.cfg [default] section to read as follows: > > [DEFAULT] > library_dirs = /usr/lib:/usr/local/lib:/usr/local/lib/atlas > include_dirs = /usr/include:/usr/local/include > > is not an option because then NumPy will find and use the system atlas, > which in my case is not threaded nor optimized for my machine. > > If you want me to patch the parse_config_files method, just let me know. > > Cheers, > > Chris > > > > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug with how numpy.distutils.system_info handles the site.cfg
I had this problem back in 2009 when building Enthought Enable, and was happy with a work around. It just bit me again, and I finally got around to drilling down to the problem. On linux, if one uses the numpy/site.cfg [default] section when building from source to specifiy local library directories, the x11 libs won't be found by NumPy. The relevant section of the site.cfg.example reads as follows: # Defaults # # The settings given here will apply to all other sections if not overridden. # This is a good place to add general library and include directories like # /usr/local/{lib,include} # #[DEFAULT] #library_dirs = /usr/local/lib #include_dirs = /usr/local/include Now, I build NumPy with Atlas and my Atlas libs are installed in /usr/local, so my [default] section of site.cfg looks like this (as suggested by the site.cfg.example): # Defaults # # The settings given here will apply to all other sections if not overridden. # This is a good place to add general library and include directories like # /usr/local/{lib,include} # [DEFAULT] library_dirs = /usr/local/lib:/usr/local/lib/atlas include_dirs = /usr/local/include NumPy builds and works fine with this. The problem occurs when other libraries use numpy.distutils.system_info.get_info('x11') (ala Enthought Enable). That function eventually calls numpy.distutils.system_info.system_info.parse_config_files which has the following definition: def parse_config_files(self): self.cp.read(self.files) if not self.cp.has_section(self.section): if self.section is not None: self.cp.add_section(self.section) When self.cp is instantiated (when looking for the x11 libs), it is provided the following defaults: {'libraries': '', 'src_dirs': '.:/usr/local/src', 'search_static_first': '0', 'library_dirs': '/usr/X11R6/lib64:/usr/X11R6/lib:/usr/X11/lib64:/usr/X11/lib:/usr/lib64:/usr/lib', 'include_dirs': '/usr/X11R6/include:/usr/X11/include:/usr/include'} As is clearly seen, the 'library_dirs' contains the proper paths to find the x11 libs. But since the config file has [default] section, these paths get trampled and replaced with whatever is contained in the site.cfg [default] section. In my case, this is /usr/local/lib:/usr/local/lib/atlas. Thus, my x11 libs aren't found and the Enable build fails. The workaround is to include an [x11] section in site.cfg with the appropriate paths, but I don't really feel this should be necessary. Would the better behavior be to look for a [default] section in the config file in the parse_config_files method and add those paths to the already specified defaults? Changing the site.cfg [default] section to read as follows: [DEFAULT] library_dirs = /usr/lib:/usr/local/lib:/usr/local/lib/atlas include_dirs = /usr/include:/usr/local/include is not an option because then NumPy will find and use the system atlas, which in my case is not threaded nor optimized for my machine. If you want me to patch the parse_config_files method, just let me know. Cheers, Chris ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] efficient way to manage a set of floats?
On Wed, May 12, 2010 at 21:37, wrote: > On Wed, May 12, 2010 at 9:27 PM, Robert Kern wrote: >> On Wed, May 12, 2010 at 20:09, Dr. Phillip M. Feldman >> wrote: >>> >>> Warren Weckesser-3 wrote: A couple questions: How many floats will you be storing? When you test for membership, will you want to allow for a numerical tolerance, so that if the value 1 - 0.7 is added to the set, a test for the value 0.3 returns True? (0.3 is actually 0.2, while 1-0.7 is 0.30004) Warren >>> >>> Anne- Thanks for that absolutely beautiful explanation!! >>> >>> Warren- I had not initially thought about numerical tolerance, but this >>> could potentially be an issue, in which case the management of the data >>> would have to be completely different. Thanks for pointing this out! I >>> might have as many as 50,000 values. >> >> You may want to explain your higher-level problem. Maintaining sets of >> floating point numbers is almost never the right approach. With sets, >> comparison must necessarily be by exact equality because fuzzy >> equality is not transitive. > > with consistent scaling, shouldn't something like rounding to a fixed > precision be enough? Then you might was well convert to integers and do integer sets. The problem is that two floats very close to a border (and hence each other) would end up in rounding to different bins. They will compare unequal to each other and equal to numbers farther away but in the same arbitrary bin. Again, it depends on the higher-level problem. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] efficient way to manage a set of floats?
On Wed, May 12, 2010 at 8:37 PM, wrote: > On Wed, May 12, 2010 at 9:27 PM, Robert Kern > wrote: > > On Wed, May 12, 2010 at 20:09, Dr. Phillip M. Feldman > > wrote: > >> > >> Warren Weckesser-3 wrote: > >>> > >>> A couple questions: > >>> > >>> How many floats will you be storing? > >>> > >>> When you test for membership, will you want to allow for a numerical > >>> tolerance, so that if the value 1 - 0.7 is added to the set, a test for > >>> the value 0.3 returns True? (0.3 is actually 0.2, > while > >>> 1-0.7 is 0.30004) > >>> > >>> Warren > >>> > >> > >> Anne- Thanks for that absolutely beautiful explanation!! > >> > >> Warren- I had not initially thought about numerical tolerance, but this > >> could potentially be an issue, in which case the management of the data > >> would have to be completely different. Thanks for pointing this out! I > >> might have as many as 50,000 values. > > > > You may want to explain your higher-level problem. Maintaining sets of > > floating point numbers is almost never the right approach. With sets, > > comparison must necessarily be by exact equality because fuzzy > > equality is not transitive. > > with consistent scaling, shouldn't something like rounding to a fixed > precision be enough? > > >>> round(1 - 0.7,14) == round(0.3, 14) > True > >>> 1 - 0.7 == 0.3 > False > > or approx_equal instead of almost_equal > > Josef > > I have to agree with Robert. Whenever a fellow student comes to me describing an issue where they needed to find a floating point number in an array, the problem can usually be restated in a way that makes much more sense. There are so many issues with doing a naive comparison using round() (largely because it is intransitive as someone else already stated). As a quick and dirty solution to very specific issues, they work -- but they are almost never left as a final solution. Ben ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] efficient way to manage a set of floats?
On 12 May 2010 20:09, Dr. Phillip M. Feldman wrote: > > > Warren Weckesser-3 wrote: >> >> A couple questions: >> >> How many floats will you be storing? >> >> When you test for membership, will you want to allow for a numerical >> tolerance, so that if the value 1 - 0.7 is added to the set, a test for >> the value 0.3 returns True? (0.3 is actually 0.2, while >> 1-0.7 is 0.30004) >> >> Warren >> > > Anne- Thanks for that absolutely beautiful explanation!! > > Warren- I had not initially thought about numerical tolerance, but this > could potentially be an issue, in which case the management of the data > would have to be completely different. Thanks for pointing this out! I > might have as many as 50,000 values. If you want one-dimensional "sets" with numerical tolerances, then either a sorted-array implementation looks more appealing. A sorted-tree implementation will be a little awkward, since you will often need to explore two branches to find out the nearest neighbour of a query point. In fact what you have is a one-dimensional kd-tree, which is helpfully provided by scipy.spatial, albeit without insertion or deletion operators. I should also point out that when you start wanting approximate matches, which you will as soon as you do any sort of arithmetic on your floats, your idea of a "set" becomes extremely messy. For example, suppose you try to insert a float that's one part in a million different from one that's in the table. Does it get inserted too or is it "equal" to what's there? When it comes time to remove it, your query will probably have a value slightly different from either previous value - which one, or both, do you remove? Or do you raise an exception? Resolving these questions satisfactorily will probably require you to know the scales that are relevant in your problem and implement sensible handling of scales larger or smaller than this (but beware of the "teapot in a stadium problem", of wildly different scales in the same data set). Even so, you will want to write algorithms that are robust to imprecision, duplication, and disappearance of points in your sets. (If this sounds like the voice of bitter experience, well, I discovered while writing a commercial ray-tracer that when you shoot billions of rays into millions of triangles, all sorts of astonishing limitations of floating-point turn into graphical artifacts. Which are always *highly* visible. It was during this period that the interval-arithmetic camp nearly gained a convert.) Anne > Phillip > -- > View this message in context: > http://old.nabble.com/efficient-way-to-manage-a-set-of-floats--tp28518014p28542439.html > Sent from the Numpy-discussion mailing list archive at Nabble.com. > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] efficient way to manage a set of floats?
On Wed, May 12, 2010 at 9:27 PM, Robert Kern wrote: > On Wed, May 12, 2010 at 20:09, Dr. Phillip M. Feldman > wrote: >> >> Warren Weckesser-3 wrote: >>> >>> A couple questions: >>> >>> How many floats will you be storing? >>> >>> When you test for membership, will you want to allow for a numerical >>> tolerance, so that if the value 1 - 0.7 is added to the set, a test for >>> the value 0.3 returns True? (0.3 is actually 0.2, while >>> 1-0.7 is 0.30004) >>> >>> Warren >>> >> >> Anne- Thanks for that absolutely beautiful explanation!! >> >> Warren- I had not initially thought about numerical tolerance, but this >> could potentially be an issue, in which case the management of the data >> would have to be completely different. Thanks for pointing this out! I >> might have as many as 50,000 values. > > You may want to explain your higher-level problem. Maintaining sets of > floating point numbers is almost never the right approach. With sets, > comparison must necessarily be by exact equality because fuzzy > equality is not transitive. with consistent scaling, shouldn't something like rounding to a fixed precision be enough? >>> round(1 - 0.7,14) == round(0.3, 14) True >>> 1 - 0.7 == 0.3 False or approx_equal instead of almost_equal Josef > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] efficient way to manage a set of floats?
On Wed, May 12, 2010 at 20:09, Dr. Phillip M. Feldman wrote: > > Warren Weckesser-3 wrote: >> >> A couple questions: >> >> How many floats will you be storing? >> >> When you test for membership, will you want to allow for a numerical >> tolerance, so that if the value 1 - 0.7 is added to the set, a test for >> the value 0.3 returns True? (0.3 is actually 0.2, while >> 1-0.7 is 0.30004) >> >> Warren >> > > Anne- Thanks for that absolutely beautiful explanation!! > > Warren- I had not initially thought about numerical tolerance, but this > could potentially be an issue, in which case the management of the data > would have to be completely different. Thanks for pointing this out! I > might have as many as 50,000 values. You may want to explain your higher-level problem. Maintaining sets of floating point numbers is almost never the right approach. With sets, comparison must necessarily be by exact equality because fuzzy equality is not transitive. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] default behavior of argsort
When operating on an array whose last dimension is unity, the default behavior of argsort is not very useful: |6> x=random.random((4,1)) |7> shape(x) <7> (4, 1) |8> argsort(x) <8> array([[0], [0], [0], [0]]) |9> argsort(x,axis=0) <9> array([[0], [2], [1], [3]]) -- View this message in context: http://old.nabble.com/default-behavior-of-argsort-tp28542476p28542476.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] efficient way to manage a set of floats?
Warren Weckesser-3 wrote: > > A couple questions: > > How many floats will you be storing? > > When you test for membership, will you want to allow for a numerical > tolerance, so that if the value 1 - 0.7 is added to the set, a test for > the value 0.3 returns True? (0.3 is actually 0.2, while > 1-0.7 is 0.30004) > > Warren > Anne- Thanks for that absolutely beautiful explanation!! Warren- I had not initially thought about numerical tolerance, but this could potentially be an issue, in which case the management of the data would have to be completely different. Thanks for pointing this out! I might have as many as 50,000 values. Phillip -- View this message in context: http://old.nabble.com/efficient-way-to-manage-a-set-of-floats--tp28518014p28542439.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] newbie: convert recarray to floating-point ndarray with mixed types
Apologies for what is likely a simple question and I hope it hasn't been asked before ... Given a recarray with a dtype consisting of more than one type, e.g. >>> import numpy as n >>> a = n.array([(1.0, 2), (3.0, 4)], dtype=[('x', float), ('y', int)]) >>> b = a.view(n.recarray) >>> b rec.array([(1.0, 2), (3.0, 4)], dtype=[('x', '>> c = b.view(dtype='float').reshape(b.size,-1) but that fails with: ValueError: new type not compatible with array. I understand why this would fail (as it is a view and not a copy), but I'm lost on a method to do this conversion simply. thanks, matt ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion