[Numpy-discussion] nan, sign, and all that
Hi All, I've added ufuncs fmin and fmax that behave as follows: In [3]: a = array([NAN, 0, NAN, 1]) In [4]: b = array([0, NAN, NAN, 0]) In [5]: fmax(a,b) Out[5]: array([ 0., 0., NaN, 1.]) In [6]: fmin(a,b) Out[6]: array([ 0., 0., NaN, 0.]) In [7]: fmax.reduce(a) Out[7]: 1.0 In [8]: fmin.reduce(a) Out[8]: 0.0 In [9]: fmax.reduce([NAN,NAN]) Out[9]: nan In [10]: fmin.reduce([NAN,NAN]) Out[10]: nan I also made the sign ufunc return the sign of nan. That works, but I'm not sure it is the way to go because there doesn't seem to be any spec as to what sign nan takes. The current np.nan on my machine is negative and 0/0, inf/inf all return negative nan. So it doesn't look like the actual sign of nan makes any sense. Currently sign(NAN) returns 0, which doesn't look right either, so I think the thing to do is return nan but this will be a change in numpy behavior. Any thoughts? Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan, sign, and all that
Hi Charles, 2008/10/2 Charles R Harris [EMAIL PROTECTED]: In [3]: a = array([NAN, 0, NAN, 1]) In [4]: b = array([0, NAN, NAN, 0]) In [5]: fmax(a,b) Out[5]: array([ 0., 0., NaN, 1.]) In [6]: fmin(a,b) Out[6]: array([ 0., 0., NaN, 0.]) These are great, many thanks! My only gripe is that they have the same NaN-handling as amin and friends, which I consider to be broken. Others also mentioned that this should be changed, and I think David C wrote a patch for it (but I am not informed as to the speed implications). If I had to choose, this would be my preferred output: In [5]: fmax(a,b) Out[5]: array([ NaN, NaN, NaN, 1.]) Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan, sign, and all that
On Thu, Oct 2, 2008 at 02:37, Stéfan van der Walt [EMAIL PROTECTED] wrote: Hi Charles, 2008/10/2 Charles R Harris [EMAIL PROTECTED]: In [3]: a = array([NAN, 0, NAN, 1]) In [4]: b = array([0, NAN, NAN, 0]) In [5]: fmax(a,b) Out[5]: array([ 0., 0., NaN, 1.]) In [6]: fmin(a,b) Out[6]: array([ 0., 0., NaN, 0.]) These are great, many thanks! My only gripe is that they have the same NaN-handling as amin and friends, which I consider to be broken. No, these follow well-defined C99 semantics of the fmin() and fmax() functions in libm. If exactly one of the arguments is a NaN, the non-NaN argument is returned. This is *not* the current behavior of amin() et al., which just do naive comparisons. Others also mentioned that this should be changed, and I think David C wrote a patch for it (but I am not informed as to the speed implications). If I had to choose, this would be my preferred output: In [5]: fmax(a,b) Out[5]: array([ NaN, NaN, NaN, 1.]) Chuck proposes letting minimum() and maximum() have that behavior. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] loadtxt
Hi all, how can I load ASCII data if the file contains characters instead of floats Traceback (most recent call last): File test_csv.py, line 2, in module A = loadtxt('ca6_sets.csv',dtype=char ,delimiter=';') NameError: name 'char' is not defined Nils ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan, sign, and all that
2008/10/2 Robert Kern [EMAIL PROTECTED]: My only gripe is that they have the same NaN-handling as amin and friends, which I consider to be broken. No, these follow well-defined C99 semantics of the fmin() and fmax() functions in libm. If exactly one of the arguments is a NaN, the non-NaN argument is returned. This is *not* the current behavior of amin() et al., which just do naive comparisons. Let me rephrase: I'm not convinced that these C99 semantics provide an optimal user experience. It worries me greatly that NaN's pop up in operations and then disappear again. It is entirely possible for a script to run without failure and spew out garbage without the user ever knowing. Others also mentioned that this should be changed, and I think David C wrote a patch for it (but I am not informed as to the speed implications). If I had to choose, this would be my preferred output: In [5]: fmax(a,b) Out[5]: array([ NaN, NaN, NaN, 1.]) Chuck proposes letting minimum() and maximum() have that behavior. That would be a good start, which would be complemented by educating the user via some appropriate mechanism (I still don't know if one exists; there is no NumPy Paperclip TM that states You have decided to commit scientific suicide. Would you like me to cut your wrists?). That's meant only half-tongue-in-cheekedly :) Thanks for your comments, Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan, sign, and all that
On Thu, Oct 2, 2008 at 4:37 PM, Stéfan van der Walt [EMAIL PROTECTED] wrote: These are great, many thanks! My only gripe is that they have the same NaN-handling as amin and friends, which I consider to be broken. Others also mentioned that this should be changed, and I think David C wrote a patch for it (but I am not informed as to the speed implications). Hopefully, Chuck and me synchronised a bit on this :) The idea is that before, I thought that there was a nan ignoring and nan propagating behavior. Robert later mentioned that fmin/fmax has a third, well specified behavior in C99. All those three are useful, and as such have been more or less implemented by Chuck or me. I think having the new C functions by Chuck makes sense as a new python API, to follow C99 fmax/fmin. They could be used for the new max/min, but then, it feels it a bit strange compared to nanmax/nanmin, so I would prefer having the *current* numpy.max and numpy.min propagate the NaN, and nanmax/nanmin ignoring the NaN altogether. Also note that matlab does not propagate NaN for max/min. The last question is FPU status flag handling: I thought comparing NaN directly with would throw a FPE_INVALID. But this is not the case (at least on Linux with glibc and Mac OS X). This is confusing because I thought the whole point of C99 macro isgreater was to not throw this. This is also how I understand both glibc manual and mac os x man isgreater. Robert, do you have any insight on this ? David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] complex numpy.ndarray dtypes
A Thursday 02 October 2008, John Gu escrigué: Hello, I am using numpy in conjunction with pyTables. The data that I read in from pyTables seem to have the following dtype: p = hdf5.root.myTable.read() p.__class__ type 'numpy.ndarray' p[0].__class__ type 'numpy.void' p.dtype dtype([('time', 'f4'), ('obs1', 'f4'), ('obs2', 'f8'), ('obs3', 'f4')]) p.shape (61230,) The manner in which I access a particular column is p['time'] or p['obs1']. I have a couple of questions regarding this data structure: 1) how do I restructure the array into a 61230 x 4 array that can be indexed using [r,c] notation? In your example, the table (record array in NumPy jargon) is inhomogeneous (all fields are 'f4' except 'obs2' which is 'f8'). In that case, you can obtain an homogeneous array by doing something like: In [44]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','f4'), ('obs2','f8')]) In [45]: b = numpy.array([(val['obs1'], val['obs2']) for val in a], dtype='f4') In [46]: b Out[46]: array([[ 1., 2.], [ 3., 4.]], dtype=float32) In case your table would be homegenous, there is a simpler way: In [41]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','f4'), ('obs2','f4')]) In [42]: d = a.view(('f4',2)) In [43]: d Out[43]: array([[ 1., 2.], [ 3., 4.]], dtype=float32) which is faster: In [68]: timeit d = a.view(('f4',2)) 10 loops, best of 3: 11.5 µs per loop In [69]: timeit b=numpy.array([(val['obs1'], val['obs2']) for val in a], dtype='f4') 1 loops, best of 3: 39.8 µs per loop 2) What kind of dtype is pyTables using? How do I create a similar array that can be indexed by a named column? I tried various ways: a = array([[1,2],[3,4]], dtype=dtype([('obs1','f4'),('obs2','f4')])) - -- type 'exceptions.TypeError' Traceback (most recent call last) p:\AsiaDesk\johngu\projects\deltaForce\ipython console in module() type 'exceptions.TypeError': expected a readable buffer object Yeah, the error message is too terse in this case. Record array constructor needs to be sure where your records start and end, and this is achieved by mapping tuples to records. So, your example must be rewritten as: In [70]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','f4'), ('obs2','f4')]) In [71]: a Out[71]: array([(1.0, 2.0), (3.0, 4.0)], dtype=[('obs1', 'f4'), ('obs2', 'f4')]) Have a look at: http://www.scipy.org/RecordArrays for more info on record arrays. I did find some documentation about array type descriptors when reading from files... it seems like these array types are specific to arrays created when reading from some sort of file / buffer? Any help is appreciated. Thanks! I'm not sure on what you are asking here. At any rate, it might be useful to have a look at complex dtype examples in: http://www.scipy.org/Numpy_Example_List#head-f9175c69cccd74b9e4ee92e2a060af27c7447b76 Hope that helps, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt
A Thursday 02 October 2008, Nils Wagner escrigué: Hi all, how can I load ASCII data if the file contains characters instead of floats Traceback (most recent call last): File test_csv.py, line 2, in module A = loadtxt('ca6_sets.csv',dtype=char ,delimiter=';') NameError: name 'char' is not defined You would need to specify the length of your strings. Try with dtype=SN, where N is the expected length of the strings. Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Portable functions for nans, signbit, etc.
On Thu, Oct 2, 2008 at 11:41 AM, Charles R Harris [EMAIL PROTECTED] wrote: Which is rather clever. I think binary_cast will require some pointer abuse. Yep (the funny thing is that the bit twiddling will likely end up more readable than this C++ stuff) cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt
2008/10/2 Francesc Alted [EMAIL PROTECTED]: how can I load ASCII data if the file contains characters instead of floats You would need to specify the length of your strings. Try with dtype=SN, where N is the expected length of the strings. Other options include: - using converters to convert the character to a value: np.loadtxt('/tmp/bleh.dat', converters={2: lambda x: 0}) - Skipping the specified column: np.loadtxt('/tmp/bleh.dat', usecols=(0,1)) Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan, sign, and all that
Stéfan van der Walt [EMAIL PROTECTED] writes: Let me rephrase: I'm not convinced that these C99 semantics provide an optimal user experience. It worries me greatly that NaN's pop up in operations and then disappear again. It is entirely possible for a script to run without failure and spew out garbage without the user ever knowing. By default NaNs are propagated through operations on them. At the end of this discussion we ought to end up with a list of functions such as fmax, isnan, and copysign that are the exceptions. I think that it is right to defer to IEEE for their decisions on the behavior of NaNs, etc. That is what C and Fortran are doing. I have not checked but I would guess that CPUs and FPUs behave that way too. So it should be easier and faster to follow IEEE. Note that in the just released Python 2.6 floating point support of IEEE 754 has been beefed up. -- Pete Forman-./\.- Disclaimer: This post is originated WesternGeco -./\.- by myself and does not represent [EMAIL PROTECTED]-./\.- the opinion of Schlumberger or http://petef.22web.net -./\.- WesternGeco. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan, sign, and all that
On Thu, Oct 2, 2008 at 1:42 AM, Robert Kern [EMAIL PROTECTED] wrote: On Thu, Oct 2, 2008 at 02:37, Stéfan van der Walt [EMAIL PROTECTED] wrote: Hi Charles, 2008/10/2 Charles R Harris [EMAIL PROTECTED]: In [3]: a = array([NAN, 0, NAN, 1]) In [4]: b = array([0, NAN, NAN, 0]) In [5]: fmax(a,b) Out[5]: array([ 0., 0., NaN, 1.]) In [6]: fmin(a,b) Out[6]: array([ 0., 0., NaN, 0.]) These are great, many thanks! My only gripe is that they have the same NaN-handling as amin and friends, which I consider to be broken. No, these follow well-defined C99 semantics of the fmin() and fmax() functions in libm. If exactly one of the arguments is a NaN, the non-NaN argument is returned. This is *not* the current behavior of amin() et al., which just do naive comparisons. Others also mentioned that this should be changed, and I think David C wrote a patch for it (but I am not informed as to the speed implications). If I had to choose, this would be my preferred output: In [5]: fmax(a,b) Out[5]: array([ NaN, NaN, NaN, 1.]) Chuck proposes letting minimum() and maximum() have that behavior. Yes. If there is any agreement on this I would like to go ahead and do it. It does change the current behavior of maximum and minimum. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan, sign, and all that
Charles R Harris wrote: Yes. If there is any agreement on this I would like to go ahead and do it. It does change the current behavior of maximum and minimum. If you do it, please do it with as many tests as possible (it should not be difficult to have a comprehensive test with *all* float data types), because this is likely to cause problems on some platforms. thanks, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Portable functions for nans, signbit, etc.
On Thu, Oct 2, 2008 at 2:41 AM, David Cournapeau [EMAIL PROTECTED] wrote: On Thu, Oct 2, 2008 at 11:41 AM, Charles R Harris [EMAIL PROTECTED] wrote: Which is rather clever. I think binary_cast will require some pointer abuse. Yep (the funny thing is that the bit twiddling will likely end up more readable than this C++ stuff) The zip file has the bit twiddling, which is worth looking at if only for the note on the PPC extended precision. Motorola seems to be a problem but I don't think we support any of the 66xx series. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Help to process a large data file
Frank, How about that: x = np.loadtxt('file') z = x.sum(1) # Reduce data to an array of 0,1,2 rz = z[z0] # Remove all 0s since you don't want to count those. loc = np.where(rz==2)[0] # The location of the (1,1)s count = np.diff(loc) - 1 # The spacing between those (1,1)s, ie, the number of elements that have one 1. HTH, David On Wed, Oct 1, 2008 at 9:27 PM, frank wang [EMAIL PROTECTED] wrote: Hi, I have a large data file which contains 2 columns of data. The two columns only have zero and one. Now I want to cound how many one in between if both columns are one. For example, if my data is: 1 0 0 0 1 1 0 0 0 1x 0 1x 0 0 0 1x 1 1 0 0 0 1x 0 1x 1 1 Then my count will be 3 and 2 (the numbers with x). Are there an efficient way to do this? My data file is pretty big. Thanks Frank -- See how Windows connects the people, information, and fun that are part of your life. See Nowhttp://clk.atdmt.com/MRT/go/msnnkwxp1020093175mrt/direct/01/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Help to process a large data file
Frank, I would imagine that you cannot get a much better performance in python than this, which avoids string conversions: c = [] count = 0 for line in open('foo'): if line == '1 1\n': c.append(count) count = 0 else: if '1' in line: count += 1 One could do some numpy trick like: a = np.loadtxt('foo',dtype=int) a = np.sum(a,axis=1)# Add the two columns horizontally b = np.where(a==2)[0] # Find with sum == 2 (1 + 1) count = [] for i,j in zip(b[:-1],b[1:]): count.append( a[i+1:j].sum() ) # Calculate number of lines with 1 but on my machine the numpy version takes about 20 sec for a 'foo' file of 2,500,000 lines versus 1.2 sec for the pure python version... As a side note, if i replace line == '1 1\n' with line.startswith('1 1'), the pure python version goes up to 1.8 sec... Isn't this a bit weird, i'd think startswith() should be faster... Chris On Wed, Oct 01, 2008 at 07:27:27PM -0600, frank wang wrote: Hi, I have a large data file which contains 2 columns of data. The two columns only have zero and one. Now I want to cound how many one in between if both columns are one. For example, if my data is: 1 0 0 0 1 1 0 0 0 1x 0 1x 0 0 0 1x 1 1 0 0 0 1x 0 1x 1 1 Then my count will be 3 and 2 (the numbers with x). Are there an efficient way to do this? My data file is pretty big. Thanks Frank ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Help to process a large data file
Thans David and Chris for providing the nice solution. Both method works gread. I could not tell the speed difference between the two solutions. My data size is 1048577 lines. I did not try the second solution from Chris since it is too slow as Chris stated. Frank Date: Thu, 2 Oct 2008 17:43:37 +0200 From: [EMAIL PROTECTED] To: numpy-discussion@scipy.org CC: [EMAIL PROTECTED] Subject: Re: [Numpy-discussion] Help to process a large data file Frank, I would imagine that you cannot get a much better performance in python than this, which avoids string conversions: c = [] count = 0 for line in open('foo'): if line == '1 1\n': c.append(count) count = 0 else: if '1' in line: count += 1 One could do some numpy trick like: a = np.loadtxt('foo',dtype=int) a = np.sum(a,axis=1) # Add the two columns horizontally b = np.where(a==2)[0] # Find with sum == 2 (1 + 1) count = [] for i,j in zip(b[:-1],b[1:]): count.append( a[i+1:j].sum() ) # Calculate number of lines with 1 but on my machine the numpy version takes about 20 sec for a 'foo' file of 2,500,000 lines versus 1.2 sec for the pure python version... As a side note, if i replace line == '1 1\n' with line.startswith('1 1'), the pure python version goes up to 1.8 sec... Isn't this a bit weird, i'd think startswith() should be faster... Chris On Wed, Oct 01, 2008 at 07:27:27PM -0600, frank wang wrote: Hi,I have a large data file which contains 2 columns of data. The two columns only have zero and one. Now I want to cound how many one in between if both columns are one. For example, if my data is:1 0 0 0 1 1 0 0 0 1 x 0 1 x 0 0 0 1 x 1 1 0 0 0 1 x 0 1 x 1 1Then my count will be 3 and 2 (the numbers with x). Are there an efficient way to do this? My data file is pretty big. ThanksFrank ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion _ See how Windows connects the people, information, and fun that are part of your life. http://clk.atdmt.com/MRT/go/msnnkwxp1020093175mrt/direct/01/___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Texas Python Regional Unconference Reminders
Hey Steve, I'll bring my camera and try to recruit a volunteer. No guarantees, but we should at least be able to record things (any volunteers to transcode a pile of scipy videos? ;-) ). Best, Travis On Oct 1, 2008, at 7:56 PM, Steve Lianoglou wrote: Hi, Are there any plans to tape the presentations? Unfortunately some of us can't make it down to Texas, but the talks look quite interesting. Thanks, -steve On Oct 1, 2008, at 10:36 AM, Travis Vaught wrote: Greetings, The Texas Python Regional Unconference is coming up this weekend (October 4-5) and I wanted to send out some more details of the meeting. The web page for the meeting is here: http://www.scipy.org/TXUncon2008 The meeting is _absolutely free_, so please add yourself to the Attendees page if you're able to make it. Also, if you're planning to attend, please send me the following information (to [EMAIL PROTECTED] ) so I can request wireless access for you during the meeting: - Full Name - Phone or email - Address - Affiliation There are still opportunities to present your pet projects at the meeting, so feel free to sign up on the presentation schedule here: http://www.scipy.org/TXUncon2008Schedule For those who are in town Friday evening, we're planning to get together for a casual dinner in downtown Austin that night. We'll meet at Enthought offices (http://www.enthought.com/contact/map-directions.php ) and walk to a casual restaurant nearby. Show up as early as 5:30pm and you can hang out and tour the Enthought offices--we'll head out to eat at 7:00pm sharp. Best, Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan, sign, and all that
On Thu, Oct 2, 2008 at 08:22, Charles R Harris [EMAIL PROTECTED] wrote: On Thu, Oct 2, 2008 at 1:42 AM, Robert Kern [EMAIL PROTECTED] wrote: On Thu, Oct 2, 2008 at 02:37, Stéfan van der Walt [EMAIL PROTECTED] wrote: Hi Charles, 2008/10/2 Charles R Harris [EMAIL PROTECTED]: In [3]: a = array([NAN, 0, NAN, 1]) In [4]: b = array([0, NAN, NAN, 0]) In [5]: fmax(a,b) Out[5]: array([ 0., 0., NaN, 1.]) In [6]: fmin(a,b) Out[6]: array([ 0., 0., NaN, 0.]) These are great, many thanks! My only gripe is that they have the same NaN-handling as amin and friends, which I consider to be broken. No, these follow well-defined C99 semantics of the fmin() and fmax() functions in libm. If exactly one of the arguments is a NaN, the non-NaN argument is returned. This is *not* the current behavior of amin() et al., which just do naive comparisons. Others also mentioned that this should be changed, and I think David C wrote a patch for it (but I am not informed as to the speed implications). If I had to choose, this would be my preferred output: In [5]: fmax(a,b) Out[5]: array([ NaN, NaN, NaN, 1.]) Chuck proposes letting minimum() and maximum() have that behavior. Yes. If there is any agreement on this I would like to go ahead and do it. It does change the current behavior of maximum and minimum. I think the position we've held is that in the presence of NaNs, the behavior of these functions have been left unspecified, so I think it is okay to change them. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: scipy.spatial
I also like the idea of a scipy.spatial library. For the research I do in machine learning and computer vision we are often interested in specifying different distance measures. It would be nice to have a way to specify the distance measure. I would like to see a standard set included: City Block, Euclidean, Correlation, etc as well as a capability for a user defined distance or similarity function. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: scipy.spatial
2008/10/2 David Bolme [EMAIL PROTECTED]: I also like the idea of a scipy.spatial library. For the research I do in machine learning and computer vision we are often interested in specifying different distance measures. It would be nice to have a way to specify the distance measure. I would like to see a standard set included: City Block, Euclidean, Correlation, etc as well as a capability for a user defined distance or similarity function. You mean similarity or dissimilarity ? Distance is a dissimilarity but correlation is a similarity measure. Matthieu -- French PhD student Information System Engineer Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] f2py IS NOW WORKING
To all, I have now been able to develop a stable file via f2py!! However, I had to execute the following: 1.) First, I had to copy all required library files from my selected Compaq visual Fortran compiler under python's scripts directory along with f2py itself. 2.) I also had to include a dll from my compiler under python's dll directory as well. I know that the reason as to why I needed to execute these actions, is that I do not know as to what should be my correct environmental variables within windows XP running Compaq Visual Fortran 6.6. Once again, I would appreciate to know as to what are the correct environmental variables should I set my windows xp under, given that the compiler I must utilize is a Compaq Visual Fortran Compiler 6.6??? Thanks, David Blubaugh This e-mail transmission contains information that is confidential and may be privileged. It is intended only for the addressee(s) named above. If you receive this e-mail in error, please do not read, copy or disseminate it in any manner. If you are not the intended recipient, any disclosure, copying, distribution or use of the contents of this information is prohibited. Please reply to the message immediately by informing the sender that the message was misdirected. After replying, please erase it from your computer system. Your assistance in correcting this error is appreciated. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST--
Jarrod Millman wrote: The 1.2.0rc2 is now available: http://svn.scipy.org/svn/numpy/tags/1.2.0rc2 what's the status of this? Here are the Window's binaries: http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0rc2-win32-superpack-python2.5.exe this appears to be a dead link. thanks, -Chris ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST--
On Thu, Oct 2, 2008 at 16:45, Chris Barker [EMAIL PROTECTED] wrote: Jarrod Millman wrote: The 1.2.0rc2 is now available: http://svn.scipy.org/svn/numpy/tags/1.2.0rc2 what's the status of this? Superceded by the 1.2.0 release. See the thread ANN: NumPy 1.2.0. Here are the Window's binaries: http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0rc2-win32-superpack-python2.5.exe this appears to be a dead link. Superceded by http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0-win32-superpack-python2.5.exe -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: scipy.spatial
It may be useful to have an interface that handles both cases: similarity and dissimilarity. Often I have seen Nearest Neighbor algorithms that look for maximum similarity instead of minimum distance. In my field (biometrics) we often deal with very specialized distance or similarity measures. I would like to see support for user defined distance and similarity functions. It should be easy to implement by passing a function object to the KNN class. I am not sure if kd-trees or other fast algorithms are compatible with similarities or non-euclidian norms, however I would be willing to implement an exhaustive search KNN that would support user defined functions. On Oct 2, 2008, at 2:01 PM, Matthieu Brucher wrote: 2008/10/2 David Bolme [EMAIL PROTECTED]: I also like the idea of a scipy.spatial library. For the research I do in machine learning and computer vision we are often interested in specifying different distance measures. It would be nice to have a way to specify the distance measure. I would like to see a standard set included: City Block, Euclidean, Correlation, etc as well as a capability for a user defined distance or similarity function. You mean similarity or dissimilarity ? Distance is a dissimilarity but correlation is a similarity measure. Matthieu -- French PhD student Information System Engineer Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST--
Robert Kern wrote: Superceded by the 1.2.0 release. See the thread ANN: NumPy 1.2.0. I thought I'd seen that, but when I went to: http://www.scipy.org/Download And I still got 1.1 Superceded by http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0-win32-superpack-python2.5.exe thanks, -Chris ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: scipy.spatial
2008/10/2 David Bolme [EMAIL PROTECTED]: It may be useful to have an interface that handles both cases: similarity and dissimilarity. Often I have seen Nearest Neighbor algorithms that look for maximum similarity instead of minimum distance. In my field (biometrics) we often deal with very specialized distance or similarity measures. I would like to see support for user defined distance and similarity functions. It should be easy to implement by passing a function object to the KNN class. I am not sure if kd-trees or other fast algorithms are compatible with similarities or non-euclidian norms, however I would be willing to implement an exhaustive search KNN that would support user defined functions. kd-trees can only work for distance measures which have certain special properties (in particular, you have to be able to bound them based on coordinate differences). This is just fine for all the Minkowski p-norms (so in particular, Euclidean distance, maximum-coordinate-difference, and Manhattan distance) and in fact the current implementation already supports all of these. I don't think that correlation can be made into such a distance measure - the neighborhoods are the wrong shape. In fact the basic space is projective n-1 space rather than affine n-space, so I think you're going to need some very different algorithm. If you make a metric space out of it - define d(A,B) to be the angle between A and B - then cover trees can serve as a spatial data structure for nearest-neighbor search. Cover trees may be worth implementing, as they're a very generic data structure, suitable for (among other things) low-dimensional data in high-dimensional spaces. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.random.hypergeometric - strange results
see http://scipy.org/scipy/numpy/ticket/921 I think I found the error http://scipy.org/scipy/numpy/browser/trunk/numpy/random/mtrand/distributions.c {{{ 805 /* this is a correction to HRUA* by Ivan Frohne in rv.py */ 806 if (good bad) Z = m - Z; }}} Quickly looking at the referenced program, downloaded from: http://pal.ece.iisc.ernet.in/~dhani/frohne/rv.py Notation: alpha = bad, beta = good: {{{ if alpha beta: # Error in HRUA*, this is correct. z = m - z }}} As you can see, if my interpretation is correct, then line 806 should have good and bad reversed, i.e. {{{ 806 if (bad good) Z = m - Z; }}} Can you verify this? I never tried to build numpy from source. Josef On Sep 25, 4:18 pm, joep [EMAIL PROTECTED] wrote: In my fuzz testing of scipy stats, I get sometimes a test failure. I think there is something wrong with numpy.random.hypergeometric for some cases: Josef import numpy.random as mtrand mtrand.hypergeometric(3,17,12,size=10) # there are only 3 good balls in urn array([16, 17, 16, 16, 15, 16, 17, 16, 17, 16]) mtrand.hypergeometric(17,3,12,size=10) #negative result array([-3, -4, -3, -4, -3, -3, -4, -4, -5, -4]) mtrand.hypergeometric(4,3,12,size=10) np.version.version '1.2.0rc2' I did not find any clear pattern when trying out different parameter values: mtrand.hypergeometric(10,10,12,size=10) array([5, 6, 4, 4, 8, 5, 4, 6, 7, 4]) mtrand.hypergeometric(10,10,20,size=10) array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10]) mtrand.hypergeometric(10,10,19,size=10) array([10, 9, 9, 9, 9, 9, 10, 9, 9, 9]) mtrand.hypergeometric(10,10,5,size=10) array([3, 5, 2, 2, 1, 2, 2, 4, 3, 1]) mtrand.hypergeometric(10,2,5,size=10) array([4, 5, 4, 5, 5, 5, 4, 3, 4, 4]) mtrand.hypergeometric(2,10,5,size=10) array([0, 2, 1, 0, 2, 2, 1, 1, 1, 1]) mtrand.hypergeometric(17,3,12,size=10) array([-5, -3, -4, -4, -4, -3, -4, -4, -3, -3]) mtrand.hypergeometric(3,17,12,size=10) array([15, 16, 17, 16, 15, 16, 15, 15, 17, 17]) mtrand.hypergeometric(18,3,12,size=10) array([-5, -6, -6, -4, -4, -4, -5, -3, -5, -5]) mtrand.hypergeometric(18,3,5,size=10) array([4, 5, 5, 5, 5, 5, 4, 5, 4, 3]) mtrand.hypergeometric(18,3,19,size=10) array([1, 1, 2, 1, 1, 1, 1, 3, 1, 1]) ___ Numpy-discussion mailing list [EMAIL PROTECTED]://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST--
On Thu, Oct 2, 2008 at 4:29 PM, Chris Barker [EMAIL PROTECTED] wrote: Robert Kern wrote: Superceded by the 1.2.0 release. See the thread ANN: NumPy 1.2.0. I thought I'd seen that, but when I went to: http://www.scipy.org/Download And I still got 1.1 I updated the page to point to the sourceforge page. Thanks for catching that. -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.random.logseries - incorrect convergence for k=1, k=2
Filed as http://scipy.org/scipy/numpy/ticket/923 and I think i finally tracked down the source of the incorrect random numbers, a reversed inequality in http://scipy.org/scipy/numpy/browser/trunk/numpy/random/mtrand/distributions.c line 871, see my last comment to the trac ticket. Josef On Sep 27, 2:12 pm, joep [EMAIL PROTECTED] wrote: random numbers generated by numpy.random.logseries do not converge to theoretical distribution: for probability paramater pr = 0.8, the random number generator converges to a frequency for k=1 at 39.8 %, while the theoretical probability mass is 49.71 k=2 is oversampled, other k's look ok check frequency of k=1 and k=2 at N = 100 0.398406 0.296465 pmf at k = 1 and k=2 with formula [ 0.4971 0.1988] for probability paramater pr = 0.3, the results are not as bad, but still off: frequency for k=1 at 82.6 %, while the theoretical probability mass is 84.11 check frequency of k=1 and k=2 at N = 100 0.826006 0.141244 pmf at k = 1 and k=2 with formula [ 0.8411 0.1262] below is a quick script for checking this Josef {{{ import numpy as np from scipy import stats pr = 0.8 np.set_printoptions(precision=2, suppress=True) # calculation for N=1million takes some time for N in [1000, 1, 1, 100]: rvsn=np.random.logseries(pr,size=N) fr=stats.itemfreq(rvsn) pmfs=stats.logser.pmf(fr[:,0],pr)*100 print 'log series sample frequency and pmf (in %) with N = ', N print np.column_stack((fr[:,0],fr[:,1]*100.0/N,pmfs)) np.set_printoptions(precision=4, suppress=True) print 'check frequency of k=1 and k=2 at N = ', N print np.sum(rvsn==1)/float(N), print np.sum(rvsn==2)/float(N) k = np.array([1,2]) print 'pmf at k = 1 and k=2 with formula' print -pr**k * 1.0 / k / np.log(1-pr)}}} ___ Numpy-discussion mailing list [EMAIL PROTECTED]://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion