Re: [Numpy-discussion] using loadtxt to load a text file in to a numpy array
On 17/01/14 13:09, Aldcroft, Thomas wrote: I've been playing around with porting a stack of analysis libraries to Python 3 and this is a very timely thread and comment. What I discovered right away is that all the string data coming from binary HDF5 files show up (as expected) as 'S' type,, but that trying to make everything actually work in Python 3 without converting to 'U' is a big mess of whack-a-mole. Yes, it's possible to change my libraries to use bytestring literals everywhere, but the Python 3 user experience becomes horrible because to interact with the data all downstream applications need to use bytestring literals everywhere. E.g. doing a simple filter like `string_array == 'foo'` doesn't work, and this will break all existing code when trying to run in Python 3. And every time you try to print something it has this horrible b in front. Ugly, and it just won't work well in the end. In terms of HDF5 it is interesting to look at how h5py -- which has to go between NumPy types and HDF5 conventions -- handles the problem as described here: http://www.h5py.org/docs/topics/strings.html which IMHO got it about right. Regards, Freddie. signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] adding fused multiply and add to numpy
On 08/01/14 21:39, Julian Taylor wrote: An issue is software emulation of real fma. This can be enabled in the test ufunc with npfma.set_type(libc). This is unfortunately incredibly slow about a factor 300 on my machine without hardware fma. This means we either have a function that is fast on some platforms and slow on others but always gives the same result or we have a fast function that gives better results on some platforms. Given that we are not worth that what numpy currently provides I favor the latter. Any opinions on whether this should go into numpy or maybe stay a third party ufunc? My preference would be to initially add an madd intrinsic. This can be supported on all platforms and can be documented to permit the use of FMA where available. A 'true' FMA intrinsic function should only be provided when hardware FMA support is available. Many of the more interesting applications of FMA depend on there only being a single rounding step and as such FMA should probably mean a*b + c with only a single rounding. Regards, Freddie. signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Padding An Array Along A Single Axis
Hi all, This should be an easy one but I can not come up with a good solution. Given an ndarray with a shape of (..., X) I wish to zero-pad it to have a shape of (..., X + K), presumably obtaining a new array in the process. My best solution this far is to use np.zeros(curr.shape[:-1] + (curr.shape[-1] + K,)) followed by an assignment. However, this seems needlessly cumbersome. I looked at np.pad but it does not seem to provide a means of just padding a single axis easily. Regards, Freddie. signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Robust Sorting of Points
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 28/10/2013 12:44, Pierre Haessig wrote: Hi, Le 27/10/2013 19:28, Freddie Witherden a écrit : I wish to sort these points into a canonical order in a fashion which is robust against small perturbations. In other words changing any component of any of the points by an epsilon ~ 1e-12 should not affect the resulting sorted order. Can you give more precision on what you mean by canonical order. Since there is no natural order in R^n for n1, I guess your problem is more about *defining* what is the order you want rather than *implementing* it in C/Python or whatever. The order itself does not need to satisfy any specific properties. Any order which can be defined and implemented in a robust fashion will do. Regards, Freddie. -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.20 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlJvfgIACgkQ/J9EM/uoqVekVQCgkFxKxBCLUn6InBxyM995FVq3 x88AnjUSopT8YJgXUwIyKalAmLmVznvb =nHMZ -END PGP SIGNATURE- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Robust Sorting of Points
Hi all, This is a question which has been bugging me for a while. I have an (N, 3) array where N ~ 16 of points. These points are all unique and separated by a reasonable distance. I wish to sort these points into a canonical order in a fashion which is robust against small perturbations. In other words changing any component of any of the points by an epsilon ~ 1e-12 should not affect the resulting sorted order. Considering a direct application of np.lexsort: In [6]: my_array = np.array([[-0.5, 0, 2**0.5], [0.5, 0, 2**0.5 - 1e-15]]) In [7]: my_array[np.lexsort(my_array.T)] Out[7]: array([[ 0.5 , 0., 1.41421356], [-0.5 , 0., 1.41421356]]) however, if the small 1e-15 perturbation is removed the order changes to the 'natural' ordering. Hence, np.lexsort is out. Rounding the array before sorting is not suitable either; just because (a - b) epsilon does not mean that np.around(a, decimals=x) == np.around(b, decimals=b). I am therefore looking for an efficient (= within a factor of 10 of np.lexsort) solution to the problem. I've looked at writing my own comparison function cmp(x, y) which looks at the next dimension if abs(x[i] - y[i]) epsilon however using this with sorted is thousands of times slower. Given that I have well over 100,000 of these arrays this is nuisance. My other idea is to therefore find a means of quickly replacing all numbers within 10*epsilon of a given number in an array with that number. This should permit the application of np.lexsort in order to obtain the desired ordering (which is what I'm interesting in). However, I am yet to figure out how to do this efficiently. Before I throw in the towel and drop down to C are there any other neat tricks I am missing? Regards, Freddie. signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Robust Sorting of Points
On 27/10/13 18:35, Nathaniel Smith wrote: On Sun, Oct 27, 2013 at 6:28 PM, Freddie Witherden fred...@witherden.org wrote: Hi all, This is a question which has been bugging me for a while. I have an (N, 3) array where N ~ 16 of points. These points are all unique and separated by a reasonable distance. I wish to sort these points into a canonical order in a fashion which is robust against small perturbations. In other words changing any component of any of the points by an epsilon ~ 1e-12 should not affect the resulting sorted order. I don't understand how this is possible even in principle. Say your points are a = [0, 0, 0] b = [0, 0, 1e-12] According to your criterion, either a or b should go first -- I don't know which. Let's say our canonical ordering decides that a goes first. But if you perturb both of them, then you have a = [0, 0, 1e-12] b = [0, 0, 0] And now your requirement says that a still has to go first. But if a goes first this time, then b had to go first the last time, by symmetry. Thus your criterion is self-contradictory...? Not exactly; in your case the distance between a and b is of the order epislon. However, my points are all unique and separated by a reasonable distance. This requires at least one of the components of any two points to differ in all instances, permitting an ordering to be defined. (Where if epislon ~ 1e-12 the minimum instance between any two points is of order ~ 1e-6.) Regards, Freddie. signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Robust Sorting of Points
On 27/10/13 18:54, Daniele Nicolodi wrote: On 27/10/2013 19:42, Freddie Witherden wrote: On 27/10/13 18:35, Nathaniel Smith wrote: On Sun, Oct 27, 2013 at 6:28 PM, Freddie Witherden fred...@witherden.org wrote: Hi all, This is a question which has been bugging me for a while. I have an (N, 3) array where N ~ 16 of points. These points are all unique and separated by a reasonable distance. I wish to sort these points into a canonical order in a fashion which is robust against small perturbations. In other words changing any component of any of the points by an epsilon ~ 1e-12 should not affect the resulting sorted order. I don't understand how this is possible even in principle. Say your points are a = [0, 0, 0] b = [0, 0, 1e-12] According to your criterion, either a or b should go first -- I don't know which. Let's say our canonical ordering decides that a goes first. But if you perturb both of them, then you have a = [0, 0, 1e-12] b = [0, 0, 0] And now your requirement says that a still has to go first. But if a goes first this time, then b had to go first the last time, by symmetry. Thus your criterion is self-contradictory...? Not exactly; in your case the distance between a and b is of the order epislon. However, my points are all unique and separated by a reasonable distance. This requires at least one of the components of any two points to differ in all instances, permitting an ordering to be defined. (Where if epislon ~ 1e-12 the minimum instance between any two points is of order ~ 1e-6.) Do you mean that all you points are distributed around some fixed points in your space? In this case, it looks like what you are looking for is categorization or clustering and not sorting. Once you perform clustering, you can simply define an arbitrary order in which report the content of each cluster. If this is not the case, the problem that Nathaniel highlishts is still present. I am not entirely sure what you mean here. If x is my array of points of size (16, 3) then I am guarenteeing that np.min(scipy.spatial.distance.pdist(x)) = 1e-6 In this instance I am unsure how the issue highlighted by Nathaniel might arise. Of course it is (very) possible that I am missing something, however, I believe under the terms of this constraint that it is always possible to define an order with which to iterate through the points which is invarient to shuffling of the points and small pertubations of the components. Regards, Freddie. signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Robust Sorting of Points
On 27/10/13 20:22, josef.p...@gmail.com wrote: On Sun, Oct 27, 2013 at 3:22 PM, Freddie Witherden fred...@witherden.org wrote: On 27/10/13 18:54, Daniele Nicolodi wrote: On 27/10/2013 19:42, Freddie Witherden wrote: On 27/10/13 18:35, Nathaniel Smith wrote: On Sun, Oct 27, 2013 at 6:28 PM, Freddie Witherden fred...@witherden.org wrote: Hi all, This is a question which has been bugging me for a while. I have an (N, 3) array where N ~ 16 of points. These points are all unique and separated by a reasonable distance. I wish to sort these points into a canonical order in a fashion which is robust against small perturbations. In other words changing any component of any of the points by an epsilon ~ 1e-12 should not affect the resulting sorted order. I don't understand how this is possible even in principle. Say your points are a = [0, 0, 0] b = [0, 0, 1e-12] According to your criterion, either a or b should go first -- I don't know which. Let's say our canonical ordering decides that a goes first. But if you perturb both of them, then you have a = [0, 0, 1e-12] b = [0, 0, 0] And now your requirement says that a still has to go first. But if a goes first this time, then b had to go first the last time, by symmetry. Thus your criterion is self-contradictory...? Not exactly; in your case the distance between a and b is of the order epislon. However, my points are all unique and separated by a reasonable distance. This requires at least one of the components of any two points to differ in all instances, permitting an ordering to be defined. (Where if epislon ~ 1e-12 the minimum instance between any two points is of order ~ 1e-6.) Do you mean that all you points are distributed around some fixed points in your space? In this case, it looks like what you are looking for is categorization or clustering and not sorting. Once you perform clustering, you can simply define an arbitrary order in which report the content of each cluster. If this is not the case, the problem that Nathaniel highlishts is still present. I am not entirely sure what you mean here. If x is my array of points of size (16, 3) then I am guarenteeing that np.min(scipy.spatial.distance.pdist(x)) = 1e-6 In this instance I am unsure how the issue highlighted by Nathaniel might arise. Of course it is (very) possible that I am missing something, however, I believe under the terms of this constraint that it is always possible to define an order with which to iterate through the points which is invarient to shuffling of the points and small pertubations of the components. If the epsilon or scale depends on the column, then, I think, divmod should work to cut off the noise my_array[np.lexsort(divmod(my_array, [1e-1, 1e-12, 1])[0].T)] array([[-0.5 , 0., 1.41421356], [ 0.5 , 0., 1.41421356]]) An interesting proposal. However, it appears to have the same issues as the rounding approach. Consider: In [5]: a, b = 1.0 + 1e-13, 1.0 - 1e-13 In [6]: abs(a - b) 1e-12 Out[6]: True In [7]: divmod(a, 1e-6)[0] == divmod(b, 1e-6)[0] Out[7]: False Hence should np.lexsort encounter such a pair it will consider a and b to be different even though they are within epsilon of one another. Regards, Freddie. signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Robust Sorting of Points
On 27/10/13 21:05, Jonathan March wrote: If an almost always works solution is good enough, then sort on the distance to some fixed random point that is in the vicinity of your N points. I had considered this. Unfortunately I need a solution which really does always work. The only pure-Python solution I can envision -- at the moment anyway -- is to do some cleverness with the output of np.unique to identify similar values and replace them with an arbitrarily chosen one. This should permit the output to be passed to np.lexsort without issue. Regards, Freddie. signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion