Re: [Numpy-discussion] using loadtxt to load a text file in to a numpy array

2014-01-17 Thread Freddie Witherden
On 17/01/14 13:09, Aldcroft, Thomas wrote:
 I've been playing around with porting a stack of analysis libraries to
 Python 3 and this is a very timely thread and comment.  What I
 discovered right away is that all the string data coming from binary
 HDF5 files show up (as expected) as 'S' type,, but that trying to make
 everything actually work in Python 3 without converting to 'U' is a big
 mess of whack-a-mole.  
 
 Yes, it's possible to change my libraries to use bytestring literals
 everywhere, but the Python 3 user experience becomes horrible because to
 interact with the data all downstream applications need to use
 bytestring literals everywhere.  E.g. doing a simple filter like
 `string_array == 'foo'` doesn't work, and this will break all existing
 code when trying to run in Python 3.  And every time you try to print
 something it has this horrible b in front.  Ugly, and it just won't
 work well in the end.

In terms of HDF5 it is interesting to look at how h5py -- which has to
go between NumPy types and HDF5 conventions -- handles the problem as
described here:

  http://www.h5py.org/docs/topics/strings.html

which IMHO got it about right.

Regards, Freddie.



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] adding fused multiply and add to numpy

2014-01-09 Thread Freddie Witherden
On 08/01/14 21:39, Julian Taylor wrote:
 An issue is software emulation of real fma. This can be enabled in the
 test ufunc with npfma.set_type(libc).
 This is unfortunately incredibly slow about a factor 300 on my machine
 without hardware fma.
 This means we either have a function that is fast on some platforms and
 slow on others but always gives the same result or we have a fast
 function that gives better results on some platforms.
 Given that we are not worth that what numpy currently provides I favor
 the latter.
 
 Any opinions on whether this should go into numpy or maybe stay a third
 party ufunc?

My preference would be to initially add an madd intrinsic.  This can
be supported on all platforms and can be documented to permit the use of
FMA where available.

A 'true' FMA intrinsic function should only be provided when hardware
FMA support is available.  Many of the more interesting applications of
FMA depend on there only being a single rounding step and as such FMA
should probably mean a*b + c with only a single rounding.

Regards, Freddie.



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Padding An Array Along A Single Axis

2014-01-03 Thread Freddie Witherden
Hi all,

This should be an easy one but I can not come up with a good solution.
Given an ndarray with a shape of (..., X) I wish to zero-pad it to have
a shape of (..., X + K), presumably obtaining a new array in the process.

My best solution this far is to use

   np.zeros(curr.shape[:-1] + (curr.shape[-1] + K,))

followed by an assignment.  However, this seems needlessly cumbersome.
I looked at np.pad but it does not seem to provide a means of just
padding a single axis easily.

Regards, Freddie.



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Robust Sorting of Points

2013-10-29 Thread Freddie Witherden
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 28/10/2013 12:44, Pierre Haessig wrote:
 Hi,
 
 Le 27/10/2013 19:28, Freddie Witherden a écrit :
 I wish to sort these points into a canonical order in a fashion
 which is robust against small perturbations.  In other words
 changing any component of any of the points by an epsilon ~ 1e-12
 should not affect the resulting sorted order.
 Can you give more precision on what you mean by canonical order.
 Since there is no natural order in R^n for n1, I guess your
 problem is more about *defining*  what is the order you want rather
 than *implementing* it in C/Python or whatever.

The order itself does not need to satisfy any specific properties.
Any order which can be defined and implemented in a robust fashion
will do.

Regards, Freddie.

-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.20 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJvfgIACgkQ/J9EM/uoqVekVQCgkFxKxBCLUn6InBxyM995FVq3
x88AnjUSopT8YJgXUwIyKalAmLmVznvb
=nHMZ
-END PGP SIGNATURE-
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Robust Sorting of Points

2013-10-27 Thread Freddie Witherden
Hi all,

This is a question which has been bugging me for a while.  I have an (N,
3) array where N ~ 16 of points.  These points are all unique and
separated by a reasonable distance.

I wish to sort these points into a canonical order in a fashion which is
robust against small perturbations.  In other words changing any
component of any of the points by an epsilon ~ 1e-12 should not affect
the resulting sorted order.

Considering a direct application of np.lexsort:

  In [6]: my_array = np.array([[-0.5, 0, 2**0.5],
   [0.5, 0, 2**0.5 - 1e-15]])

  In [7]: my_array[np.lexsort(my_array.T)]
  Out[7]:  array([[ 0.5   ,  0.,  1.41421356],
  [-0.5   ,  0.,  1.41421356]])

however, if the small 1e-15 perturbation is removed the order changes to
the 'natural' ordering.  Hence, np.lexsort is out.

Rounding the array before sorting is not suitable either; just because
(a - b)  epsilon does not mean that np.around(a, decimals=x) ==
np.around(b, decimals=b).

I am therefore looking for an efficient (= within a factor of 10 of
np.lexsort) solution to the problem.  I've looked at writing my own
comparison function cmp(x, y) which looks at the next dimension if
abs(x[i] - y[i])  epsilon however using this with sorted is thousands
of times slower.  Given that I have well over 100,000 of these arrays
this is nuisance.

My other idea is to therefore find a means of quickly replacing all
numbers within 10*epsilon of a given number in an array with that
number.  This should permit the application of np.lexsort in order to
obtain the desired ordering (which is what I'm interesting in).
However, I am yet to figure out how to do this efficiently.

Before I throw in the towel and drop down to C are there any other neat
tricks I am missing?

Regards, Freddie.



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Robust Sorting of Points

2013-10-27 Thread Freddie Witherden
On 27/10/13 18:35, Nathaniel Smith wrote:
 On Sun, Oct 27, 2013 at 6:28 PM, Freddie Witherden
 fred...@witherden.org wrote:
 Hi all,

 This is a question which has been bugging me for a while.  I have an (N,
 3) array where N ~ 16 of points.  These points are all unique and
 separated by a reasonable distance.

 I wish to sort these points into a canonical order in a fashion which is
 robust against small perturbations.  In other words changing any
 component of any of the points by an epsilon ~ 1e-12 should not affect
 the resulting sorted order.
 
 I don't understand how this is possible even in principle.
 
 Say your points are
 
  a = [0, 0, 0]
  b = [0, 0, 1e-12]
 
 According to your criterion, either a or b should go first -- I don't
 know which. Let's say our canonical ordering decides that a goes
 first. But if you perturb both of them, then you have
 
  a = [0, 0, 1e-12]
  b = [0, 0, 0]
 
 And now your requirement says that a still has to go first. But if a
 goes first this time, then b had to go first the last time, by
 symmetry. Thus your criterion is self-contradictory...?

Not exactly; in your case the distance between a and b is of the order
epislon.  However, my points are all unique and separated by a
reasonable distance.  This requires at least one of the components of
any two points to differ in all instances, permitting an ordering to be
defined.  (Where if epislon ~ 1e-12 the minimum instance between any two
points is of order ~ 1e-6.)

Regards, Freddie.



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Robust Sorting of Points

2013-10-27 Thread Freddie Witherden
On 27/10/13 18:54, Daniele Nicolodi wrote:
 On 27/10/2013 19:42, Freddie Witherden wrote:
 On 27/10/13 18:35, Nathaniel Smith wrote:
 On Sun, Oct 27, 2013 at 6:28 PM, Freddie Witherden
 fred...@witherden.org wrote:
 Hi all,

 This is a question which has been bugging me for a while.  I have an (N,
 3) array where N ~ 16 of points.  These points are all unique and
 separated by a reasonable distance.

 I wish to sort these points into a canonical order in a fashion which is
 robust against small perturbations.  In other words changing any
 component of any of the points by an epsilon ~ 1e-12 should not affect
 the resulting sorted order.

 I don't understand how this is possible even in principle.

 Say your points are

  a = [0, 0, 0]
  b = [0, 0, 1e-12]

 According to your criterion, either a or b should go first -- I don't
 know which. Let's say our canonical ordering decides that a goes
 first. But if you perturb both of them, then you have

  a = [0, 0, 1e-12]
  b = [0, 0, 0]

 And now your requirement says that a still has to go first. But if a
 goes first this time, then b had to go first the last time, by
 symmetry. Thus your criterion is self-contradictory...?

 Not exactly; in your case the distance between a and b is of the order
 epislon.  However, my points are all unique and separated by a
 reasonable distance.  This requires at least one of the components of
 any two points to differ in all instances, permitting an ordering to be
 defined.  (Where if epislon ~ 1e-12 the minimum instance between any two
 points is of order ~ 1e-6.)
 
 Do you mean that all you points are distributed around some fixed points
 in your space?  In this case, it looks like what you are looking for is
 categorization or clustering and not sorting.  Once you perform
 clustering, you can simply define an arbitrary order in which report the
 content of each cluster.  If this is not the case, the problem that
 Nathaniel highlishts is still present.

I am not entirely sure what you mean here.  If x is my array of points
of size (16, 3) then I am guarenteeing that

  np.min(scipy.spatial.distance.pdist(x)) = 1e-6

In this instance I am unsure how the issue highlighted by Nathaniel
might arise.  Of course it is (very) possible that I am missing
something, however, I believe under the terms of this constraint that it
is always possible to define an order with which to iterate through the
points which is invarient to shuffling of the points and small
pertubations of the components.

Regards, Freddie.




signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Robust Sorting of Points

2013-10-27 Thread Freddie Witherden
On 27/10/13 20:22, josef.p...@gmail.com wrote:
 On Sun, Oct 27, 2013 at 3:22 PM, Freddie Witherden
 fred...@witherden.org wrote:
 On 27/10/13 18:54, Daniele Nicolodi wrote:
 On 27/10/2013 19:42, Freddie Witherden wrote:
 On 27/10/13 18:35, Nathaniel Smith wrote:
 On Sun, Oct 27, 2013 at 6:28 PM, Freddie Witherden
 fred...@witherden.org wrote:
 Hi all,

 This is a question which has been bugging me for a while.  I have an (N,
 3) array where N ~ 16 of points.  These points are all unique and
 separated by a reasonable distance.

 I wish to sort these points into a canonical order in a fashion which is
 robust against small perturbations.  In other words changing any
 component of any of the points by an epsilon ~ 1e-12 should not affect
 the resulting sorted order.

 I don't understand how this is possible even in principle.

 Say your points are

  a = [0, 0, 0]
  b = [0, 0, 1e-12]

 According to your criterion, either a or b should go first -- I don't
 know which. Let's say our canonical ordering decides that a goes
 first. But if you perturb both of them, then you have

  a = [0, 0, 1e-12]
  b = [0, 0, 0]

 And now your requirement says that a still has to go first. But if a
 goes first this time, then b had to go first the last time, by
 symmetry. Thus your criterion is self-contradictory...?

 Not exactly; in your case the distance between a and b is of the order
 epislon.  However, my points are all unique and separated by a
 reasonable distance.  This requires at least one of the components of
 any two points to differ in all instances, permitting an ordering to be
 defined.  (Where if epislon ~ 1e-12 the minimum instance between any two
 points is of order ~ 1e-6.)

 Do you mean that all you points are distributed around some fixed points
 in your space?  In this case, it looks like what you are looking for is
 categorization or clustering and not sorting.  Once you perform
 clustering, you can simply define an arbitrary order in which report the
 content of each cluster.  If this is not the case, the problem that
 Nathaniel highlishts is still present.

 I am not entirely sure what you mean here.  If x is my array of points
 of size (16, 3) then I am guarenteeing that

   np.min(scipy.spatial.distance.pdist(x)) = 1e-6

 In this instance I am unsure how the issue highlighted by Nathaniel
 might arise.  Of course it is (very) possible that I am missing
 something, however, I believe under the terms of this constraint that it
 is always possible to define an order with which to iterate through the
 points which is invarient to shuffling of the points and small
 pertubations of the components.
 
 
 If the epsilon or scale depends on the column, then, I think, divmod
 should work to cut off the noise
 
 my_array[np.lexsort(divmod(my_array, [1e-1, 1e-12, 1])[0].T)]
 array([[-0.5   ,  0.,  1.41421356],
[ 0.5   ,  0.,  1.41421356]])

An interesting proposal.  However, it appears to have the same issues as
the rounding approach.  Consider:

In [5]: a, b = 1.0 + 1e-13, 1.0 - 1e-13

In [6]: abs(a - b)  1e-12
Out[6]: True

In [7]: divmod(a, 1e-6)[0] == divmod(b, 1e-6)[0]
Out[7]: False

Hence should np.lexsort encounter such a pair it will consider a and b
to be different even though they are within epsilon of one another.

Regards, Freddie.





signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Robust Sorting of Points

2013-10-27 Thread Freddie Witherden
On 27/10/13 21:05, Jonathan March wrote:
 If an almost always works solution is good enough, then sort on the
 distance to some fixed random point that is in the vicinity of your N
 points.

I had considered this.  Unfortunately I need a solution which really
does always work.

The only pure-Python solution I can envision -- at the moment anyway --
is to do some cleverness with the output of np.unique to identify
similar values and replace them with an arbitrarily chosen one.  This
should permit the output to be passed to np.lexsort without issue.

Regards, Freddie.






signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion