[Numpy-discussion] argsort

2013-01-15 Thread Mads Ipsen

Hi,

I simply can't understand this. I'm trying to use argsort to produce 
indices that can be used to sort an array:


  from numpy import *

  indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]])
  args = argsort(indices, axis=0)
  print indices[args]

gives:

[[[ 1 12]
  [ 4  3]]

 [[ 4  3]
  [11  6]]

 [[ 8  9]
  [23  7]]

 [[11  6]
  [ 8  9]]

 [[23  7]
  [ 1 12]]]

I thought this should produce a sorted version of the indices array.

Any help is appreciated.

Best regards,

Mads

--
+-+
| Mads Ipsen  |
+--+--+
| Gåsebæksvej 7, 4. tv |  |
| DK-2500 Valby| phone:  +45-29716388 |
| Denmark  | email:  mads.ip...@gmail.com |
+--+--+

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort

2013-01-15 Thread Charles R Harris
On Tue, Jan 15, 2013 at 4:50 AM, Mads Ipsen madsip...@gmail.com wrote:

  Hi,

 I simply can't understand this. I'm trying to use argsort to produce
 indices that can be used to sort an array:

   from numpy import *

   indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]])
   args = argsort(indices, axis=0)
   print indices[args]

 gives:

 [[[ 1 12]
   [ 4  3]]

  [[ 4  3]
   [11  6]]

  [[ 8  9]
   [23  7]]

  [[11  6]
   [ 8  9]]

  [[23  7]
   [ 1 12]]]

 I thought this should produce a sorted version of the indices array.

 Any help is appreciated.


Fancy indexing is a funny creature and not easy to understand in more than
one dimension. What is happening is that each index is replaced by the
corresponding row of a and the result is of shape (5,2,2). To do what you
want to do:

In [20]: a[i, [[0,1]]*5]
Out[20]:
array([[ 1,  3],
   [ 4,  6],
   [ 8,  7],
   [11,  9],
   [23, 12]])

I agree that there should be an easier way to do this.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort

2013-01-15 Thread Robert Kern
On Tue, Jan 15, 2013 at 3:44 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Fancy indexing is a funny creature and not easy to understand in more than
 one dimension. What is happening is that each index is replaced by the
 corresponding row of a and the result is of shape (5,2,2). To do what you
 want to do:

 In [20]: a[i, [[0,1]]*5]
 Out[20]:
 array([[ 1,  3],
[ 4,  6],
[ 8,  7],
[11,  9],
[23, 12]])

 I agree that there should be an easier way to do this.

Slightly easier, though no more transparent:

  a[i, [0,1]]

http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dtype reduction [SOLVED]

2013-01-15 Thread Nicolas Rougier


I ended coding the dtype reduction, it's not foolproof but it might be useful 
for others as well.

Nicolas



import numpy as np

def dtype_reduce(dtype, level=0, depth=0):

Try to reduce dtype up to a given level when it is possible

dtype =  [ ('vertex',  [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]),
   ('normal',  [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]),
   ('color',   [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 
'f4')])]

level 0: ['color,vertex,normal,', 10, 'float32']
level 1: [['color', 4, 'float32']
  ['normal', 3, 'float32']
  ['vertex', 3, 'float32']]

dtype = np.dtype(dtype)
fields = dtype.fields

# No fields
if fields is None:
if dtype.shape:
count = reduce(mul, dtype.shape)
else:
count = 1
size = dtype.itemsize/count
if dtype.subdtype:
name = str( dtype.subdtype[0] )
else:
name = str( dtype )
return ['', count, name]
else:
items = []
name = ''
# Get reduced fields
for key,value in fields.items():
l =  dtype_reduce(value[0], level, depth+1)
if type(l[0]) is str:
items.append( [key, l[1], l[2]] )
else:
items.append( l )
name += key+','

# Check if we can reduce item list
ctype = None
count = 0
for i,item in enumerate(items):
# One item is a list, we cannot reduce
if type(item[0]) is not str:
return items
else:
if i==0:
ctype = item[2]
count += item[1]
else:
if item[2] != ctype:
return items
count += item[1]
if depth = level:
return [name, count, ctype]
else:
return items

if __name__ == '__main__':

# Fully reductible
dtype =  [ ('vertex',  [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]),
   ('normal',  [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]),
   ('color',   [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 
'f4')])]
print 'level 0:'
print dtype_reduce(dtype,level=0)
print 'level 1:'
print dtype_reduce(dtype,level=1)
print

# Not fully reductible
dtype =  [ ('vertex',  [('x', 'i4'), ('y', 'i4'), ('z', 'i4')]),
   ('normal',  [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]),
   ('color',   [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 
'f4')])]
print 'level 0:'
print dtype_reduce(dtype,level=0)
print

# Not reductible at all
dtype =  [ ('vertex',  [('x', 'f4'), ('y', 'f4'), ('z', 'i4')]),
   ('normal',  [('x', 'f4'), ('y', 'f4'), ('z', 'i4')]),
   ('color',   [('r', 'f4'), ('g', 'f4'), ('b', 'i4'), ('a', 
'f4')])]
print 'level 0:'
print dtype_reduce(dtype,level=0)





On Dec 27, 2012, at 9:11 , Nicolas Rougier wrote:

 
 Yep, I'm trying to construct dtype2 programmaticaly and was hoping for some 
 function giving me a canonical expression of the dtype. I've started 
 playing with fields but it's just a bit harder than I though (lot of 
 different cases and recursion).
 
 Thanks for the answer.
 
 
 Nicolas
 
 On Dec 27, 2012, at 1:32 , Nathaniel Smith wrote:
 
 On Wed, Dec 26, 2012 at 8:09 PM, Nicolas Rougier
 nicolas.roug...@inria.fr wrote:
 
 
 Hi all,
 
 
 I'm looking for a way to reduce dtype1 into dtype2 (when it is possible 
 of course).
 Is there some easy way to do that by any chance ?
 
 
 dtype1 = np.dtype( [ ('vertex',  [('x', 'f4'),
 ('y', 'f4'),
 ('z', 'f4')]),
   ('normal',  [('x', 'f4'),
('y', 'f4'),
('z', 'f4')]),
   ('color',   [('r', 'f4'),
('g', 'f4'),
('b', 'f4'),
('a', 'f4')]) ] )
 
 dtype2 = np.dtype( [ ('vertex',  'f4', 3),
('normal',  'f4', 3),
('color',   'f4', 4)] )
 
 
 If you have an array whose dtype is dtype1, and you want to convert it
 into an array with dtype2, then you just do
 my_dtype2_array = my_dtype1_array.view(dtype2)
 
 If you have dtype1 and you want to programmaticaly construct dtype2,
 then that's a little more fiddly and depends on what exactly you're
 trying to do, but start by poking around with dtype1.names and
 dtype1.fields, which contain information on how dtype1 is put together
 in the form of regular python structures.
 
 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 

[Numpy-discussion] algorithm for faster median calculation ?

2013-01-15 Thread Jerome Caron
Dear all,
I am new to the Numpy-discussion list.
I would like to follow up some possibly useful information about calculating 
median.
The message below was posted today on the AstroPy mailing list.
Kind regards
Jerome Caron

#
I think the calculation of median values in Numpy is not optimal. I don't know 
if there are other libraries that do better?
On my machine I get these results:
 data = numpy.random.rand(5000,5000)
 t0=time.time();print numpy.ma.median(data);print time.time()-t0
0.499845739822
15.194332
 t0=time.time();print numpy.median(data);print time.time()-t0
0.499845739822
4.3219918
 t0=time.time();print aspylib.astro.get_median(data);print time.time()-t0
[ 0.49984574]
0.9047139


The median calculation in Aspylib is using C code from Nicolas Devillard (can 
be found here: http://ndevilla.free.fr/median/index.html) interfaced with 
ctypes.
It could be easily re-used for other, more official packages. I think the code 
also finds quantiles efficiently.
See: http://www.aspylib.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] algorithm for faster median calculation ?

2013-01-15 Thread Christoph Deil

On Jan 15, 2013, at 8:31 PM, Jerome Caron jerome_caron_as...@ymail.com wrote:

 Dear all,
 I am new to the Numpy-discussion list.
 I would like to follow up some possibly useful information about calculating 
 median.
 The message below was posted today on the AstroPy mailing list.
 Kind regards
 Jerome Caron
  
 #
 I think the calculation of median values in Numpy is not optimal. I don't 
 know if there are other libraries that do better?
 On my machine I get these results:
  data = numpy.random.rand(5000,5000)
  t0=time.time();print numpy.ma.median(data);print time.time()-t0
 0.499845739822
 15.194332
  t0=time.time();print numpy.median(data);print time.time()-t0
 0.499845739822
 4.3219918
  t0=time.time();print aspylib.astro.get_median(data);print time.time()-t0
 [ 0.49984574]
 0.9047139
 
 The median calculation in Aspylib is using C code from Nicolas Devillard (can 
 be found here: http://ndevilla.free.fr/median/index.html) interfaced with 
 ctypes.
 It could be easily re-used for other, more official packages. I think the 
 code also finds quantiles efficiently.
 See: http://www.aspylib.com/
  
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Jerome,

some of the numpy devs are already discussing how to best implement the fast 
median for numpy here:
https://github.com/numpy/numpy/issues/1811 median in average O(n) time

If you want to get an email when someone posts a comment on that github ticket, 
sign up for a free github account, then click on watch tread at the bottom of 
that issue.

Note that numpy is BSD-licensed, so they can't take GPL-licensed code.
But I think looking at the method you have in aspylib is OK, so thanks for 
sharing!

Christoph___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] algorithm for faster median calculation ?

2013-01-15 Thread Sturla Molden
You might want to look at this first:

https://github.com/numpy/numpy/issues/1811

Yes it is possible to compute the median faster by doing quickselect 
instead of quicksort. Best case O(n) for quickselect, O(n log n) for 
quicksort. But adding selection and partial sorting to NumPy is a bigger 
issue than just computing medians and percentiles faster.

If we are to do this I think we should add partial sorting and selection 
to npysort, not patch in some C or Cython quickselect just for the 
median. When npysort has quickselect, changing the Python code to use it 
for medians and percentiles is a nobrainer.

https://github.com/numpy/numpy/tree/master/numpy/core/src/npysort



Sturla



On 15.01.2013 20:31, Jerome Caron wrote:
 Dear all,
 I am new to the Numpy-discussion list.
 I would like to follow up some possibly useful information about
 calculating median.
 The message below was posted today on the AstroPy mailing list.
 Kind regards
 Jerome Caron
 #
 I think the calculation of median values in Numpy is not optimal. I
 don't know if there are other libraries that do better?
 On my machine I get these results:
   data = numpy.random.rand(5000,5000)
   t0=time.time();print numpy.ma.median(data);print time.time()-t0
 0.499845739822
 15.194332
   t0=time.time();print numpy.median(data);print time.time()-t0
 0.499845739822
 4.3219918
   t0=time.time();print aspylib.astro.get_median(data);print
 time.time()-t0
 [ 0.49984574]
 0.9047139
  
 The median calculation in Aspylib is using C code from Nicolas Devillard
 (can be found here: http://ndevilla.free.fr/median/index.html
 http://ndevilla.free.fr/median/index.html) interfaced with ctypes.
 It could be easily re-used for other, more official packages. I think
 the code also finds quantiles efficiently.
 See: http://www.aspylib.com/


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort

2013-01-15 Thread eat
Hi,

On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen madsip...@gmail.com wrote:

  Hi,

 I simply can't understand this. I'm trying to use argsort to produce
 indices that can be used to sort an array:

   from numpy import *

   indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]])
   args = argsort(indices, axis=0)
   print indices[args]

 gives:

 [[[ 1 12]
   [ 4  3]]

  [[ 4  3]
   [11  6]]

  [[ 8  9]
   [23  7]]

  [[11  6]
   [ 8  9]]

  [[23  7]
   [ 1 12]]]

 I thought this should produce a sorted version of the indices array.

 Any help is appreciated.

Perhaps these three different point of views will help you a little bit
more to move on:
In []: x
Out[]:
array([[ 4,  3],
   [ 1, 12],
   [23,  7],
   [11,  6],
   [ 8,  9]])
In []: ind= x.argsort(axis= 0)
In []: ind
Out[]:
array([[1, 0],
   [0, 3],
   [4, 2],
   [3, 4],
   [2, 1]])

In []: x[ind[:, 0]]
Out[]:
array([[ 1, 12],
   [ 4,  3],
   [ 8,  9],
   [11,  6],
   [23,  7]])

In []: x[ind[:, 1]]
Out[]:
array([[ 4,  3],
   [11,  6],
   [23,  7],
   [ 8,  9],
   [ 1, 12]])

In []: x[ind, [0, 1]]
Out[]:
array([[ 1,  3],
   [ 4,  6],
   [ 8,  7],
   [11,  9],
   [23, 12]])
-eat


 Best regards,

 Mads

  --
 +-+
 | Mads Ipsen  |
 +--+--+
 | Gåsebæksvej 7, 4. tv |  |
 | DK-2500 Valby| phone:  +45-29716388 |
 | Denmark  | email:  mads.ip...@gmail.com |
 +--+--+



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] algorithm for faster median calculation ?

2013-01-15 Thread Sturla Molden



On 15.01.2013 20:50, Sturla Molden wrote:

You might want to look at this first:

https://github.com/numpy/numpy/issues/1811

Yes it is possible to compute the median faster by doing quickselect
instead of quicksort. Best case O(n) for quickselect, O(n log n) for
quicksort. But adding selection and partial sorting to NumPy is a bigger
issue than just computing medians and percentiles faster.



Anyway, here is the code, a bit updated.

I prefer quickselect with a better pivot though.

Sturla

# proposed avg. O(n) replacement for NumPy's median
# (C) 2009 Sturla Molden
# SciPy license


import numpy as np

try:
from quickselect import select 

except ImportError:

def _select(a, k):
 Python quickselect for reference only 
l = 0
r = a.shape[0] - 1
while l  r:
x = a[k]
i = l
j = r
while 1:
while a[i]  x: i += 1
while x  a[j]: j -= 1
if i = j:
tmp = a[i]
a[i] = a[j]
a[j] = tmp
i += 1
j -= 1
if i  j: break
if j  k: l = i
if k  i: r = j

def select(a, k, inplace=False):
'''
Wirth's version of Hoare's quick select

Parameters
--
a : array_like
k : integer
inplace : boolean
The partial sort is done inplace if a is a
contiguous ndarray and ndarray and inplace=True. Default: False. 

Returns
---
out : ndarray
Partially sorted a such that out[k] is
the k largest element. Elements smaller than
out[k] are unsorted in out[:k]. Elements larger
than out[k] are unsorted in out[k:].

Python version for reference only!

'''

if inplace:
_a = np.ascontiguousarray(a)
else:
_a = np.array(a)
_select(_a,k)
return _a


def _median(x, inplace):
assert(x.ndim == 1)
n = x.shape[0]
if n  3:
k = n  1
s = select(x, k, inplace=inplace)
if n  1:
return s[k]
else:
return 0.5*(s[k]+s[:k].max())  
elif n == 0:
return np.nan
elif n == 2:
return 0.5*(x[0]+x[1])
else: # n == 3
s = select(x, 1, inplace=inplace)
return s[1]



def median(a, axis=None, out=None, overwrite_input=False):

Compute the median along the specified axis.

Returns the median of the array elements.

Parameters
--
a : array_like
Input array or object that can be converted to an array.
axis : {None, int}, optional
Axis along which the medians are computed. The default (axis=None)
is to compute the median along a flattened version of the array.
out : ndarray, optional
Alternative output array in which to place the result. It must
have the same shape and buffer length as the expected output,
but the type (of the output) will be cast if necessary.
overwrite_input : {False, True}, optional
   If True, then allow use of memory of input array (a) for
   calculations. The input array will be modified by the call to
   median. This will save memory when you do not need to preserve
   the contents of the input array. Treat the input as undefined,
   but it will probably be fully or partially sorted. Default is
   False. Note that, if `overwrite_input` is True and the input
   is not already an ndarray, an error will be raised.

Returns
---
median : ndarray
A new array holding the result (unless `out` is specified, in
which case that array is returned instead).  If the input contains
integers, or floats of smaller precision than 64, then the output
data-type is float64.  Otherwise, the output data-type is the same
as that of the input.

See Also

mean

Notes
-
Given a vector V of length N, the median of V is the middle value of
a sorted copy of V, ``V_sorted`` - i.e., ``V_sorted[(N-1)/2]``, when N is
odd.  When N is even, it is the average of the two middle values of
``V_sorted``.

Examples

 a = np.array([[10, 7, 4], [3, 2, 1]])
 a
array([[10,  7,  4],
   [ 3,  2,  1]])
 np.median(a)
3.5
 np.median(a, axis=0)
array([ 6.5,  4.5,  2.5])
 np.median(a, axis=1)
array([ 7.,  2.])
 m = np.median(a, axis=0)
 out = np.zeros_like(m)
 np.median(a, axis=0, out=m)
array([ 6.5,  4.5,  2.5])
 m
array([ 6.5,  4.5,  2.5])
 b = a.copy()
 np.median(b, axis=1, overwrite_input=True)
array([ 7.,  2.])
 assert not np.all(a==b)
 b = a.copy()
 np.median(b, axis=None, overwrite_input=True)
3.5
 assert not np.all(a==b)



if overwrite_input and not isinstance(a,