[Numpy-discussion] argsort
Hi, I simply can't understand this. I'm trying to use argsort to produce indices that can be used to sort an array: from numpy import * indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) args = argsort(indices, axis=0) print indices[args] gives: [[[ 1 12] [ 4 3]] [[ 4 3] [11 6]] [[ 8 9] [23 7]] [[11 6] [ 8 9]] [[23 7] [ 1 12]]] I thought this should produce a sorted version of the indices array. Any help is appreciated. Best regards, Mads -- +-+ | Mads Ipsen | +--+--+ | Gåsebæksvej 7, 4. tv | | | DK-2500 Valby| phone: +45-29716388 | | Denmark | email: mads.ip...@gmail.com | +--+--+ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort
On Tue, Jan 15, 2013 at 4:50 AM, Mads Ipsen madsip...@gmail.com wrote: Hi, I simply can't understand this. I'm trying to use argsort to produce indices that can be used to sort an array: from numpy import * indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) args = argsort(indices, axis=0) print indices[args] gives: [[[ 1 12] [ 4 3]] [[ 4 3] [11 6]] [[ 8 9] [23 7]] [[11 6] [ 8 9]] [[23 7] [ 1 12]]] I thought this should produce a sorted version of the indices array. Any help is appreciated. Fancy indexing is a funny creature and not easy to understand in more than one dimension. What is happening is that each index is replaced by the corresponding row of a and the result is of shape (5,2,2). To do what you want to do: In [20]: a[i, [[0,1]]*5] Out[20]: array([[ 1, 3], [ 4, 6], [ 8, 7], [11, 9], [23, 12]]) I agree that there should be an easier way to do this. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort
On Tue, Jan 15, 2013 at 3:44 PM, Charles R Harris charlesr.har...@gmail.com wrote: Fancy indexing is a funny creature and not easy to understand in more than one dimension. What is happening is that each index is replaced by the corresponding row of a and the result is of shape (5,2,2). To do what you want to do: In [20]: a[i, [[0,1]]*5] Out[20]: array([[ 1, 3], [ 4, 6], [ 8, 7], [11, 9], [23, 12]]) I agree that there should be an easier way to do this. Slightly easier, though no more transparent: a[i, [0,1]] http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] dtype reduction [SOLVED]
I ended coding the dtype reduction, it's not foolproof but it might be useful for others as well. Nicolas import numpy as np def dtype_reduce(dtype, level=0, depth=0): Try to reduce dtype up to a given level when it is possible dtype = [ ('vertex', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('normal', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('color', [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 'f4')])] level 0: ['color,vertex,normal,', 10, 'float32'] level 1: [['color', 4, 'float32'] ['normal', 3, 'float32'] ['vertex', 3, 'float32']] dtype = np.dtype(dtype) fields = dtype.fields # No fields if fields is None: if dtype.shape: count = reduce(mul, dtype.shape) else: count = 1 size = dtype.itemsize/count if dtype.subdtype: name = str( dtype.subdtype[0] ) else: name = str( dtype ) return ['', count, name] else: items = [] name = '' # Get reduced fields for key,value in fields.items(): l = dtype_reduce(value[0], level, depth+1) if type(l[0]) is str: items.append( [key, l[1], l[2]] ) else: items.append( l ) name += key+',' # Check if we can reduce item list ctype = None count = 0 for i,item in enumerate(items): # One item is a list, we cannot reduce if type(item[0]) is not str: return items else: if i==0: ctype = item[2] count += item[1] else: if item[2] != ctype: return items count += item[1] if depth = level: return [name, count, ctype] else: return items if __name__ == '__main__': # Fully reductible dtype = [ ('vertex', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('normal', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('color', [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 'f4')])] print 'level 0:' print dtype_reduce(dtype,level=0) print 'level 1:' print dtype_reduce(dtype,level=1) print # Not fully reductible dtype = [ ('vertex', [('x', 'i4'), ('y', 'i4'), ('z', 'i4')]), ('normal', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('color', [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 'f4')])] print 'level 0:' print dtype_reduce(dtype,level=0) print # Not reductible at all dtype = [ ('vertex', [('x', 'f4'), ('y', 'f4'), ('z', 'i4')]), ('normal', [('x', 'f4'), ('y', 'f4'), ('z', 'i4')]), ('color', [('r', 'f4'), ('g', 'f4'), ('b', 'i4'), ('a', 'f4')])] print 'level 0:' print dtype_reduce(dtype,level=0) On Dec 27, 2012, at 9:11 , Nicolas Rougier wrote: Yep, I'm trying to construct dtype2 programmaticaly and was hoping for some function giving me a canonical expression of the dtype. I've started playing with fields but it's just a bit harder than I though (lot of different cases and recursion). Thanks for the answer. Nicolas On Dec 27, 2012, at 1:32 , Nathaniel Smith wrote: On Wed, Dec 26, 2012 at 8:09 PM, Nicolas Rougier nicolas.roug...@inria.fr wrote: Hi all, I'm looking for a way to reduce dtype1 into dtype2 (when it is possible of course). Is there some easy way to do that by any chance ? dtype1 = np.dtype( [ ('vertex', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('normal', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('color', [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 'f4')]) ] ) dtype2 = np.dtype( [ ('vertex', 'f4', 3), ('normal', 'f4', 3), ('color', 'f4', 4)] ) If you have an array whose dtype is dtype1, and you want to convert it into an array with dtype2, then you just do my_dtype2_array = my_dtype1_array.view(dtype2) If you have dtype1 and you want to programmaticaly construct dtype2, then that's a little more fiddly and depends on what exactly you're trying to do, but start by poking around with dtype1.names and dtype1.fields, which contain information on how dtype1 is put together in the form of regular python structures. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list
[Numpy-discussion] algorithm for faster median calculation ?
Dear all, I am new to the Numpy-discussion list. I would like to follow up some possibly useful information about calculating median. The message below was posted today on the AstroPy mailing list. Kind regards Jerome Caron # I think the calculation of median values in Numpy is not optimal. I don't know if there are other libraries that do better? On my machine I get these results: data = numpy.random.rand(5000,5000) t0=time.time();print numpy.ma.median(data);print time.time()-t0 0.499845739822 15.194332 t0=time.time();print numpy.median(data);print time.time()-t0 0.499845739822 4.3219918 t0=time.time();print aspylib.astro.get_median(data);print time.time()-t0 [ 0.49984574] 0.9047139 The median calculation in Aspylib is using C code from Nicolas Devillard (can be found here: http://ndevilla.free.fr/median/index.html) interfaced with ctypes. It could be easily re-used for other, more official packages. I think the code also finds quantiles efficiently. See: http://www.aspylib.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] algorithm for faster median calculation ?
On Jan 15, 2013, at 8:31 PM, Jerome Caron jerome_caron_as...@ymail.com wrote: Dear all, I am new to the Numpy-discussion list. I would like to follow up some possibly useful information about calculating median. The message below was posted today on the AstroPy mailing list. Kind regards Jerome Caron # I think the calculation of median values in Numpy is not optimal. I don't know if there are other libraries that do better? On my machine I get these results: data = numpy.random.rand(5000,5000) t0=time.time();print numpy.ma.median(data);print time.time()-t0 0.499845739822 15.194332 t0=time.time();print numpy.median(data);print time.time()-t0 0.499845739822 4.3219918 t0=time.time();print aspylib.astro.get_median(data);print time.time()-t0 [ 0.49984574] 0.9047139 The median calculation in Aspylib is using C code from Nicolas Devillard (can be found here: http://ndevilla.free.fr/median/index.html) interfaced with ctypes. It could be easily re-used for other, more official packages. I think the code also finds quantiles efficiently. See: http://www.aspylib.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi Jerome, some of the numpy devs are already discussing how to best implement the fast median for numpy here: https://github.com/numpy/numpy/issues/1811 median in average O(n) time If you want to get an email when someone posts a comment on that github ticket, sign up for a free github account, then click on watch tread at the bottom of that issue. Note that numpy is BSD-licensed, so they can't take GPL-licensed code. But I think looking at the method you have in aspylib is OK, so thanks for sharing! Christoph___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] algorithm for faster median calculation ?
You might want to look at this first: https://github.com/numpy/numpy/issues/1811 Yes it is possible to compute the median faster by doing quickselect instead of quicksort. Best case O(n) for quickselect, O(n log n) for quicksort. But adding selection and partial sorting to NumPy is a bigger issue than just computing medians and percentiles faster. If we are to do this I think we should add partial sorting and selection to npysort, not patch in some C or Cython quickselect just for the median. When npysort has quickselect, changing the Python code to use it for medians and percentiles is a nobrainer. https://github.com/numpy/numpy/tree/master/numpy/core/src/npysort Sturla On 15.01.2013 20:31, Jerome Caron wrote: Dear all, I am new to the Numpy-discussion list. I would like to follow up some possibly useful information about calculating median. The message below was posted today on the AstroPy mailing list. Kind regards Jerome Caron # I think the calculation of median values in Numpy is not optimal. I don't know if there are other libraries that do better? On my machine I get these results: data = numpy.random.rand(5000,5000) t0=time.time();print numpy.ma.median(data);print time.time()-t0 0.499845739822 15.194332 t0=time.time();print numpy.median(data);print time.time()-t0 0.499845739822 4.3219918 t0=time.time();print aspylib.astro.get_median(data);print time.time()-t0 [ 0.49984574] 0.9047139 The median calculation in Aspylib is using C code from Nicolas Devillard (can be found here: http://ndevilla.free.fr/median/index.html http://ndevilla.free.fr/median/index.html) interfaced with ctypes. It could be easily re-used for other, more official packages. I think the code also finds quantiles efficiently. See: http://www.aspylib.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort
Hi, On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen madsip...@gmail.com wrote: Hi, I simply can't understand this. I'm trying to use argsort to produce indices that can be used to sort an array: from numpy import * indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) args = argsort(indices, axis=0) print indices[args] gives: [[[ 1 12] [ 4 3]] [[ 4 3] [11 6]] [[ 8 9] [23 7]] [[11 6] [ 8 9]] [[23 7] [ 1 12]]] I thought this should produce a sorted version of the indices array. Any help is appreciated. Perhaps these three different point of views will help you a little bit more to move on: In []: x Out[]: array([[ 4, 3], [ 1, 12], [23, 7], [11, 6], [ 8, 9]]) In []: ind= x.argsort(axis= 0) In []: ind Out[]: array([[1, 0], [0, 3], [4, 2], [3, 4], [2, 1]]) In []: x[ind[:, 0]] Out[]: array([[ 1, 12], [ 4, 3], [ 8, 9], [11, 6], [23, 7]]) In []: x[ind[:, 1]] Out[]: array([[ 4, 3], [11, 6], [23, 7], [ 8, 9], [ 1, 12]]) In []: x[ind, [0, 1]] Out[]: array([[ 1, 3], [ 4, 6], [ 8, 7], [11, 9], [23, 12]]) -eat Best regards, Mads -- +-+ | Mads Ipsen | +--+--+ | Gåsebæksvej 7, 4. tv | | | DK-2500 Valby| phone: +45-29716388 | | Denmark | email: mads.ip...@gmail.com | +--+--+ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] algorithm for faster median calculation ?
On 15.01.2013 20:50, Sturla Molden wrote: You might want to look at this first: https://github.com/numpy/numpy/issues/1811 Yes it is possible to compute the median faster by doing quickselect instead of quicksort. Best case O(n) for quickselect, O(n log n) for quicksort. But adding selection and partial sorting to NumPy is a bigger issue than just computing medians and percentiles faster. Anyway, here is the code, a bit updated. I prefer quickselect with a better pivot though. Sturla # proposed avg. O(n) replacement for NumPy's median # (C) 2009 Sturla Molden # SciPy license import numpy as np try: from quickselect import select except ImportError: def _select(a, k): Python quickselect for reference only l = 0 r = a.shape[0] - 1 while l r: x = a[k] i = l j = r while 1: while a[i] x: i += 1 while x a[j]: j -= 1 if i = j: tmp = a[i] a[i] = a[j] a[j] = tmp i += 1 j -= 1 if i j: break if j k: l = i if k i: r = j def select(a, k, inplace=False): ''' Wirth's version of Hoare's quick select Parameters -- a : array_like k : integer inplace : boolean The partial sort is done inplace if a is a contiguous ndarray and ndarray and inplace=True. Default: False. Returns --- out : ndarray Partially sorted a such that out[k] is the k largest element. Elements smaller than out[k] are unsorted in out[:k]. Elements larger than out[k] are unsorted in out[k:]. Python version for reference only! ''' if inplace: _a = np.ascontiguousarray(a) else: _a = np.array(a) _select(_a,k) return _a def _median(x, inplace): assert(x.ndim == 1) n = x.shape[0] if n 3: k = n 1 s = select(x, k, inplace=inplace) if n 1: return s[k] else: return 0.5*(s[k]+s[:k].max()) elif n == 0: return np.nan elif n == 2: return 0.5*(x[0]+x[1]) else: # n == 3 s = select(x, 1, inplace=inplace) return s[1] def median(a, axis=None, out=None, overwrite_input=False): Compute the median along the specified axis. Returns the median of the array elements. Parameters -- a : array_like Input array or object that can be converted to an array. axis : {None, int}, optional Axis along which the medians are computed. The default (axis=None) is to compute the median along a flattened version of the array. out : ndarray, optional Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output, but the type (of the output) will be cast if necessary. overwrite_input : {False, True}, optional If True, then allow use of memory of input array (a) for calculations. The input array will be modified by the call to median. This will save memory when you do not need to preserve the contents of the input array. Treat the input as undefined, but it will probably be fully or partially sorted. Default is False. Note that, if `overwrite_input` is True and the input is not already an ndarray, an error will be raised. Returns --- median : ndarray A new array holding the result (unless `out` is specified, in which case that array is returned instead). If the input contains integers, or floats of smaller precision than 64, then the output data-type is float64. Otherwise, the output data-type is the same as that of the input. See Also mean Notes - Given a vector V of length N, the median of V is the middle value of a sorted copy of V, ``V_sorted`` - i.e., ``V_sorted[(N-1)/2]``, when N is odd. When N is even, it is the average of the two middle values of ``V_sorted``. Examples a = np.array([[10, 7, 4], [3, 2, 1]]) a array([[10, 7, 4], [ 3, 2, 1]]) np.median(a) 3.5 np.median(a, axis=0) array([ 6.5, 4.5, 2.5]) np.median(a, axis=1) array([ 7., 2.]) m = np.median(a, axis=0) out = np.zeros_like(m) np.median(a, axis=0, out=m) array([ 6.5, 4.5, 2.5]) m array([ 6.5, 4.5, 2.5]) b = a.copy() np.median(b, axis=1, overwrite_input=True) array([ 7., 2.]) assert not np.all(a==b) b = a.copy() np.median(b, axis=None, overwrite_input=True) 3.5 assert not np.all(a==b) if overwrite_input and not isinstance(a,