subject:"\[Numpy\-discussion\] numpythonically getting elements with the minimum sum"

Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

2013-01-29 Thread Gregor Thalhammer


Am 28.1.2013 um 23:15 schrieb Lluís:

 Hi,
 
 I have a somewhat convoluted N-dimensional array that contains information of 
 a
 set of experiments.
 
 The last dimension has as many entries as iterations in the experiment (an
 iterative application), and the penultimate dimension has as many entries as
 times I have run that experiment; the rest of dimensions describe the features
 of the experiment:
 
data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, 
 NUM_ITERATIONS)
 
 So, what I want is to get the data for the best run of each experiment:
 
best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)
 
 by selecting, for each experiment, the run with the lowest total time (sum of
 the time of all iterations for that experiment).
 
 
 So far I've got the trivial part, but not the final indexing into data:
 
dsum = data.sum(axis = -1)
dmin = dsum.min(axis = -1)
best = data[???]
 
 
 I'm sure there must be some numpythonic and generic way to get what I want, 
 but
 fancy indexing is beating me here :)

Did you have a look at the argmin function? It delivers the indices of the 
minimum values along an axis. Untested guess:

dmin_idx = argmin(dsum, axis = -1)
best = data[..., dmin_idx, :]

Gregor

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

2013-01-29 Thread Lluís

Gregor Thalhammer writes:

 Am 28.1.2013 um 23:15 schrieb Lluís:

 Hi,
 
 I have a somewhat convoluted N-dimensional array that contains information 
 of a
 set of experiments.
 
 The last dimension has as many entries as iterations in the experiment (an
 iterative application), and the penultimate dimension has as many entries as
 times I have run that experiment; the rest of dimensions describe the 
 features
 of the experiment:
 
 data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, 
 NUM_ITERATIONS)
 
 So, what I want is to get the data for the best run of each experiment:
 
 best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)
 
 by selecting, for each experiment, the run with the lowest total time (sum of
 the time of all iterations for that experiment).
 
 
 So far I've got the trivial part, but not the final indexing into data:
 
 dsum = data.sum(axis = -1)
 dmin = dsum.min(axis = -1)
 best = data[???]
 
 
 I'm sure there must be some numpythonic and generic way to get what I want, 
 but
 fancy indexing is beating me here :)

 Did you have a look at the argmin function? It delivers the indices of the 
 minimum values along an axis. Untested guess:

 dmin_idx = argmin(dsum, axis = -1)
 best = data[..., dmin_idx, :]

Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing
with it does not exactly work as I expected:

   d1.shape
  (2, 5, 10)
   dsum = d1.sum(axis = -1)
   dmin = d1.argmin(axis = -1)
   dmin.shape
  (2,)
   d1_best = d1[...,dmin,:]
   d1_best.shape
  (2, 2, 10)


Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using
this previous code with some example values:

   dmin
  [4 3]
   d1_best
  [[[ ... contents of d1[0,4,:] ...]
[ ... contents of d1[0,3,:] ...]]
   [[ ... contents of d1[1,4,:] ...]
[ ... contents of d1[1,3,:] ...]]]


While I actually want this:

  [[ ... contents of d1[0,4,:] ...]
   [ ... contents of d1[1,3,:] ...]]


Thanks,
  Lluis

-- 
 And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer.
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

2013-01-29 Thread Sebastian Berg

On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote:
 Gregor Thalhammer writes:
 
  Am 28.1.2013 um 23:15 schrieb Lluís:
 
  Hi,
  
  I have a somewhat convoluted N-dimensional array that contains information 
  of a
  set of experiments.
  
  The last dimension has as many entries as iterations in the experiment (an
  iterative application), and the penultimate dimension has as many entries 
  as
  times I have run that experiment; the rest of dimensions describe the 
  features
  of the experiment:
  
  data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, 
  NUM_ITERATIONS)
  
  So, what I want is to get the data for the best run of each experiment:
  
  best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)
  
  by selecting, for each experiment, the run with the lowest total time (sum 
  of
  the time of all iterations for that experiment).
  
  
  So far I've got the trivial part, but not the final indexing into data:
  
  dsum = data.sum(axis = -1)
  dmin = dsum.min(axis = -1)
  best = data[???]
  
  
  I'm sure there must be some numpythonic and generic way to get what I 
  want, but
  fancy indexing is beating me here :)
 
  Did you have a look at the argmin function? It delivers the indices of the 
  minimum values along an axis. Untested guess:
 
  dmin_idx = argmin(dsum, axis = -1)
  best = data[..., dmin_idx, :]
 
 Ah, sorry, my example is incorrect. I was actually using 'argmin', but 
 indexing
 with it does not exactly work as I expected:
 
d1.shape
   (2, 5, 10)
dsum = d1.sum(axis = -1)
dmin = d1.argmin(axis = -1)
dmin.shape
   (2,)
d1_best = d1[...,dmin,:]

You need to use fancy indexing. Something like:
 d1_best = d1[np.arange(2), dmin,:]

Because the Ellipsis takes everything from the axis, while you want to
pick from multiple axes at the same time. That can be achieved with
fancy indexing (indexing with arrays). From another perspective, you
want to get rid of two axes in favor of a new one, but a slice/Ellipsis
always preserves the axis it works on.

d1_best.shape
   (2, 2, 10)
 
 
 Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using
 this previous code with some example values:
 
dmin
   [4 3]
d1_best
   [[[ ... contents of d1[0,4,:] ...]
 [ ... contents of d1[0,3,:] ...]]
[[ ... contents of d1[1,4,:] ...]
 [ ... contents of d1[1,3,:] ...]]]
 
 
 While I actually want this:
 
   [[ ... contents of d1[0,4,:] ...]
[ ... contents of d1[1,3,:] ...]]
 
 
 Thanks,
   Lluis
 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

2013-01-29 Thread Lluís

Sebastian Berg writes:

 On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote:
 Gregor Thalhammer writes:
 
  Am 28.1.2013 um 23:15 schrieb Lluís:
 
  Hi,
  
  I have a somewhat convoluted N-dimensional array that contains 
  information of a
  set of experiments.
  
  The last dimension has as many entries as iterations in the experiment (an
  iterative application), and the penultimate dimension has as many entries 
  as
  times I have run that experiment; the rest of dimensions describe the 
  features
  of the experiment:
  
  data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, 
  NUM_ITERATIONS)
  
  So, what I want is to get the data for the best run of each experiment:
  
  best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)
  
  by selecting, for each experiment, the run with the lowest total time 
  (sum of
  the time of all iterations for that experiment).
  
  
  So far I've got the trivial part, but not the final indexing into data:
  
  dsum = data.sum(axis = -1)
  dmin = dsum.min(axis = -1)
  best = data[???]
  
  
  I'm sure there must be some numpythonic and generic way to get what I 
  want, but
  fancy indexing is beating me here :)
 
  Did you have a look at the argmin function? It delivers the indices of the 
  minimum values along an axis. Untested guess:
 
  dmin_idx = argmin(dsum, axis = -1)
  best = data[..., dmin_idx, :]
 
 Ah, sorry, my example is incorrect. I was actually using 'argmin', but 
 indexing
 with it does not exactly work as I expected:
 
  d1.shape
 (2, 5, 10)
  dsum = d1.sum(axis = -1)
  dmin = d1.argmin(axis = -1)
  dmin.shape
 (2,)
  d1_best = d1[...,dmin,:]

 You need to use fancy indexing. Something like:
 d1_best = d1[np.arange(2), dmin,:]

 Because the Ellipsis takes everything from the axis, while you want to
 pick from multiple axes at the same time. That can be achieved with
 fancy indexing (indexing with arrays). From another perspective, you
 want to get rid of two axes in favor of a new one, but a slice/Ellipsis
 always preserves the axis it works on.

Nice, thanks. That works for this specific example, but I couldn't get it to
work with d1.shape == (1, 2, 16, 5, 10) (thus dmin.shape == (1, 2, 16)):

 def get_best_run (data, field):
... Returns the best run.
... data = data.view(np.ndarray)
... assert data.ndim = 2
... dsum = data[field].sum(axis=-1)
... dmin = dsum.argmin(axis=-1)
... idxs  = [ np.arange(dlen) for dlen in data.shape[:-2] ]
... idxs += [ dmin ]
... idxs += [ slice(None) ]
... return data[tuple(idxs)]
 d1.shape   
(2, 5, 10)
 get_best_run(d1, time)
(2, 10)
 d2.shape
(1, 2, 16, 5, 10)
 get_best_run(d2, time)
Traceback (most recent call last):
  ...
  File ./plot-user.py, line 89, in get_best_run
res = data.view(np.ndarray)[tuple(idxs)]
ValueError: shape mismatch: objects cannot be broadcast to a single shape


After reading the Advanced indexing section, my understanding is that the
elements in idxs are not broadcastable to the same shape, but I'm not sure how
I should build them to be broadcastable to what specific shape.


Thanks a lot,
  Lluis


  d1_best.shape
 (2, 2, 10)
 
 
 Assuming 1st dimension is the test, 2nd the run and 10th the iterations, 
 using
 this previous code with some example values:
 
  dmin
 [4 3]
  d1_best
 [[[ ... contents of d1[0,4,:] ...]
 [ ... contents of d1[0,3,:] ...]]
 [[ ... contents of d1[1,4,:] ...]
 [ ... contents of d1[1,3,:] ...]]]
 
 
 While I actually want this:
 
 [[ ... contents of d1[0,4,:] ...]
 [ ... contents of d1[1,3,:] ...]]

-- 
 And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer.
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

2013-01-29 Thread Lluís

Lluís  writes:

 Sebastian Berg writes:
 On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote:
 Gregor Thalhammer writes:
 
  Am 28.1.2013 um 23:15 schrieb Lluís:
 
  Hi,
  
  I have a somewhat convoluted N-dimensional array that contains 
  information of a
  set of experiments.
  
  The last dimension has as many entries as iterations in the experiment 
  (an
  iterative application), and the penultimate dimension has as many 
  entries as
  times I have run that experiment; the rest of dimensions describe the 
  features
  of the experiment:
  
  data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, 
  NUM_ITERATIONS)
  
  So, what I want is to get the data for the best run of each experiment:
  
  best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)
  
  by selecting, for each experiment, the run with the lowest total time 
  (sum of
  the time of all iterations for that experiment).
  
  
  So far I've got the trivial part, but not the final indexing into data:
  
  dsum = data.sum(axis = -1)
  dmin = dsum.min(axis = -1)
  best = data[???]
  
  
  I'm sure there must be some numpythonic and generic way to get what I 
  want, but
  fancy indexing is beating me here :)
 
  Did you have a look at the argmin function? It delivers the indices of 
  the minimum values along an axis. Untested guess:
 
  dmin_idx = argmin(dsum, axis = -1)
  best = data[..., dmin_idx, :]
 
 Ah, sorry, my example is incorrect. I was actually using 'argmin', but 
 indexing
 with it does not exactly work as I expected:
 
  d1.shape
 (2, 5, 10)
  dsum = d1.sum(axis = -1)
  dmin = d1.argmin(axis = -1)
  dmin.shape
 (2,)
  d1_best = d1[...,dmin,:]

 You need to use fancy indexing. Something like:
 d1_best = d1[np.arange(2), dmin,:]

 Because the Ellipsis takes everything from the axis, while you want to
 pick from multiple axes at the same time. That can be achieved with
 fancy indexing (indexing with arrays). From another perspective, you
 want to get rid of two axes in favor of a new one, but a slice/Ellipsis
 always preserves the axis it works on.

 Nice, thanks. That works for this specific example, but I couldn't get it to
 work with d1.shape == (1, 2, 16, 5, 10) (thus dmin.shape == (1, 2, 16)):

 def get_best_run (data, field):
 ... Returns the best run.
 ... data = data.view(np.ndarray)
 ... assert data.ndim = 2
 ... dsum = data[field].sum(axis=-1)
 ... dmin = dsum.argmin(axis=-1)
 ... idxs  = [ np.arange(dlen) for dlen in data.shape[:-2] ]
 ... idxs += [ dmin ]
 ... idxs += [ slice(None) ]
 ... return data[tuple(idxs)]
 d1.shape   
 (2, 5, 10)
 get_best_run(d1, time)
 (2, 10)
 d2.shape
 (1, 2, 16, 5, 10)
 get_best_run(d2, time)
 Traceback (most recent call last):
   ...
   File ./plot-user.py, line 89, in get_best_run
 res = data.view(np.ndarray)[tuple(idxs)]
 ValueError: shape mismatch: objects cannot be broadcast to a single shape


 After reading the Advanced indexing section, my understanding is that the
 elements in idxs are not broadcastable to the same shape, but I'm not sure 
 how
 I should build them to be broadcastable to what specific shape.

BTW, here's an equivalent that seems to work on all cases, although I would
prefer to avoid control code to manually fill-in the result:


 def get_best_run (data, field):
... Returns the best run.
... data = data.view(np.ndarray)
... assert data.ndim = 2
... dsum = data[field].sum(axis=-1)
... dmin = dsum.argmin(axis=-1)
...  
... res_shape = list(data.shape)
... del res_shape[-2]
... res = np.ndarray(res_shape, dtype = data.dtype)
...  
... idxs = np.unravel_index(np.arange(dmin.size), dmin.shape)
... for idx in itertools.izip(*idxs):
... isum = dsum[idx]
... imin = dmin[idx]
... idata = data[idx]
... res[idx] = data[tuple(list(idx) + [imin])]
...  
... return res
 d1.shape   
(2, 5, 10)
 get_best_run(d1, time)
(2, 10)
 d2.shape
(1, 2, 16, 5, 10)
 get_best_run(d2, time)
(1, 2, 16, 10)


Thanks,
  Lluis


  d1_best.shape
 (2, 2, 10)
 
 
 Assuming 1st dimension is the test, 2nd the run and 10th the iterations, 
 using
 this previous code with some example values:
 
  dmin
 [4 3]
  d1_best
 [[[ ... contents of d1[0,4,:] ...]
 [ ... contents of d1[0,3,:] ...]]
 [[ ... contents of d1[1,4,:] ...]
 [ ... contents of d1[1,3,:] ...]]]
 
 
 While I actually want this:
 
 [[ ... contents of d1[0,4,:] ...]
 [ ... contents of d1[1,3,:] ...]]


-- 
 And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer.
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org

[Numpy-discussion] numpythonically getting elements with the minimum sum

2013-01-28 Thread Lluís

Hi,

I have a somewhat convoluted N-dimensional array that contains information of a
set of experiments.

The last dimension has as many entries as iterations in the experiment (an
iterative application), and the penultimate dimension has as many entries as
times I have run that experiment; the rest of dimensions describe the features
of the experiment:

data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, 
NUM_ITERATIONS)

So, what I want is to get the data for the best run of each experiment:

best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)

by selecting, for each experiment, the run with the lowest total time (sum of
the time of all iterations for that experiment).


So far I've got the trivial part, but not the final indexing into data:

dsum = data.sum(axis = -1)
dmin = dsum.min(axis = -1)
best = data[???]


I'm sure there must be some numpythonic and generic way to get what I want, but
fancy indexing is beating me here :)


Thanks a lot!
  Lluis

-- 
 And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer.
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

[Numpy-discussion] numpythonically getting elements with the minimum sum

6 matches

Site Navigation

Mail list logo

Footer information