Re: [Numpy-discussion] numpythonically getting elements with the minimum sum
Am 28.1.2013 um 23:15 schrieb Lluís: Hi, I have a somewhat convoluted N-dimensional array that contains information of a set of experiments. The last dimension has as many entries as iterations in the experiment (an iterative application), and the penultimate dimension has as many entries as times I have run that experiment; the rest of dimensions describe the features of the experiment: data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) So, what I want is to get the data for the best run of each experiment: best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) by selecting, for each experiment, the run with the lowest total time (sum of the time of all iterations for that experiment). So far I've got the trivial part, but not the final indexing into data: dsum = data.sum(axis = -1) dmin = dsum.min(axis = -1) best = data[???] I'm sure there must be some numpythonic and generic way to get what I want, but fancy indexing is beating me here :) Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess: dmin_idx = argmin(dsum, axis = -1) best = data[..., dmin_idx, :] Gregor ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpythonically getting elements with the minimum sum
Gregor Thalhammer writes: Am 28.1.2013 um 23:15 schrieb Lluís: Hi, I have a somewhat convoluted N-dimensional array that contains information of a set of experiments. The last dimension has as many entries as iterations in the experiment (an iterative application), and the penultimate dimension has as many entries as times I have run that experiment; the rest of dimensions describe the features of the experiment: data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) So, what I want is to get the data for the best run of each experiment: best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) by selecting, for each experiment, the run with the lowest total time (sum of the time of all iterations for that experiment). So far I've got the trivial part, but not the final indexing into data: dsum = data.sum(axis = -1) dmin = dsum.min(axis = -1) best = data[???] I'm sure there must be some numpythonic and generic way to get what I want, but fancy indexing is beating me here :) Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess: dmin_idx = argmin(dsum, axis = -1) best = data[..., dmin_idx, :] Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing with it does not exactly work as I expected: d1.shape (2, 5, 10) dsum = d1.sum(axis = -1) dmin = d1.argmin(axis = -1) dmin.shape (2,) d1_best = d1[...,dmin,:] d1_best.shape (2, 2, 10) Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using this previous code with some example values: dmin [4 3] d1_best [[[ ... contents of d1[0,4,:] ...] [ ... contents of d1[0,3,:] ...]] [[ ... contents of d1[1,4,:] ...] [ ... contents of d1[1,3,:] ...]]] While I actually want this: [[ ... contents of d1[0,4,:] ...] [ ... contents of d1[1,3,:] ...]] Thanks, Lluis -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpythonically getting elements with the minimum sum
On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote: Gregor Thalhammer writes: Am 28.1.2013 um 23:15 schrieb Lluís: Hi, I have a somewhat convoluted N-dimensional array that contains information of a set of experiments. The last dimension has as many entries as iterations in the experiment (an iterative application), and the penultimate dimension has as many entries as times I have run that experiment; the rest of dimensions describe the features of the experiment: data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) So, what I want is to get the data for the best run of each experiment: best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) by selecting, for each experiment, the run with the lowest total time (sum of the time of all iterations for that experiment). So far I've got the trivial part, but not the final indexing into data: dsum = data.sum(axis = -1) dmin = dsum.min(axis = -1) best = data[???] I'm sure there must be some numpythonic and generic way to get what I want, but fancy indexing is beating me here :) Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess: dmin_idx = argmin(dsum, axis = -1) best = data[..., dmin_idx, :] Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing with it does not exactly work as I expected: d1.shape (2, 5, 10) dsum = d1.sum(axis = -1) dmin = d1.argmin(axis = -1) dmin.shape (2,) d1_best = d1[...,dmin,:] You need to use fancy indexing. Something like: d1_best = d1[np.arange(2), dmin,:] Because the Ellipsis takes everything from the axis, while you want to pick from multiple axes at the same time. That can be achieved with fancy indexing (indexing with arrays). From another perspective, you want to get rid of two axes in favor of a new one, but a slice/Ellipsis always preserves the axis it works on. d1_best.shape (2, 2, 10) Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using this previous code with some example values: dmin [4 3] d1_best [[[ ... contents of d1[0,4,:] ...] [ ... contents of d1[0,3,:] ...]] [[ ... contents of d1[1,4,:] ...] [ ... contents of d1[1,3,:] ...]]] While I actually want this: [[ ... contents of d1[0,4,:] ...] [ ... contents of d1[1,3,:] ...]] Thanks, Lluis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpythonically getting elements with the minimum sum
Sebastian Berg writes: On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote: Gregor Thalhammer writes: Am 28.1.2013 um 23:15 schrieb Lluís: Hi, I have a somewhat convoluted N-dimensional array that contains information of a set of experiments. The last dimension has as many entries as iterations in the experiment (an iterative application), and the penultimate dimension has as many entries as times I have run that experiment; the rest of dimensions describe the features of the experiment: data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) So, what I want is to get the data for the best run of each experiment: best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) by selecting, for each experiment, the run with the lowest total time (sum of the time of all iterations for that experiment). So far I've got the trivial part, but not the final indexing into data: dsum = data.sum(axis = -1) dmin = dsum.min(axis = -1) best = data[???] I'm sure there must be some numpythonic and generic way to get what I want, but fancy indexing is beating me here :) Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess: dmin_idx = argmin(dsum, axis = -1) best = data[..., dmin_idx, :] Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing with it does not exactly work as I expected: d1.shape (2, 5, 10) dsum = d1.sum(axis = -1) dmin = d1.argmin(axis = -1) dmin.shape (2,) d1_best = d1[...,dmin,:] You need to use fancy indexing. Something like: d1_best = d1[np.arange(2), dmin,:] Because the Ellipsis takes everything from the axis, while you want to pick from multiple axes at the same time. That can be achieved with fancy indexing (indexing with arrays). From another perspective, you want to get rid of two axes in favor of a new one, but a slice/Ellipsis always preserves the axis it works on. Nice, thanks. That works for this specific example, but I couldn't get it to work with d1.shape == (1, 2, 16, 5, 10) (thus dmin.shape == (1, 2, 16)): def get_best_run (data, field): ... Returns the best run. ... data = data.view(np.ndarray) ... assert data.ndim = 2 ... dsum = data[field].sum(axis=-1) ... dmin = dsum.argmin(axis=-1) ... idxs = [ np.arange(dlen) for dlen in data.shape[:-2] ] ... idxs += [ dmin ] ... idxs += [ slice(None) ] ... return data[tuple(idxs)] d1.shape (2, 5, 10) get_best_run(d1, time) (2, 10) d2.shape (1, 2, 16, 5, 10) get_best_run(d2, time) Traceback (most recent call last): ... File ./plot-user.py, line 89, in get_best_run res = data.view(np.ndarray)[tuple(idxs)] ValueError: shape mismatch: objects cannot be broadcast to a single shape After reading the Advanced indexing section, my understanding is that the elements in idxs are not broadcastable to the same shape, but I'm not sure how I should build them to be broadcastable to what specific shape. Thanks a lot, Lluis d1_best.shape (2, 2, 10) Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using this previous code with some example values: dmin [4 3] d1_best [[[ ... contents of d1[0,4,:] ...] [ ... contents of d1[0,3,:] ...]] [[ ... contents of d1[1,4,:] ...] [ ... contents of d1[1,3,:] ...]]] While I actually want this: [[ ... contents of d1[0,4,:] ...] [ ... contents of d1[1,3,:] ...]] -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpythonically getting elements with the minimum sum
Lluís writes: Sebastian Berg writes: On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote: Gregor Thalhammer writes: Am 28.1.2013 um 23:15 schrieb Lluís: Hi, I have a somewhat convoluted N-dimensional array that contains information of a set of experiments. The last dimension has as many entries as iterations in the experiment (an iterative application), and the penultimate dimension has as many entries as times I have run that experiment; the rest of dimensions describe the features of the experiment: data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) So, what I want is to get the data for the best run of each experiment: best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) by selecting, for each experiment, the run with the lowest total time (sum of the time of all iterations for that experiment). So far I've got the trivial part, but not the final indexing into data: dsum = data.sum(axis = -1) dmin = dsum.min(axis = -1) best = data[???] I'm sure there must be some numpythonic and generic way to get what I want, but fancy indexing is beating me here :) Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess: dmin_idx = argmin(dsum, axis = -1) best = data[..., dmin_idx, :] Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing with it does not exactly work as I expected: d1.shape (2, 5, 10) dsum = d1.sum(axis = -1) dmin = d1.argmin(axis = -1) dmin.shape (2,) d1_best = d1[...,dmin,:] You need to use fancy indexing. Something like: d1_best = d1[np.arange(2), dmin,:] Because the Ellipsis takes everything from the axis, while you want to pick from multiple axes at the same time. That can be achieved with fancy indexing (indexing with arrays). From another perspective, you want to get rid of two axes in favor of a new one, but a slice/Ellipsis always preserves the axis it works on. Nice, thanks. That works for this specific example, but I couldn't get it to work with d1.shape == (1, 2, 16, 5, 10) (thus dmin.shape == (1, 2, 16)): def get_best_run (data, field): ... Returns the best run. ... data = data.view(np.ndarray) ... assert data.ndim = 2 ... dsum = data[field].sum(axis=-1) ... dmin = dsum.argmin(axis=-1) ... idxs = [ np.arange(dlen) for dlen in data.shape[:-2] ] ... idxs += [ dmin ] ... idxs += [ slice(None) ] ... return data[tuple(idxs)] d1.shape (2, 5, 10) get_best_run(d1, time) (2, 10) d2.shape (1, 2, 16, 5, 10) get_best_run(d2, time) Traceback (most recent call last): ... File ./plot-user.py, line 89, in get_best_run res = data.view(np.ndarray)[tuple(idxs)] ValueError: shape mismatch: objects cannot be broadcast to a single shape After reading the Advanced indexing section, my understanding is that the elements in idxs are not broadcastable to the same shape, but I'm not sure how I should build them to be broadcastable to what specific shape. BTW, here's an equivalent that seems to work on all cases, although I would prefer to avoid control code to manually fill-in the result: def get_best_run (data, field): ... Returns the best run. ... data = data.view(np.ndarray) ... assert data.ndim = 2 ... dsum = data[field].sum(axis=-1) ... dmin = dsum.argmin(axis=-1) ... ... res_shape = list(data.shape) ... del res_shape[-2] ... res = np.ndarray(res_shape, dtype = data.dtype) ... ... idxs = np.unravel_index(np.arange(dmin.size), dmin.shape) ... for idx in itertools.izip(*idxs): ... isum = dsum[idx] ... imin = dmin[idx] ... idata = data[idx] ... res[idx] = data[tuple(list(idx) + [imin])] ... ... return res d1.shape (2, 5, 10) get_best_run(d1, time) (2, 10) d2.shape (1, 2, 16, 5, 10) get_best_run(d2, time) (1, 2, 16, 10) Thanks, Lluis d1_best.shape (2, 2, 10) Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using this previous code with some example values: dmin [4 3] d1_best [[[ ... contents of d1[0,4,:] ...] [ ... contents of d1[0,3,:] ...]] [[ ... contents of d1[1,4,:] ...] [ ... contents of d1[1,3,:] ...]]] While I actually want this: [[ ... contents of d1[0,4,:] ...] [ ... contents of d1[1,3,:] ...]] -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org
[Numpy-discussion] numpythonically getting elements with the minimum sum
Hi, I have a somewhat convoluted N-dimensional array that contains information of a set of experiments. The last dimension has as many entries as iterations in the experiment (an iterative application), and the penultimate dimension has as many entries as times I have run that experiment; the rest of dimensions describe the features of the experiment: data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) So, what I want is to get the data for the best run of each experiment: best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) by selecting, for each experiment, the run with the lowest total time (sum of the time of all iterations for that experiment). So far I've got the trivial part, but not the final indexing into data: dsum = data.sum(axis = -1) dmin = dsum.min(axis = -1) best = data[???] I'm sure there must be some numpythonic and generic way to get what I want, but fancy indexing is beating me here :) Thanks a lot! Lluis -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion