Re: [Numpy-discussion] Data cube optimization for combination

2012-03-06 Thread Sebastian Berg
Hello,

On Tue, 2012-03-06 at 13:00 +0100, Jose Miguel Ibáñez wrote:
> Hello everyone,
> 
> does anyone know of an efficient implementation (maybe using
> numpy.where statement) of the next code for data cube (3d array)
> combining ?
> 
You use the axis keyword/argument to sum, at which point you want to
cast (if you do) to float32 I don't know.
result = np.sqrt(cube).sum(axis=0)

> import numpy as np
> 
> def combine( )
> 
>   cube = np.random.rand(32,2048,2048)
>   result = np.zeros([2048,2048], np.float32)
> 
>for ii in range(2048):
>for jj in range(2048):
> result[, ii, jj] = np.sqrt((cube[:,ii, jj])).sum()
> 
> 
> It takes long time to run, however,
> 
> 
> >> result = np.median(cube,0)
> 
> 
> only around one second ! where is the point ? any suggestions ?
> 
> 
> 
> Thanks !
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in as_strided/reshape

2012-08-09 Thread Sebastian Berg
Hello,

looking at the code, when only adding/removing dimensions with size 1,
numpy takes a small shortcut, however it uses 0 stride lengths as value
for the new one element dimensions temporarily, then replacing it again
to ensure the new array is contiguous.
This replacing does not check if the dimension has more then size 1.
Likely there is a better way to fix it, but the attached diff should do
it.

Regards,

Sebastian

On Do, 2012-08-09 at 13:06 +, Dave Hirschfeld wrote:
> Dave Hirschfeld  gmail.com> writes:
> 
> > 
> > It seems that reshape doesn't work correctly on an array which has been
> > resized using the 0-stride trick e.g.
> > 
> > In [73]: x = array([5])
> > 
> > In [74]: y = as_strided(x, shape=(10,), strides=(0,))
> > 
> > In [75]: y
> > Out[75]: array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5])
> > 
> > In [76]: y.reshape([10,1])
> > Out[76]: 
> > array([[  5],
> >[  8],
> >[  762933412],
> >[-2013265919],
> >[ 26],
> >[ 64],
> >[  762933414],
> >[-2013244356],
> >[ 26],
> >[ 64]]) < Should all be 5
> > 
> > In [77]: y.copy().reshape([10,1])
> > Out[77]: 
> > array([[5],
> >[5],
> >[5],
> >[5],
> >[5],
> >[5],
> >[5],
> >[5],
> >[5],
> >[5]])--- a/numpy/core/src/multiarray/shape.c
+++ b/numpy/core/src/multiarray/shape.c
@@ -273,21 +273,21 @@ PyArray_Newshape(PyArrayObject *self, PyArray_Dims
*newdims,
  * appropriate value to preserve contiguousness
  */
 if (order == NPY_FORTRANORDER) {
-if (strides[0] == 0) {
+if ((strides[0] == 0) && (dimensions[0] == 1)) {
 strides[0] = PyArray_DESCR(self)->elsize;
 }
 for (i = 1; i < ndim; i++) {
-if (strides[i] == 0) {
+if ((strides[i] == 0) && (dimensions[i] == 1)) {
 strides[i] = strides[i-1] * dimensions[i-1];
 }
 }
 }
 else {
-if (strides[ndim-1] == 0) {
+if ((strides[ndim-1] == 0) && (dimensions[ndim-1] == 1)) {
 strides[ndim-1] = PyArray_DESCR(self)->elsize;
 }
 for (i = ndim - 2; i > -1; i--) {
-if (strides[i] == 0) {
+if ((strides[i] == 0) && (dimensions[i] == 1)) {
 strides[i] = strides[i+1] * dimensions[i+1];
 }
 }
> > 
> > In [78]: np.__version__
> > Out[78]: '1.6.2'
> > 
> > Perhaps a clause such as below is required in reshape?
> > 
> > if any(stride == 0 for stride in y.strides):
> > return y.copy().reshape(shape)
> > else:
> > return y.reshape(shape)
> > 
> > Regards,
> > Dave
> > 
> 
> Though it would be good to avoid the copy which you should be able to do in 
> this 
> case. Investigating further:
> 
> In [15]: y.strides
> Out[15]: (0,)
> 
> In [16]: z = y.reshape([10,1])
> 
> In [17]: z.strides
> Out[17]: (4, 4)
> 
> In [18]: z.strides = (0, 4)
> 
> In [19]: z
> Out[19]: 
> array([[5],
>[5],
>[5],
>[5],
>[5],
>[5],
>[5],
>[5],
>[5],
>[5]])
> 
> In [32]: y.reshape([5, 2])
> Out[32]: 
> array([[5, 5],
>[5, 5],
>[5, 5],
>[5, 5],
>    [5, 5]])
> 
> In [33]: y.reshape([5, 2]).strides
> Out[33]: (0, 0)
> 
> So it seems that reshape is incorrectly setting the stride of axis0 to 4, but 
> only when the appended axis is of size 1.
> 
> -Dave
> 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

>From eed2abca6144e16c5d9ca208ef90dd01f7dd6009 Mon Sep 17 00:00:00 2001
From: Sebastian Berg 
Date: Thu, 9 Aug 2012 17:17:32 +0200
Subject: [PATCH] Fix reshaping of arrays with stride 0 in a dimension with
 size of more then 1.

---
 numpy/core/src/multiarray/shape.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/numpy/core/src/multiarray/shape.c b/numpy/core/src/multiarray/shape.c
index 0672326..09a6cb0 100644
--- a/numpy/core/src/multiarray/shape.c
+++ b/numpy/core/src/multiarray/shape.c
@@ -273,21 +273,21 @@ PyArray_Newshape(PyArrayObject *self, PyArray_Dims *newdims,
  * appropriate value to preserve contiguousness
 

Re: [Numpy-discussion] Multidimensional neighbours

2012-08-16 Thread Sebastian Berg
Hello,

Just to throw it in, if you do not mind useing scipy, you can use its
multidimensional correlate method instead:

stamp = np.ones((3,3,3))
stamp[1,1,1] = 0
num_neighbours = scipy.ndimage.correlate(x, stamp, mode='wrap'))

In the link np.roll is used to implement periodic boundaries
(mode='wrap' when using correlate), if you want constant boundary
conditions padding is probably simplest. Maybe someone else has some
trick to avoid so much code duplication without using scipy...

Regards,

Sebastian

On Do, 2012-08-16 at 18:46 +0100, Jose Gomez-Dans wrote:
> Hi,
> I've just come across Travis Oliphant's array version of the game of
> life here . It's a really nice
> example of how to efficiently find neighbours using numpy and family.
> However, I wanted to apply these ideas to higher order arrays (eg GRID
> being three dimensional rather than two dimensional). My previous
> attempts at this include the (sorry for the ensuing horror!)
>   neighbours = np.array( [ \
> # Lower row...
> x[:-2, :-2, :-2], x[:-2, :-2, 1:-1], x[:-2, :-2, 2: ], 
> x[:-2, 1:-1,:-2], x[:-2, 1:-1, 1:-1], x[:-2, 1:-1, 2:], 
> x[:-2, 2:,:-2], x[:-2, 2:, 1:-1], x[:-2, 2:, 2:], 
> # Middle row
> x[1:-1, :-2, :-2], x[1:-1, :-2, 1:-1], x[1:-1, :-2, 2: ], 
> x[1:-1, 1:-1,:-2], x[1:-1, 1:-1, 2:], 
> x[1:-1, 2:,:-2], x[1:-1, 2:, 1:-1], x[1:-1, 2:, 2:], 
> # Top row
> x[2:, :-2, :-2], x[2:, :-2, 1:-1], x[2:, :-2, 2: ], 
> x[2:, 1:-1,:-2], x[2:, 1:-1, 1:-1], x[2:, 1:-1, 2:], 
> x[2:, 2:,:-2], x[2:, 2:, 1:-1], x[2:, 2:, 2:] ] )
> 
> 
> I think this works (it's been a while! ;D), but it's a bit messy and
> the boundary condition is a bit crude. I was wondering whether there's
> some other nice way of doing it, like demonstrated in the above link.
> Also, what if you want to have a larger neighbourhood (say 5x5x2
> instead of 3x3x3)?
> 
> 
> While I appreciate these index tricks, I also find them quite mind
> boggling!
> Thanks!
> Jose
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Functions for stride manipulations

2012-08-19 Thread Sebastian Berg
Hey,

Inspired by an existing PR into numpy, I created two functions based on
stride_tricks which I thought might be useful inside numpy. So if I get
any feedback/pointers, I would add some tests and create a PR.

The first function rolling_window is to create for every point of the
original array, a view of its neighbourhood in arbitrary dimensions with
some options to make it easy to use.
For example for a 3 dim array:
rolling_window(a, (3,3,3), asteps=(3,3,3)).max((3,4,5))
Gives the maximum for all non-overlapping 3x3x3 subarrays and:
rolling_window(a, 3, axes=0).max(-1) would create the rolling maximum
over all 3 successive values along the 0th axis.

Plus a function permute_axes which allows to give a tuple for switching
axes and can also merge them. A (10, 4, 3) shaped array would give:
permute_axes(a, (2, 1, 0)).shape -> (3, 4, 10)
permute_axes(a, (1, (0, 2))).shape -> (4, 30)
A copy is created when necessary, this basically allows to do multiple
swap_axes and a reshape in a single call.

I hope this might be useful...

Regards,

Sebastian
import numpy as np

def rolling_window(array, window=(0,), asteps=None, wsteps=None, axes=None, intersperse=False):
"""Create a view of `array` which for every point gives the n-dimensional
neighbourhood of size window. New dimensions are added at the end of
`array` or after the corresponding original dimension.

Parameters
--
array : array_like
Array to which the rolling window is applied.
window : int or tuple
Either a single integer to create a window of only the last axis or a
tuple to create it for the last len(window) axes. 0 can be used as a
to ignore a dimension in the window.
asteps : tuple
Aligned at the last axis, new steps for the original array, ie. for
creation of non-overlapping windows. (Equivalent to slicing result)
wsteps : int or tuple (same size as window)
steps for the added window dimensions. These can be 0 to repeat values
along the axis.
axes: int or tuple
If given, must have the same size as window. In this case window is
interpreted as the size in the dimension given by axes. IE. a window
of (2, 1) is equivalent to window=2 and axis=-2.   
intersperse : bool
If True, the new dimensions are right after the corresponding original
dimension, instead of at the end of the array.

Returns
---
A view on `array` which is smaller to fit the windows and has windows added
dimensions (0s not counting), ie. every point of `array` is an array of size
window.

Examples

>>> a = np.arange(9).reshape(3,3)
>>> rolling_window(a, (2,2))
array(0, 1],
 [3, 4]],

[[1, 2],
 [4, 5]]],


   [[[3, 4],
 [6, 7]],

[[4, 5],
 [7, 8)

Or to create non-overlapping windows, but only along the first dimension:
>>> rolling_window(a, (2,0), asteps=(2,1))
array([[[0, 3],
[1, 4],
[2, 5]]])

Note that the 0 is discared, so that the output dimension is 3:
>>> rolling_window(a, (2,0), asteps=(2,1)).shape
(1, 3, 2)

This is useful for example to calculate the maximum in all (overlapping)
2x2 submatrixes:
>>> rolling_window(a, (2,2)).max((2,3))
array([[4, 5],
   [7, 8]])
   
Or delay embedding (3D embedding with delay 2):
>>> x = np.arange(10)
>>> rolling_window(x, 3, wsteps=2)
array([[0, 2, 4],
   [1, 3, 5],
   [2, 4, 6],
   [3, 5, 7],
   [4, 6, 8],
   [5, 7, 9]])
"""
array = np.asarray(array)
orig_shape = np.asarray(array.shape)
window = np.atleast_1d(window).astype(int) # maybe crude to cast to int...

if axes is not None:
axes = np.atleast_1d(axes)
w = np.zeros(array.ndim, dtype=int)
for axis, size in zip(axes, window):
w[axis] = size
window = w

# Check if window is legal:
if window.ndim > 1:
raise ValueError("`window` must be one-dimensional.")
if np.any(window < 0):
raise ValueError("All elements of `window` must be larger then 1.")
if len(array.shape) < len(window):
raise ValueError("`window` length must be less or equal `array` dimension.") 

_asteps = np.ones_like(orig_shape)
if asteps is not None:
asteps = np.atleast_1d(asteps)
if asteps.ndim != 1:
raise ValueError("`asteps` must be either a scalar or one dimensional.")
if len(asteps) > array.ndim:
raise ValueError("`asteps` cannot be longer then the `array` dimension.")
# does not enforce alignment, so that steps can be same as window too.
_asteps[-len(asteps):] = asteps

if np.any(asteps < 1):
 raise ValueError("All elements of `asteps` must be larger then 1."

Re: [Numpy-discussion] Difference between np.loadtxt depending on whether you supply a file object or a filename

2012-08-20 Thread Sebastian Berg
Hello,

On Mo, 2012-08-20 at 20:55 +1000, Andrew Nelson wrote:
> Dear list,
> I observe a difference when I try to load a 2D numpy array from a file
> object compared to if I supply a filename viz:
> 
> 
> >>> np.version.version
> '1.5.1'
> >>> f=open('fit_theoretical.txt')
> >>> a=np.loadtxt(f)
> >>> a.shape
> (1000,)
> >>> a=np.loadtxt('fit_theoretical.txt')
> >>> a.shape
> (500, 2)

There is actually a difference between the two, because np.loadtxt opens
the file with open(..., 'U'), which means that newlines might be
interpreted differently (because of difference between linux/windows/mac
newlines using \n, \r, etc. So the problem is that newlines are
interpreted wrong for you if you open with just the default mode.

Regards,

Sebastian
> 
> This strikes me as unexpected, it's not a documented behaviour. Any
> ideas?
> 
> 
> cheers,
> Andrew.
> 
> 
> 
> 
> -- 
> _
> Dr. Andrew Nelson
> 
> 
> _
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how is y += x computed when y.strides = (0, 8) and x.strides=(16, 8) ?

2012-09-05 Thread Sebastian Berg
Hey,

No idea if this is simply not support or just a bug, though I am
guessing that such usage simply is not planned. However, this also has
to do with buffering, so unless the behaviour is substantially changed,
I would not expect even predictable results. I have used things like
a[1:] += a[:-1] (cumsum is better here, and clear as to what it does),
but this is different, as here the same data is read after being
written.

And here an example showing the buffer is involved:

In [14]: np.setbufsize(1024)
Out[14]: 1024

In [19]: x = numpy.arange(6*1000).reshape(-1,2)

In [20]: y = numpy.arange(2)

In [21]: u,v = numpy.broadcast_arrays(x, y)

In [22]: v += u

In [23]: v
Out[23]: 
array([[21348, 21355],
   [21348, 21355],
   [21348, 21355],
   ..., 
   [21348, 21355],
   [21348, 21355],
   [21348, 21355]])

In [24]: np.setbufsize(100)
Out[24]: 1024 # note it gives old bufsize...

In [25]: x = numpy.arange(6*1000).reshape(-1,2)

In [26]: y = numpy.arange(2)

In [27]: u,v = numpy.broadcast_arrays(x, y)

In [28]: v += u

In [29]: v
Out[29]: 
array([[5998, 6000],
   [5998, 6000],
   [5998, 6000],
   ..., 
   [5998, 6000],
   [5998, 6000],
   [5998, 6000]])

On Thu, 2012-09-06 at 01:55 +0200, Friedrich Romstedt wrote:
> Poor Sebastian, you make the mistake of asking difficult questions.
> 
> I noticed that it should be [6, 10] not [6, 12], and in fact is with 
> numpy-1.4.1; while I observe the [4, 6] result with numpy-1.6.1.  Logs follow:
> 
> numpy-1.4.1 in Python-2.6.5 on Mac (intel 64bit) with Python + numpy built 
> from sources dual-arch.  The result in terms of output does not depend on the 
> architecture chosen for run.  The other is numpy-1.6.1 with Python-2.7.2.
> 
> numpy-1.4.1 (64bit Python 2.6.5):
> 
> Python 2.6.5 (r265:79063, Jul 18 2010, 12:14:53) 
> [GCC 4.2.1 (Apple Inc. build 5659)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import numpy
> >>> print numpy.__version__
> 1.4.1
> >>> import numpy
> >>> 
> >>> x = numpy.arange(6).reshape((3,2))
> >>> y = numpy.arange(2)
> >>> 
> >>> print 'x=\n', x
> x=
> [[0 1]
>  [2 3]
>  [4 5]]
> >>> print 'y=\n', y
> y=
> [0 1]
> >>> 
> >>> u,v = numpy.broadcast_arrays(x, y)
> >>> 
> >>> print 'u=\n', u
> u=
> [[0 1]
>  [2 3]
>  [4 5]]
> >>> print 'v=\n', v
> v=
> [[0 1]
>  [0 1]
>  [0 1]]
> >>> print 'v.strides=\n', v.strides
> v.strides=
> (0, 8)
> >>> 
> >>> v += u
> >>> 
> >>> print 'v=\n', v  # expectation: v = [[6,12], [6,12], [6,12]]
> v=
> [[ 6 10]
>  [ 6 10]
>  [ 6 10]]
> >>> print 'u=\n', u
> u=
> [[0 1]
>  [2 3]
>  [4 5]]
> >>> print 'y=\n', y  # expectation: y = [6,12]
> y=
> [ 6 10]
> 
> And numpy-1.6.1 (64bit Python-2.7.2):
> 
> Python 2.7.2 (default, Mar 15 2012, 15:42:23) 
> [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> [fiddling with sys.maxint edited out]
> >>> import numpy
> >>> 
> >>> x = numpy.arange(6).reshape((3,2))
> >>> y = numpy.arange(2)
> >>> 
> >>> print 'x=\n', x
> x=
> [[0 1]
>  [2 3]
>  [4 5]]
> >>> print 'y=\n', y
> y=
> [0 1]
> >>> 
> >>> u,v = numpy.broadcast_arrays(x, y)
> >>> 
> >>> print 'u=\n', u
> u=
> [[0 1]
>  [2 3]
>  [4 5]]
> >>> print 'v=\n', v
> v=
> [[0 1]
>  [0 1]
>  [0 1]]
> >>> print 'v.strides=\n', v.strides
> v.strides=
> (0, 8)
> >>> 
> >>> v += u
> >>> 
> >>> print 'v=\n', v  # expectation: v = [[6,12], [6,12], [6,12]]
> v=
> [[4 6]
>  [4 6]
>  [4 6]]
> >>> print 'u=\n', u
> u=
> [[0 1]
>  [2 3]
>  [4 5]]
> >>> print 'y=\n', y  # expectation: y = [6,12]
> y=
> [4 6]
> 
> Maybe this helps bisecting it.
> 
> Friedrich.
> 
> P.S.: I took the time scrolling through the tickets, with an empty set 
> resulting (first three pages by date or so).  This does not mean such a 
> ticket does not exist.  Also the docs are rather quiet about this (e.g. 
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.__iadd__.html#numpy.ndarray.__iadd__,
>  or 
> http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#arithmetic-and-comparison-operations).
> 
> My guess is that v is duplicated during the in-place addition (including 
> strides), and the results of the calculation (using the original v) are 
> written to that copy.  After that, the copy's data would be [4, 6] with the 
> strides as for v.  So copying over to v's "original" data or letting it point 
> to the copy and freeing the remaining copy would explain the behaviour after 
> all.  Maybe there's a sensible explanation for why it should be done that way.
> 
> It is and remains a guess after all.
> 
> I find the problem difficult to term after all.  Maybe "In-place operation 
> not in-place for zero-strided first operand?"  I don't think it is easy to 
> get a good notion of an "in-place operation" for a zero-strided first 
> operand.  It might be that your definition does just not match up.  But I 
> think in-place operations should be expected to be independent on the order

Re: [Numpy-discussion] numpy.ma.MaskedArray.min() makes a copy?

2012-09-18 Thread Sebastian Berg
On Tue, 2012-09-18 at 08:42 -1000, Eric Firing wrote:
> On 2012/09/18 7:40 AM, Benjamin Root wrote:
> >
> >
> > On Fri, Sep 7, 2012 at 12:05 PM, Nathaniel Smith  > > wrote:
> >
> > On 7 Sep 2012 14:38, "Benjamin Root"  > > wrote:
> >  >
> >  > An issue just reported on the matplotlib-users list involved a
> > user who ran out of memory while attempting to do an imshow() on a
> > large array.  While this wouldn't be totally unexpected, the user's
> > traceback shows that they ran out of memory before any actual
> > building of the image occurred.  Memory usage sky-rocketed when
> > imshow() attempted to determine the min and max of the image.  The
> > input data was a masked array, and it appears that the
> > implementation of min() for masked arrays goes something like this
> > (paraphrasing here):
> >  >
> >  > obj.filled(inf).min()
> >  >
> >  > The idea is that any masked element is set to the largest
> > possible value for their dtype in a copied array of itself, and then
> > a min() is performed on that copied array.  I am assuming that max()
> > does the same thing.
> >  >
> >  > Can this be done differently/more efficiently?  If the "filled"
> > approach has to be done, maybe it would be a good idea to make the
> > copy in chunks instead of all at once?  Ideally, it would be nice to
> > avoid the copying altogether and utilize some of the special
> > iterators that Mark Weibe created last year.
> >
> > I think what you're looking for is where= support for ufunc.reduce.
> > This isn't implemented yet but at least it's straightforward in
> > principle... otherwise I don't know anything better than
> > reimplementing .min() by hand.
> >
> > -n
> >
> >
> >
> > Yes, it was the where= support that I was thinking of.  I take it that
> > it was pulled out of the 1.7 branch with the rest of the NA stuff?
> 
> The where= support was left in:
> http://docs.scipy.org/doc/numpy/reference/ufuncs.html
> 

It seems though that the keyword argument is still missing from the
ufunc help (`help(np.add)` and individual `np.info(np.add)`) though.


> See also get_ufunc_arguments in ufunc_object.c.
> 
> Eric
> 
> 
> >
> > Ben Root
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.delete fix

2012-09-20 Thread Sebastian Berg
Hey,

I have written a small PR, to fix np.delete, since it would change the
behavior a little (to the better IMO) I think I should also write to the
list? So here is the problem with np.delete:

1. When using slices with negative strides, it does not work (best case)
or give even wrong results.
2. When using an index array, it ignores negative indexes.
3. The fact that it uses setdiff1d makes it unnecessarily slow.

https://github.com/numpy/numpy/pull/452/files fixes these things.

The change is that out of bounds indices would raise an Exception (even
if one might say that they do not matter to deletion), however I
consider that a feature.

And a small example how badly its wrong with slices:
In [1]: arr = np.arange(4)

In [2]: set(arr) - set(arr[1::-1])
Out[2]: set([2, 3])

In [3]: np.delete(arr, np.s_[1::-1])
Out[3]: array([3, 3])

Regards,

Sebastian

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ZeroRank memmap behavior?

2012-09-21 Thread Sebastian Berg
Hey,

this is indirectly related (I think it might fix many of these memmap
oddities though?)...

Why does the memmap object not implement:

def __array_wrap__(self, obj):
if self is obj:
return obj
return np.array(obj, copy=False, subok=False)

By doing so if we have a ufunc with only memmap objects, the result will
not be a memmap object, but if the ufunc has an output parameter, then
"self is obj" which means that it is not casted. The problem is that
subclass automatically inherit an __array_wrap__ method that sets the
result to the subclasses type (which is not generally wanted for
memmaps). May make a PR for this...

Regards,

Sebastian


On Fri, 2012-09-21 at 14:59 +0200, Wim Bakker wrote:
> I'm deeply puzzled by the recently changed behavior of zero-rank
> memmaps. I think this change happened from version 1.6.0 to 1.6.1,
> which I'm currently using.
> 
> 
> >>> import numpy as np
> 
> 
> Create a zero-rank memmap.
> 
> 
> >>> x = np.memmap(filename='/tmp/m', dtype=float, mode='w+', shape=())
> 
> 
> Give it a value:
> 
> 
> >>> x[...] = 22
> >>> x
> memmap(22.0)
> 
> 
> So far so good. But now:
> 
> 
> >>> b = (x + x) / 1.5
> >>> b
> memmap(29.332)
> 
> 
> WT.? Why is the result of this calculation a memmap?
> 
> 
> It even thinks that it's still linked to the file, but it's not:
> 
> 
> >>> b.filename
> '/tmp/m'
> 
> 
> If I try this with arrays then I don't get this weird behavior:
> 
> 
> >>> a = np.array(2, dtype=float)
> 
> 
> >>> (a + a) / 2.5
> 1.6001
> 
> 
> which gives me a Python float, not a zero-rank array.
> 
> 
> Why does the memmap behave like that? Why do I get a memmap even
> though it's not connected to any file?
> 
> 
> Regards,
> 
> 
> Wim
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Ignore axes with dimension==1 for contiguous flags

2012-09-22 Thread Sebastian Berg
Hey,

Numpy currently assumes that if "ndim > 1" then it is impossible for any
array to be both C- and F-contiguous, however an axes of dimension 1
does have no effect on the memory layout. I think I have made most
important changes (actually really very few), though I bet some parts of
numpy still need adapting because of smaller quirks:

https://github.com/seberg/numpy/compare/master...cflags

This example sums up two advantages. On that branch:

In [9]: a = np.arange(9).reshape(3,3)[::3,:]

In [10]: a.flags.contiguous, a.flags.fortran
Out[10]: (True, True)

Note that currently _both_ are false, because numpy does not reset the
strides for the first dimension.

The only real problem I see is that someone who assumes that for a
contiguous array strides[0] or strides[-1] is elemsize has to change the
code or face segmentation faults, but maybe I am missing something big?

Any comments if this is the right idea? And if where would more changes
be needed?

Regards,

Sebastian

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.array execution path

2012-09-22 Thread Sebastian Berg
Hi,

I have a bit of trouble figuring this out. I would have expected
np.asarray(array) to go through ctors, PyArray_NewFromArray, but it
seems to me it does not, so which execution path is exactly taken here?
The reason I am asking is that I want to figure out this behavior/bug,
and I really am not sure which function is responsible:

In [69]: o = np.ones(3)

In [70]: no = np.asarray(o, order='C')

In [71]: no[:] = 10

In [72]: o # OK, o was changed in place:
Out[72]: array([ 10.,  10.,  10.])

In [73]: no.flags # But no claims to own its data!
Out[73]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [74]: no = np.asarray(o, order='F')

In [75]: no[:] = 11

In [76]: o # Here asarray actually returned a real copy!
Out[76]: array([ 10.,  10.,  10.])


Thanks,

Sebastian


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.array execution path

2012-09-22 Thread Sebastian Berg
Ooops obviously thanks a lot, stupid me. Thanks was also enough to
figure the rest out myself...

On Sat, 2012-09-22 at 13:12 -0500, Travis Oliphant wrote:
> Check to see if this expression is true
> 
> no is o
> 
> In the first case no and o are the same object
> 
> 
> Travis 
> 
> --
> Travis Oliphant
> (on a mobile)
> 512-826-7480
> 
> 
> On Sep 22, 2012, at 1:01 PM, Sebastian Berg  
> wrote:
> 
> > Hi,
> > 
> > I have a bit of trouble figuring this out. I would have expected
> > np.asarray(array) to go through ctors, PyArray_NewFromArray, but it
> > seems to me it does not, so which execution path is exactly taken here?
> > The reason I am asking is that I want to figure out this behavior/bug,
> > and I really am not sure which function is responsible:
> > 
> > In [69]: o = np.ones(3)
> > 
> > In [70]: no = np.asarray(o, order='C')
> > 
> > In [71]: no[:] = 10
> > 
> > In [72]: o # OK, o was changed in place:
> > Out[72]: array([ 10.,  10.,  10.])
> > 
> > In [73]: no.flags # But no claims to own its data!
> > Out[73]: 
> >  C_CONTIGUOUS : True
> >  F_CONTIGUOUS : True
> >  OWNDATA : True
> >  WRITEABLE : True
> >  ALIGNED : True
> >  UPDATEIFCOPY : False
> > 
> > In [74]: no = np.asarray(o, order='F')
> > 
> > In [75]: no[:] = 11
> > 
> > In [76]: o # Here asarray actually returned a real copy!
> > Out[76]: array([ 10.,  10.,  10.])
> > 
> > 
> > Thanks,
> > 
> > Sebastian
> > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.array execution path

2012-09-23 Thread Sebastian Berg
In case you are interested, the second (real odditiy), is caused by
ISFORTRAN and IS_F_CONTIGUOUS mixup, I have found three occurances where
I think ISFORTRAN should be replaced by the latter. Check also:

https://github.com/seberg/numpy/commit/4d2713ce8f2107d225fe291f5da6c6a75436647e


Sebastian

On Sat, 2012-09-22 at 13:12 -0500, Travis Oliphant wrote:
> Check to see if this expression is true
> 
> no is o
> 
> In the first case no and o are the same object
> 
> 
> Travis 
> 
> --
> Travis Oliphant
> (on a mobile)
> 512-826-7480
> 
> 
> On Sep 22, 2012, at 1:01 PM, Sebastian Berg  
> wrote:
> 
> > Hi,
> > 
> > I have a bit of trouble figuring this out. I would have expected
> > np.asarray(array) to go through ctors, PyArray_NewFromArray, but it
> > seems to me it does not, so which execution path is exactly taken here?
> > The reason I am asking is that I want to figure out this behavior/bug,
> > and I really am not sure which function is responsible:
> > 
> > In [69]: o = np.ones(3)
> > 
> > In [70]: no = np.asarray(o, order='C')
> > 
> > In [71]: no[:] = 10
> > 
> > In [72]: o # OK, o was changed in place:
> > Out[72]: array([ 10.,  10.,  10.])
> > 
> > In [73]: no.flags # But no claims to own its data!
> > Out[73]: 
> >  C_CONTIGUOUS : True
> >  F_CONTIGUOUS : True
> >  OWNDATA : True
> >  WRITEABLE : True
> >  ALIGNED : True
> >  UPDATEIFCOPY : False
> > 
> > In [74]: no = np.asarray(o, order='F')
> > 
> > In [75]: no[:] = 11
> > 
> > In [76]: o # Here asarray actually returned a real copy!
> > Out[76]: array([ 10.,  10.,  10.])
> > 
> > 
> > Thanks,
> > 
> > Sebastian
> > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.array execution path

2012-09-23 Thread Sebastian Berg
On Sun, 2012-09-23 at 18:54 +0100, Nathaniel Smith wrote:
> On Sat, Sep 22, 2012 at 10:24 PM, Sebastian Berg
>  wrote:
> > In case you are interested, the second (real odditiy), is caused by
> > ISFORTRAN and IS_F_CONTIGUOUS mixup, I have found three occurances where
> > I think ISFORTRAN should be replaced by the latter. Check also:
> >
> > https://github.com/seberg/numpy/commit/4d2713ce8f2107d225fe291f5da6c6a75436647e
> 
> So I guess we have this ISFORTRAN function (also exposed to
> Python[1]). It's documented as checking the rather odd condition of an
> array being in fortran-order AND having ndim > 1. Sebastian, as part
> of polishing up some of our contiguity-handling code, is suggesting
> changing this so that ISFORTRAN is true for an array that is (fortran
> order && !C order). Off the top of my head I can't think of any
> situation where *either* of these predicates is actually useful. (I
> can see why you want to check if an array is in fortran order, but not
> why it'd be natural to check whether it's in fortran order and also
> these other conditions together in one function call.) The problem is,
> this makes it hard to know whether Sebastian's change is a good idea.
> 
> Can anyone think of legitimate uses for ISFORTRAN? Or should it just
> be deprecated altogether?

Maybe I am missing things, but I think ISFORTRAN is used to decide the
order in which a new array is requested when "Anyorder" is used.
In some use cases it does not matter, but for example in these cases
(where the new array has a different shape then the original) it would
change if you just changed ISFORTRAN:

a = np.ones(4) # C/F-Contig
a.reshape(2,2, order='A')
np.add.outer(a, a, order='A')

These would return Fortran order instead of C if ISFORTRAN did not check
dimension > 1 (or equivalently !c-contig).

> 
> -n
> 
> [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.isfortran.html
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.7.0b2 release

2012-10-01 Thread Sebastian Berg
Hey,

About the imaginary part being ignored for all/any function...




> The all method fails also.
> 
> In [1]: a = zeros(5, complex)
> 
> In [2]: a.imag = 1
> 
> In [3]: a.all()
> Out[3]: False
> 
> Chuck
> 
I believe this diff fixes the issue (also posted on Tracker), I doubt
its the best way to fix the issue, but if anyone who knows this code
wants to look into it its good to know what is broken. Note that also
a.astype(bool) fails:

--- a/numpy/core/src/multiarray/lowlevel_strided_loops.c.src
+++ b/numpy/core/src/multiarray/lowlevel_strided_loops.c.src
@@ -811,9 +811,17 @@ static void
 dst_value[0] = _CONVERT_FN(src_value[0]);
 dst_value[1] = _CONVERT_FN(src_value[1]);
 #  elif !@aligned@
-dst_value = _CONVERT_FN(src_value[0]);
+#if @is_bool2@
+   dst_value = _CONVERT_FN(src_value[0]) || _CONVERT_FN(src_value[1]);
+#else
+   dst_value = _CONVERT_FN(src_value[0]);
+#endif
 #  else
-*(_TYPE2 *)dst = _CONVERT_FN(src_value[0]);
+#if @is_bool2@
+   *(_TYPE2 *)dst = _CONVERT_FN(src_value[0]) || _CONVERT_FN(src_value[1]);
+#else
+   *(_TYPE2 *)dst = _CONVERT_FN(src_value[0]);
+#endif
 #  endif
 #else
 #  if @is_complex2@


Regards,

Sebastian
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.7.0b2 release

2012-10-01 Thread Sebastian Berg
On Mon, 2012-10-01 at 10:59 -0600, Charles R Harris wrote:
> 
> 
> On Mon, Oct 1, 2012 at 10:09 AM, Sebastian Berg
>  wrote:
> Hey,
> 
> About the imaginary part being ignored for all/any function...
> 
> 
> 
> 
> > The all method fails also.
> >
> > In [1]: a = zeros(5, complex)
> >
> > In [2]: a.imag = 1
> >
> > In [3]: a.all()
> > Out[3]: False
> >
> > Chuck
> >
> I believe this diff fixes the issue (also posted on Tracker),
> I doubt
> its the best way to fix the issue, but if anyone who knows
> this code
> wants to look into it its good to know what is broken. Note
> that also
> a.astype(bool) fails:
> 
> --- a/numpy/core/src/multiarray/lowlevel_strided_loops.c.src
> +++ b/numpy/core/src/multiarray/lowlevel_strided_loops.c.src
> @@ -811,9 +811,17 @@ static void
>  dst_value[0] = _CONVERT_FN(src_value[0]);
>  dst_value[1] = _CONVERT_FN(src_value[1]);
>  #  elif !@aligned@
> -dst_value = _CONVERT_FN(src_value[0]);
> +#if @is_bool2@
> +   dst_value = _CONVERT_FN(src_value[0]) ||
> _CONVERT_FN(src_value[1]);
> +#else
> +   dst_value = _CONVERT_FN(src_value[0]);
> +#endif
>  #  else
> -*(_TYPE2 *)dst = _CONVERT_FN(src_value[0]);
> +#if @is_bool2@
> +   *(_TYPE2 *)dst = _CONVERT_FN(src_value[0]) ||
> _CONVERT_FN(src_value[1]);
> +#else
> +   *(_TYPE2 *)dst = _CONVERT_FN(src_value[0]);
> +#endif
>  #  endif
>  #else
>  #  if @is_complex2@
> 
> 
>  
> Hey, I think you are onto something. I'm I correct in assuming the
> this also fixes astype(bool)? In any case, it would be nice if you
> submitted this as an ordinary PR and added some tests. That would be
> the fastest route to getting it reviewed and committed.
> 
Yes, it fixes that too of course, just noted it because then it may be
obvious why to change the code there for you guys knowing numpy well :).
I will do a PR, then can see if this is the best fix.

> Chuck
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303)

2012-10-11 Thread Sebastian Berg
On Wed, 2012-10-10 at 12:55 -0400, Cera, Tim wrote:
> On Wed, Oct 10, 2012 at 1:58 AM, Travis E.
> Oliphant  wrote:
> I'm not sure what to make of no comments on this PR. This
> seems like a useful addition. @timcera are you still
> interested in having this PR merged?
> 

> 
> Internally I implemented something like rolling window, but I don't
> return the windows.  Instead the contents of the windows are used for
> calculation of each windows 'central' cell in the results array.
> 
Rolling window can help with everything but the I guess typical case of
neighbor(..., mode='same', pad=None), where the idea must fail since not
all neighborhoods are the same size. It is probably quite a bit faster
to replace the shapeintersect with it in those cases, but not sure if it
is worth it.
What I personally do not quite like is that for the case of the function
being a ufunc, it not being (..., mode='same', pad=None) and
weights=np.ones(...), the rolling window approach will be much faster as
it can be fully vectorized with the new numpy versions. But thats just
me and the documentation would have a "see also", so I guess users
should be able to figure that out if they need the speed. Plus I don't
see an elegant way to find the neighborhoods for (mode='same', pad=None)
so having a function to help with it should be nice.
On a general note, maybe it would make sense to allow a binary mask
instead of the np.isfinite check and would it be better if the returned
arrays are not raveled unless there is this mask? Also there is a **
missing for the kwargs in the function call.
> 
> After seeing the rolling window function I thought it might be nice to
> bring that out into a callable function, so that similar functionality
> would be available.  That particular function isn't useful to me
> directly, but perhaps others? 
> 
> 
> Kindest regards,
> Tim
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] copy/deepcopy pull request, Travis build failure

2012-10-25 Thread Sebastian Berg
On Thu, 2012-10-25 at 17:48 -0400, David Warde-Farley wrote:
> I submitted a pull request and one of the Travis builds is failing:
> 
> https://travis-ci.org/#!/numpy/numpy/jobs/2933551
> 
Don't worry about that failure on Travis... It happens randomly on at
the moment and its unrelated to anything you are doing.
I am not sure though you can change behavior like that since you also
change the default behavior of the `.copy()` method and someone might
rely on that? Maybe making it specific to the copy model would make it
unlikely that anyone relies on the default, it would seem sensible that
copy.copy(array) does basically the same as np.copy(array) and not as
the method .copy, though ideally maybe the differences could be removed
in the long run I guess.

Regards,

Sebastian

> Given my changes,
> 
> 
> https://github.com/dwf/numpy/commit/4c88fdafc003397d6879f81bf59f68adeeb59f2b
> 
> I don't see how the masked array module (responsible for the failing
> test) could possibly be affected in this manner (RuntimeWarning raised
> by encountering an invalid value in power()).
> 
> Moreover, I cannot reproduce it after setting NPY_SEPARATE_COMPILATION
> on my own machine (whatever that does -- I'm not quite clear).
> 
> Does anyone have any insight into what's going on here? Can anyone
> else reproduce it outside of Travis?
> 
> Thanks,
> David
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] copy/deepcopy pull request, Travis build failure

2012-10-26 Thread Sebastian Berg
Hey

On Thu, 2012-10-25 at 19:27 -0600, Charles R Harris wrote:
> Hi Sebastian,
> 

> 
> You seem to becoming more involved in the code maintenance. Would you
> be interested in gaining commit rights at some point?
> 
Maybe, but honestly I am not sure if I will keep following numpy very
closely in the future.

Regards,

Sebastian

> Chuck 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Regression in mpl: AttributeError: incompatible shape for a non-contiguous array

2012-10-29 Thread Sebastian Berg
Hey,

On Mon, 2012-10-29 at 09:54 -0400, Benjamin Root wrote:
> This error started showing up in the test suite for mpl when using
> numpy master.
> 
> AttributeError: incompatible shape for a non-contiguous array
> 
> The tracebacks all point back to various code points where we are
> trying to set the shape of an array, e.g.,
> 
> offsets.shape = (-1, 2)
> 
Could you give a hint what these arrays history (how it was created) and
maybe .shape/.strides is? Sounds like the array is not contiguous when
it is expected to be, or the attribute setting itself fails in some
corner cases on master?

Regards,

Sebastian

> Those lines haven't changed in a couple of years, and was intended to
> be done this way to raise an error when reshaping would result in a
> copy (since we needed to use the original in those places).  I don't
> know how these arrays have become non-contiguous, so I am wondering if
> there was some sort of attribute that got screwed up somewhere (maybe
> with views?)
> 
> Cheers!
> Ben Root
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Regression in mpl: AttributeError: incompatible shape for a non-contiguous array

2012-10-29 Thread Sebastian Berg
Thanks a lot! I feared it would have something to do with those changes,
turns out I tried with the wrong numpy version. But at least its not in
1.7. for sure so nothing to worry about :).

I will have a look at it.

On Mon, 2012-10-29 at 10:15 -0500, Patrick Marsh wrote:
> I've tracked down the problem to this
> commit: 
> https://github.com/numpy/numpy/commit/c48156dfdc408f0a1e59ef54ac490cccbd6b8d73
> 
> 
> 
> 
> 
> 
> Patrick.Marsh@buxton numpy> git bisect good
> c48156dfdc408f0a1e59ef54ac490cccbd6b8d73 is the first bad commit
> commit c48156dfdc408f0a1e59ef54ac490cccbd6b8d73
> Author: Sebastian Berg 
> Date:   Sun Oct 21 18:50:28 2012 +0200
> 
> 
> API: Change Flags Updateing to allow C-/F-contiguous arrays
> 
> This changes UpdateFlags to ignore 1-dimensional axis when
> setting C-/F-contiguous flags. Updates both flags always now.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ---
> Patrick Marsh
> Ph.D. Candidate / Liaison to the HWT
> School of Meteorology / University of Oklahoma
> Cooperative Institute for Mesoscale Meteorological Studies
> National Severe Storms Laboratory
> http://www.patricktmarsh.com
> 
> 
> 
> On Mon, Oct 29, 2012 at 10:04 AM, Patrick Marsh
>  wrote:
> Turns out it isn't the commit I thought it was. I'm currently
> going through a git bisect to track down the actual commit
> that introduced this bug. I'll post back when I've found it.
> 
> 
> 
> 
> PTM
> ---
> Patrick Marsh
> Ph.D. Candidate / Liaison to the HWT
> School of Meteorology / University of Oklahoma
> Cooperative Institute for Mesoscale Meteorological Studies
> National Severe Storms Laboratory
> http://www.patricktmarsh.com
>     
>     
> 
> On Mon, Oct 29, 2012 at 9:43 AM, Benjamin Root
>  wrote:
> 
> 
> 
> On Mon, Oct 29, 2012 at 10:33 AM, Sebastian Berg
>  wrote:
> Hey,
> 
> On Mon, 2012-10-29 at 09:54 -0400, Benjamin
> Root wrote:
> > This error started showing up in the test
> suite for mpl when using
> > numpy master.
> >
> > AttributeError: incompatible shape for a
> non-contiguous array
> >
> > The tracebacks all point back to various
> code points where we are
> > trying to set the shape of an array, e.g.,
> >
> > offsets.shape = (-1, 2)
> >
> 
> Could you give a hint what these arrays
> history (how it was created) and
> maybe .shape/.strides is? Sounds like the
> array is not contiguous when
> it is expected to be, or the attribute setting
> itself fails in some
> corner cases on master?
> 
> Regards,
> 
> Sebastian
> 
> 
> 
> The original reporter of the bug dug into the commit
> list and suspects it was this one:
> 
> 
> https://github.com/numpy/numpy/commit/02ebf8b3e7674a6b8a06636feaa6c761fcdf4e2d
> 
> However, it might be earlier than that (he is
> currently doing a clean rebuild to make sure).
> 
> As for the history:
> 
> offsets = np.asanyarray(offsets)
> offsets.shape = (-1, 2) # Make
> it Nx2
> 
> Where "offsets" comes in from (possibly) user-supplied
> data.  Nothing really all that special.  I will see if
> I can get stride information.
> 
> Ben Root
> 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> 

Re: [Numpy-discussion] using numpy.argmax to index into another array

2012-10-31 Thread Sebastian Berg
Hey,

On Wed, 2012-10-31 at 20:22 -0400, David Warde-Farley wrote:
> On Wed, Oct 31, 2012 at 7:23 PM, Moroney, Catherine M (388D)
>  wrote:
> > Hello Everybody,
> >
> > I have the following problem that I would be interested in finding an 
> > easy/elegant solution to.
> > I've got it working, but my solution is exceedingly clunky and I'm sure 
> > that there must be a
> > better way.
> >
> > Here is the (boiled-down) problem:
> >
> > I have 2 different 3-d array, shaped (4,4,4), "a" and "b"
> >
> > I can find the max values and location of those max values in "a" in the 
> > 0-th dimension
> > using max and argmax resulting in a 4x4 matrix.  So far, very easy.
> >
> > I then want to find the values in "b" that correspond to the maximum values 
> > in a.
> > This is where I got stuck.
> >
> > Below find the sample code I used (pretty clumsy stuff ...)
> > Can somebody show a better (less clumsy) way of finding bmax
> > in the code excerpt below?
> 

Its a bit off-label usage (as the documentation says), so maybe its
slower for larger arrays, but as long as you choose along axis 0, you
could also use:

np.choose(amax, b)

> Hi Catherine,
> 
> It's not a huge improvement, but one simple thing is that you can
> generate those index arrays easily and avoid the pesky reshaping at
> the end like so:
> 
> In [27]: idx2, idx3 = numpy.mgrid[0:4, 0:4]
> 
> In [28]: b[amax, idx2, idx3]
> Out[28]:
> array([[12,  1, 13,  3],
>[ 8, 11,  9, 10],
>[11, 11,  1, 10],
>[ 5,  7,  9,  5]])
> 
> In [29]: b[amax, idx2, idx3] == bmax
> Out[29]:
> array([[ True,  True,  True,  True],
>[ True,  True,  True,  True],
>[ True,  True,  True,  True],
>[ True,  True,  True,  True]], dtype=bool)
> 
> numpy.ogrid would work here too, and will use a bit less memory. mgrid
> and ogrid are special objects from the numpy.lib.index_tricks module
> that generate, respectively, a "mesh grid" and an "open mesh grid"
> (where the unchanging dimensions are of length 1) when indexed like
> so. In the latter case, broadcasting takes effect when you index with
> them.
> 
> Cheers,
> 
> David
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] using numpy.argmax to index into another array

2012-10-31 Thread Sebastian Berg
On Thu, 2012-11-01 at 01:30 +0100, Sebastian Berg wrote:
> Hey,
> 
> On Wed, 2012-10-31 at 20:22 -0400, David Warde-Farley wrote:
> > On Wed, Oct 31, 2012 at 7:23 PM, Moroney, Catherine M (388D)
> >  wrote:
> > > Hello Everybody,
> > >
> > > I have the following problem that I would be interested in finding an 
> > > easy/elegant solution to.
> > > I've got it working, but my solution is exceedingly clunky and I'm sure 
> > > that there must be a
> > > better way.
> > >
> > > Here is the (boiled-down) problem:
> > >
> > > I have 2 different 3-d array, shaped (4,4,4), "a" and "b"
> > >
> > > I can find the max values and location of those max values in "a" in the 
> > > 0-th dimension
> > > using max and argmax resulting in a 4x4 matrix.  So far, very easy.
> > >
> > > I then want to find the values in "b" that correspond to the maximum 
> > > values in a.
> > > This is where I got stuck.
> > >
> > > Below find the sample code I used (pretty clumsy stuff ...)
> > > Can somebody show a better (less clumsy) way of finding bmax
> > > in the code excerpt below?
> > 
> 
> Its a bit off-label usage (as the documentation says), so maybe its
> slower for larger arrays, but as long as you choose along axis 0, you
> could also use:
> 
> np.choose(amax, b)
> 
Sorry, nevermind that. choose just is not made for this, it would only
works for very small arrays...

> > Hi Catherine,
> > 
> > It's not a huge improvement, but one simple thing is that you can
> > generate those index arrays easily and avoid the pesky reshaping at
> > the end like so:
> > 
> > In [27]: idx2, idx3 = numpy.mgrid[0:4, 0:4]
> > 
> > In [28]: b[amax, idx2, idx3]
> > Out[28]:
> > array([[12,  1, 13,  3],
> >[ 8, 11,  9, 10],
> >[11, 11,  1, 10],
> >[ 5,  7,  9,  5]])
> > 
> > In [29]: b[amax, idx2, idx3] == bmax
> > Out[29]:
> > array([[ True,  True,  True,  True],
> >[ True,  True,  True,  True],
> >[ True,  True,  True,  True],
> >[ True,  True,  True,  True]], dtype=bool)
> > 
> > numpy.ogrid would work here too, and will use a bit less memory. mgrid
> > and ogrid are special objects from the numpy.lib.index_tricks module
> > that generate, respectively, a "mesh grid" and an "open mesh grid"
> > (where the unchanging dimensions are of length 1) when indexed like
> > so. In the latter case, broadcasting takes effect when you index with
> > them.
> > 
> > Cheers,
> > 
> > David
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Sebastian Berg
Hey,

On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote:
> Well, hinted by what Fabien said, I looked at the C level dot function.
> Quite verbose!
> 
> But starting line 757, we can see that it shouldn't be too much work
> to fix that bug (well there is even a comment there that states just
> that)
> https://github.com/numpy/numpy/blob/master/numpy/core/blasdot/_dotblas.c#L757
> I now think that should be the cleanest.
> 
> This would only work for gemm though. I don't know what the benefit is
> for gemv for instance, but we should make that kind of changes
> everywhere we can.
> The evil PyArray_Copy is there twice and that's what we want to get rid of.
> 
> I'm not sure, but it looks to me that removing the copy and doing the
> following would do the work:
> Order = CblasRowMajor;
> Trans1 = CblasNoTrans;
> Trans2 = CblasNoTrans;
> if (!PyArray_ISCONTIGUOUS(ap1)) {
> Trans1 = CblasTrans;
> }
> if (!PyArray_ISCONTIGUOUS(ap2)) {
> Trans2 = CblasTrans;
> }
> might be too easy to be true.
> 

Sounds nice, though don't forget that the array may also be neither C-
or F-Contiguous, in which case you need a copy in any case. So it would
probably be more like:

if (PyArray_IS_C_CONTIGUOUS(ap1)) {
Trans1 = CblasNoTrans;
}
else if (PyArray_IS_F_CONTIGUOUS(ap1)) {
Trans1 = CblasTrans;
}
else {
Trans1 = CblasNoTrans;
PyObject *new = PyArray_Copy(ap1);
Py_DECREF(ap1);
ap1 = (PyArrayObject *)new;
}

Regards,

Sebastian

> 
> 
> On Thu, Nov 8, 2012 at 12:06 PM, Nicolas SCHEFFER
>  wrote:
> > I've made the necessary changes to get the proper order for the output 
> > array.
> > Also, a pass of pep8 and some tests (fixmes are in failing tests)
> > http://pastebin.com/M8TfbURi
> >
> > -n
> >
> > On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER
> >  wrote:
> >> Thanks for all the responses folks. This is indeed a nice problem to solve.
> >>
> >> Few points:
> >> I. Change the order from 'F' to 'C': I'll look into it.
> >> II. Integration with scipy / numpy: opinions are diverging here.
> >> Let's wait a bit to get more responses on what people think.
> >> One thing though: I'd need the same functionality as get_blas_funcs in 
> >> numpy.
> >> Since numpy does not require lapack, what functions can I get?
> >> III. Complex arrays
> >> I unfortunately don't have enough knowledge here. If someone could
> >> propose a fix, that'd be great.
> >> IV. C
> >> Writing this in C sounds like a good idea. I'm not sure I'd be the
> >> right person to this though.
> >> V. Patch in numpy
> >> I'd love to do that and learn to do it as a byproduct.
> >> Let's make sure we agree this can go in numpy first and that all FIXME
> >> can be fixed.
> >> Although I guess we can resolve fixmes using git.
> >>
> >> Let me know how you'd like to proceed,
> >>
> >> Thanks!
> >>
> >> FIXMEs:
> >> - Fix for ndim != 2
> >> - Fix for dtype == np.complex*
> >> - Fix order of output array
> >>
> >> On Thu, Nov 8, 2012 at 9:42 AM, Frédéric Bastien  wrote:
> >>> Hi,
> >>>
> >>> I also think it should go into numpy.dot and that the output order should
> >>> not be changed.
> >>>
> >>> A new point, what about the additional overhead for small ndarray? To 
> >>> remove
> >>> this, I would suggest to put this code into the C function that do the
> >>> actual work (at least, from memory it is a c function, not a python one).
> >>>
> >>> HTH
> >>>
> >>> Fred
> >>>
> >>>
> >>>
> >>> On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz  
> >>> wrote:
> 
>  On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau 
>  wrote:
> >
> > On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn
> >  wrote:
> > > On 11/08/2012 01:07 PM, Gael Varoquaux wrote:
> > >> On Thu, Nov 08, 2012 at 11:28:21AM +, Nathaniel Smith wrote:
> > >>> I think everyone would be very happy to see numpy.dot modified to do
> > >>> this automatically. But adding a scipy.dot IMHO would be fixing
> > >>> things
> > >>> in the wrong place and just create extra confusion.
> > >>
> > >> I am not sure I agree: numpy is often compiled without lapack 
> > >> support,
> > >> as
> > >> it is not necessary. On the other hand scipy is always compiled with
> > >> lapack. Thus this makes more sens in scipy.
> > >
> > > Well, numpy.dot already contains multiple fallback cases for when it 
> > > is
> > > compiled with BLAS and not. So I'm +1 on just making this an
> > > improvement
> > > on numpy.dot. I don't think there's a time when you would not want to
> > > use this (provided the output order issue is fixed), and it doesn't
> > > make
> > > sense to not have old codes take advantage of the speed improvement.
> >
> > Indeed, there is no reason not to make this available in NumPy.
> >
> > Nicolas, can you prepare a patch for numpy ?
> 
> 
>  +1, I agree, this should be a fix in numpy, not scipy.
> 
>  Be Well
>  Anthony
> 
> >
> 

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Sebastian Berg
On Fri, 2012-11-09 at 00:24 +0100, Sebastian Berg wrote:
> Hey,
> 
> On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote:
> > Well, hinted by what Fabien said, I looked at the C level dot function.
> > Quite verbose!
> > 
> > But starting line 757, we can see that it shouldn't be too much work
> > to fix that bug (well there is even a comment there that states just
> > that)
> > https://github.com/numpy/numpy/blob/master/numpy/core/blasdot/_dotblas.c#L757
> > I now think that should be the cleanest.
> > 
> > This would only work for gemm though. I don't know what the benefit is
> > for gemv for instance, but we should make that kind of changes
> > everywhere we can.
> > The evil PyArray_Copy is there twice and that's what we want to get rid of.
> > 
> > I'm not sure, but it looks to me that removing the copy and doing the
> > following would do the work:
> > Order = CblasRowMajor;
> > Trans1 = CblasNoTrans;
> > Trans2 = CblasNoTrans;
> > if (!PyArray_ISCONTIGUOUS(ap1)) {
> > Trans1 = CblasTrans;
> > }
> > if (!PyArray_ISCONTIGUOUS(ap2)) {
> > Trans2 = CblasTrans;
> > }
> > might be too easy to be true.
> > 
> 
> Sounds nice, though don't forget that the array may also be neither C-
> or F-Contiguous, in which case you need a copy in any case. So it would
> probably be more like:
> 
> if (PyArray_IS_C_CONTIGUOUS(ap1)) {
> Trans1 = CblasNoTrans;
> }
> else if (PyArray_IS_F_CONTIGUOUS(ap1)) {
> Trans1 = CblasTrans;
> }
> else {
> Trans1 = CblasNoTrans;
> PyObject *new = PyArray_Copy(ap1);
> Py_DECREF(ap1);
> ap1 = (PyArrayObject *)new;
> }
> 

Well, of course I forgot error checking there, and maybe you need to set
some of the other parameters differently, but it looks like its probably
that easy, and I am sure everyone will welcome a PR with such changes.

> Regards,
> 
> Sebastian
> 
> > 
> > 
> > On Thu, Nov 8, 2012 at 12:06 PM, Nicolas SCHEFFER
> >  wrote:
> > > I've made the necessary changes to get the proper order for the output 
> > > array.
> > > Also, a pass of pep8 and some tests (fixmes are in failing tests)
> > > http://pastebin.com/M8TfbURi
> > >
> > > -n
> > >
> > > On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER
> > >  wrote:
> > >> Thanks for all the responses folks. This is indeed a nice problem to 
> > >> solve.
> > >>
> > >> Few points:
> > >> I. Change the order from 'F' to 'C': I'll look into it.
> > >> II. Integration with scipy / numpy: opinions are diverging here.
> > >> Let's wait a bit to get more responses on what people think.
> > >> One thing though: I'd need the same functionality as get_blas_funcs in 
> > >> numpy.
> > >> Since numpy does not require lapack, what functions can I get?
> > >> III. Complex arrays
> > >> I unfortunately don't have enough knowledge here. If someone could
> > >> propose a fix, that'd be great.
> > >> IV. C
> > >> Writing this in C sounds like a good idea. I'm not sure I'd be the
> > >> right person to this though.
> > >> V. Patch in numpy
> > >> I'd love to do that and learn to do it as a byproduct.
> > >> Let's make sure we agree this can go in numpy first and that all FIXME
> > >> can be fixed.
> > >> Although I guess we can resolve fixmes using git.
> > >>
> > >> Let me know how you'd like to proceed,
> > >>
> > >> Thanks!
> > >>
> > >> FIXMEs:
> > >> - Fix for ndim != 2
> > >> - Fix for dtype == np.complex*
> > >> - Fix order of output array
> > >>
> > >> On Thu, Nov 8, 2012 at 9:42 AM, Frédéric Bastien  wrote:
> > >>> Hi,
> > >>>
> > >>> I also think it should go into numpy.dot and that the output order 
> > >>> should
> > >>> not be changed.
> > >>>
> > >>> A new point, what about the additional overhead for small ndarray? To 
> > >>> remove
> > >>> this, I would suggest to put this code into the C function that do the
> > >>> actual work (at least, from memory it is a c function, not a python 
> > >>> one).
> > >>>
> > >>> HTH
> > >>>
> > >>> Fred
> > 

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Sebastian Berg
On Fri, 2012-11-09 at 14:52 -0800, Nicolas SCHEFFER wrote:
> Ok: comparing apples to apples. I'm clueless on my observations and
> would need input from you guys.
> 
> Using ATLAS 3.10, numpy with and without my changes, I'm getting these
> timings and comparisons.
> 
> #
> #I. Generate matrices using regular dot:
> #
> big = np.array(np.random.randn(2000, 2000), 'f');
> np.savez('out', big=big, none=big.dot(big), both=big.T.dot(big.T),
> left=big.T.dot(big), right=big.dot(big.T))"
> 
> #
> #II. Timings with regular dot
> #
> In [3]: %timeit np.dot(big, big)
> 10 loops, best of 3: 138 ms per loop
> 
> In [4]: %timeit np.dot(big, big.T)
> 10 loops, best of 3: 166 ms per loop
> 
> In [5]: %timeit np.dot(big.T, big.T)
> 10 loops, best of 3: 193 ms per loop
> 
> In [6]: %timeit np.dot(big.T, big)
> 10 loops, best of 3: 165 ms per loop
> 
> #
> #III. I load these arrays and time again with the "fast" dot
> #
> In [21]: %timeit np.dot(big, big)
> 10 loops, best of 3: 138 ms per loop
> 
> In [22]: %timeit np.dot(big.T, big)
> 10 loops, best of 3: 104 ms per loop
> 
> In [23]: %timeit np.dot(big.T, big.T)
> 10 loops, best of 3: 138 ms per loop
> 
> In [24]: %timeit np.dot(big, big.T)
> 10 loops, best of 3: 102 ms per loop
> 
> 1. A'.A': great!
> 2. A.A' becomes faster than A.A !?!
> 
> #
> #IV. MSE on differences
> #
> In [25]: np.sqrt(((arr['none'] - none)**2).sum())
> Out[25]: 0.0
> 
> In [26]: np.sqrt(((arr['both'] - both)**2).sum())
> Out[26]: 0.0
> 
> In [27]: np.sqrt(((arr['left'] - left)**2).sum())
> Out[27]: 0.015900515
> 
> In [28]: np.sqrt(((arr['right'] - right)**2).sum())
> Out[28]: 0.015331409
> 
> #
> # CCl
> #
> While the MSE are small, I'm wondering whether:
> - It's a bug: it should be exactly the same
> - It's a feature: BLAS is taking shortcuts when you have A.A'. The
> difference is not significant. Quick: PR that asap!
> 

I think its a feature because memory can be accessed in a more regular
fashion in the case of one array being Fortran and the other C
contiguous. IE. if its A.B' then fetching the first row is perfect,
since its all behind each other in memory (C-order), while it has to be
multiplied with the first column from B' which, due to being transposed,
is fortran order and the column thus also one chunk in memory.

> I don't have enough expertise to answer that...
> 
> Thanks much!
> 
> -nicolas
> On Fri, Nov 9, 2012 at 2:13 PM, Nicolas SCHEFFER
>  wrote:
> > I too encourage users to use scipy.linalg for speed and robustness
> > (hence calling this scipy.dot), but it just brings so much confusion!
> > When using the scipy + numpy ecosystem, you'd almost want everything
> > be done with scipy so that you get the best implementation in all
> > cases: scipy.zeros(), scipy.array(), scipy.dot(), scipy.linalg.inv().
> >
> > Anyway this is indeed for another thread, the confusion we'd like to
> > fix here is that users shouldn't have to understand the C/F contiguous
> > concepts to get the maximum speed for np.dot()
> >
> > To summarize:
> > - The python snippet I posted is still valid and can speed up your
> > code if you can change all your dot() calls.
> > - The change in dotblas.c is a bit more problematic because it's very
> > core. I'm having issues right now to replicate the timings, I've got
> > better timing for a.dot(a.T) than for a.dot(a). There might be a bug.
> >
> > It's a pain to test because I cannot do the test in a single python session.
> > I'm going to try to integrate most of your suggestions, I cannot
> > guarantee I'll have time to do them all though.
> >
> > -nicolas
> > On Fri, Nov 9, 2012 at 8:56 AM, Nathaniel Smith  wrote:
> >> On Fri, Nov 9, 2012 at 4:25 PM, Gael Varoquaux
> >>  wrote:
> >>> On Fri, Nov 09, 2012 at 03:12:42PM +, Nathaniel Smith wrote:
>  But what if someone compiles numpy against an optimized blas (mkl,
>  say) and then compiles SciPy against the reference blas? What do you
>  do then!? ;-)
> >>>
> >>> This could happen. But the converse happens very often. What happens is
> >>> that users (eg on shared computing resource) ask for a scientific python
> >>> environment. The administrator than installs the package starting from
> >>> the most basic one, to the most advanced one, thus starting with numpy
> >>> that can very well build without any external blas. When he gets to scipy
> >>> he hits the problem that the build system does not detect properly the
> >>> blas, and he solves that problem.
> >>>
> >>> Also, it used to be that on the major linux distributions, numpy would not
> >>> be build with an optimize lapack because numpy was in the 'base' set of
> >>> packages, but not lapack. On the contrary, scipy being in the 'contrib'
> >>> set, it could depend on lapack. I just checked, and this has been fixed
> >>> in the major distributions (Fedora, Debian, Ubuntu).
> >>>
> >>> Now we can discuss with such problems should not happen, and put the
> >>> blame on the users/administrators, the fact is that they happen often. I
> >>>

Re: [Numpy-discussion] fix random.choice for 1.7?

2012-11-12 Thread Sebastian Berg
Hey,

On Mon, 2012-11-12 at 08:48 -0500, Alan G Isaac wrote:
> On 11/9/2012 12:21 PM, Nathaniel Smith wrote:
> > you might want to double-check that the
> > np.random.choice in 1.7 actually*does*  give an error if the input
> > array is not 1-d
> 
> 
> Any idea where I can look at the code?
> I browsed github after failing to find
> a productive search string, but failed
> to find it.
> 

its here:
https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L919

Sounds like it should be pretty simple to add axis=None which would
change the current behavior very little, it would stop give an error
anymore for none 1-d arrays though.

Regards,

Sebastian


> Which remind me: it would be nice if the
> docs linked to the source.
> 
> Thanks,
> Alan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fix random.choice for 1.7?

2012-11-12 Thread Sebastian Berg
On Mon, 2012-11-12 at 17:52 +0100, Nathaniel Smith wrote:
> On Mon, Nov 12, 2012 at 5:31 PM, Alan G Isaac  wrote:
> > In a comment on the issue https://github.com/numpy/numpy/issues/2724 
> > Sebastian notes:
> > "it could also be reasonable to have size=None as default and have it 
> > return a scalar/the given axes removed in that case. That would be a real 
> > change
> > in functionality unfortunately, but it would make sense for similarity to 
> > import random; random.choice mostly."
> >
> > If this is under serious consider, then perhaps
> > random.choice should not be in 1.7 unless a decision can
> > be quickly made about the default for `size`.
> > (Allowing an axis argument can however be postponed.)
> >
> > I am inclined to think that Sebastian's suggestion
> > is correct.
> 
> For anyone else trying to follow, here's the current function:
>   
> http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice.html
> 
> I'm afraid I don't understand what Sebastian is suggesting should
> happen by default. size=None doesn't have any intuitive meaning to me,
> and I don't know what "a scalar/the given axes removed" means.
> 
None is a little awkward I agree (but I don't think there is something
better), but basically what I meant is this:

>>> random.choice([1, 1])
1
>>> np.random.choice([1, 2])
array([1]) # its 1-D not 0-D.

So instead of taking a sequence of length 1, take an element as default.

> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fix random.choice for 1.7?

2012-11-12 Thread Sebastian Berg
On Mon, 2012-11-12 at 18:36 -0500, Alan G Isaac wrote:
> On 11/12/2012 5:46 PM, Nathaniel Smith wrote:
> > Want to make a pull request?
> 
> 
> Well, I'd be happy to help Sebastien to change the
> code, but I'm not a git user.
> 

I have created a pull request, but tests are still needed... If you like
it would be very nice if you can check it and maybe also write some
tests. Git is relatively simple (and likely worth to learn) but even
otherwise posting code would be nice.

> And I'd have some questions.  E.g., with `size=None`,
> couldn't we just call Python's random.choice?  And for
> sampling without replacement, wouldn't it be faster to
> just call Python's random.sample (rather than
> implement this as currently done)?
> 

I don't think the difference should be really noticeable. But even if, I
doubt its worth special casing. If someone cares a lot about speed they
probably should not use it to get single values anyway. Also for this
case of random numbers, it would be a bad idea since you would use a
different random number generator and a different seed!

Regards,

Sebastian

> Alan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fix random.choice for 1.7?

2012-11-13 Thread Sebastian Berg
On Mon, 2012-11-12 at 22:44 -0500, Alan G Isaac wrote:
> On 11/12/2012 8:18 PM, Sebastian Berg wrote:
> > I have created a pull request
> 
> 
> This is still a bit different than I thought you intended.
> With `size=None` we don't get an element,
> but rather a 0d array.
> 
You are right, it should not be a 0d array. I overlooked that tuple()
does not give the same as None at least least for the random functions.

> I thought the idea was to return an element in this case?
> 
> Alan
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Crash using "reshape"...

2012-11-21 Thread Sebastian Berg
Hey,

On Wed, 2012-11-21 at 01:12 -0800, Terry J. Ligocki wrote:
> I am having a problem with "reshape" crashing:
> > python
> Python 2.6.4 (r264:75706, Jan 16 2010, 21:11:47) 
> [GCC 4.3.2] on linux2
> Type "help", "copyright", "credits" or "license" for more
> information.
> >>> import numpy
> >>> numpy.version.version
> '1.6.2'
> >>> npData = numpy.ones([701,701,7899],dtype=numpy.dtype('b'))
> >>> npDataSubset =
> npData[[slice(0,700),slice(0,700),slice(0,5000)]]
> >>> npDataOutput =
> npDataSubset.reshape([700*700*5000],order='F')
> Segmentation fault
> If I change the "5000" to a "4000", everything is fine.  I'm not
> running out of memory - my system had 48 GB of memory and nothing else
> is using a significant portion of this memory.
> 
> Note:  700x700x4000 = 1,960,000,000 < 2^31 and 700x700x5000 =
> 245000 > 2^31.  I suspect somewhere in the underlying code there
> is a signed 32-bit integer being used for an index/pointer offset
> (this is running on a 64-bit machine).

yes, int is used for npy_intp in one occasione. I have created a PR that
fixes the issue:
https://github.com/numpy/numpy/pull/2754

Regards,

Sebastian

> I did some searching of the archives and didn't find a match for this
> problem.  Thank you for any and all help!
> 
> Terry J. (Ligocki, tjligo...@lbl.gov)
> 
> __
> Wishes
> 
> When you wish upon a falling star, your dreams can come true.
> Unless it's really a meteorite hurtling to the Earth which will
> destroy all life.
> Then you're pretty much hosed no matter what you wish for.
> Unless it's death by meteor.
> 
> (Despair, Inc.)
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] the mean, var, std of empty arrays

2012-11-22 Thread Sebastian Berg
On Wed, 2012-11-21 at 22:58 -0500, josef.p...@gmail.com wrote:
> On Wed, Nov 21, 2012 at 10:35 PM, Charles R Harris
>  wrote:
> >
> >
> > On Wed, Nov 21, 2012 at 7:45 PM,  wrote:
> >>
> >> On Wed, Nov 21, 2012 at 9:22 PM, Olivier Delalleau  wrote:
> >> > Current behavior looks sensible to me. I personally would prefer no
> >> > warning
> >> > but I think it makes sense to have one as it can be helpful to detect
> >> > issues
> >> > faster.
> >>
> >> I agree that nan should be the correct answer.
> >> (I gave up trying to define a default for 0/0 in scipy.stats ttests.)
> >>
> >> some funnier cases
> >>
> >> >>> np.var([1], ddof=1)
> >> 0.0
> >
> >
> > This one is a nan in development.
> >
> >>
> >> >>> np.var([1], ddof=5)
> >> -0
> >> >>> np.var([1,2], ddof=5)
> >> -0.1
> >> >>> np.std([1,2], ddof=5)
> >> nan
> >>
> >
> > These still do this. Also
> >
> > In [10]: var([], ddof=1)
> > Out[10]: -0
> >
> > Which suggests that the nan is pretty much an accidental byproduct of
> > division by zero. I think it might make sense to have a definite policy for
> > these corner cases.
> 
> It would also be consistent with the usual pattern to raise a
> ValueError on this. ddof too large, size too small.
> It wouldn't be the case that for some columns or rows we get valid
> answers in this case, as long as we don't allow for missing values.
> 

It seems to me that nan is the reasonable result for these operations
(reduce like operations that do not have an identity). Though actually
reduce operations without an identity throw a ValueError (ie.
`np.minimum.reduce([])`), but then mean/std/var seem special enough to
be different from other reduce operations (for example their result is
always floating point). As for usability I think for example when
plotting errorbars using std, it would be rather annoying to get a
ValueError, so if anything the reduce machinery could give more special
results for empty floating point reductions.

In any case the warning should be clearer and for too large ddof's I
would say it should return nan+Warning as well.

Sebastian

> 
> quick check with np.ma
> 
> looks correct except when delegating to numpy ?
> 
> >>> s = np.ma.var(np.ma.masked_invalid([[1.,2],[1,np.nan]]), ddof=5, axis=0)
> >>> s
> masked_array(data = [-- --],
>  mask = [ True  True],
>fill_value = 1e+20)
> 
> >>> s = np.ma.var(np.ma.masked_invalid([[1.,2],[1,np.nan]]), ddof=1, axis=0)
> >>> s
> masked_array(data = [0.0 --],
>  mask = [False  True],
>fill_value = 1e+20)
> 
> >>> s = np.ma.std([1,2], ddof=5)
> >>> s
> masked
> >>> type(s)
> 
> 
> >>> np.ma.var([1,2], ddof=5)
> -0.1
> 
> 
> Josef
> 
> >
> > 
> >
> > Chuck
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] the mean, var, std of empty arrays

2012-11-22 Thread Sebastian Berg
On Thu, 2012-11-22 at 16:05 +0100, Daπid wrote:
> On Thu, Nov 22, 2012 at 3:54 PM,   wrote:
> > Why don't operations on empty arrays not return empty arrays?
> 
> Because functions like mean or std are expected to return a scalar.
> Functions that are piecewiese can (and should) return an empty array,
> but not the mean.

I agree, this makes sense, note that:

In [2]: a = np.empty((5,0))

In [3]: a.std(0)
Out[3]: array([], dtype=float64)

In [4]: a.std(1)
/usr/bin/ipython:1: RuntimeWarning: invalid value encountered in divide
  #!/usr/bin/env python
Out[4]: array([ nan,  nan,  nan,  nan,  nan])

However you are reducing, and with reducing you expect exactly 1 scalar
result (along that dimension).

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] When are 0-d arrays writeable?

2012-11-23 Thread Sebastian Berg
On Fri, 2012-11-23 at 10:49 +, Nathaniel Smith wrote:
> On 23 Nov 2012 03:34, "Charles R Harris" 
> wrote:
> >
> > Examples,
> >
> > In [13]: ones(()).flags.writeable
> > Out[13]: True
> >
> > In [14]: (-ones(())).flags.writeable
> > Out[14]: False
> >
> > In [15]: (-1*ones(())).flags.writeable
> > Out[15]: False
> >
> > In [16]: (1 + ones(())).flags.writeable
> > Out[16]: False
> >
> > In [17]: array(1)
> > Out[17]: array(1)
> >
> > In [18]: array(1).shape
> > Out[18]: ()
> >
> > In [19]: array(1).flags.writeable
> > Out[19]: True
> 
> Looks like a bug in the ufunc output value setup code or something?
> 
It might be possible to rethink when (or if) to convert 0-d array to a
scalar. However, at the moment as far as I understand many functions
generally do not return 0-d arrays but scalars. Which makes sense
because mostly we would rather use scalars then 0-d arrays as they are
closer to typical python (hashable and subclasses of python types).

Of course the way this is done is not aware of what is put in (scalar
vs. 0-d array), since all input is converted to an array normally, which
means that most (all?) functions either return 0-d arrays or scalars and
are never aware if the original input was a scalar or an array. Maybe
there could be a np.asarray_or_scalar or such so that its easier to give
the same output type as the original input type?

Regards,

Sebastian

> -n
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] result shape from dot for 0d, 1d, 2d scalar

2012-11-27 Thread Sebastian Berg
On Mon, 2012-11-26 at 13:54 -0500, Skipper Seabold wrote:
> I discovered this because scipy.optimize.fmin_powell appears to
> squeeze 1d argmin to 0d unlike the other optimizers, but that's a
> different story.
> 
> 
> I would expect the 0d array to behave like the 1d array not the 2d as
> it does below. Thoughts? Maybe too big of a pain to change this
> behavior if indeed it's not desired, but I found it to be unexpected.

I don't quite understand why it is unexpected. A 1-d array is considered
a vector, a 0-d array is a scalar.

> [255]: np.version.full_version # same on 1.5.1
> [255]: '1.8.0.dev-8e0a542'
> 
> 
> [262]: arr = np.random.random((25,1))
> 
> 
> [~/]
> [263]: np.dot(arr, np.array([1.])).shape
> [263]: (25,)
> 
Matrix times vector = vector
> 
> [~/]
> [264]: np.dot(arr, np.array([[1.]])).shape
> [264]: (25, 1)
> 
Matrix times matrix = matrix
> 
> [~/]
> [265]: np.dot(arr, np.array(1.)).shape
> [265]: (25, 1)
> 
matrix times scalar = matrix (of same shape)
> 
> [~/]
> [271]: np.dot(arr.squeeze(), np.array(1.)).shape
> [271]: (25,)
> 
vector times scalar = vector (of same shape)
> 
> Huh? 0d arrays broadcast with dot?
> 
Remember a 0-d array is a scalar, there is no actual broadcasting
involved here. (except that vectors (1-d arrays) are special)

> [~]
> [279]: arr = np.random.random((25,2))
> 
> 
> [~/]
> [280]: np.dot(arr.squeeze(), np.array(2.)).shape
> [280]: (25, 2)
> 
> 
> Skipper
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] result shape from dot for 0d, 1d, 2d scalar

2012-11-28 Thread Sebastian Berg
On Wed, 2012-11-28 at 11:11 -0500, Skipper Seabold wrote:
> On Tue, Nov 27, 2012 at 11:16 AM, Sebastian Berg
>  wrote:
> On Mon, 2012-11-26 at 13:54 -0500, Skipper Seabold wrote:
> > I discovered this because scipy.optimize.fmin_powell appears
> to
> > squeeze 1d argmin to 0d unlike the other optimizers, but
> that's a
> > different story.
> >
> >
> > I would expect the 0d array to behave like the 1d array not
> the 2d as
> > it does below. Thoughts? Maybe too big of a pain to change
> this
> > behavior if indeed it's not desired, but I found it to be
> unexpected.
> 
> 
> I don't quite understand why it is unexpected. A 1-d array is
> considered
> a vector, a 0-d array is a scalar.
> 
> 
> 
> 
> When you put it like this I guess it makes sense. I don't encounter 0d
> arrays often and never think of a 0d array as truly a scalar like
> np.array(1.).item(). See below for my intuition.
>  
I think you should see them as a scalar though for mathematical
operations. The differences are fine in any case, and numpy typically
silently converts scalars -> 0d arrays on function calls and back again
to return scalars.


> 
> Maybe I'm misunderstanding. How do you mean there is no broadcasting?

Broadcasting adds dimensions to the start. To handle a vector like a
matrix product in dot, you do not always add the dimension at the start.
For matrix.vector the vector (N,) is much like (N,1). Also the result of
dot is not necessarily 2-d which it should be in your reasoning and if
you think about what happens in broadcasting terms.

>  They're clearly not conformable. Is vector.scalar specially defined
> (I have no idea)? I recall arguing once and submitting a patch such
> that np.linalg.det(5) and np.linalg.inv(5) should be well-defined and
> work but the counter-argument was that a scalar is not the same as a
> scalar matrix. This seems to be an exception.

I do not see an exception, in all cases there is no implicit
(broadcasting like) adding of extra dimensions (leading to an error in
most linear algebra functions if the input is not 2-d) which is good
since "explicit is better then implicit".

> Here, I guess, following that counterargument, I'd expected the scalar
> to fail in dot. I certainly don't expect a (N,2).scalar -> (N,2). Or

If you say dot is strictly a matrix product yes (though it should also
throw errors for vectors then). I think it simply is trying to be more
like the dot that I would write down on paper and thus special cases
vectors and scalars and this generalization only replaces what should
otherwise be an error in a matrix product!

Maybe a strict matrix product would make sense too, but the dot function
behavior cannot be changed in any case, so its pointless to argue about
it. Just make sure your arrays are 2-d (or matrices) if you want a
matrix product, which will give the behavior you expect in a much more
controlled fashion anyway.

>  I'd expect it to follow the rules of matrix notation and be treated
> like the 1d scalar vector so that (N,1).scalar -> (N,). To my mind,
> this follows more closely to the expectation that (J,K).(M,N) ->
> (J,N), i.e., the second dimension of the result is the same as the
> second dimension of whatever is post-multiplying where the first
> dimension is inferred if necessary (or should fail if non-existent).
> So my expectations are (were)
> 
> 
> (N,).() -> (N,)
> (N,1).() -> (N,)
> (N,1).(1,) -> (N,)
> (N,1).(1,1) -> (N,1)
> (N,2).() -> Error
>  
> Skipper
> 
> 
> > [~]
> > [279]: arr = np.random.random((25,2))
> >
> >
> > [~/]
> > [280]: np.dot(arr.squeeze(), np.array(2.)).shape
> > [280]: (25, 2)
> >
> >
> > Skipper
> >
> >
> 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Allowing 0-d arrays in np.take

2012-12-04 Thread Sebastian Berg
Hey,
 
Maybe someone has an opinion about this (since in fact it is new
behavior, so it is undefined). `np.take` used to not allow 0-d/scalar
input but did allow any other dimensions for the indices. Thinking about
changing this, meaning that:

np.take(np.arange(5), 0)

works. I was wondering if anyone has feelings about whether this should
return a scalar or a 0-d array. Typically numpy prefers scalars for
these cases (indexing would return a scalar too) for good reasons, so I
guess that is correct. But since I noticed this wondering if maybe it
returns a 0-d array, I thought I would ask here.

Regards,

Sebastian

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy's definition of contiguous arrays

2012-12-04 Thread Sebastian Berg
Hi,

maybe someone has an opinion about how this can be handled and was not
yet aware of this.

In current numpy master (probably being reverted), the definition for
contiguous arrays is changed such that it means "Contiguous in memory"
and nothing more. What this means is this:

1. An array of size (1,3,1) is both C- and F-contiguous (Assuming
`arr.strides[1] == arr.itemsize`).
2. However it is incorrect that `arr.strides[-1] == arr.itemsize`
because the corresponding axes dimension is 1 so it does not matter for
the memory layout. Also other similar assumptions about "clean strides"
are incorrect. (This was always incorrect in corner cases)

I think most will agree that this change reflects what these flags
should indicate, because the exact value of the strides is not really
important for the memory layout and for example for a row vector there
is no reason to say it cannot be both C- and F-contiguous.

However the change broke some code in scipy as well as sk-learn, that
relied on `arr.strides[-1] == arr.itemsize` (for C-contiguous arrays).
The fact that it was never noticed that this isn't quite correct
indicates that there is certainly more code out there just like it.

There is more discussion here: https://github.com/numpy/numpy/pull/2735
with suggestions for a possible deprecation process of having both
definitions next to each other and deprecating the current, etc.
I was personally wondering if it is good enough to ensure strides are
cleaned up when an array is explicitly requested as contiguous which
means:

np.array(arr, copy=False, order='C').strides[-1] == arr.itemsize

is always True, but:

if arr.flags.c_contiguous:
# It is possible that:
arr.strides[-1] != arr.itemsize

Which fixes the problems found yet since typically if you want to use
the fact that an array is contiguous, you use this kind of command to
make sure it is. But I guess it is likely too dangerous to assume that
nobody only checks the flags and then continuous to do unwanted
assumptions about strides.

Best Regards,

Sebastian

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-integer index misfeature?

2012-12-12 Thread Sebastian Berg
On Wed, 2012-12-12 at 20:48 +, Nathaniel Smith wrote:
> On Wed, Dec 12, 2012 at 8:20 PM, Ralf Gommers  wrote:
> >
> > On Tue, Dec 11, 2012 at 5:44 PM, Neal Becker  wrote:
> >>
> >> I think it's a misfeature that a floating point is silently accepted as an
> >> index.  I would prefer a warning for:
> >>
> >> bins = np.arange (...)
> >>
> >> for b in bins:
> >> ...
> >>   w[b] = blah
> >>
> >> when I meant:
> >>
> >> for ib,b in enumerate (bins):
> >>   w[ib] = blah
> >
> >
> > Agreed. Scipy.special functions were just changed to generate warnings on
> > truncation of float inputs where ints are expected (only if truncation
> > changes the value, so 3.0 is silent and 3.1 is not).
> >
> > For numpy indexing this may not be appropriate though; checking every index
> > value used could slow things down and/or be quite disruptive.
> 
> I doubt this is measurable, and it only affects people who are using
> floats as indexes, which is a risky thing to be doing in the first
> place. The only good reason to use floats as indexes is if you're
> doing floating point arithmetic to calculate indexes -- but now you're
> going to get bitten as soon as some operation returns N - epsilon
> instead of N, and gets truncated to N - 1.
> 
> I'd be +1 for a patch to make numpy warn when indexing with
> non-integer floats. (Heck, I'd probably be +1 on deprecating allowing
> floating point numbers as indexes at all... it's risky as heck and
> reminding people to think about rounding can only be a good thing,
> given that risk.)
> 

Personally +1 on just deprecating that stuff in the long run. Just if
someone is interested I remember seeing this comment (which applies for
the scalar case):

/*
 * PyNumber_Index was introduced in Python 2.5 because of NumPy.
 * http://www.python.org/dev/peps/pep-0357/
 * Let's use it for indexing!
 *
 * Unfortunately, SciPy and possibly other code seems to rely
 * on the lenient coercion. :(
 */
#if 0 /*PY_VERSION_HEX >= 0x0205*/
PyObject *ind = PyNumber_Index(op);
if (ind != NULL) {
value = PyArray_PyIntAsIntp(ind);
Py_DECREF(ind);
}
else {
value = -1;
}
#else

and is is somewhat related. But with a long deprecation process
switching to using `__index__` would seem possible to me.

Regards,

Sebastian

> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Howto bisect old commits correctly

2013-01-04 Thread Sebastian Berg
Hey,

this is probably just because I do not have any experience with bisect
and the like, but when I try running a bisect keep running into:

ImportError: 
/home/sebastian/.../lib/python2.7/site-packages/numpy/core/multiarray.so: 
undefined symbol: PyDataMem_NEW
or:
RuntimeError: module compiled against API version 8 but this version of numpy 
is 7

I am sure I am missing something simple, but I have no idea where to
look. Am I just forgetting to delete some things and my version is not
clean!?

Regards,

Sebastian

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Howto bisect old commits correctly

2013-01-04 Thread Sebastian Berg
On Sat, 2013-01-05 at 00:17 +0100, Sebastian Berg wrote:
> Hey,
> 
> this is probably just because I do not have any experience with bisect
> and the like, but when I try running a bisect keep running into:
> 

Nevermind that. Probably I just stumbled on some bad versions...

> ImportError: 
> /home/sebastian/.../lib/python2.7/site-packages/numpy/core/multiarray.so: 
> undefined symbol: PyDataMem_NEW
> or:
> RuntimeError: module compiled against API version 8 but this version of numpy 
> is 7
> 
> I am sure I am missing something simple, but I have no idea where to
> look. Am I just forgetting to delete some things and my version is not
> clean!?
> 
> Regards,
> 
> Sebastian
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rank-0 arrays - reprise

2013-01-06 Thread Sebastian Berg
On Sun, 2013-01-06 at 08:58 +0100, Dag Sverre Seljebotn wrote:
> On 01/05/2013 10:31 PM, Nathaniel Smith wrote:
> > On 5 Jan 2013 12:16, "Matthew Brett"  wrote:
> >>
> >> Hi,
> >>
> >> Following on from Nathaniel's explorations of the scalar - array
> >> casting rules, some resources on rank-0 arrays.
> >>



> > Q: Yeah. I mean, I remember that seemed weird when I first learned
> > Python, but when have you ever felt the Python was really missing a
> > "character" type like C has?
> 
> str is immutable which makes this a lot easier to deal with without 
> getting confused. So basically you have:
> 
> a[0:1] # read-write view
> a[[0]] # read-write copy
> a[0] # read-only view
> 
> AND, += are allowed on all read-only arrays, they just transparently 
> create a copy instead of doing the operation in-place.
> 
> Try to enumerate all the fundamentally different things (if you count 
> memory use/running time) that can happen for ndarrays a, b, and 
> arbitrary x here:
> 
> a += b[x]
> 
> That's already quite a lot, your proposal adds even more options. It's 
> certainly a lot more complicated than str.
> 
> To me it all sounds like a lot of rules introduced just to have the 
> result of a[0] be "kind of a scalar" without actually choosing that option.
> 

Yes, but I don't think there is an option to making the elements of an
array being immutable. Firstly if you switch normal python code to numpy
code you suddenly get numpy data types spilled into your code, and
mutable objects are simply very different (also true for code updating
to this new version). Do you expect:

array = np.zeros(10, dtype=np.intp)
b = arr[5]
while condition:
# might change the array?!
b += 1
# This would not be possible and break:
dictionary[b] = b**2

Because mutable objects are not hashable which important considering
that dictionaries are a very central data type, making an element return
mutable would be a bad idea.
One could argue about structured datatypes, but maybe then it should be
a datatype property whether its mutable or not, and even then the
element should probably be a copy (though I did not check what happens
here right now).

> BUT I should read up on that thread you posted on why that won't work, 
> didn't have time yet...
> 
> Dag Sverre
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] high dimensional array -> python scalar/index

2013-01-06 Thread Sebastian Berg
Question for everyone, is this really reasonable:

>>> import numpy as np
>>> from operator import index
>>> index(np.array([[5]]))
5
>>> int(np.array([[5]]))
5
>>> [0,1,2,3][np.array([[2]])]
2

To me, this does not make sense, why should we allow to use a high
dimensional object like a normal scalar (its ok for 0-d arrays I guess)?
Personally I would be for deprecating these usages, even if that
(probably) means you cannot reshape your array with a matrix (as it is
2D) ;-):
>>> np.arange(10).reshape(np.matrix([5,-1]).T)
array([[0, 1],
   [2, 3],
   [4, 5],
   [6, 7],
   [8, 9]])

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] high dimensional array -> python scalar/index

2013-01-06 Thread Sebastian Berg
On Sun, 2013-01-06 at 13:28 -0500, josef.p...@gmail.com wrote:
> On Sun, Jan 6, 2013 at 12:57 PM, Sebastian Berg
>  wrote:
> > Question for everyone, is this really reasonable:
> >
> >>>> import numpy as np
> >>>> from operator import index
> >>>> index(np.array([[5]]))
> > 5
> >>>> int(np.array([[5]]))
> > 5
> >>>> [0,1,2,3][np.array([[2]])]
> > 2
> 
> Not sure I understand the point
> 
> looks reasonable to my
> 
> int has an implied squeeze, if it succeeds
> 

Exactly *why* should it have an implied squeeze? Note I agree, the
int(np.array([3])) is OK, since also int('10') works, however for index
I think it is not OK, you simply cannot do list['10'].

> not so python lists
> 
> >>> int([[1]])
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: int() argument must be a string or a number, not 'list'

Exactly, so why should numpy be much more forgiving?

> 
> >>> [0,1,2,3][np.array([[2, 2], [0, 1]])]
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: only integer arrays with one element can be converted to an index
> 
> 
> but we can to more fun things with numpy
> 
> >>> np.array([0,1,2,3])[np.array([[2, 2], [0, 1]])]
> array([[2, 2],
>[0, 1]])
> 

Of course... But if you compare to lists, thats actually a point why
index should fail:

>>> np.array([0,1,2,3])[np.array([[3]])]

is very different from:

>>> [0,1,2,3][np.array([[3]])]

and in my opinion there is no reason why the latter should not simply
fail.

> Josef
> 
> >
> > To me, this does not make sense, why should we allow to use a high
> > dimensional object like a normal scalar (its ok for 0-d arrays I guess)?
> > Personally I would be for deprecating these usages, even if that
> > (probably) means you cannot reshape your array with a matrix (as it is
> > 2D) ;-):
> >>>> np.arange(10).reshape(np.matrix([5,-1]).T)
> > array([[0, 1],
> >[2, 3],
> >[4, 5],
> >[6, 7],
> >[8, 9]])
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Sebastian Berg
On Tue, 2013-01-08 at 19:59 +, Nathaniel Smith wrote:
> On 8 Jan 2013 17:24, "Andrew Collette"  wrote:
> >
> > Hi,
> >
> > > I think you are voting strongly for the current casting rules, because
> > > they make it less obvious to the user that scalars are different from
> > > arrays.
> >
> > Maybe this is the source of my confusion... why should scalars be
> > different from arrays?  They should follow the same rules, as closely
> > as possible.  If a scalar value would fit in an int16, why not add it
> > using the rules for an int16 array?
> 
> The problem is that rule for arrays - and for every other party of
> numpy in general - are that we *don't* pick types based on values.
> Numpy always uses input types to determine output types, not input
> values.
> 
> # This value fits in an int8
> In [5]: a = np.array([1])
> 
> # And yet...
> In [6]: a.dtype
> Out[6]: dtype('int64')
> 
> In [7]: small = np.array([1], dtype=np.int8)
> 
> # Computing 1 + 1 doesn't need a large integer... but we use one
> In [8]: (small + a).dtype
> Out[8]: dtype('int64')
> 
> Python scalars have an unambiguous types: a Python 'int' is a C
> 'long', and a Python 'float' is a C 'double'. And these are the types
> that np.array() converts them to. So it's pretty unambiguous that
> "using the same rules for arrays and scalars" would mean, ignore the
> value of the scalar, and in expressions like
>   np.array([1], dtype=np.int8) + 1
> we should always upcast to int32/int64. The problem is that this makes
> working with narrow types very awkward for no real benefit, so
> everyone pretty much seems to want *some* kind of special case. These
> are both absolutely special cases:
> 
> numarray through 1.5: in a binary operation, if one operand has
> ndim==0 and the other has ndim>0, ignore the width of the ndim==0
> operand.
> 
> 1.6, your proposal: in a binary operation, if one operand has ndim==0
> and the other has ndim>0, downcast the ndim==0 item to the smallest
> width that is consistent with its value and the other operand's type.
> 

Well, that leaves the maybe not quite implausible proposal of saying
that numpy scalars behave like arrays with ndim>0, but python scalars
behave like they do in 1.6. to allow for easier working with narrow
types.

Sebastian

> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] int and long issues

2013-01-10 Thread Sebastian Berg
On Thu, 2013-01-10 at 11:32 +0100, Mads Ipsen wrote:
> Hi,
> 
> I find this to be a little strange:
> 
> x = numpy.arange(10)
> isinstance(x[0],int)
> 
> gives True
> 
> y = numpy.where(x < 5)[0]
> isinstance(y[0],int)
> 
> gives False
> 
> isinstance(y[0],long)
> 

Check what type(x[0])/type(y[0]) prints, I expect these are very
different, because the default integer type and the integer type used
for indexing (addressing memory in general) are not necessarily the
same. And because of that, `y[0]` probably simply isn't compatible to
the datatype of a python integer for your hardware and OS (for example
for me, your code works). So on python 2 (python 3 abolishes int and
makes long the only integer, so this should work as expected there) you
have to just check both even in the python context, because you can
never really know (there may be some nice trick for that, but not sure).
And if you want to allow for rare 0d arrays as well (well they are very
rare admittingly)... it gets even a bit hairier.


> gives True
> 
> Specs: Python 2.7.2, numpy-1.6.1, Win7, 64 bit
> 
> Best regards,
> 
> Mads
> -- 
> +-+
> | Mads Ipsen  |
> +--+--+
> | Gåsebæksvej 7, 4. tv |  |
> | DK-2500 Valby| phone:  +45-29716388 |
> | Denmark  | email:  mads.ip...@gmail.com |
> +--+--+
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] int and long issues

2013-01-10 Thread Sebastian Berg
On Thu, 2013-01-10 at 11:32 +0100, Mads Ipsen wrote:
> Hi,
> 
> I find this to be a little strange:
> 
> x = numpy.arange(10)
> isinstance(x[0],int)
> 
> gives True
> 
> y = numpy.where(x < 5)[0]
> isinstance(y[0],int)
> 
> gives False
> 
> isinstance(y[0],long)
> 

Check what type(x[0])/type(y[0]) prints, I expect these are very
different, because the default integer type and the integer type used
for indexing (addressing memory in general) are not necessarily the
same. And because of that, `y[0]` probably simply isn't compatible to
the datatype of a python integer for your hardware and OS (for example
for me, your code works). So on python 2 (python 3 abolishes int and
makes long the only integer, so this should work as expected there) you
have to just check both even in the python context, because you can
never really know (there may be some nice trick for that, but not sure).
And if you want to allow for rare 0d arrays as well (well they are very
rare admittingly)... it gets even a bit hairier.


Regards,

Sebastian

> gives True
> 
> Specs: Python 2.7.2, numpy-1.6.1, Win7, 64 bit
> 
> Best regards,
> 
> Mads
> -- 
> +-+
> | Mads Ipsen  |
> +--+--+
> | Gåsebæksvej 7, 4. tv |  |
> | DK-2500 Valby| phone:  +45-29716388 |
> | Denmark  | email:  mads.ip...@gmail.com |
> +--+--+
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] return index of maximum value in an array easily?

2013-01-11 Thread Sebastian Berg
On Sat, 2013-01-12 at 00:26 +0100, Chao YUE wrote:
> Hi,
> 
> I don't know how others think about this. Like you point out, one can
> use 
> np.nonzero(a==np.max(a)) as a workaround. 
> 
> For the second point, in case I have an array:
> a = np.arange(24.).reshape(2,3,4)
> 
> suppose I want to find the index for maximum value of each 2X3 array
> along 
> the 3rd dimension, what I can think of will be:
> 
> index_list = []
> for i in range(a.shape[-1]):
> data = a[...,i]
> index_list.append(np.nonzero(data==np.max(data)))
> 
To keep being close to min/max (and other ufunc based reduce
operations), it would seem consistent to allow something like
np.argmax(array, axis=(1, 2)), which would give a tuple of
arrays as result such that

array[np.argmax(array, axis=(1,2))] == np.max(array, axis=(1,2))

But apart from consistency, I am not sure anyone would get the idea of
giving multiple axes into the function...

> 
> In [87]:
> 
> 
> index_list
> Out[87]:
> [(array([1]), array([2])),
>  (array([1]), array([2])),
>  (array([1]), array([2])),
>  (array([1]), array([2]))]
> 
> 
> If we want to make the np.argmax function doing the job of this part
> of code,
> could we add another some kind of boolean keyword argument, for
> example, 
> "exclude" to the function? 
> [this is only my thinking, and I am only a beginner, maybe it's
> stupid!!!]
> 
> np.argmax(a,axis=2,exclude=True) (default value for exclude is False)
> 
> it will give the index of maximum value along all other axis except
> the axis=2
> (which is acutally the 3rd axis)
> 
> The output will be:
> 
> np.array(index_list).squeeze()
> 
> array([[1, 2],
>[1, 2],
>[1, 2],
>[1, 2]])
> 
> and one can use a[1,2,i] (i=1,2,3,4) to extract the maximum value. 
> 
> I doubt this is really useful.. too complicated..
> 
> Chao
> 
> On Fri, Jan 11, 2013 at 11:00 PM, Nathaniel Smith 
> wrote:
> On Thu, Jan 10, 2013 at 9:40 AM, Chao YUE
>  wrote:
> > Dear all,
> >
> > Are we going to consider returning the index of maximum
> value in an array
> > easily
> > without calling np.argmax and np.unravel_index
> consecutively?
> 
> 
> This does seem like a good thing to support somehow. What
> would a good
> interface look like? Something like np.nonzero(a ==
> np.max(a))? Should
> we support vectorized operation (e.g. argmax of each 2-d
> subarray of a
> 3-d array along some axis)?
> 
> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> 
> -- 
> ***
> Chao YUE
> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> UMR 1572 CEA-CNRS-UVSQ
> Batiment 712 - Pe 119
> 91191 GIF Sur YVETTE Cedex
> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Subclassing ndarray with concatenate

2013-01-22 Thread Sebastian Berg
Hey,

On Tue, 2013-01-22 at 10:21 +0100, Todd wrote:
> I am trying to create a subclass of ndarray that has additional
> attributes.  These attributes are maintained with most numpy functions
> if __array_finalize__ is used.  
> 
You can cover a bit more if you also implement `__array_wrap__`, though
unless you want to do something fancy, that just replaces the
`__array_finalize__` for the most part. But some (very few) functions
currently call `__array_wrap__` explicitly.

> The main exception I have found is concatenate (and hstack/vstack,
> which just wrap concatenate).  In this case, __array_finalize__ is
> passed an array that has already been stripped of the additional
> attributes, and I don't see a way to recover this information.  
> 
There are quite a few functions that simply do not preserve subclasses
(though I think more could/should call `__array_wrap__` probably, even
if the documentation may say that it is about ufuncs, there are some
example of this already).
`np.concatenate` is one of these. It always returns a base array. In any
case it gets a bit difficult if you have multiple input arrays (which
may not matter for you).

> In my particular case at least, there are clear ways to handle corner
> cases (like being passed a class that lacks these attributes), so in
> principle there no problem handling concatenate in a general way,
> assuming I can get access to the attributes.
> 
> 
> So is there any way to subclass ndarray in such a way that concatenate
> can be handled properly?
> 
Quite simply, no. If you compare masked arrays, they also provide their
own concatenate for this reason.

I hope that helps a bit...

Regards,

Sebastian

> I have been looking extensively online, but have not been able to find
> a clear answer on how to do this, or if there even is a way.
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Subclassing ndarray with concatenate

2013-01-22 Thread Sebastian Berg
On Tue, 2013-01-22 at 13:44 +0100, Sebastian Berg wrote:
> Hey,
> 
> On Tue, 2013-01-22 at 10:21 +0100, Todd wrote:
> > I am trying to create a subclass of ndarray that has additional
> > attributes.  These attributes are maintained with most numpy functions
> > if __array_finalize__ is used.  
> > 
> You can cover a bit more if you also implement `__array_wrap__`, though
> unless you want to do something fancy, that just replaces the
> `__array_finalize__` for the most part. But some (very few) functions
> currently call `__array_wrap__` explicitly.
> 

Actually have to correct myself here. The default __array_wrap__ causes
__array_finalize__ to be called as you would expect, so there is no need
to use it unless you want to do something fancy.

> > The main exception I have found is concatenate (and hstack/vstack,
> > which just wrap concatenate).  In this case, __array_finalize__ is
> > passed an array that has already been stripped of the additional
> > attributes, and I don't see a way to recover this information.  
> > 
> There are quite a few functions that simply do not preserve subclasses
> (though I think more could/should call `__array_wrap__` probably, even
> if the documentation may say that it is about ufuncs, there are some
> example of this already).
> `np.concatenate` is one of these. It always returns a base array. In any
> case it gets a bit difficult if you have multiple input arrays (which
> may not matter for you).
> 
> > In my particular case at least, there are clear ways to handle corner
> > cases (like being passed a class that lacks these attributes), so in
> > principle there no problem handling concatenate in a general way,
> > assuming I can get access to the attributes.
> > 
> > 
> > So is there any way to subclass ndarray in such a way that concatenate
> > can be handled properly?
> > 
> Quite simply, no. If you compare masked arrays, they also provide their
> own concatenate for this reason.
> 
> I hope that helps a bit...
> 
> Regards,
> 
> Sebastian
> 
> > I have been looking extensively online, but have not been able to find
> > a clear answer on how to do this, or if there even is a way.
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpythonically getting elements with the minimum sum

2013-01-29 Thread Sebastian Berg
On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote:
> Gregor Thalhammer writes:
> 
> > Am 28.1.2013 um 23:15 schrieb Lluís:
> 
> >> Hi,
> >> 
> >> I have a somewhat convoluted N-dimensional array that contains information 
> >> of a
> >> set of experiments.
> >> 
> >> The last dimension has as many entries as iterations in the experiment (an
> >> iterative application), and the penultimate dimension has as many entries 
> >> as
> >> times I have run that experiment; the rest of dimensions describe the 
> >> features
> >> of the experiment:
> >> 
> >> data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, 
> >> NUM_ITERATIONS)
> >> 
> >> So, what I want is to get the data for the best run of each experiment:
> >> 
> >> best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)
> >> 
> >> by selecting, for each experiment, the run with the lowest total time (sum 
> >> of
> >> the time of all iterations for that experiment).
> >> 
> >> 
> >> So far I've got the trivial part, but not the final indexing into "data":
> >> 
> >> dsum = data.sum(axis = -1)
> >> dmin = dsum.min(axis = -1)
> >> best = data[???]
> >> 
> >> 
> >> I'm sure there must be some numpythonic and generic way to get what I 
> >> want, but
> >> fancy indexing is beating me here :)
> 
> > Did you have a look at the argmin function? It delivers the indices of the 
> > minimum values along an axis. Untested guess:
> 
> > dmin_idx = argmin(dsum, axis = -1)
> > best = data[..., dmin_idx, :]
> 
> Ah, sorry, my example is incorrect. I was actually using 'argmin', but 
> indexing
> with it does not exactly work as I expected:
> 
>   >>> d1.shape
>   (2, 5, 10)
>   >>> dsum = d1.sum(axis = -1)
>   >>> dmin = d1.argmin(axis = -1)
>   >>> dmin.shape
>   (2,)
>   >>> d1_best = d1[...,dmin,:]

You need to use fancy indexing. Something like:
>>> d1_best = d1[np.arange(2), dmin,:]

Because the Ellipsis takes everything from the axis, while you want to
pick from multiple axes at the same time. That can be achieved with
fancy indexing (indexing with arrays). From another perspective, you
want to get rid of two axes in favor of a new one, but a slice/Ellipsis
always preserves the axis it works on.

>   >>> d1_best.shape
>   (2, 2, 10)
> 
> 
> Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using
> this previous code with some example values:
> 
>   >>> dmin
>   [4 3]
>   >>> d1_best
>   [[[ ... contents of d1[0,4,:] ...]
> [ ... contents of d1[0,3,:] ...]]
>[[ ... contents of d1[1,4,:] ...]
> [ ... contents of d1[1,3,:] ...]]]
> 
> 
> While I actually want this:
> 
>   [[ ... contents of d1[0,4,:] ...]
>[ ... contents of d1[1,3,:] ...]]
> 
> 
> Thanks,
>   Lluis
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Subclassing ndarray with concatenate

2013-01-30 Thread Sebastian Berg
On Wed, 2013-01-30 at 10:24 +0100, Todd wrote:
> On Tue, Jan 22, 2013 at 1:44 PM, Sebastian Berg
>  wrote:
> Hey,
> 
> On Tue, 2013-01-22 at 10:21 +0100, Todd wrote:
> 
> 
> > The main exception I have found is concatenate (and
> hstack/vstack,
> > which just wrap concatenate).  In this case,
> __array_finalize__ is
> > passed an array that has already been stripped of the
> additional
> > attributes, and I don't see a way to recover this
> information.
> >
> 
> There are quite a few functions that simply do not preserve
> subclasses
> (though I think more could/should call `__array_wrap__`
> probably, even
> if the documentation may say that it is about ufuncs, there
> are some
> example of this already).
> `np.concatenate` is one of these. It always returns a base
> array. In any
> case it gets a bit difficult if you have multiple input arrays
> (which
> may not matter for you).
> 
> 
> 
> I don't think this is right.  I tried it and it doesn't return a base
> array, it returns an instance of the original array subclass.

Yes you are right it preserves type, I was fooled by
`__array_priority__` being 0 as default, thought it defaulted to more
then 0 (for ufuncs everything beats arrays, not sure if it really
should) but so I missed.

In any case, yes, it calls __array_finalize__, but as you noticed, it
calls it without the original array. Now it would be very easy and
harmless to change that, however I am not sure if giving only the parent
array is very useful (ie. you only get the one with highest array
priority).

Another way to get around it would be maybe to call __array_wrap__ like
ufuncs do (with a context, so you get all inputs, but then the non-array
axis argument may not be reasonably placed into the context).

In any case, if you think it would be helpful to at least get the single
parent array, that would be a very simple change, but I feel the whole
subclassing could use a bit thinking and quite a bit of work probably,
since I am not quite convinced that calling __array_wrap__ with a
complicated context from as many functions as possible is the right
approach for allowing more complex subclasses.

>  
> 
> > In my particular case at least, there are clear ways to
> handle corner
> > cases (like being passed a class that lacks these
> attributes), so in
> > principle there no problem handling concatenate in a general
> way,
> > assuming I can get access to the attributes.
> >
> >
> > So is there any way to subclass ndarray in such a way that
> concatenate
> > can be handled properly?
> >
> 
> Quite simply, no. If you compare masked arrays, they also
> provide their
> own concatenate for this reason.
> 
> I hope that helps a bit...
> 
> 
> 
> Is this something that should be available?  For instance a method
> that provides both the new array and the arrays that were used to
> construct it.  This would seem to be an extremely common use-case for
> array subclasses, so letting them gracefully handle this would seem to
> be very important.
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Issues to fix for 1.7.0rc2.

2013-02-06 Thread Sebastian Berg
On Wed, 2013-02-06 at 10:18 +0100, Dag Sverre Seljebotn wrote:
> On 02/06/2013 08:41 AM, Charles R Harris wrote:
> >
> >
> > On Tue, Feb 5, 2013 at 11:50 PM, Jason Grout
> > mailto:jason-s...@creativetrax.com>> wrote:
> >
> > On 2/6/13 12:46 AM, Charles R Harris wrote:
> >  > if we decide to do so
> >
> > I should mention that we don't really depend on either behavior (we
> > probably should have a better doctest testing for an array of None
> > values anyway), but we noticed the oddity and thought we ought to
> > mention it.  So it doesn't matter to us which way the decision goes.
> >
> >
> > More Python craziness
> >
> > In [6]: print None or 0
> > 0
> >
> > In [7]: print 0 or None
> > None
> 
> To me this seems natural and is just how Python works? I think the rule 
> for "or" is simply "evaluate __nonzero__ of left operand, if it is 
> False, return right operand".
> 
> The reason is so that you can use it like this:
> 

Yes, but any() and all() functions in python return forcibly a bool as
one would expect. So probably logical_and.reduce and all should simply
not be the same thing, at least for objects.
Though it is a bit weird that objects do something different from other
types, so maybe it would be OK to say that numpy just differs from
python here, since I am not sure if you can easily change it for the
other types.

Regards,

Sebastian

> x = get_foo() or get_bar() # if get_foo() returns None
> # use result of get_bar
> 
> or
> 
> def f(x=None):
>  x = x or create_default_x()
>  ...
> 
> I guess that after the "a if expr else b" was introduced this has become 
> less common.
> 
> Dag Sverre
> 
> >
> > Numpy any is consistent with python when considered as logical_or.reduce
> >
> > In [13]: print array([0, None]).any()
> > None
> >
> > In [14]: print array([None, 0]).any()
> > 0
> >
> > This appears to be an __ror__, __or__ inconsistency in python. Note that
> > None possesses neither of those operators.
> >
> > Chuck
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Building numpy for python3.3: No _numpyconfig.h

2013-02-06 Thread Sebastian Berg
On Wed, 2013-02-06 at 13:31 +, gk230-free...@yahoo.de wrote:
> Hi,
> 
> I'm currently trying to build numpy 1.6.2 for python python 3.3 from ports on 
> FreeBSD. Unfortunately, the setup.py execution fails because some [1] gcc 
> command trying to access _numpyconfig.h fails since _numpyconfig.h is not 
> generated from _numpyconfig.h.in. How do I manually build the proper header 
> from the .h.in, and why does that not happen automatically?
> 

1.6.2 probably does not support python 3.3 at all. You should instead
try the 1.7. release candidate (or wait a bit longer for 1.7rc2 or even
the final release), which supports python 3.3. Note that when 1.6.2 was
released python 3.3 did not exist.

Regards,

Sebastian

> --
> -- Gereon
> 
> [1]
> # gcc46 -DNDEBUG -O2 -pipe -fno-strict-aliasing -O2 -pipe 
> -Wl,-rpath=/usr/local/lib/gcc46 -fno-strict-aliasing -fPIC 
> -Inumpy/core/include 
> -Ibuild/src.freebsd-9.0-RELEASE-amd64-3.3/numpy/core/include/numpy 
> -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core 
> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath 
> -Inumpy/core/include -I/usr/local/include/python3.3m 
> -Ibuild/src.freebsd-9.0-RELEASE-amd64-3.3/numpy/core/src/multiarray 
> -Ibuild/src.freebsd-9.0-RELEASE-amd64-3.3/numpy/core/src/umath -c 
> numpy/core/src/multiarray/multiarraymodule_onefile.c -o 
> build/temp.freebsd-9.0-RELEASE-amd64-3.3/numpy/core/src/multiarray/multiarraymodule_onefile.o
> Assembler messages:
> Fatal error: can't create 
> build/temp.freebsd-9.0-RELEASE-amd64-3.3/numpy/core/src/multiarray/multiarraymodule_onefile.o:
>  No such file or directory
> In file included from numpy/core/include/numpy/ndarraytypes.h:5:0,
>  from numpy/core/include/numpy/ndarrayobject.h:17,
>  from numpy/core/include/numpy/arrayobject.h:14,
>  from numpy/core/src/multiarray/common.c:6,
>  from numpy/core/src/multiarray/multiarraymodule_onefile.c:8:
> numpy/core/include/numpy/numpyconfig.h:4:26: fatal error: _numpyconfig.h: No 
> such file or directory
> compilation terminated.
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where's that function?

2013-02-06 Thread Sebastian Berg
On Wed, 2013-02-06 at 13:08 -0500, josef.p...@gmail.com wrote:
> I'm convinced that I saw a while ago a function that uses a list of
> interval boundaries to index into an array, either to iterate or to
> take.
> I thought that's very useful, but didn't make a note.
> 
> Now, I have no idea where I saw this (I thought numpy), and I cannot
> find it anywhere.
> 
> any clues?
> 

It does not quite sound like what you are looking for, but the only
thing I can think of in numpy right now that does something in that
direction is the ufunc.reduceat functionality:

In [1]: a = np.arange(10)

In [2]: a[2:5].sum(), a[5:9].sum(), a[9:].sum()
Out[2]: (9, 26, 9)

In [3]: np.add.reduceat(a, [2, 5, 9])
Out[3]: array([ 9, 26,  9])

Regards,

Sebastian

> Thanks,
> 
> Josef
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.percentile docstring

2013-02-17 Thread Sebastian Berg
Hey,

On Sun, 2013-02-17 at 11:50 +0100, Andreas Hilboll wrote:
> In my numpy 1.6.1 (from Ubuntu 12.04LTS repository), the docstring of
> np.percentile is wrong. I'm not just submitting a PR because I don't
> understand something.
> 
> in the "Notes" and "Examples" sections, there seems to be some confusion
> if ``q`` should be in the range [0,1] or in [0,100]. The parameters
> section is very clear about this, it should be [0,100].
> 
> However, in the Examples section, it says
> 
>>>> np.percentile(a, 0.5, axis=0)
>array([ 6.5,  4.5,  2.5])
> 
> The given array is clearly the result of a call to np.percentile(a, 50.,
> axis=0).
> 
> I thought the docs are auto-generated, and that the "array..." result of
> the docstring would be calculated by numpy while building the docs? Or
> am I misunderstanding something here?

They are not auto generated, in principle the other way would be
possible, i.e. numpy can use the documentation to test the code. But the
doctests have been relatively broken for a while I think, so there is at
the time no way to automatically find such issues. There is some work
right now going on for percentile, but please just do your pull request.

Regards,

Sebastian

> 
> Cheers, Andreas.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array accumulation in numpy

2013-02-19 Thread Sebastian Berg
On Tue, 2013-02-19 at 10:00 -0500, Tony Ladd wrote:
> I want to accumulate elements of a vector (x) to an array (f) based on 
> an index list (ind).
> 
> For example:
> 
> x=[1,2,3,4,5,6]
> ind=[1,3,9,3,4,1]
> f=np.zeros(10)
> 
> What I want would be produced by the loop
> 
> for i=range(6):
>  f[ind[i]]=f[ind[i]]+x[i]
> 
> The answer is f=array([ 0.,  7.,  0.,  6.,  5.,  0.,  0.,  0.,  0., 3.])
> 
> When I try to use implicit arguments
> 
> f[ind]=f[ind]+x
> 
> I get f=array([ 0.,  6.,  0.,  4.,  5.,  0.,  0.,  0.,  0.,  3.])
> 
> 
> So it takes the last value of x that is pointed to by ind and adds it to 
> f, but its the wrong answer when there are repeats of the same entry in 
> ind (e.g. 3 or 1)
> 
> I realize my code is incorrect, but is there a way to make numpy 
> accumulate without using loops? I would have thought so but I cannot 
> find anything in the documentation.
> 

You might be interested in this:
https://github.com/numpy/numpy/pull/2821

But anyway, you should however be able to do what you want to do using
np.bincount with the weights keyword argument.

Regards,

Sebastian

> Would much appreciate any help - probably a really simple question.
> 
> Thanks
> 
> Tony
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] What should np.ndarray.__contains__ do

2013-02-25 Thread Sebastian Berg
Hello all,

currently the `__contains__` method or the `in` operator on arrays, does
not return what the user would expect when in the operation `a in b` the
`a` is not a single element (see "In [3]-[4]" below).

The first solution coming to mind might be checking `all()` for all
dimensions given in argument `a` (see line "In [5]" for a simplistic
example). This does not play too well with broadcasting however, but one
could maybe simply *not* broadcast at all (i.e. a.shape ==
b.shape[b.ndim-a.ndim:]) and raise an error/return False otherwise.

On the other hand one could say broadcasting of `a` onto `b` should be
"any" along that dimension (see "In [8]"). The other way should maybe
raise an error though (see "In [9]" to understand what I mean).

I think using broadcasting dimensions where `a` is repeated over `b` as
the dimensions to use "any" logic on is the most general way for numpy
to handle this consistently, while the other way around could be handled
with an `all` but to me makes so little sense that I think it should be
an error. Of course this is different to a list of lists, which gives
False in these cases, but arrays are not list of lists...

As a side note, since for loop, etc.  use "for item in array", I do not
think that vectorizing along `a` as np.in1d does is reasonable. `in`
should return a single boolean.

I have opened an issue for it:
https://github.com/numpy/numpy/issues/3016#issuecomment-14045545


Regards,

Sebastian

In [1]: a = np.array([0, 2])

In [2]: b = np.arange(10).reshape(5,2)

In [3]: b
Out[3]: 
array([[0, 1],
   [2, 3],
   [4, 5],
   [6, 7],
   [8, 9]])

In [4]: a in b
Out[4]: True

In [5]: (b == a).any()
Out[5]: True

In [6]: (b == a).all(0).any() # the 0 could be multiple axes
Out[6]: False

In [7]: a_2d = a[None,:]

In [8]: a_2d in b # broadcast dimension means "any" -> True
Out[8]: True

In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True)
Out[9]: True


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should np.ndarray.__contains__ do

2013-02-25 Thread Sebastian Berg
On Mon, 2013-02-25 at 16:33 +, Nathaniel Smith wrote:
> On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg
>  wrote:
> > Hello all,
> >
> > currently the `__contains__` method or the `in` operator on arrays, does
> > not return what the user would expect when in the operation `a in b` the
> > `a` is not a single element (see "In [3]-[4]" below).
> 
> True, I did not expect that!
> 



> The two approaches that I can see, and which generalize the behaviour
> of simple Python lists in natural ways, are:
> 
> a) the left argument is coerced to a scalar of the appropriate type,
> then we check if that value appears anywhere in the array (basically
> raveling the right argument).
> 
> b) for an array with shape (n1, n2, n3, ...), the left argument is
> treated as an array of shape (n2, n3, ...), and we check if that
> subarray (as a whole) appears anywhere in the array. Or in other
> words, 'A in B' is true iff there is some i such that
> np.array_equals(B[i], A).
> 
> Question 1: are there any other sensible options that aren't on this list?
> 
> Question 2: if not, then which should we choose? (Or we could choose
> both, I suppose, depending on what the left argument looks like.)
> 
> Between these two options, I like (a) and don't like (b). The
> pretending-to-be-a-list-of-lists special case behaviour for
> multidimensional arrays is already weird and confusing, and besides,
> I'd expect equality comparison on arrays to use ==, not array_equals.
> So (b) feels pretty inconsistent with other numpy conventions to me.
> 

I agree with rejecting (b). (a) seems a good way to think about the
problem and I don't see other sensible options. The question is, lets
say you have an array b = [[0, 1], [2, 3]] and a = [[0, 1]] since they
are both 2d, should b be interpreted as two 2d elements? Another way of
seeing this would be ignoring one sized dimensions in `a` for the sake
of defining its "element". This would allow:

In [1]: b = np.arange(10).reshape(5,2)

In [2]: b
Out[2]: 
array([[0, 1],
   [2, 3],
   [4, 5],
   [6, 7],
   [8, 9]])

In [3]: a = np.array([[0, 1]]) # extra dimensions at the start

In [4]: a in b
Out[4]: True

# But would also allow transpose, since now the last axes is a dummy:
In [5]: a.T in b.T
Out[5]: True

Those two examples could also be a shape mismatch error, I tend to think
they are reasonable enough to work, but then the user could just
reshape/transpose to achieve the same.

I also wondered about b having i.e. b.shape = (5,1) with a.shape = (1,2)
being sensible enough to be not an error, but this element thinking is a
good reasoning for rejecting it IMO.

Maybe this is clearer,

Sebastian


> -n
> 
> > I have opened an issue for it:
> > https://github.com/numpy/numpy/issues/3016#issuecomment-14045545
> >
> >
> > Regards,
> >
> > Sebastian
> >
> > In [1]: a = np.array([0, 2])
> >
> > In [2]: b = np.arange(10).reshape(5,2)
> >
> > In [3]: b
> > Out[3]:
> > array([[0, 1],
> >[2, 3],
> >[4, 5],
> >[6, 7],
> >[8, 9]])
> >
> > In [4]: a in b
> > Out[4]: True
> >
> > In [5]: (b == a).any()
> > Out[5]: True
> >
> > In [6]: (b == a).all(0).any() # the 0 could be multiple axes
> > Out[6]: False
> >
> > In [7]: a_2d = a[None,:]
> >
> > In [8]: a_2d in b # broadcast dimension means "any" -> True
> > Out[8]: True
> >
> > In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True)
> > Out[9]: True
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should np.ndarray.__contains__ do

2013-02-25 Thread Sebastian Berg
On Mon, 2013-02-25 at 18:01 +0100, Todd wrote:
> The problem with b is that it breaks down if the two status have the
> same dimensionality. 
> 
> I think a better approach would be for 
> 
> a in b
> 
> With a having n dimensions, it returns true if there is any subarray
> of b that matches a along the last n dimensions.
> 
> So if a has 3 dimensions and b has 6, a in b is true iff there is any
> i, j, k, m, n, p such that
> 
> a=b[i, j, k,
> m:m+a.shape[0], 
> n:n+a.shape[1],
> p:p+a.shape[2]] ]
> 
> This isn't a very clear way to describe it, but I think it is
> consistent with the concept of a being a subarray of b even when they
> have the same dimensionality. 
> 
Oh, great point. Guess this is the most general way, I completely missed
this option. Allows [0, 3] in [1, 0, 3, 5] to be true. I am not sure if
this kind of matching should be part of the in operator or not, though
on the other hand it would only do something reasonable when otherwise
an error would be thrown and it definitely is useful and compatible to
what anyone else might expect. 

> On Feb 25, 2013 5:34 PM, "Nathaniel Smith"  wrote:
> On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg
>  wrote:
> > Hello all,
> >
> > currently the `__contains__` method or the `in` operator on
> arrays, does
> > not return what the user would expect when in the operation
> `a in b` the
> > `a` is not a single element (see "In [3]-[4]" below).
> 
> True, I did not expect that!
> 
> > The first solution coming to mind might be checking `all()`
> for all
> > dimensions given in argument `a` (see line "In [5]" for a
> simplistic
> > example). This does not play too well with broadcasting
> however, but one
> > could maybe simply *not* broadcast at all (i.e. a.shape ==
> > b.shape[b.ndim-a.ndim:]) and raise an error/return False
> otherwise.
> >
> > On the other hand one could say broadcasting of `a` onto `b`
> should be
> > "any" along that dimension (see "In [8]"). The other way
> should maybe
> > raise an error though (see "In [9]" to understand what I
> mean).
> >
> > I think using broadcasting dimensions where `a` is repeated
> over `b` as
> > the dimensions to use "any" logic on is the most general way
> for numpy
> > to handle this consistently, while the other way around
> could be handled
> > with an `all` but to me makes so little sense that I think
> it should be
> > an error. Of course this is different to a list of lists,
> which gives
> > False in these cases, but arrays are not list of lists...
> >
> > As a side note, since for loop, etc.  use "for item in
> array", I do not
> > think that vectorizing along `a` as np.in1d does is
> reasonable. `in`
> > should return a single boolean.
> 
> Python effectively calls bool() on the return value from
> __contains__,
> so reasonableness doesn't even come into it -- the only
> possible
> behaviours for `in` are to return True, False, or raise an
> exception.
> 
> I admit that I don't actually really understand any of this
> discussion
> of broadcasting. in's semantics are, "is this scalar in this
> container"? (And the scalarness is enforced by Python, as per
> above.)
> So I think we should find some approach where the left
> argument is
> treated as a scalar.
> 
> The two approaches that I can see, and which generalize the
> behaviour
> of simple Python lists in natural ways, are:
> 
> a) the left argument is coerced to a scalar of the appropriate
> type,
> then we check if that value appears anywhere in the array
> (basically
> raveling the right argument).
> 
> b) for an array with shape (n1, n2, n3, ...), the left
> argument is
> treated as an array of shape (n2, n3, ...), and we check if
> that
> subarray (as a whole) appears anywhere in the array. Or in
> other
> words, 'A in B' is true iff there is some i such that
> np.array_equals(B[

Re: [Numpy-discussion] Adding .abs() method to the array object

2013-02-25 Thread Sebastian Berg
On Mon, 2013-02-25 at 10:50 -0500, Skipper Seabold wrote:
> On Mon, Feb 25, 2013 at 10:43 AM, Till Stensitzki 
> wrote:
> >
> > First, sorry that i didnt search for an old thread, but because i
> disagree with
> > conclusion i would at least address my reason:
> >
> >> I don't like
> >> np.abs(arr).max()
> >> because I have to concentrate to much on the braces, especially if
> arr
> >> is a calculation
> >
> > This exactly, adding an abs into an old expression is always a
> little annoyance
> > due to the parenthesis. The argument that np.abs() also works is
> true for
> > (almost?) every other method. The fact that so many methods already
> exists,
> > especially for most of the commonly used functions (min, max, dot,
> mean, std,
> > argmin, argmax, conj, T) makes me missing abs. Of course, if one
> would redesign
> > the api, one would drop most methods (i am looking at you ptp and
> byteswap). But
> > the objected is already cluttered and adding abs is imo logical
> application of
> > "practicality beats purity".
> >
> 
> I tend to agree here. The situation isn't all that dire for the number
> of methods in an array. No scrolling at reasonably small terminal
> sizes.
> 
> [~/]
> [3]: x.
> x.T x.copy  x.getfield  x.put   x.std
> x.all   x.ctypesx.imag  x.ravel
> x.strides
> x.any   x.cumprod   x.item  x.real  x.sum
> x.argmaxx.cumsumx.itemset   x.repeat
>  x.swapaxes
> x.argminx.data  x.itemsize  x.reshape   x.take
> x.argsort   x.diagonal  x.max   x.resize
>  x.tofile
> x.astypex.dot   x.mean  x.round
> x.tolist
> x.base  x.dtype x.min   x.searchsorted
>  x.tostring
> x.byteswap  x.dump  x.nbytesx.setfield
>  x.trace
> x.choosex.dumps x.ndim  x.setflags
>  x.transpose
> x.clip  x.fill  x.newbyteorder  x.shape x.var
> x.compress  x.flags x.nonzero   x.size  x.view
> x.conj  x.flat  x.prod  x.sort  
> x.conjugate x.flatten   x.ptp   x.squeeze   
> 
> 
Two small things (not sure if it matters much). But first almost all of
these methods are related to the container and not the elements. Second
actually using a method arr.abs() has a tiny pitfall, since abs would
work on numpy types, but not on python types. This means that:

np.array([1, 2, 3]).max().abs()

works, but

np.array([1, 2, 3], dtype=object).max().abs()

breaks. Python has a safe name for abs already...


> I find myself typing things like 
> 
> arr.abs()
> 
> and
> 
> arr.unique()
> 
> quite often.
> 
> Skipper
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding .abs() method to the array object

2013-02-26 Thread Sebastian Berg
On Mon, 2013-02-25 at 22:04 -0500, josef.p...@gmail.com wrote:
> On Mon, Feb 25, 2013 at 9:58 PM,   wrote:
> > On Mon, Feb 25, 2013 at 9:20 PM, Sebastian Berg
> >  wrote:
> >> On Mon, 2013-02-25 at 10:50 -0500, Skipper Seabold wrote:
> >>> On Mon, Feb 25, 2013 at 10:43 AM, Till Stensitzki 
> >>> wrote:
> >>> >
> >>> > First, sorry that i didnt search for an old thread, but because i
> >>> disagree with
> >>> > conclusion i would at least address my reason:
> >>> >

> >> Two small things (not sure if it matters much). But first almost all of
> >> these methods are related to the container and not the elements. Second
> >> actually using a method arr.abs() has a tiny pitfall, since abs would
> >> work on numpy types, but not on python types. This means that:
> >>
> >> np.array([1, 2, 3]).max().abs()
> >>
> >> works, but
> >>
> >> np.array([1, 2, 3], dtype=object).max().abs()
> >>
> >> breaks. Python has a safe name for abs already...
> >
> >>>> (np.array([1, 2, 3], dtype=object)).max()
> > 3
> >>>> (np.array([1, 2, 3], dtype=object)).__abs__().max()
> > 3
> >>>> (np.array([1, 2, '3'], dtype=object)).__abs__()
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > TypeError: bad operand type for abs(): 'str'
> >
> >>>> map(abs, [1, 2, 3])
> > [1, 2, 3]
> >>>> map(abs, [1, 2, '3'])
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > TypeError: bad operand type for abs(): 'str'
> 
> or maybe more useful
> 
> >>> from decimal import Decimal
> >>> d = [Decimal(str(k)) for k in np.linspace(-1, 1, 5)]
> >>> map(abs, d)
> [Decimal('1.0'), Decimal('0.5'), Decimal('0.0'), Decimal('0.5'), 
> Decimal('1.0')]
> 
> >>> np.asarray(d).__abs__()
> array([1.0, 0.5, 0.0, 0.5, 1.0], dtype=object)
> >>> np.asarray(d).__abs__()[0]
> Decimal('1.0')
> 
> Josef
> 
> >
> > I don't see a difference.
> >
> > (I don't expect to use max abs on anything else than numbers.)
> >

The difference is about scalars only. And of course __abs__ is fine, but
if numpy adds an abs method, its scalars would logically have it too.
But then you diverge from python scalars. That has exactly the downside
that you may write code that suddenly stops working for python scalars
without noticing.

I turned around the abs and max order here, so that the abs works on the
scalar, not useful but just as an example.

> > Josef
> >>
> >>
> >>> I find myself typing things like
> >>>
> >>> arr.abs()
> >>>
> >>> and
> >>>
> >>> arr.unique()
> >>>
> >>> quite often.
> >>>
> >>> Skipper
> >>> ___
> >>> NumPy-Discussion mailing list
> >>> NumPy-Discussion@scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >>
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should np.ndarray.__contains__ do

2013-02-26 Thread Sebastian Berg
On Mon, 2013-02-25 at 16:33 +, Nathaniel Smith wrote:
> On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg
>  wrote:
> > Hello all,
> >
> > currently the `__contains__` method or the `in` operator on arrays, does
> > not return what the user would expect when in the operation `a in b` the
> > `a` is not a single element (see "In [3]-[4]" below).
> 
> True, I did not expect that!
> 


> The two approaches that I can see, and which generalize the behaviour
> of simple Python lists in natural ways, are:
> 
> a) the left argument is coerced to a scalar of the appropriate type,
> then we check if that value appears anywhere in the array (basically
> raveling the right argument).
> 

How did I misread that? I guess you mean element and never subarray
matching. Actually I am starting to think that is best. Subarray
matching may be useful, but would probably be better off inside its own
function.
That also might be best with object arrays, since it is difficult to
know if the user means a tuple as an element or a two element subarray,
unless you say "input is array-like", which is possible (or more
sensible) for a function.

That would mean just make the use cases that current give weird results
into errors. And maybe those errors hint to np.in1d and if numpy would
get it, some dedicated subarray matching function.

-- Sebastian

> b) for an array with shape (n1, n2, n3, ...), the left argument is
> treated as an array of shape (n2, n3, ...), and we check if that
> subarray (as a whole) appears anywhere in the array. Or in other
> words, 'A in B' is true iff there is some i such that
> np.array_equals(B[i], A).
> 
> Question 1: are there any other sensible options that aren't on this list?
> 
> Question 2: if not, then which should we choose? (Or we could choose
> both, I suppose, depending on what the left argument looks like.)
> 
> Between these two options, I like (a) and don't like (b). The
> pretending-to-be-a-list-of-lists special case behaviour for
> multidimensional arrays is already weird and confusing, and besides,
> I'd expect equality comparison on arrays to use ==, not array_equals.
> So (b) feels pretty inconsistent with other numpy conventions to me.
> 
> -n
> 
> > I have opened an issue for it:
> > https://github.com/numpy/numpy/issues/3016#issuecomment-14045545
> >
> >
> > Regards,
> >
> > Sebastian
> >
> > In [1]: a = np.array([0, 2])
> >
> > In [2]: b = np.arange(10).reshape(5,2)
> >
> > In [3]: b
> > Out[3]:
> > array([[0, 1],
> >[2, 3],
> >[4, 5],
> >[6, 7],
> >[8, 9]])
> >
> > In [4]: a in b
> > Out[4]: True
> >
> > In [5]: (b == a).any()
> > Out[5]: True
> >
> > In [6]: (b == a).all(0).any() # the 0 could be multiple axes
> > Out[6]: False
> >
> > In [7]: a_2d = a[None,:]
> >
> > In [8]: a_2d in b # broadcast dimension means "any" -> True
> > Out[8]: True
> >
> > In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True)
> > Out[9]: True
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding .abs() method to the array object

2013-02-26 Thread Sebastian Berg
On Tue, 2013-02-26 at 11:16 +0100, Todd wrote:
> On Tue, Feb 26, 2013 at 10:58 AM, Sebastian Berg
>  wrote:
> On Mon, 2013-02-25 at 22:04 -0500, josef.p...@gmail.com wrote:
> > On Mon, Feb 25, 2013 at 9:58 PM,  
> wrote:
> > > On Mon, Feb 25, 2013 at 9:20 PM, Sebastian Berg
> > >  wrote:
> > >> On Mon, 2013-02-25 at 10:50 -0500, Skipper Seabold wrote:
> > >>> On Mon, Feb 25, 2013 at 10:43 AM, Till Stensitzki
> 
> > >>> wrote:
> > >>> >
> > >>> > First, sorry that i didnt search for an old thread,
> but because i
> > >>> disagree with
> > >>> > conclusion i would at least address my reason:
> > >>> >
> 
> 
> > >> Two small things (not sure if it matters much). But first
> almost all of
> > >> these methods are related to the container and not the
> elements. Second
> > >> actually using a method arr.abs() has a tiny pitfall,
> since abs would
> > >> work on numpy types, but not on python types. This means
> that:
> > >>
> > >> np.array([1, 2, 3]).max().abs()
> > >>
> > >> works, but
> > >>
> > >> np.array([1, 2, 3], dtype=object).max().abs()
> > >>
> > >> breaks. Python has a safe name for abs already...
> > >
> > >>>> (np.array([1, 2, 3], dtype=object)).max()
> > > 3
> > >>>> (np.array([1, 2, 3], dtype=object)).__abs__().max()
> > > 3
> > >>>> (np.array([1, 2, '3'], dtype=object)).__abs__()
> > > Traceback (most recent call last):
> > >   File "", line 1, in 
> > > TypeError: bad operand type for abs(): 'str'
> > >
> > >>>> map(abs, [1, 2, 3])
> > > [1, 2, 3]
> > >>>> map(abs, [1, 2, '3'])
> > > Traceback (most recent call last):
> > >   File "", line 1, in 
> > > TypeError: bad operand type for abs(): 'str'
> >
> > or maybe more useful
> >
> > >>> from decimal import Decimal
> > >>> d = [Decimal(str(k)) for k in np.linspace(-1, 1, 5)]
> > >>> map(abs, d)
> > [Decimal('1.0'), Decimal('0.5'), Decimal('0.0'),
> Decimal('0.5'), Decimal('1.0')]
> >
> > >>> np.asarray(d).__abs__()
> > array([1.0, 0.5, 0.0, 0.5, 1.0], dtype=object)
> > >>> np.asarray(d).__abs__()[0]
> > Decimal('1.0')
> >
> > Josef
> >
> > >
> > > I don't see a difference.
> > >
> > > (I don't expect to use max abs on anything else than
> numbers.)
> > >
> 
> 
> The difference is about scalars only. And of course __abs__ is
> fine, but
> if numpy adds an abs method, its scalars would logically have
> it too.
> But then you diverge from python scalars. That has exactly the
> downside
> that you may write code that suddenly stops working for python
> scalars
> without noticing.
> 
> I turned around the abs and max order here, so that the abs
> works on the
> scalar, not useful but just as an example.
> 
> 
> But doesn't this also apply to many existing methods?
> 
I do not think that it does, or at a different quality. Almost all of
those methods either logically work on the container not the array. I.e.
reshape, etc. or are the most common reductions like mean/max/min (which
are also only sensible for containers). Now those also work on numpy
scalars but you should rarely use them on scalars. The only example I
can see is round (there is no special method for that before python 3,
so you could argue that python did not provide a canonical way for
numpy).

Now this is not a huge argument, obviously scalars can have a lot of
specialized methods (like decimals for example), but in the case of abs,
python provides a default way of doing it which always works, and it
makes me tend to say that it is better to use that (even if I sometimes
write .abs() too...).

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array indexing and repeated indices

2013-03-01 Thread Sebastian Berg
On Fri, 2013-03-01 at 08:30 +0100, Nicolas Rougier wrote:
> Hi,
> 
> I'm trying to increment an array using indexing and a second array for 
> increment values (since it might be a little tedious to explain, see below 
> for a short example). 
> 
> Using "direct" indexing, the values in the example are incremented by 1 only 
> while I want to achieve the alternative behavior. My question is whether 
> there is such function in numpy or if there a re better way to achieve the 
> same result ?
> (I would like to avoid the while statement)
> 
> I found and adapted the alternative solution from: 
> http://stackoverflow.com/questions/2004364/increment-numpy-array-with-repeated-indices
>  but it is only for a fixed increment from what I've understood.
> 
> 
> Nicolas
> 
> 
> # 
> 
> import numpy as np
> 
> n,p = 5,100
> nodes = np.zeros( n, [('value', 'f4', 1)] )
> links = np.zeros( p, [('source', 'i4', 1),
>   ('target', 'i4', 1)])
> links['source'] = np.random.randint(0, n, p)
> links['target'] = np.random.randint(0, n, p)
> 
> targets = links['target'] # Indices can be repeated
> K = np.ones(len(targets)) # Note K could be anything
> 
> # Direct indexing
> nodes['value'] = 0
> nodes['value'][targets] += K
> print nodes
> 
> # "Alternative" indexing
> nodes['value'] = 0
> B = np.bincount(targets)

bincount takes a weights argument which should do exactly what you are
looking for.

- Sebastian

> while B.any():
> I = np.argwhere(B>=1)
> nodes['value'][I] += K[I]
> B = np.maximum(B-1,0)
> print nodes
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] step paramter for linspace

2013-03-01 Thread Sebastian Berg

Hi,

there has been a request on the issue tracker for a step parameter to
linspace. This is of course tricky with the imprecision of floating
point numbers.
As a trade off, I was thinking of a step parameter that is used to
calculate the integer number of steps. However to be certain that it
never misbehaves, doing this strict up to the numerical precision of the
(float) numbers. Effectively this means:

In [9]: np.linspace(0, 1.2, step=0.3)
Out[9]: array([ 0. ,  0.3,  0.6,  0.9,  1.2])

In [10]: np.linspace(0, 1.2+5-5, step=0.3)
Out[10]: array([ 0. ,  0.3,  0.6,  0.9,  1.2])

In [11]: np.linspace(0, 1.2+500-500, step=0.3)
ValueError: could not determine exact number of samples for given step

I.e. the last fails, because 1.2 + 500 - 500 == 1.1886,
which is an error that is larger then the imprecision of floating point
numbers.

Is this considered useful, or as it can easily fail for calculated
numbers, and is thus only a convenience, it is not?

Regards,

Sebastian


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] step paramter for linspace

2013-03-01 Thread Sebastian Berg
On Fri, 2013-03-01 at 12:33 +, Henry Gomersall wrote:
> On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
> > there has been a request on the issue tracker for a step parameter to
> > linspace. This is of course tricky with the imprecision of floating
> > point numbers.
> 
> How is that different to arange? Either you specify the number of points
> with linspace, or you specify the step with arange. Is there a third
> option?
> 
> My usual hack to deal with the numerical bounds issue is to add/subtract
> half the step.
> 

There is not much. It does that half step logic for you, and you
actually know that the end point is exact (since linspace makes sure of
that).

In arange, the start and step are exact. In linspace the start and stop
are exact (even with a given step, it would vary on the order of
floating point accuracy).

Maybe the larger point is the hope that by adding this to linspace it is
easier to get new users to use it and avoid pitfalls of arange with
floating points when you are not aware of that half step thing.

> Henry
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] step paramter for linspace

2013-03-01 Thread Sebastian Berg
On Fri, 2013-03-01 at 13:44 +0100, Sebastian Berg wrote:
> On Fri, 2013-03-01 at 12:33 +, Henry Gomersall wrote:
> > On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
> > > there has been a request on the issue tracker for a step parameter to
> > > linspace. This is of course tricky with the imprecision of floating
> > > point numbers.
> > 
> > How is that different to arange? Either you specify the number of points
> > with linspace, or you specify the step with arange. Is there a third
> > option?
> > 
> > My usual hack to deal with the numerical bounds issue is to add/subtract
> > half the step.
> > 
> 
> There is not much. It does that half step logic for you, and you
> actually know that the end point is exact (since linspace makes sure of
> that).
> 
> In arange, the start and step are exact. In linspace the start and stop
> are exact (even with a given step, it would vary on the order of
> floating point accuracy).
> 
> Maybe the larger point is the hope that by adding this to linspace it is
> easier to get new users to use it and avoid pitfalls of arange with
> floating points when you are not aware of that half step thing.
> 

That said, I am honestly not sure this is worth it. I guess I might use
it once in a while, but overall probably hardly at all and it is easy to
do something else...

> > Henry
> > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] step paramter for linspace

2013-03-01 Thread Sebastian Berg
On Fri, 2013-03-01 at 13:34 +, Nathaniel Smith wrote:
> On Fri, Mar 1, 2013 at 12:33 PM, Henry Gomersall  wrote:
> > On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
> >> there has been a request on the issue tracker for a step parameter to
> >> linspace. This is of course tricky with the imprecision of floating
> >> point numbers.
> >
> > How is that different to arange? Either you specify the number of points
> > with linspace, or you specify the step with arange. Is there a third
> > option?
> 
> arange is designed for ints and gives you a half-open interval,
> linspace is designed for floats and gives you a closed interval. This
> means that when arange is used on floats, it does weird things that
> linspace doesn't:
> 
> In [11]: eps = np.finfo(float).eps
> 
> In [12]: np.arange(0, 1, step=0.2)
> Out[12]: array([ 0. ,  0.2,  0.4,  0.6,  0.8])
> 
> In [13]: np.arange(0, 1 + eps, step=0.2)
> Out[13]: array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])
> 
> In [14]: np.linspace(0, 1, 6)
> Out[14]: array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])
> 
> In [15]: np.linspace(0, 1 + eps, 6)
> Out[15]: array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])
> 
> The half-open/closed thing also has effects on what kind of api is
> reasonable. arange(0, 1, step=0.8) makes perfect sense (it acts like
> python range(0, 10, step=8)). linspace(0, 1, step=0.8) is just
> incoherent, though, because linspace guarantees that both the start
> and end points are included.
> 
> > My usual hack to deal with the numerical bounds issue is to add/subtract
> > half the step.
> 
> Right. Which is exactly the sort of annoying, content-free code that a
> library is supposed to handle for you, so you can save mental energy
> for more important things :-).
> 
> The problem is to figure out exactly how strict we should be. Like,
> presumably linspace(0, 1, step=0.8) should fail, rather than round 0.8
> to 0.5 or 1. That would clearly violate "in the face of ambiguity,
> refuse the temptation to guess".
> 
> OTOH, as Sebastian points out, requiring that the step be *exactly* a
> divisor of the value (stop - start), within 1 ULP, is probably
> obnoxious.
> 
> Would anything bad happen if we just required that, say, (stop -
> start)/step had to be within "np.allclose" of an integer, i.e., to
> some reasonable relative and absolute precision, and then rounded the
> number of steps to match that integer exactly?

I was a bit worried about what happens for huge a number of steps. Have
to rethink a bit about it, but I guess one should be able to relax it...
or maybe someone here has a nice idea on how to relax it.
It seems to me that there is a bit of a trade off if you get into the
millions of steps range, because absolute errors that make sense for few
steps are suddenly in the range integers.

> 
> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] step paramter for linspace

2013-03-01 Thread Sebastian Berg
On Fri, 2013-03-01 at 15:07 +, Henry Gomersall wrote:
> On Fri, 2013-03-01 at 10:01 -0500, Alan G Isaac wrote:
> > On 3/1/2013 9:32 AM, Henry Gomersall wrote:
> > > there should be an equivalent for floats that
> > > unambiguously returns a range for the half open interval
> > 
> > 
> > If I've understood you:
> > start + stepsize*np.arange(nsteps)
> 
> yes, except that nsteps is computed for you, otherwise you could just
> use linspace ;)

If you could just use linspace, you should use linspace (and give it a
step argument) in my opinion, but I don't think you meant that ;).

linspace holds start and stop exact and guarantees that you actually get
to stop. Even a modified/new arange will never do that, but I think many
use arange like that and giving linspace a step argument could migrate
that usage (which is simply ill defined for arange) to it. That might
give an error once in a while, but that should be much less often and
much more enlightening then a sudden "one value too much".

I think the accuracy requirements for the step for linspace can be
relaxed enough probably, though I am not quite certain yet as to how
(there is a bit of a trade off/problem when you get to a very large
number of steps).

> 
> hen
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] step paramter for linspace

2013-03-01 Thread Sebastian Berg
On Fri, 2013-03-01 at 10:49 -0500, Alan G Isaac wrote:
> One motivation of this thread was that
> adding a step parameter to linspace might make
> things easier for beginners.
> 
> I claim this thread has put the lie to that,
> starting with the initial post.  So what is the
> persuasive case for the change?
> 
> Imo, the current situation is good:
> use arange if you want to specify the stepsize,
> or use linspace if you want to specify the
> number of points.  Nice and clean.
> 

Maybe you are right, and it is not easier. But there was a "please
include an end_point=True/False option to arange" request, and that does
not sense by arange logic.

The fact that the initial example was overly strict is something that
can be relaxed quite a bit I am sure, though I guess you may always have
an odd case here or there with floats.

I agree the difference is nice and clean right now, but I disagree that
this would change much. Arange guarantees the step size. Linspace the
end point. There is a bit of a shift, but if I thought it was less clean
I would not have asked if it is deemed useful :).

At this time it seems there is more sentiment against it and that is
fine with me. I thought it might be useful for some who normally want
the linspace behavior, but do not want to worry about the right num in
some cases. Someone who actually wants an error if the step they put in
quickly (and which they would have used to calculate num) was wrong.

> Cheers,
> Alan Isaac
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013)

2013-03-06 Thread Sebastian Berg
On Wed, 2013-03-06 at 12:42 -0600, Kurt Smith wrote:
> On Wed, Mar 6, 2013 at 12:12 PM, Kurt Smith  wrote:
> > On Wed, Mar 6, 2013 at 4:29 AM, Francesc Alted  
> > wrote:
> >>
> >> I would not run too much.  The example above takes 9 bytes to host the
> >> structure, while a `aligned=True` will take 16 bytes.  I'd rather let
> >> the default as it is, and in case performance is critical, you can
> >> always copy the unaligned field to a new (homogeneous) array.
> >
> > Yes, I can absolutely see the case you're making here, and I made my
> > "vote" with the understanding that `aligned=False` will almost
> > certainly stay the default.  Adding 'aligned=True' is simple for me to
> > do, so no harm done.
> >
> > My case is based on what's the least surprising behavior: C structs /
> > all C compilers, the builtin `struct` module, and ctypes `Structure`
> > subclasses all use padding to ensure aligned fields by default.  You
> > can turn this off to get packed structures, but the default behavior
> > in these other places is alignment, which is why I was surprised when
> > I first saw that NumPy structured dtypes are packed by default.
> >
> 
> Some surprises with aligned / unaligned arrays:
> 
> #-
> 
> import numpy as np
> 
> packed_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=False)
> aligned_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=True)
> 
> packed_arr = np.ones((10**6,), dtype=packed_dt)
> aligned_arr = np.ones((10**6,), dtype=aligned_dt)
> 
> print "all(packed_arr['a'] == aligned_arr['a'])",
> np.all(packed_arr['a'] == aligned_arr['a']) # True
> print "all(packed_arr['b'] == aligned_arr['b'])",
> np.all(packed_arr['b'] == aligned_arr['b']) # True
> print "all(packed_arr == aligned_arr)", np.all(packed_arr ==
> aligned_arr) # False (!!)
> 
> #-
> 
> I can understand what's likely going on under the covers that makes
> these arrays not compare equal, but I'd expect that if all columns of
> two structured arrays are everywhere equal, then the arrays themselves
> would be everywhere equal.  Bug?
> 

Yes and no... equal for structured types seems not implemented, you get
the same (wrong) False also with (packed_arr == packed_arr). But if the
types are equivalent but np.equal not implemented, just returning False
is a bit dangerous I agree. Not sure what the solution is exactly, I
think the == operator could really raise an error instead of eating them
all though probably...

- Sebastian

> And regarding performance, doing simple timings shows a 30%-ish
> slowdown for unaligned operations:
> 
> In [36]: %timeit packed_arr['b']**2
> 100 loops, best of 3: 2.48 ms per loop
> 
> In [37]: %timeit aligned_arr['b']**2
> 1000 loops, best of 3: 1.9 ms per loop
> 
> Whereas summing shows just a 10%-ish slowdown:
> 
> In [38]: %timeit packed_arr['b'].sum()
> 1000 loops, best of 3: 1.29 ms per loop
> 
> In [39]: %timeit aligned_arr['b'].sum()
> 1000 loops, best of 3: 1.14 ms per loop
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Compile time flag for numpy

2013-03-09 Thread Sebastian Berg
Hey,

how would I go about making a compile time flag for numpy to use as a
macro?

The reason is: https://github.com/numpy/numpy/pull/2735

so that it would be possible to compile numpy differently for debugging
if software depending on numpy is broken by this change.

Regards,

Sebastian

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Compile time flag for numpy

2013-03-09 Thread Sebastian Berg
On Sat, 2013-03-09 at 17:17 +0100, Sebastian Berg wrote:
> Hey,
> 
> how would I go about making a compile time flag for numpy to use as a
> macro?
> 

To be clear I mean an environment variable.

> The reason is: https://github.com/numpy/numpy/pull/2735
> 
> so that it would be possible to compile numpy differently for debugging
> if software depending on numpy is broken by this change.
> 
> Regards,
> 
> Sebastian
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] SVD does not converge

2011-06-28 Thread Sebastian Berg
Hi,

On Tue, 2011-06-28 at 10:56 -0500, santhu kumar wrote:
> Hello,
> 
> I have a 380X5 matrix and when I am calculating pseudo-inverse of the
> matrix using pinv(numpy.linalg) I get the following error message:
> 
> raise LinAlgError, 'SVD did not converge'
> numpy.linalg.linalg.LinAlgError: SVD did not converge
> 
Just a quick guess, but is there any chance that your array includes a
nan value?

Regards,

Sebastian

> I have looked in the list that it is a recurring issue but I was
> unable to find any solution. Can somebody please guide me on how to
> fix that issue?
> 
> Thanks
> Santhosh
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trick for fast

2012-02-03 Thread Sebastian Berg
I guess Einsum is much cleaner, but I already had started with this and
maybe someone likes it, this is fully vectorized and uses a bit of funny
stuff too:

# The dot product(s), written using broadcasting rules:
a = -(x.reshape(-1,1,3) * x[...,None])

# Magic, to avoid the eye thing, takes all diagonal elements as view,
maybe there is a cooler way for it:
diagonals = np.lib.stride_tricks.as_strided(a, (a.shape[0], 3), 
(a.dtype.itemsize*9, a.dtype.itemsize*4))

# Add the x**2 (s is a view on the diagonals), the sum is broadcasted.
diagonals += (sum(x**2, 1))[:,None]

# And multiply by mass using broadcasting:
a *= mass[...,None]

# And sum up all the intermediat results:
inert = a.sum(0)

print inert

Regards,

Sebastian

On Fri, 2012-02-03 at 15:43 +0100, Søren Gammelmark wrote:
> What about this?
> 
> 
> A = einsum("i,ij->", mass, x ** 2)
> B = einsum("i,ij,ik->jk", mass, x, x)
> I = A * eye(3) - B
> 
> 
> /Søren
> 
> On 3 February 2012 15:10,  wrote:
> On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac
>  wrote:
> > On 2/3/2012 5:16 AM, santhu kumar wrote:
> >> x = nX3 vector.
> >> mass = nX1 vector
> >> inert = zeros((3,3))
> >> for i in range(n):
> >>ri = x[i,:].reshape(1,3)
> >>inert = inert + mass[i,]*(sum(ri*ri)*eye(3) -
> dot(ri.T,ri))
> >>
> >
> >
> > This should buy you a bit.
> >
> > xdot = (x*x).sum(axis=1)
> > for (massi,xi,xdoti) in zip(mass.flat,x,xdot):
> >   temp = -np.outer(xi,xi)
> >   temp.flat[slice(0,None,4)] += xdoti
> >   inert += massi*temp
> >
> > Alan Isaac
> 
> 
> maybe something like this, (self contained example and name
> spaces to
> make running it easier)
> 
> import numpy as np
> n = 15
> x = np.arange(n*3.).reshape(-1,3) #nX3 vector.
> mass = np.linspace(1,2,n)[:,None] #nX1 vector
> inert = np.zeros((3,3))
> for i in range(n):
>  ri = x[i,:].reshape(1,3)
> 
>  inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) -
> np.dot(ri.T,ri))
> print inert
> 
> print  np.diag((mass * x**2).sum(0))  - np.dot(x.T, mass*x)
> 
> [[ 0.  -16755.  -17287.5]
>  [-16755.   0.  -17865. ]
>  [-17287.5 -17865.   0. ]]
> [[ 0.  -16755.  -17287.5]
>  [-16755.   0.  -17865. ]
>  [-17287.5 -17865.   0. ]]
> 
> Josef
> 
> 
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to efficiently select multiple slices from an array?

2013-03-20 Thread Sebastian Berg
Hey,

On Wed, 2013-03-20 at 16:31 +0100, Andreas Hilboll wrote:
> Cross-posting a question I asked on SO
> (http://stackoverflow.com/q/15527666/152439):
> 
> 
> Given an array
> 
> d = np.random.randn(100)
> 
> and an index array
> 
> i = np.random.random_integers(low=3, high=d.size - 5, size=20)
> 
> how can I efficiently create a 2d array r with
> 
> r.shape = (20, 8)
> 
> such that for all j=0..19,
> 
> r[j] = d[i[j]-3:i[j]+5]
> 
> In my case, the arrays are quite large (~20 instead of 100 and 20),
> so something quick would be useful.


You can use stride tricks, its simple to do by hand, but since I got it,
maybe just use this: https://gist.github.com/seberg/3866040

d = np.random.randn(100)
windowed_d = rolling_window(d, 8)
i = np.random_integers(len(windowed_d))
r = d[i,:]

Or use stride_tricks by hand, with:
windowed_d = np.lib.stride_tricks.as_strided(d, (d.shape[0]-7, 8),
(d.strides[0],)*2)

Since the fancy indexing will create a copy, while windowed_d views the
same data as the original array, of course that is not the case for the
end result.

Regards,

Sebastian

> 
> Cheers, Andreas.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy/SciPy participation in GSoC 2013

2013-03-24 Thread Sebastian Berg
On Thu, 2013-03-21 at 22:20 +0100, Ralf Gommers wrote:
> Hi all,
> 
> 
> It is the time of the year for Google Summer of Code applications. If
> we want to participate with Numpy and/or Scipy, we need two things:
> enough mentors and ideas for projects. If we get those, we'll apply
> under the PSF umbrella. They've outlined the timeline they're working
> by and guidelines at
> http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html. 
> 
> We should be able to come up with some interesting project ideas I'd
> think, let's put those at
> http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably
> with enough detail to be understandable for people new to the projects
> and a proposed mentor.

Just some more ideas for numpy. I did not think about it much if they
fit the GSoC format well, but maybe a possible mentor likes one:

1. Speed improvement for scalars/small arrays. This would start with
ideas along the lines currently done by two current pull requests for
numpy, that try to improve array + python scalar speed by circumventing
costly scalar -> array conversions, etc. And continue to improving the
speed for finding the correct ufunc (which I believe Nathaniel timed to
be a pretty big factor). But it would touch a lot of numpy internals
probably, so the learning curve may be pretty steep.

2. Implement stable summation. Basically it would be about creating
generalized ufuncs (if that is possible) implementing different kinds of
stable summation algorithms for the inexact types and then adding it as
an option to np.sum.

3. This has been suggested before in some way or another, but improving
of the subclassing of arrays. Though I am unsure if user code might
dislike changes, even if it is improvement...
It would start of with checking which python side functions should
explicitly call __array_wrap__ (possible writing more helpers to do it)
and calling it more consistently plus adding the context information
where it is currently not added (only simple ufunc calls add it, not
even reductions I think). I am sure you can dig a lot deeper into it
all, but it would require some serious thinking and is not straight
forward.

4. Partial sorting. This would be about implementing partial sorting and
O(N) median calculation into numpy. Plus maybe new functions that can
make use of it (though I don't know what exactly that would be, but
these functions could also have a home in scipy not numpy).

> 
> We need at least 3 people willing to mentor a student. Ideally we'd
> have enough mentors this week, so we can apply to the PSF on time. If
> you're willing to be a mentor, please send me the following: name,
> email address, phone nr, and what you're interested in mentoring. If
> you have time constaints and have doubts about being able to be a
> primary mentor, being a backup mentor would also be helpful.
> 
> 
> Cheers,
> Ralf
> 
> 
> P.S. as you can probably tell from the above, I'm happy to coordinate
> the GSoC applications for Numpy and Scipy
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-03-30 Thread Sebastian Berg
On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote:
> Hi,
> 
> We were teaching today, and found ourselves getting very confused
> about ravel and shape in numpy.
> 
> Summary
> --
> 
> There are two separate ideas needed to understand ordering in ravel and 
> reshape:
> 
> Idea 1): ravel / reshape can proceed from the last axis to the first,
> or the first to the last.  This is "ravel index ordering"
> Idea 2) The physical layout of the array (on disk or in memory) can be
> "C" or "F" contiguous or neither.
> This is "memory ordering"
> 
> The index ordering is usually (but see below) orthogonal to the memory 
> ordering.
> 
> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
> index ordering, and this mixes the two ideas and is confusing.
> 
> What the current situation looks like
> 
> 
> Specifically, we've been rolling this around 4 experienced numpy users
> and we all predicted at least one of the results below wrongly.
> 
> This was what we knew, or should have known:
> 
> In [2]: import numpy as np
> 
> In [3]: arr = np.arange(10).reshape((2, 5))
> 
> In [5]: arr.ravel()
> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> So, the 'ravel' operation unravels over the last axis (1) first,
> followed by axis 0.
> 
> So far so good (even if the opposite to MATLAB, Octave).
> 
> Then we found the 'order' flag to ravel:
> 
> In [10]: arr.flags
> Out[10]:
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>   OWNDATA : False
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
> 
> In [11]: arr.ravel('C')
> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> But we soon got confused.  How about this?
> 
> In [12]: arr_F = np.array(arr, order='F')
> 
> In [13]: arr_F.flags
> Out[13]:
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : True
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
> 
> In [16]: arr_F
> Out[16]:
> array([[0, 1, 2, 3, 4],
>[5, 6, 7, 8, 9]])
> 
> In [17]: arr_F.ravel('C')
> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
> ordering, but is to do with *index* ordering.
> 
> And in fact, we can ask for memory ordering specifically:
> 
> In [22]: arr.ravel('K')
> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> In [23]: arr_F.ravel('K')
> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
> 
> In [24]: arr.ravel('A')
> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> In [25]: arr_F.ravel('A')
> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
> 
> There are some confusions to get into with the 'order' flag to reshape
> as well, of the same type.
> 
> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
> 
> This is very confusing.  We think the index ordering and memory
> ordering ideas need to be separated, and specifically, we should avoid
> using "C" and "F" to refer to index ordering.
> 
> Proposal
> -
> 
> * Deprecate the use of "C" and "F" meaning backwards and forwards
> index ordering for ravel, reshape
> * Prefer "Z" and "N", being graphical representations of unraveling in
> 2 dimensions, axis1 first and axis0 first respectively (excellent
> naming idea by Paul Ivanov)
> 
> What do y'all think?
> 

Personally I think it is clear enough and that "Z" and "N" would confuse
me just as much (though I am used to the other names). Also "Z" and "N"
would seem more like aliases, which would also make sense in the memory
order context.
If anything, I would prefer renaming the arguments iteration_order and
memory_order, but it seems overdoing it...
Maybe the documentation could just be checked if it is always clear
though. I.e. maybe it does not use "iteration" or "memory" order
consistently (though I somewhat feel it is usually clear that it must be
iteration order, since no numpy function cares about the input memory
order as they will just do a copy if necessary).

Regards,

Sebastian 

> Cheers,
> 
> Matthew
> Paul Ivanov
> JB Poline
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-03-30 Thread Sebastian Berg
On Sat, 2013-03-30 at 12:45 -0700, Matthew Brett wrote:
> Hi,
> 
> On Sat, Mar 30, 2013 at 11:55 AM, Sebastian Berg
>  wrote:
> > On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote:
> >> Hi,
> >>
> >> We were teaching today, and found ourselves getting very confused
> >> about ravel and shape in numpy.
> >>



> >>
> >> What do y'all think?
> >>
> >
> > Personally I think it is clear enough and that "Z" and "N" would confuse
> > me just as much (though I am used to the other names). Also "Z" and "N"
> > would seem more like aliases, which would also make sense in the memory
> > order context.
> > If anything, I would prefer renaming the arguments iteration_order and
> > memory_order, but it seems overdoing it...
> 
> I am not sure what you mean - at the moment  there is one argument
> called 'order' that can refer to iteration order or memory order.  Are
> you proposing two arguments?
> 

Yes that is what I meant. The reason that it is not convincing to me is
that if I write `np.reshape(arr, ..., order='Z')`, I may be tempted to
also write `np.copy(arr, order='Z')`. I don't see anything against
allowing 'Z' as a more memorable 'C' (I also used to forget which was
which), but I don't really see enforcing a different _value_ on the same
named argument making it clearer.
Renaming the argument itself would seem more sensible to me right now,
but I cannot think of a decent name, so I would prefer trying to clarify
the documentation if necessary.

> > Maybe the documentation could just be checked if it is always clear
> > though. I.e. maybe it does not use "iteration" or "memory" order
> > consistently (though I somewhat feel it is usually clear that it must be
> > iteration order, since no numpy function cares about the input memory
> > order as they will just do a copy if necessary).
> 
> Do you really mean this?  Numpy is full of 'order=' flags that refer to 
> memory.
> 

I somewhat imagined there were more iteration order flags and I
basically count empty/ones/.../copy as basically one "array creation"
monster...

> Cheers,
> 
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-04-01 Thread Sebastian Berg
On Sun, 2013-03-31 at 14:04 -0700, Matthew Brett wrote:
> Hi,
> 
> On Sun, Mar 31, 2013 at 1:43 PM,   wrote:
> > On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett  
> > wrote:
> >> Hi,
> >>
> >> On Sat, Mar 30, 2013 at 10:38 PM,   wrote:
> >>> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett  
> >>> wrote:
>  Hi,
> 
>  On Sat, Mar 30, 2013 at 9:37 PM,   wrote:
> > On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett 
> >  wrote:
> >> Hi,
> >>
> >> On Sat, Mar 30, 2013 at 7:02 PM,   wrote:
> >>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett 
> >>>  wrote:
>  Hi,
> 
>  On Sat, Mar 30, 2013 at 7:50 PM,   wrote:
> > On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
> >  wrote:
> >> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett 
> >> 
> >> wrote:
> >>>
> >>> On Sat, Mar 30, 2013 at 2:20 PM,   wrote:
> >>> > On Sat, Mar 30, 2013 at 4:57 PM,   wrote:
> >>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
> >>> >>  wrote:
> >>> >>> On Sat, Mar 30, 2013 at 4:14 AM,   
> >>> >>> wrote:
> >>>  On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
> >>>   wrote:
> >>> >
> >>> > Ravel and reshape use the tems 'C' and 'F" in the sense of 
> >>> > index
> >>> > ordering.
> >>> >
> >>> > This is very confusing.  We think the index ordering and 
> >>> > memory
> >>> > ordering ideas need to be separated, and specifically, we 
> >>> > should
> >>> > avoid
> >>> > using "C" and "F" to refer to index ordering.
> >>> >
> >>> > Proposal
> >>> > -
> >>> >
> >>> > * Deprecate the use of "C" and "F" meaning backwards and 
> >>> > forwards
> >>> > index ordering for ravel, reshape
> >>> > * Prefer "Z" and "N", being graphical representations of 
> >>> > unraveling
> >>> > in
> >>> > 2 dimensions, axis1 first and axis0 first respectively 
> >>> > (excellent
> >>> > naming idea by Paul Ivanov)
> >>> >
> >>> > What do y'all think?
> >>> 
> >>>  I always thought "F" and "C" are easy to understand, I 
> >>>  always thought
> >>>  about
> >>>  the content and never about the memory when using it.
> >>> >>
> >>> >> changing the names doesn't make it easier to understand.
> >>> >> I think the confusion is because the new A and K refer to 
> >>> >> existing
> >>> >> memory
> >>> >>
> >>>
> >>> I disagree, I think it's confusing, but I have evidence, and that 
> >>> is
> >>> that four out of four of us tested ourselves and got it wrong.
> >>>
> >>> Perhaps we are particularly dumb or poorly informed, but I think 
> >>> it's
> >>> rash to assert there is no problem here.
> >
> > I think you are overcomplicating things or phrased it as a "trick 
> > question"
> 
>  I don't know what you mean by trick question - was there something
>  over-complicated in the example?  I deliberately didn't include
>  various much more confusing examples in "reshape".
> >>>
> >>> I meant making the "candidates" think about memory instead of just
> >>> column versus row stacking.
> >>
> >> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
> >> array, it was an image, with time as the 4th dimension (N time
> >> points).   Raveling and reshaping 3D and 4D arrays is a common thing
> >> to do in neuroimaging, as you can imagine.
> >>
> >> A student asked what he would get back from raveling this array, a
> >> concatenated time series, or something spatial?
> >>
> >> We showed (I'd worked it out by this time) that the first N values
> >> were the time series given by [0, 0, 0, :].
> >>
> >> He said - "Oh - I see - so the data is stored as a whole lot of time
> >> series one by one, I thought it would be stored as a series of
> >> images'.
> >>
> >> Ironically, this was a Fortran-ordered array in memory, and he was 
> >> wrong.
> >>
> >> So, I think the idea of memory ordering and index ordering is very
> >> easy to confuse, and comes up naturally.
> >>
> >> I would like, as a teacher, to be able to say something like:
> >>
> >> This is what C memory layout is (it's the memory layout  that gives
> >> arr.flags.C_CONTIGUOUS=True)
> >> This is what F memory layout is (it's the memory layout  that gives
> >> arr.flags.F_CONTIGUOUS=True)
> >> It's rather easy to get something that is neither C or F memory layout
> 

Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-04-03 Thread Sebastian Berg
On Tue, 2013-04-02 at 22:52 +0100, Nathaniel Smith wrote:
> On Tue, Apr 2, 2013 at 10:21 PM, Matthew Brett  
> wrote:
> >> This is like observing that if I say "go North" then it's ambiguous
> >> about whether I want you to drive or walk, and concluding that we need
> >> new words for the directions depending on what sort of vehicle you
> >> use. So "go North" means drive North, "go htuoS" means walk North,
> >> etc. Totally silly. Makes much more sense to have one set of words for
> >> directions, and then make clear from context what the directions are
> >> used for -- "drive North", "walk North". Or "iterate C-wards", "store
> >> F-wards".
> >>
> >> "C" and "Z" mean exactly the same thing -- they describe a way of
> >> unraveling a cube into a straight line. The difference is what we do
> >> with the resulting straight line. That's why I'm suggesting that the
> >> distinction should be made in the name of the argument.
> >
> > Could you unpack that for the 'ravel' docstring?  Because these
> > options all refer to the way of unraveling and not the memory layout
> > that results.
> 
> Z/C/column-major/whatever-you-want-to-call-it is a general strategy
> for converting between a 1-dim representation and a n-dim
> representation. In the case of memory storage, the 1-dim
> representation is the flat space of pointer arithmetic. In the case of
> ravel, the 1-dim representation is the flat space of a 1-dim indexed
> array. But the 1-dim-to-n-dim part is the same in both cases.
> 
> I think that's why you're seeing people baffled by your proposal -- to
> them the "C" refers to this general strategy, and what's different is
> the context where it gets applied. So giving the same strategy two
> different names is silly; if anything it's the contexts that should
> have different names.
> 

Yup, thats how I think about it too... So I am against different values
for the order argument. I am somewhat fine with a new name, but it seems
like that may get clumsy.
But I would really love if someone would try to make the documentation
simpler! There is also never a mention of "contiguity", even though when
we refer to "memory order", then having a C/F contiguous array is often
the reason why (in np.asarray "contiguous='C'" would make as much sense
as "order", maybe even more so). Also 'A' seems often explained not
quite correctly (though that does not matter (except for reshape, where
its explanation is fuzzy), it will matter more in the future -- even if
I don't expect 'A' to be actually used).

If there is not yet, there should maybe be an overview in the
user/reference guide of what order means and how application to new
memory is different to reshape, etc. use it... Then the functions using
order, can also reference that, plus maybe we have some place to look up
what C and F is for all of us who like to forget it...

- Sebastian

> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-04-03 Thread Sebastian Berg
On Wed, 2013-04-03 at 08:52 -0700, Chris Barker - NOAA Federal wrote:
> On Wed, Apr 3, 2013 at 6:24 AM, Sebastian Berg
>  wrote:
> >> the context where it gets applied. So giving the same strategy two
> >> different names is silly; if anything it's the contexts that should
> >> have different names.
> >>
> >
> > Yup, thats how I think about it too...
> 
> me too...
> 
> > But I would really love if someone would try to make the documentation
> > simpler!
> 
> yes, I think this is where the solution lies.
> 
> > There is also never a mention of "contiguity", even though when
> > we refer to "memory order", then having a C/F contiguous array is often
> > the reason why
> 
> good point -- in fact, I have no idea what would happen in many of
> these cases for a discontiguous array (or one with arbitrarily weird
> strides...)
> 
> >  Also 'A' seems often explained not
> > quite correctly (though that does not matter (except for reshape, where
> > its explanation is fuzzy), it will matter more in the future -- even if
> > I don't expect 'A' to be actually used).
> 
> I wonder about having a 'A' option in reshape at all -- what the heck
> does it mean? why do we need it? Again, I come back to the fact that
> memory order is kind-of orthogonal to index order. So for reshape (or
> ravel, which is really just a special case of reshape...) the 'A' flag
> and 'K' flag (huh?) is pretty dangerous, and prone to error. I think
> of it this way:
> 

Actually 'K' + reshape is not even implemented sensibly and in current
master I changed it to an error. I would not even know how to define it,
and even if you find a definition I cannot imagine it being useful...
Deprecating 'A' for reshape would seem OK to me since I doubt anyone
actually uses it. It is currently equivalent to `'F' if input.flags.fnc
else 'C'` (fnc means "fortran not c"), and as such is shaky business.

I just realized that 'A' is a bit funny. Basically it means anything
(Anyorder), including discontinuous memory chunks for np.array with
copy=False. But if you do a copy (or reshape), lacking a more free way
to do it, it means `'F' if input.flags.fnc else 'C'` again.

Not sure about the history, but it seems to me 'K' basically supersedes
'A' for most stuff and its usage as Fortran or C, is more an accident
because it is the simplest way to implement "I don't care".

The use of 'K' is very sensible for copies of course. 'K' actually does
make some sense for ravel, since if you don't care, it has the best
chance of no copy. 'A' for ravel could/should in my opinion be
deprecated just like for reshape, since it is pretty unpredictable.


> Much of the beauty of numpy is that it presents a consistent interface
> to various forms of strided data -- that way, folks can write code
> that works the same way for any ndarray, while still being able to
> have internal storage be efficient for the use at hand -- i.e. C order
> for the common case, Fortran order for interaction with libraries that
> expect that order (or for algorithms that are more efficient in that
> order, though that's mostly external libs..), and non-contiguous data
> so one can work on sub-parts of arrays without copying data around.
> 
> In most places, the numpy API hides the internal memory order -- this
> is a good thing, most people have no need to think about it (or most
> code, anyway), and you can write code that works (even if not
> optimally) for any (strided) memory layout. All is good.
> 
> There are times when you really need to understand, or control or
> manipulate the memory layout, to make sure your routines are
> optimized, or the data is in the right form to pass of to an external
> lib, or to make sense of raw data read from a file, or... That's what
> we have .view() and friends for.
> 

Yeah, I somewhat dislike the fact that "view" only works right for
(roughly) C-contiguous arrays, thats another one of those old traps that
is difficult to impossible to get rid of. Maybe some or all of view
usages should be superseded by a new command...

Regards,

Sebastian

> However, the 'A' and 'K' flags mix and match these concepts -- and I
> think that's dangerous. it would be easy for the a to use the 'A'
> flag, and have everything work fine and dandy with all their test
> cases, only to have it blow up when  someone passes in a
> different-than-expected array. So really, they should only be used in
> cases where the code has checked memory order before hand, or in a

Re: [Numpy-discussion] try to solve issue #2649 and revisit #473

2013-04-03 Thread Sebastian Berg
On Wed, 2013-04-03 at 16:03 -0400, Alan G Isaac wrote:
> On 4/3/2013 3:18 PM, huangkan...@gmail.com wrote:
> > A 5*5 matrix multiplies another matrix, we expect answer to be error or a 
> > 5*? matrix, not a 1*5 matrix.
> 
> 
> That is what happens.
> But you are post"multiplying" a matrix by a one-dimensional list.
> What should happen then?  That is the question.
> 
> One could argue that this should just raise an error,
> or that the result should be 1d.
> In my view, the result should be a 1d array,
> the same as I.A.dot(x).
> 

Would it be reasonable if this was a Nx1 matrix? I am not sure how you
would implement it exactly into dot. Maybe by transposing the result if
the second argument was a vector and the result is not a base class? And
then __mul__ can use np.asarray instead of np.asmatrix. Or just fix
__mul__ itself to transpose the result and don't care about np.dot?

- Sebastian

> But the maintainers wanted operations with matrices to
> return matrices whenever possible.  So instead of
> returning x it returns np.matrix(x).
> 
> My related grievance is that I[0] is a matrix,
> not an array.  There was a long discussion of
> this a couple years ago.
> 
> Anyway, the bottom line is: don't mix matrices and
> other objects.  The matrix object is really built
> only to interact with other matrix objects.
> 
> Alan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-04-04 Thread Sebastian Berg
On Thu, 2013-04-04 at 12:40 -0700, Matthew Brett wrote:
> Hi,
> 

> 
> So - to restate in other words - this :
> 
> np.reshape(a, (3, 4), order='F')
> 
> could reasonably mean one of two orthogonal things
> 
> 1) Retrieve data from the array using first-to-last indexing, return
> any memory layout you like
> 2) Retrieve data from the array using the default last-to-first index
> ordering, and return memory in F-contiguous layout
> 

Yes, it could mean both. I am simply not sure if it helps enough to
warrant the trouble. So if it still interests someone, I feel the docs
are more important, but I am neutral to changing this.
I don't quite see a big gain, so I am just worried that it bugs a lot of
people either because of changing or because of having to remember the
different name (you can argue that is good, but if it bugs most maybe it
does not help either).

As to being confused. Did anyone ever see a np.reshape(arr, ...,
order='F') and then continuing assuming the result is F-contiguous (when
the original arr is not known to be contiguous)? If that actually create
a real bug somewhere, that might actually convince me that it is worth
it to walk through trouble and complaints. I guess I just don't believe
it really happens in the real world.

- Sebastian

> Cheers,
> 
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-04-05 Thread Sebastian Berg
Hey

On Thu, 2013-04-04 at 14:20 -0700, Matthew Brett wrote:
> Hi,
> 
> On Tue, Apr 2, 2013 at 4:32 AM, Nathaniel Smith  wrote:
> 
> > Maybe we should go through and rename "order" to something more descriptive
> > in each case, so we'd have
> >   a.reshape(..., index_order="C")
> >   a.copy(memory_order="F")
> > etc.?
> 
> I'd like to propose this instead:
> 
> a.reshape(..., order="C")
> a.copy(layout="F")
> 

I actually like this, makes the point clearer that it has to do with
memory layout and implies contiguity, plus it is short and from the
numpy perspective copy, etc. are the ones that add additional info to
"order" and not reshape (because IMO memory order is something new users
should not worry about at first). A and K orders will still have their
quirks with np.array and copy=True/False, but for many functions they
are esoteric anyway.

It will be one hell of a deprecation though, but I am +0.5 for adding an
alias for now (maybe someone knows an even better name?), but I think
that in this case, it probably really is better to wait with actual
deprecation warnings for a few versions, since it touches a *lot* of
code. Plus I think at the point of starting deprecation warnings (and
best earlier) numpy should provide an automatic fixer script...

The only counter point that remains for me is the difficulty of
deprecation, since I think the new name idea is very clean. And this is
unfortunately even more invasive then the index_order proposal.

Fun point at the end: ndarray.tostring takes an order argument, which is
correct as "order" but has a lot in common with "layout" :). (that is
not an issue IMO, but for me it is a reason to prefer the layout
proposal over the index_order one).

Regards,

Sebastian

> This fits well with the terms we've been using during the discussion.
> It reduces the changes to only one of the two meanings.
> 
> Thinking about it, I feel that this would have been considerably
> clearer to me as I learned numpy.
> 
> Cheers,
> 
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] einsum and broadcasting

2013-04-05 Thread Sebastian Berg
On Thu, 2013-04-04 at 16:56 +0300, Jaakko Luttinen wrote:
> I don't quite understand how einsum handles broadcasting. I get the
> following error, but I don't understand why:
> 
> In [8]: import numpy as np
> In [9]: A = np.arange(12).reshape((4,3))
> In [10]: B = np.arange(6).reshape((3,2))
> In [11]: np.einsum('ik,k...->i...', A, B)
> ---
> ValueError: operand 0 did not have enough dimensions to match the
> broadcasting, and couldn't be extended because einstein sum subscripts
> were specified at both the start and end
> 
> However, if I use explicit indexing, it works:
> 
> In [12]: np.einsum('ik,kj->ij', A, B)
> Out[12]:
> array([[10, 13],
>[28, 40],
>[46, 67],
>[64, 94]])
> 
> It seems that it also works if I add '...' to the first operand:
> 
> In [12]: np.einsum('ik...,k...->i...', A, B)
> Out[12]:
> array([[10, 13],
>[28, 40],
>[46, 67],
>[64, 94]])
> 
> However, as far as I understand, the syntax
> np.einsum('ik,k...->i...', A, B)
> should work. Have I misunderstood something or is there a bug?
> 

My guess is, it is by design because the purpose of the ellipsis is more
to allow extra dimensions that are not important to the problem itself.
A vector product is np.einsum('i,i->i') and if I write
np.einsum('...i,...i->...i') I allow generalizing that arrays of 1-d
arrays (like the new gufunc linalg stuff).
I did not check the code though, so maybe thats not the case. But in any
case I don't see a reason why it should not be possible to only allow
extra dims for some inputs (I guess it can also make sense to not
give ... for the output).
So I would say, if you want to generalize it, go ahead ;).

- Sebastian

> Thanks for your help!
> Jaakko
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenOpt Suite release 0.45

2013-04-10 Thread Sebastian Berg
On Wed, 2013-04-10 at 11:54 +0300, Dmitrey wrote:
> On 04/10/2013 10:31 AM, Robert Kern wrote: 
> > You think comparing tracked bug counts across different projects
> > means anything? That's adorable. I admire your diligence at
> > addressing the bugs that you do acknowledge. That was never in
> > question. But refusing to acknowledge a bug is not the same thing as
> > fixing a bug. You cannot use objects that do not have a valid
> > __eq__() (as in, returns boolean True if and only if they are to be
> > considered equivalent for the purpose of dictionary lookup,
> > otherwise returns False) as dictionary keys. Your oofun object still
> > violates this principle. As dictionary keys, you want them to use
> > their `id` attributes to distinguish them, but their __eq__() method
> > still just returns another oofun with the default
> > object.__nonzero__() implementation. This means that bool(some_oofun
> > == other_oofun) is always True regardless of the `id` attributes.
> > You have been unfortunate enough to not run into cases where this
> > causes a problem yet, but the bug is still there, lurking, waiting
> > for a chance hash collision to silently give you wrong results. That
> > is the worst kind of bug. -- Robert Kern 
> I had encountered the bugs with bool(some_oofun == other_oofun) when
> it was raised from other, than dict, cases, e.g. from "in list" (e.f.
> "if my_oofun in freeVarsList") etc, and had fixed them all. But that
> one doesn't occur from "in dict", I traced it with both debugger and
> putting print("in __eq__"),print("in __le__"), print("in __lt__"),
> print('in __gt__'),  print('in __ge__') statements. 

This is all good and nice, but Robert is still right. For dictionaries
to work predictable you need to ensure two things.

First:

if object1 == object2:
assert bool(hash(object1) == hash(object2))

and second, which is your case for the dictionary lookup to be
predictable this must always work:

keys, values = zip(*dictionary.keys())
assert(keys.count(object2) == 1)

index = keys.index(object2)
value = values[index]

And apparently this is not the case and it simply invites bugs which are
impossible to track down because you will not see them in small tests.
Instead you will have code that runs great for toy problems and can
suddenly break down in impossible to understand ways when you have large
problems.

So hopefully the list fixes you mention provide that the second code
block will work as you would expect dictionary[object2] to work, but if
this is not the case...

- Sebastian

> As I had mentioned, removing mapping oofuns as dict keys is mere
> impossible - this is fundamental thing whole FuncDesigner is build on,
> as well as its user API. 
> D. 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenOpt Suite release 0.45

2013-04-10 Thread Sebastian Berg
On Wed, 2013-04-10 at 11:45 +0200, Sebastian Berg wrote:
> On Wed, 2013-04-10 at 11:54 +0300, Dmitrey wrote:
> > On 04/10/2013 10:31 AM, Robert Kern wrote: 

> 
> This is all good and nice, but Robert is still right. For dictionaries
> to work predictable you need to ensure two things.
> 
> First:
> 
> if object1 == object2:
> assert bool(hash(object1) == hash(object2))
> 
> and second, which is your case for the dictionary lookup to be
> predictable this must always work:
> 
> keys, values = zip(*dictionary.keys())

sorry dictionary.items() of course.

> assert(keys.count(object2) == 1)
> 
> index = keys.index(object2)
> value = values[index]
> 
> And apparently this is not the case and it simply invites bugs which are
> impossible to track down because you will not see them in small tests.
> Instead you will have code that runs great for toy problems and can
> suddenly break down in impossible to understand ways when you have large
> problems.
> 
> So hopefully the list fixes you mention provide that the second code
> block will work as you would expect dictionary[object2] to work, but if
> this is not the case...
> 
> - Sebastian
> 
> > As I had mentioned, removing mapping oofuns as dict keys is mere
> > impossible - this is fundamental thing whole FuncDesigner is build on,
> > as well as its user API. 
> > D. 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenOpt Suite release 0.45

2013-04-10 Thread Sebastian Berg
On Wed, 2013-04-10 at 11:45 +0200, Sebastian Berg wrote:
> On Wed, 2013-04-10 at 11:54 +0300, Dmitrey wrote:
> > On 04/10/2013 10:31 AM, Robert Kern wrote: 
> > > You think comparing tracked bug counts across different projects
> > > means anything? That's adorable. I admire your diligence at
> > > addressing the bugs that you do acknowledge. That was never in
> > > question. But refusing to acknowledge a bug is not the same thing as
> > > fixing a bug. You cannot use objects that do not have a valid
> > > __eq__() (as in, returns boolean True if and only if they are to be
> > > considered equivalent for the purpose of dictionary lookup,
> > > otherwise returns False) as dictionary keys. Your oofun object still
> > > violates this principle. As dictionary keys, you want them to use
> > > their `id` attributes to distinguish them, but their __eq__() method
> > > still just returns another oofun with the default
> > > object.__nonzero__() implementation. This means that bool(some_oofun
> > > == other_oofun) is always True regardless of the `id` attributes.
> > > You have been unfortunate enough to not run into cases where this
> > > causes a problem yet, but the bug is still there, lurking, waiting
> > > for a chance hash collision to silently give you wrong results. That
> > > is the worst kind of bug. -- Robert Kern 
> > I had encountered the bugs with bool(some_oofun == other_oofun) when
> > it was raised from other, than dict, cases, e.g. from "in list" (e.f.
> > "if my_oofun in freeVarsList") etc, and had fixed them all. But that
> > one doesn't occur from "in dict", I traced it with both debugger and
> > putting print("in __eq__"),print("in __le__"), print("in __lt__"),
> > print('in __gt__'),  print('in __ge__') statements. 
> 
> This is all good and nice, but Robert is still right. For dictionaries
> to work predictable you need to ensure two things.
> 
> First:
> 
> if object1 == object2:
> assert bool(hash(object1) == hash(object2))
> 
> and second, which is your case for the dictionary lookup to be
> predictable this must always work:
> 
> keys, values = zip(*dictionary.keys())
> assert(keys.count(object2) == 1)
> 
> index = keys.index(object2)
> value = values[index]

Ah well, so maybe it is not quite as strict, if you know that hashes
can't match (and if they do for another type, equal evaluates to False).
But it would be a bit weird anyway if these give different results.

> 
> And apparently this is not the case and it simply invites bugs which are
> impossible to track down because you will not see them in small tests.
> Instead you will have code that runs great for toy problems and can
> suddenly break down in impossible to understand ways when you have large
> problems.
> 
> So hopefully the list fixes you mention provide that the second code
> block will work as you would expect dictionary[object2] to work, but if
> this is not the case...
> 
> - Sebastian
> 
> > As I had mentioned, removing mapping oofuns as dict keys is mere
> > impossible - this is fundamental thing whole FuncDesigner is build on,
> > as well as its user API. 
> > D. 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases

2013-04-11 Thread Sebastian Berg
On Wed, 2013-03-06 at 11:43 -0700, Charles R Harris wrote:
> Hi All,
> 


> The development branch has been accumulating stuff since last summer,
> I suggest we look to get it out in May, branching at the end of this
> month.

Hey,

maybe it is a bit early, but I was wondering. What are the things that
should be done before branching? Most of the blockers should be
resolved? I think the largest of that are maybe the deprecations (but
since 1.7.0 is not too long ago many of them maybe don't need touching).
Aside from that, I think these are lurking/in-progress:

Fix the piled up mapping.c things:
* This means resolving https://github.com/numpy/numpy/pull/436 .
  It will be even worse now, but maybe at least most of it can
  be put in without too much work?
* After that, non-integer deprecations should be done fully before
  1.8. I think, but these need to change mapping.c too (the
  most work is probably the tests...):
- https://github.com/numpy/numpy/pull/2891
- https://github.com/numpy/numpy/pull/2825
* (https://github.com/numpy/numpy/pull/2701)
 
Are the 2to3 fixers supposed to be finished for 1.8? If yes there is
probably no point in thinking about branching, etc. since back porting
them all is just pain.

Regards,

Sebastian

> Thoughts?
> 
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random number generation and testing across different OS's.

2013-04-12 Thread Sebastian Berg
On Fri, 2013-04-12 at 10:50 -0400, Andrew Nelson wrote:
> I have written a differential evolution optimiser that i use for
> curvefitting.  As a genetic optimisation technique it is stochastic and
> relies heavily on random number generators to do the minimisation.  As 
> part
> of the module tests I would like to write a cross-platform test that 
> checks
> if the fitting is being done correctly.
> I use an instance of numpy.random.RandomState for the generation.  If I 
> use
> the seed method on a single platform I get the same output, which I could
> use to write a test.  However, I am unsure of how the seeding and
> RandomState works across platforms.
> If I use the same seed on OSX/Windows/Linux, will I get the same stream of
> random numbers being generated? I need to know if the test I write works
> across platforms.

Hi,

yes, you can be certain of that. NumPy does exactly this for its test as
well.

Regards,

sebastian

> regards,
> Andrew.
> 
> --
> _
> Dr. Andrew Nelson
> 
> 
> _
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Non-Integer deprecations

2013-04-12 Thread Sebastian Berg
Hey all,

just revisiting non-integer (index) deprecations (basically
https://github.com/numpy/numpy/pull/2891). I believe for all natural
integer arguments, it is correct to do a deprecation if the input is not
an integer. (Technically most of these go through PyArray_PyIntAsIntp,
and if not probably should)

This affects all axes, shapes, strides, indices (as well as slices) and
(in the future) possibly other integer arguments. Indexing already has a
deprecation warning in master, but I think it should be moved further
down into PyArray_PyIntAsIntp.

Just a couple of minor things to note or I am wondering about:
  1. Does anyone see a problem with rejecting python bools as not
 integers? Python does not do it, but for array indices they
 misbehave and I don't see a real reason to allow them even
 for other integer arguments.
  2. This will mean that 1e4, etc. is not valid, 10**4 must be used.
  3. Deprecate (maybe faster then the rest) that array.__index__()
 works for non 0-d arrays seems right, but makes me a bit wonder
 about __int__ and __float__. (This is unrelated though)
  4. This will affect third party behavior using the public numpy
 conversion functions.

These are long deprecations (but possibly the __index__ which I think
already has a bug report as being wrong...) so if any problems occur
there should be time to clear them. This is just a note in case someone
sees any problems with the general plan.

Regards,

Sebastian

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] MapIter api

2013-04-15 Thread Sebastian Berg
Hey,

the MapIter API has only been made public in master right? So it is no
problem at all to change at least the mapiter struct, right?

I got annoyed at all those special cases that make things difficult to
get an idea where to put i.e. to fix the boolean array-like stuff. So
actually started rewriting it (and I already got one big function that
does all index preparation -- ok it is untested but its basically
there).

I would guess it is not really a big problem even if it was public for
longer, since you shouldn't do those direct struct access probably? But
just checking.

Since I got the test which mimics complex indexes in the tests, I thinks
it should actually be feasible to do bigger refactoring there without
having to worry too much about breaking things.

Regards,

Sebastian

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] MapIter api

2013-04-15 Thread Sebastian Berg
On Mon, 2013-04-15 at 11:16 -0600, Charles R Harris wrote:
> 
> 
> On Mon, Apr 15, 2013 at 10:29 AM, Sebastian Berg
>  wrote:
> Hey,
> 
> the MapIter API has only been made public in master right? So
> it is no
> problem at all to change at least the mapiter struct, right?
> 
> I got annoyed at all those special cases that make things
> difficult to
> get an idea where to put i.e. to fix the boolean array-like
> stuff. So
> actually started rewriting it (and I already got one big
> function that
> does all index preparation -- ok it is untested but its
> basically
> there).
> 
> I would guess it is not really a big problem even if it was
> public for
> longer, since you shouldn't do those direct struct access
> probably? But
> just checking.
> 
> Since I got the test which mimics complex indexes in the
> tests, I thinks
> it should actually be feasible to do bigger refactoring there
> without
> having to worry too much about breaking things.
> 
> 
> Looks like the public API went in last August but didn't make it into
> the 1.7.x release. What sort of schedule are you looking at?
> 
Not sure about a schedule, I somewhat think it is not even that hard,
but of course it would still take a while (once I get a bit further, I
will put it out there, hopefully someone else will be interested to
help), but certainly not aiming to get anything done for 1.8.

My first idea was to just do the parsing differently and keep the
mapiter part identical (or with minor modifications). That seems
actually impractical, since MapIter has a lot of stuff that it does not
need. Plus it seems to me that it might be worth it to use the new
nditer. One could try keep the fields somewhat identical (likely
identical enough to be binary compatible with that ufunc.at pull request
even), but I am not even sure that that is something to aim for, since
the ufunc.at could be modified too (and might get good speed
improvements out of that).

- Sebastian


> Chuck 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] MapIter api

2013-04-16 Thread Sebastian Berg
On Mon, 2013-04-15 at 13:36 -0600, Charles R Harris wrote:
> 
> 
> On Mon, Apr 15, 2013 at 1:27 PM, Sebastian Berg
>  wrote:
> On Mon, 2013-04-15 at 11:16 -0600, Charles R Harris wrote:
> >
> >
> > On Mon, Apr 15, 2013 at 10:29 AM, Sebastian Berg
> >  wrote:
> > Hey,
> >
> > the MapIter API has only been made public in master
> right?



> > Looks like the public API went in last August but didn't
> make it into
> > the 1.7.x release. What sort of schedule are you looking at?
> >
> 
> Not sure about a schedule, I somewhat think it is not even
> that hard,
> but of course it would still take a while (once I get a bit
> further, I
> will put it out there, hopefully someone else will be
> interested to
> help), but certainly not aiming to get anything done for 1.8.
> 
> My first idea was to just do the parsing differently and keep
> the
> mapiter part identical (or with minor modifications). That
> seems
> actually impractical, since MapIter has a lot of stuff that it
> does not
> need. Plus it seems to me that it might be worth it to use the
> new
> nditer. One could try keep the fields somewhat identical
> (likely
> identical enough to be binary compatible with that ufunc.at
> pull request
> even), but I am not even sure that that is something to aim
> for, since
> the ufunc.at could be modified too (and might get good speed
> improvements out of that).
> 
> - Sebastian
> 
> 
> Makes me wonder if we should expose the API in 1.8 if you are thinking
> a change might be appropriate. Or am I missing something here?
> 
Yeah, I am wondering about that. But since I am not clear on exactly if
and how one would reimplement it right now (certainly it would look very
similar in the basic design), there is a bit time before deciding that
maybe. And maybe someone else has an opinion one way or another? 

For example the MapIter currently does not expose the subspace as a
separate iterator. You could access it, but you cannot optimize subspace
iteration by handling it separately. I am thinking about something like
the np.nestediters, but the user would maybe have to check if the inner
iterator even exists (the subspace can easily be 0-d or have only one
element).
Also, I could imagine to tag a second array onto the fancy index
iteration itself. That would be iterated together with the fancy indexes
in one nditer and return pointers into its own subspace. That array
would be the value array in assignment or the new array in subscription.

- Sebastian

> Chuck 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Finding the same value in List

2013-04-17 Thread Sebastian Berg
On Wed, 2013-04-17 at 13:32 +0400, Happyman wrote:
> Hi Todd,
> Greaaat thanks for your help.. By the way, the first one (I think) is
> much simpler.. I tested it and ,of course, it is 1D, but it is also a
> good idea to consider it for Ndimensional.
> I prefer the first one! Do you you think first version is okay to
> use? 

If you are only interested in the count, using np.bincount should be
much faster then the list comprehension with "==". Of course that gives
you a count of zero for all indexes that do not exist. But even then I
very much expect that filtering those out afterwards will be faster
unless your "indexes" can be arbitrary large. Of course bincount loses
the order information, so if you need that, you can only replace the
second step with it.

- Sebastian
> 
> 
> Среда, 17 апреля 2013, 11:02 +02:00 от Todd :
> On Wed, Apr 17, 2013 at 10:46 AM, Todd 
> wrote:
> x,i=numpy.unique(y, return_inverse=True)
> 
> f=[numpy.where(i==ind) for ind in range(len(x))]
> 
> 
> 
> 
> 
> 
> 
> A better version would be (np.where returns tuples, but we
> don't want tuples):
> 
> x,i=numpy.unique(y, return_inverse=True)
> f=[numpy.where(i==ind)[0] for ind in range(len(x))]
> 
> You can also do it this way, but it is much harder to read
> IMO:
> 
> x=numpy.unique(y)
> f=numpy.split(numpy.argsort(y),
> numpy.nonzero(numpy.diff(numpy.sort(y)))[0]+1)
> 
> 
> This version figures out the indexes needed to put the values
> of y in sorted order (the same order x uses), then splits it
> into sub-arrays based on value.  The principle is simpler but
> the implementation looks like clear to me.
> 
> 
> Note that these are only guaranteed to work on 1D arrays, I
> have not tested them on multidimensional arrays
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] datetime64 1970 issue

2013-04-17 Thread Sebastian Berg
On Wed, 2013-04-17 at 09:07 -0700, Chris Barker - NOAA Federal wrote:
> On Wed, Apr 17, 2013 at 9:04 AM, Chris Barker - NOAA Federal
>  wrote:
> > On Tue, Apr 16, 2013 at 8:23 PM, Zachary Ploskey  wrote:
> > I'd say we need some more unit-tests!
> 
> speaking of which, where are the tests? I just did a quick poke at
> github, and found:
> 
> https://github.com/numpy/numpy/tree/master/numpy/testing
> and
> https://github.com/numpy/numpy/tree/master/numpy/test
> 
> but there's very little in there.
> 

The datetime is implemented in the core, so the tests are here:

https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_datetime.py

- Sebastian

> -Chris
> 
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Index Parsing redo

2013-04-18 Thread Sebastian Berg
Hey,

so I ignored trying to redo MapIter (likely it is lobotomized at this
time though). But actually got a working new index parsing (still needs
cleanup, etc.).  Also some of the fast paths are not yet put back. For
most pure integer indices it got a bit slower, if it actually gets too
much one could re-add such special cases I guess...

If anyone wants to take a look, it is here:

https://github.com/seberg/numpy/compare/rewrite-index-parser

The tests run through fine with the exception of the matrix item setting
[1] and changes in errors. Certainly many errors still need to be made
indexing specific.

If anyone is interested in messing with it, give me a ping for direct
access. Polishing such a thing up (if deemed right) is a lot of work and
I am not sure I will find the time soon.

Regards,

Sebastian

[1] That is because subclass item setting (for non-fancy, non-scalar
result and non-single integer indices) is really

tmp = subclass.view(np.ndarray)
tmp[index] = values`

which needs to be put back in. The whole matrix indexing business
probably needs some extra functionality in the core to get right I
guess.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy scalars and savez -- bug?

2013-04-19 Thread Sebastian Berg
On Fri, 2013-04-19 at 08:03 -0700, Chris Barker - NOAA Federal wrote:
> On Apr 18, 2013, at 11:33 PM, Nathaniel Smith  wrote:
> 
> 
> 
> > On 18 Apr 2013 01:29, "Chris Barker - NOAA Federal"
> >  wrote:
> > > This has been annoying, particular as rank-zero scalars are kind
> > of a pain.
> > 
> > BTW, while we're on the topic, can you elaborate on this? I tend to
> > think scalars (as opposed to 0d ndarrays) are kind of a pain, so I'm
> > curious if you have specific issues you've run into with 0d
> > ndarrays.
> > 
> > 
> Well, I suppose what's really a pain is that we have both, and they
> are not the same, and neither can be used in all cases one may want.
> 
> 
> In the case at hand, I really wanted a datetime64 scalar. By saving
> and re-loading in an npz, it got converted to a rank-zero array, which
> had different behavior. In this case, the frustrating bit was how to
> extract a scalar again ( which I really wanted to turn into a datetime
> object).
> 
> 
> After the fact, I discovered .item(), which seems to do what I want.
> 
Fun fact, array[()] will convert a 0-d array to a scalar, but do nothing
(or currently create a view) for other arrays. Which is actually a good
question. Should array[()] force a view or not?

- Sebastian
> 
> On a phone now, so sorry about the lack of examples.
> 
> 
> Note: I've lost track of why we need both scalers and rank-zero
> arrays. I can't help thinking that there could be an object that acts
> like a scalar in most contexts, but also has the array methods that
> make sense.
> 
> 
> But I know it's far from simple.
> 
> 
> -Chris 
> 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   3   4   5   >