from:"Sebastian"

Re: [Numpy-discussion] PyData Barcelona this May

2017-03-09 Thread Sebastian Berg

On Thu, 2017-03-09 at 15:45 +0100, Jaime Fernández del Río wrote:
> There will be a PyData conference in Barcelona this May:
> 
> http://pydata.org/barcelona2017/
> 
> I am planning on attending, and was thinking of maybe proposing to
> organize a numpy-themed workshop or tutorial.
> 
> My personal inclination would be to look at some advanced topic that
> I know well, like writing gufuncs in Cython, but wouldn't mind doing
> a more run of the mill thing. Anyone has any thoughts or experiences
> on what has worked well in similar situations? Any specific topic you
> always wanted to attend a workshop on, but were afraid to ask?
> 
> Alternatively, or on top of the workshop, I could propose to do a
> talk: talking last year at PyData Madrid about the new indexing was a
> lot of fun! Thing is, I have been quite disconnected from the project
> this past year, and can't really think of any worthwhile topic. Is
> there any message that we as a project would like to get out to the
> larger community?
> 

Francesc already pointed out the temporary optimization. From what I
remember, my personal highlight would probably be Pauli's work on the
memory overlap detection. Though both are rather passive improvements I
guess (you don't really have to learn them to use them), its very cool!
And if its about highlighting new stuff, these can probably easily fill
a talk.

> And if you are planning on attending, please give me a shout.
> 

Barcelona :). Maybe I should think about it, but probably not.


> Thanks,
> 
> Jaime
> 
> -- 
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
> planes de dominación mundial.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] vectorization vs. numpy.linalg (shape (3, 3, 777) vs shape (777, 3, 3))

2017-03-05 Thread Sebastian Berg

On Sun, 2017-03-05 at 15:46 +, Nico Schlömer wrote:
> > I am honestly not sure where you are going at. This order seems
> the more natural order for most operations. 
> 
> Not sure what you mean by "natural". The most basic operations
> like `a[0] + a[1]` are several times faster than `a[...,0] + a[...,
> 1]`. Perhaps one can come up with code examples that don't use those
> operations at all, but I would guess that'll be a rare case. My
> point: If your code uses vector operations (as `+`) _and_
> numpy.linalg functions, your vector operations will be several times
> slower than necessary. One way around this would perhaps be to
> initialize all your data in Fortran-style data layout, where
> `a[...,0] + a[..., 1]` is indeed faster than `a[0] + a[1]`.
> 
> What I'll end up doing is probably to rewrite whatever numpy.linalg
> function I need for C-style ordering. For dets of 2x2 or 3x3
> matrices, this shouldn't be to hard (`a[0][0] +a[1][1] - a[0][1] -
> a[1][0]`). :)

Well, Linalg functions are (stacked) low-level calls into lapack.

Now for some functions (not simple math), it is possible that if you
have chosen a non C-order memory layout, numpy has to jump through
hoops to do your operations or iterates in C-order anyway.

And yes, this is also the case for most of linalg, since most of them
call into highly optimized code and algorithms defined for a single
array.

In that case manual solutions like you suggested may be faster.
Although, I would suggest to be very careful with them, since there are
may be subtleties one may miss, and it probably does make the code less
easy to maintain and more error prone.

In general numpy is smart about memory/iteration order for most cases,
if you work on stacked arrays (and not on a single slices of them as in
your examples) the memory order often has little to no effect on the
operation speed, e.g. while  `a[0] + a[1]` it is faster in your
examples, whether you do `a.sum(0)` or `a.sum(-1)` makes typically
little difference.

Of course there are some cases were you can out-smart numpy, and of
course in general choosing the right memory order can have a huge
impact on your performance. But again, trying to out-smart also comes
at a cost and I would be very reluctant to do it, and typically there are 
probably lower hanging fruits to get first.

- Sebastian


> 
> Cheers,
> Nico
> 
> 
> On Sun, Mar 5, 2017 at 3:53 PM Sebastian Berg <sebastian@sipsolutions
> .net> wrote:
> > On Thu, 2017-03-02 at 10:27 +, Nico Schlömer wrote:
> > > Hi everyone,
> > >
> > > When trying to speed up my code, I noticed that simply by
> > reordering
> > > my data I could get more than twice as fast for the simplest
> > > operations:
> > > ```
> > > import numpy
> > > a = numpy.random.rand(50, 50, 50)
> > >
> > > %timeit a[0] + a[1]
> > > 100 loops, best of 3: 1.7 µs per loop
> > >
> > > %timeit a[:, 0] + a[:, 1]
> > > 10 loops, best of 3: 4.42 µs per loop
> > >
> > > %timeit a[..., 0] + a[..., 1]
> > > 10 loops, best of 3: 5.99 µs per loop
> > > ```
> > > This makes sense considering the fact that, by default, NumPy
> > > features C-style memory allocation: the last index runs fastest.
> > The
> > > blocks that are added with `a[0] + a[1]` are contiguous in
> > memory, so
> > > cache is nicely made use of. As opposed to that, the blocks that
> > are
> > > added with `a[:, 0] + a[:, 1]` are not contiguous, even more so
> > with
> > > `a[..., 0] + a[..., 1]`; hence the slowdown. Would that be the
> > > correct explanation?
> > >
> > > If yes, I'm wondering why most numpy.linalg methods, when
> > vectorized,
> > > put the vector index up front. E.g., to mass-compute
> > determinants,
> > > one has to do
> > > ```
> > > a = numpy.random.rand(777, 3, 3)
> > > numpy.linalg.det(a)
> > > ```
> > > This way, all 3x3 matrices form a memory block, so if you do
> > `det`
> > > block by block, that'll be fine. However, vectorized operations
> > (like
> > > `+` above) will be slower than necessary.
> > > Any background on this? 
> > >
> > 
> > I am honestly not sure where you are going at. This order seems the
> > more natural order for most operations. Also numpy does not create
> > copies normally even for transposed data (though some functions may
> > internally, numpy just _defaults_ to C-order). So of course what is
> > faster depends on your use-case, but if you have an operation on
> > many
> > 3x3 arrays the way numpy does it is the more

Re: [Numpy-discussion] vectorization vs. numpy.linalg (shape (3, 3, 777) vs shape (777, 3, 3))

2017-03-05 Thread Sebastian Berg

On Thu, 2017-03-02 at 10:27 +, Nico Schlömer wrote:
> Hi everyone,
> 
> When trying to speed up my code, I noticed that simply by reordering
> my data I could get more than twice as fast for the simplest
> operations:
> ```
> import numpy
> a = numpy.random.rand(50, 50, 50)
> 
> %timeit a[0] + a[1]
> 100 loops, best of 3: 1.7 µs per loop
> 
> %timeit a[:, 0] + a[:, 1]
> 10 loops, best of 3: 4.42 µs per loop
> 
> %timeit a[..., 0] + a[..., 1]
> 10 loops, best of 3: 5.99 µs per loop
> ```
> This makes sense considering the fact that, by default, NumPy
> features C-style memory allocation: the last index runs fastest. The
> blocks that are added with `a[0] + a[1]` are contiguous in memory, so
> cache is nicely made use of. As opposed to that, the blocks that are
> added with `a[:, 0] + a[:, 1]` are not contiguous, even more so with
> `a[..., 0] + a[..., 1]`; hence the slowdown. Would that be the
> correct explanation?
> 
> If yes, I'm wondering why most numpy.linalg methods, when vectorized,
> put the vector index up front. E.g., to mass-compute determinants,
> one has to do
> ```
> a = numpy.random.rand(777, 3, 3)
> numpy.linalg.det(a)
> ```
> This way, all 3x3 matrices form a memory block, so if you do `det`
> block by block, that'll be fine. However, vectorized operations (like
> `+` above) will be slower than necessary.
> Any background on this? 
> 

I am honestly not sure where you are going at. This order seems the
more natural order for most operations. Also numpy does not create
copies normally even for transposed data (though some functions may
internally, numpy just _defaults_ to C-order). So of course what is
faster depends on your use-case, but if you have an operation on many
3x3 arrays the way numpy does it is the more natural way. If you do
other things that are faster the other way around, you will have to
decide which operation is the more important one overall.

- Sebastian

> (I came across this when having to rearrange my data (swapaxes,
> rollaxis) from shape (3, 3, 777) (which allows for fast vectorized
> operations in the rest of the code) to (777, 3, 3) for using numpy's
> svd.)
> 
> Cheers,
> Nico
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy Overhead

2017-02-28 Thread Sebastian K

Thank you! That is the information I needed.

2017-03-01 0:18 GMT+01:00 Matthew Brett <matthew.br...@gmail.com>:

> Hi,
>
> On Tue, Feb 28, 2017 at 3:04 PM, Sebastian K
> <sebastiankas...@googlemail.com> wrote:
> > Yes you are right. There is no need to add that line. I deleted it. But
> the
> > measured heap peak is still the same.
>
> You're applying the naive matrix multiplication algorithm, which is
> ideal for minimizing memory use during the computation, but terrible
> for speed-related stuff like keeping values in the CPU cache:
>
> https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm
>
> The Numpy version is likely calling into a highly optimized compiled
> routine for matrix multiplication, which can load chunks of the
> matrices at a time, to speed up computation.   If you really need
> minimum memory heap usage and don't care about the order of
> magnitude(s) slowdown, then you might need to use the naive method,
> maybe implemented in Cython / C.
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy Overhead

2017-02-28 Thread Sebastian K

Yes you are right. There is no need to add that line. I deleted it. But the
measured heap peak is still the same.

2017-03-01 0:00 GMT+01:00 Joseph Fox-Rabinovitz <jfoxrabinov...@gmail.com>:

> For one thing, `C = np.empty(shape=(n,n), dtype='float64')` allocates 10^4
> extra elements before being immediately discarded.
>
> -Joe
>
> On Tue, Feb 28, 2017 at 5:57 PM, Sebastian K <sebastiankaster@googlemail.
> com> wrote:
>
>> Yes it is true the execution time is much faster with the numpy function.
>>
>>  The Code for numpy version:
>>
>> def createMatrix(n):
>> Matrix = np.empty(shape=(n,n), dtype='float64')
>> for x in range(n):
>> for y in range(n):
>> Matrix[x, y] = 0.1 + ((x*y)%1000)/1000.0
>> return Matrix
>>
>>
>>
>> if __name__ == '__main__':
>> n = getDimension()
>> if n > 0:
>> A = createMatrix(n)
>> B = createMatrix(n)
>> C = np.empty(shape=(n,n), dtype='float64')
>> C = np.dot(A,B)
>>
>> #print(C)
>>
>> In the pure python version I am just implementing the multiplication with
>> three for-loops.
>>
>> Measured data with libmemusage:
>> dimension of matrix: 100x100
>> heap peak pure python3: 1060565
>> heap peakt numpy function: 4917180
>>
>>
>> 2017-02-28 23:17 GMT+01:00 Matthew Brett <matthew.br...@gmail.com>:
>>
>>> Hi,
>>>
>>> On Tue, Feb 28, 2017 at 2:12 PM, Sebastian K
>>> <sebastiankas...@googlemail.com> wrote:
>>> > Thank you for your answer.
>>> > For example a very simple algorithm is a matrix multiplication. I can
>>> see
>>> > that the heap peak is much higher for the numpy version in comparison
>>> to a
>>> > pure python 3 implementation.
>>> > The heap is measured with the libmemusage from libc:
>>> >
>>> >   heap peak
>>> >   Maximum of all size arguments of malloc(3), all
>>> products
>>> >   of nmemb*size of calloc(3), all size arguments of
>>> >   realloc(3), length arguments of mmap(2), and new_size
>>> >   arguments of mremap(2).
>>>
>>> Could you post the exact code you're comparing?
>>>
>>> I think you'll find that a naive Python 3 matrix multiplication method
>>> is much, much slower than the same thing with Numpy, with arrays of
>>> any reasonable size.
>>>
>>> Cheers,
>>>
>>> Matthew
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy Overhead

2017-02-28 Thread Sebastian K

Yes it is true the execution time is much faster with the numpy function.

 The Code for numpy version:

def createMatrix(n):
Matrix = np.empty(shape=(n,n), dtype='float64')
for x in range(n):
for y in range(n):
Matrix[x, y] = 0.1 + ((x*y)%1000)/1000.0
return Matrix



if __name__ == '__main__':
n = getDimension()
if n > 0:
A = createMatrix(n)
B = createMatrix(n)
C = np.empty(shape=(n,n), dtype='float64')
C = np.dot(A,B)

#print(C)

In the pure python version I am just implementing the multiplication with
three for-loops.

Measured data with libmemusage:
dimension of matrix: 100x100
heap peak pure python3: 1060565
heap peakt numpy function: 4917180


2017-02-28 23:17 GMT+01:00 Matthew Brett <matthew.br...@gmail.com>:

> Hi,
>
> On Tue, Feb 28, 2017 at 2:12 PM, Sebastian K
> <sebastiankas...@googlemail.com> wrote:
> > Thank you for your answer.
> > For example a very simple algorithm is a matrix multiplication. I can see
> > that the heap peak is much higher for the numpy version in comparison to
> a
> > pure python 3 implementation.
> > The heap is measured with the libmemusage from libc:
> >
> >   heap peak
> >   Maximum of all size arguments of malloc(3), all
> products
> >   of nmemb*size of calloc(3), all size arguments of
> >   realloc(3), length arguments of mmap(2), and new_size
> >   arguments of mremap(2).
>
> Could you post the exact code you're comparing?
>
> I think you'll find that a naive Python 3 matrix multiplication method
> is much, much slower than the same thing with Numpy, with arrays of
> any reasonable size.
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy Overhead

2017-02-28 Thread Sebastian K

Thank you for your answer.
For example a very simple algorithm is a matrix multiplication. I can see
that the heap peak is much higher for the numpy version in comparison to a
pure python 3 implementation.
The heap is measured with the libmemusage from libc:

  *heap peak*
  Maximum of all *size* arguments of malloc(3)
<http://man7.org/linux/man-pages/man3/malloc.3.html>, all products
  of *nmemb***size* of calloc(3)
<http://man7.org/linux/man-pages/man3/calloc.3.html>, all *size*
arguments of
  realloc(3)
<http://man7.org/linux/man-pages/man3/realloc.3.html>, *length*
arguments of mmap(2)
<http://man7.org/linux/man-pages/man2/mmap.2.html>, and *new_size*
  arguments of mremap(2)
<http://man7.org/linux/man-pages/man2/mremap.2.html>.

Regards

Sebastian

On 28 Feb 2017 11:03 p.m., "Benjamin Root" <ben.v.r...@gmail.com> wrote:

> You are going to need to provide much more context than that. Overhead
> compared to what? And where (io, cpu, etc.)? What are the size of your
> arrays, and what sort of operations are you doing? Finally, how much
> overhead are you seeing?
>
> There can be all sorts of reasons for overhead, and some can easily be
> mitigated, and others not so much.
>
> Cheers!
> Ben Root
>
>
> On Tue, Feb 28, 2017 at 4:47 PM, Sebastian K <
> sebastiankas...@googlemail.com> wrote:
>
>> Hello everyone,
>>
>> I'm interested in the numpy project and tried a lot with the numpy array.
>> I'm wondering what is actually done that there is so much overhead when I
>> call a function in Numpy. What is the reason?
>> Thanks in advance.
>>
>> Regards
>>
>> Sebastian Kaster
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Numpy Overhead

2017-02-28 Thread Sebastian K

Hello everyone,

I'm interested in the numpy project and tried a lot with the numpy array.
I'm wondering what is actually done that there is so much overhead when I
call a function in Numpy. What is the reason?
Thanks in advance.

Regards

Sebastian Kaster
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Default type for functions that accumulate integers

2017-01-03 Thread Sebastian Berg

On Mo, 2017-01-02 at 18:46 -0800, Nathaniel Smith wrote:
> On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris
> <charlesr.har...@gmail.com> wrote:
> > 
> > Hi All,
> > 
> > Currently functions like trace use the C long type as the default
> > accumulator for integer types of lesser precision:
> > 

> 
> Things we'd need to know more about before making a decision:
> - compatibility: if we flip this switch, how much code breaks? In
> general correct numpy-using code has to be prepared to handle
> np.dtype(int) being 64-bits, and in fact there might be more code
> that
> accidentally assumes that np.dtype(int) is always 64-bits than there
> is code that assumes it is always 32-bits. But that's theory; to know
> how bad this is we would need to try actually running some projects
> test suites and see whether they break or not.
> - speed: there's probably some cost to using 64-bit integers on 32-
> bit
> systems; how big is the penalty in practice?
> 

I agree with trying to switch the default in general first, I don't
like the idea of having two different "defaults".

There are two issues, one is the change on Python 2 (no inheritance of
Python int by default numpy type) and any issues due to increased
precision (more RAM usage, code actually expects lower precision
somehow, etc.).
Cannot say I know for sure, but I would be extremely surprised if there
is a speed difference between 32bit vs. 64bit architectures, except the
general slowdown you get due to bus speeds, etc. when going to higher
bit width.

If the inheritance for some reason is a bigger issue, we might limit
the change to Python 3. For other possible problems, I think we may
have difficulties assessing how much is affected. The problem is, that
the most affected thing should be projects only being used on windows,
or so. Bigger projects should work fine already (they are more likely
to get better due to not being tested as well on 32bit long platforms,
especially 64bit windows).

Of course limiting the change to python 3, could have the advantage of
not affecting older projects which are possibly more likely to be
specifically using the current behaviour.

So, I would be open to trying the change, I think the idea of at least
changing it in python 3 has been brought up a couple of times,
including by Julian, so maybe it is time to give it a shot

It would be interesting to see if anyone knows projects that may be
affected (for example because they are designed to only run on windows
or limited hardware), and if avoiding to change anything in python 2
might mitigate problems here as well (additionally to avoiding the
inheritance change)?

Best,

Sebastian

> -n
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Casting to np.byte before clearing values

2016-12-26 Thread Sebastian Berg

On Mo, 2016-12-26 at 10:34 +0100, Nicolas P. Rougier wrote:
> Hi all,
> 
> 
> I'm trying to understand why viewing an array as bytes before
> clearing makes the whole operation faster.
> I imagine there is some kind of special treatment for byte arrays but
> I've no clue. 
> 

Sure, if its a 1-byte width type, the code will end up calling
`memset`. If it is not, it will end up calling a loop with:

while (N > 0) {
    *dst = output;
    *dst += 8;  /* or whatever element size/stride is */
    --N;
}

now why this gives such a difference, I don't really know, but I guess
it is not too surprising and may depend on other things as well.

- Sebastian


> 
> # Native float
> Z_float = np.ones(100, float)
> Z_int   = np.ones(100, int)
> 
> %timeit Z_float[...] = 0
> 1000 loops, best of 3: 361 µs per loop
> 
> %timeit Z_int[...] = 0
> 1000 loops, best of 3: 366 µs per loop
> 
> %timeit Z_float.view(np.byte)[...] = 0
> 1000 loops, best of 3: 267 µs per loop
> 
> %timeit Z_int.view(np.byte)[...] = 0
> 1000 loops, best of 3: 266 µs per loop
> 
> 
> Nicolas
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Does numpy.bincount support numpy.float128 type weights?

2016-11-30 Thread Sebastian Berg

Fist off, a word of caution. float128 depends on your system and maps
to whatever longdouble is (IIRC) or may not even exist. So I hope you
don't expect IEEE 128 bit floats, if you are unsure, maybe check
`np.finfo`.

If speed does not matter
```
res = np.zeros(np.max(b), dtype=np.longdouble)
np.add.at(res, b, a)
```
will work, but do not expect it to be fast.

- Sebastian


On Do, 2016-12-01 at 05:54 +0800, Wei, Huayi wrote:
> Hi, There,
> 
> Here is a sample code using `numpy.bincount`
> 
>  import numpy as np
>  a = np.array([1.0, 2.0, 3.0], dtype=np.float128)
>  b = np.array([1, 2, 0], dtype=np.int)
>  c = np.bincount(b, weights=a)
> 
> If run it, I get the following error report:
> 
>  > 1 c = np.bincount(b, weights=a)
>  TypeError: Cannot cast array data from dtype('float128') to 
> dtype('float64') according to the rule 'safe'
> 
> Is it a bug of `np.bincount`? Does there exist any similar function 
> which I can use to do the similar thing with numpy.float128 type
> weights?
> 
> Best
> 
> Huayi
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy 1.12.x branched

2016-11-06 Thread Sebastian Berg

On Sa, 2016-11-05 at 17:04 -0600, Charles R Harris wrote:
> Hi All,
> 
> Numpy 1.12.x has been branched and the 1.13 development branch is
> open. It would be helpful if folks could review the release notes as
> it is likely I've missed something.  I'd like to make the first beta
> release in a couple of days.
> 

Very cool, thanks for all the hard work!

- Sebastian


> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-04 Thread Sebastian Berg

On Fr, 2016-11-04 at 15:42 -0400, Matthew Harrigan wrote:
> I didn't notice identity before.  Seems like frompyfunc always sets
> it to None.  If it were zero maybe it would work as desired here.
> 
> In the writing your own ufunc doc, I was wondering if the pointer to
> data could be used to get a constant at runtime.  If not, what could
> that be used for?
> static void double_logit(char **args, npy_intp *dimensions,
> npy_intp* steps, void* data)
> Why would the numerical accuracy be any different?  The subtraction
> and square operations look identical and I thought np.sum just calls
> np.add.reduce, so the reduction step uses the same code and would
> therefore have the same accuracy.
> 

Sorry, did not read it carefully, I guess `c` is the mean, so you are
doing the two pass method.

- Sebastian


> Thanks
> 
> On Fri, Nov 4, 2016 at 1:56 PM, Sebastian Berg <sebastian@sipsolution
> s.net> wrote:
> > On Fr, 2016-11-04 at 13:11 -0400, Matthew Harrigan wrote:
> > > I was reading this and got thinking about if a ufunc could
> > compute
> > > the sum of squared differences in a single pass without a
> > temporary
> > > array.  The python code below demonstrates a possible approach.
> > >
> > > import numpy as np
> > > x = np.arange(10)
> > > c = 1.0
> > > def add_square_diff(x1, x2):
> > >     return x1 + (x2-c)**2
> > > ufunc = np.frompyfunc(add_square_diff, 2, 1)
> > > print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
> > > print(np.sum(np.square(x-c)))
> > >
> > > I have (at least) 4 questions:
> > > 1. Is it possible to pass run time constants to a ufunc written
> > in C
> > > for use in its inner loop, and if so how?
> > 
> > I don't think its anticipated, since a ufunc could in most cases
> > use a
> > third argument, but a 3 arg ufunc can't be reduced. Not sure if
> > there
> > might be some trickery possible.
> > 
> > > 2. Is it possible to pass an initial value to reduce to avoid the
> > > clean up required for the first element?
> > 
> > This is the identity normally. But the identity can only be 0, 1 or
> > -1
> > right now I think. The identity is what the output array gets
> > initialized with (which effectively makes it the first value passed
> > into the inner loop).
> > 
> > > 3. Does that ufunc work, or are there special cases which cause
> > it to
> > > fall apart?
> > > 4. Would a very specialized ufunc such as this be considered for
> > > incorporating in numpy since it would help reduce time and memory
> > of
> > > functions already in numpy?
> > >
> > 
> > Might be mixing up things, however, IIRC the single pass approach
> > has a
> > bad numerical accuracy, so that I doubt that it is a good default
> > algorithm.
> > 
> > - Sebastian
> > 
> > 
> > > Thank you,
> > > Matt
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-04 Thread Sebastian Berg

On Fr, 2016-11-04 at 13:11 -0400, Matthew Harrigan wrote:
> I was reading this and got thinking about if a ufunc could compute
> the sum of squared differences in a single pass without a temporary
> array.  The python code below demonstrates a possible approach.
> 
> import numpy as np
> x = np.arange(10)
> c = 1.0
> def add_square_diff(x1, x2):
>     return x1 + (x2-c)**2
> ufunc = np.frompyfunc(add_square_diff, 2, 1)
> print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
> print(np.sum(np.square(x-c)))
> 
> I have (at least) 4 questions:
> 1. Is it possible to pass run time constants to a ufunc written in C
> for use in its inner loop, and if so how?

I don't think its anticipated, since a ufunc could in most cases use a
third argument, but a 3 arg ufunc can't be reduced. Not sure if there
might be some trickery possible.

> 2. Is it possible to pass an initial value to reduce to avoid the
> clean up required for the first element?

This is the identity normally. But the identity can only be 0, 1 or -1
right now I think. The identity is what the output array gets
initialized with (which effectively makes it the first value passed
into the inner loop).

> 3. Does that ufunc work, or are there special cases which cause it to
> fall apart?
> 4. Would a very specialized ufunc such as this be considered for
> incorporating in numpy since it would help reduce time and memory of
> functions already in numpy?
> 

Might be mixing up things, however, IIRC the single pass approach has a
bad numerical accuracy, so that I doubt that it is a good default
algorithm.

- Sebastian


> Thank you,
> Matt
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Sebastian Berg

On Mi, 2016-10-26 at 09:29 -0700, Robert Kern wrote:
> On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor <jtaylor.debian@google
> mail.com> wrote:
> >
> > On 10/26/2016 06:00 PM, Julian Taylor wrote:
> 
> >> I prefer for the full functionality of numpy to stay available
> with a
> >> stack of community owned software, even if it may be less powerful
> that
> >> way.
> >
> > But then if this is really just the same random numbers numpy
> already provides just faster, it is probably acceptable in principle.
> I haven't actually looked at the PR yet.
> 
> I think the stream is different in some places, at least. And it's
> not a silent backend drop-in like np.linalg being built against an
> optimized BLAS, just a separate module that is inoperative without
> MKL.
> 

I might be swayed, but my gut feeling would be that a backend change
(if the default stream changes, an explicit one, though maybe one could
make a "fastest") would be the only reasonable way to provide such a
thing in numpy itself.

- Sebastian



> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fpower ufunc

2016-10-21 Thread Sebastian Berg

On Fr, 2016-10-21 at 09:45 +0200, Sebastian Berg wrote:
> On Do, 2016-10-20 at 21:38 -0600, Charles R Harris wrote:
> > 
> > 
> > 
> > On Thu, Oct 20, 2016 at 9:11 PM, Nathaniel Smith <n...@pobox.com>
> > wrote:
> > > 
> > > On Thu, Oct 20, 2016 at 7:58 PM, Charles R Harris
> > > <charlesr.har...@gmail.com> wrote:
> > > > 
> > > > Hi All,
> > > > 
> > > > I've put up a preliminary PR for the proposed fpower ufunc.
> > > > Apart
> > > from
> > > > 
> > > > adding more tests and documentation, I'd like to settle a few
> > > other things.
> > > > 
> > > > The first is the name, two names have been proposed and we
> > > > should
> > > settle on
> > > > 
> > > > one
> > > > 
> > > > fpower (short)
> > > > float_power (obvious)
> > > +0.6 for float_power
> > > 
> > > > 
> > > > The second thing is the minimum precision. In the preliminary
> > > version I have
> > > > 
> > > > used float32, but perhaps it makes more sense for the intended
> > > use to make
> > > > 
> > > > the minimum precision float64 instead.
> > > Can you elaborate on what you're thinking? I guess this is
> > > because
> > > float32 has limited range compared to float64, so is more likely
> > > to
> > > see overflow? float32 still goes up to 10**38 which is <
> > > int64_max**2,
> > > FWIW. Or maybe there's some subtlety with the int->float casting
> > > here?
> > logical, (u)int8, (u)int16, and float16 get converted to float32,
> > which is probably sufficient to avoid overflow and such. My thought
> > was that float32 is something of a "specialized" type these days,
> > while float64 is the standard floating point precision for everyday
> > computation.
> > 
> 
> Isn't the behaviour we already have (e.g. such as mean).
> 
> ints -> float64
> inexacts do not get upcast?
> 

Ah, on the other hand, some/most of the float only ufuncs probably do
it as you made it work?


> - Sebastian
> 
> 
> > 
> > Chuck 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fpower ufunc

2016-10-21 Thread Sebastian Berg

On Do, 2016-10-20 at 21:38 -0600, Charles R Harris wrote:
> 
> 
> On Thu, Oct 20, 2016 at 9:11 PM, Nathaniel Smith <n...@pobox.com>
> wrote:
> > On Thu, Oct 20, 2016 at 7:58 PM, Charles R Harris
> > <charlesr.har...@gmail.com> wrote:
> > > Hi All,
> > >
> > > I've put up a preliminary PR for the proposed fpower ufunc. Apart
> > from
> > > adding more tests and documentation, I'd like to settle a few
> > other things.
> > > The first is the name, two names have been proposed and we should
> > settle on
> > > one
> > >
> > > fpower (short)
> > > float_power (obvious)
> > 
> > +0.6 for float_power
> > 
> > > The second thing is the minimum precision. In the preliminary
> > version I have
> > > used float32, but perhaps it makes more sense for the intended
> > use to make
> > > the minimum precision float64 instead.
> > 
> > Can you elaborate on what you're thinking? I guess this is because
> > float32 has limited range compared to float64, so is more likely to
> > see overflow? float32 still goes up to 10**38 which is <
> > int64_max**2,
> > FWIW. Or maybe there's some subtlety with the int->float casting
> > here?
> logical, (u)int8, (u)int16, and float16 get converted to float32,
> which is probably sufficient to avoid overflow and such. My thought
> was that float32 is something of a "specialized" type these days,
> while float64 is the standard floating point precision for everyday
> computation.
> 


Isn't the behaviour we already have (e.g. such as mean).

ints -> float64
inexacts do not get upcast?

- Sebastian


> Chuck 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve

2016-10-14 Thread Sebastian Berg

On Fr, 2016-10-14 at 13:00 -0400, Allan Haldane wrote:
> Hi all,
> 
> Eric Wieser has a PR which defines new functions np.ma.correlate and
> np.ma.convolve:
> 
> https://github.com/numpy/numpy/pull/7922
> 
> We're deciding how to name the keyword arg which determines whether
> masked elements are "propagated" in the convolution sums. Currently
> we
> are leaning towards calling it "contagious", with default of True:
> 
> def convolve(a, v, mode='full', contagious=True):
> 
> Any thoughts?
> 

Sounds a bit overly odd to me to be honest. Just brain storming, you
could think/name it the other way around maybe? Should the masked
values be considered as zero/ignored?

- Sebastian


> Cheers,
> Allan
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Integers to negative integer powers, time for a decision.

2016-10-09 Thread Sebastian Berg

On Fr, 2016-10-07 at 19:12 -0600, Charles R Harris wrote:
> Hi All,
> 
> The time for NumPy 1.12.0 approaches and I like to have a final
> decision on the treatment of integers to negative integer powers with
> the `**` operator. The two alternatives looked to be
> 
> Raise an error for arrays and numpy scalars, including 1 and -1 to
> negative powers.
> 

For what its worth, I still feel it is probably the only real option to
go with error, changing to float may have weird effects. Which does not
mean it is impossible, I admit, though I would like some data on how
downstream would handle it. Also would we need an int power? The fpower
seems more straight forward/common pattern.
If errors turned out annoying in some cases, a seterr might be
plausible too (as well as a deprecation).

- Sebastian


> Pluses
> Backward compatible
> Allows common powers to be integer, e.g., arange(3)**2
> Consistent with inplace operators
> Fixes current wrong behavior.
> Preserves type
> 
> Minuses
> Integer overflow
> Computational inconvenience
> Inconsistent with Python integers
> 
> Always return a float 
> 
> Pluses
> Computational convenience
> 
> Minuses
> Loss of type
> Possible backward incompatibilities
> Not applicable to inplace operators
> 
> 
> Thoughts?
> 
> Chuck
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [7949] How to handle these deprecation Warning errors

2016-09-26 Thread Sebastian Berg

On Mo, 2016-09-26 at 15:36 +0200, Saurabh Mehta wrote:
> Hi
> 
> I am working on issue #7949, and need to use "-3" switch while
> running python 2.7 for my tests.
> python -3 -c "import numpy as np; np.test()"
> 
> Several errors are reported and all all of them are
> DeprecationWarnings, which is ok. (https://travis-ci.org/njase/numpy/
> jobs/162733502)
> 

OK, most of them seem harmless (i.e. don't use the slice C-slot).

I think we should just get rid of the slice C-slot thing globally, the
simplest way would be to go to numpy/testing/nosetester.py, look for
things like `sup.filter(message='Not importing directory')`. Then,
maybe specific `if sys.version_info.major == 2 and sys.py3kwarning`,
just add some `sup.filter` such as:

```
if sys.version_info.major == 2 and sys.py3kwarning:
    sup.filter(DeprecationWarning, message="in 3.x, __setslice__")
    sup.filter(DeprecationWarning, message="in 3.x, __getslice__")

```
First scrolling through the errors, my guess is that the other errors
we can also silence more locally.

This silencing might also leek to scipy, but frankly I am not worried
about it for those slots. The threading warnings seem also quite noisy
(and useless), but not sure right away what the best approach for that
would be.

- Sebastian

> But now these errors must be either fixed or skipped. This is where I
> am facing problem. Pls suggest:
> 
> 1. Identify that python was invoked with -3 and skip these cases:
> There seems no way to know if python was invoked with -3
> sys.argv only reports about "-c" and ignores other switches
> 
> 
> 2. Fix these issues: All of them are about deprecated APIs and new
> APIs have been introduced in python 3. Since I am using python 2.x, I
> don't see a way to fix them
> 
> What to do?
> 
> Regards
> Saurabh
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] String & unicode arrays vs text loading in python 3

2016-09-13 Thread Sebastian Berg

On Di, 2016-09-13 at 15:02 +0200, Lluís Vilanova wrote:
> Hi! I'm giving a shot to issue #3184 [1], based on the observation
> that the
> string dtype ('S') under python 3 uses byte arrays instead of unicode
> (the only
> readable string type in python 3).
> 
> This brings two major problems:
> 
> * numpy code has to go through loops to open and read files as binary
> data to
>   load text into a bytes array, and does not play well with users
> providing
>   string (unicode) arguments
> 
> * the repr of these arrays shows strings as b'text' instead of
> 'text', which
>   breaks doctests of software built on numpy
> 
> What I'm trying to do is make dtypes 'S' and 'U' equivalnt
> (NPY_STRING and
> NPY_UNICODE).
> 
> Now the question. Keeping 'S' and 'U' as separate dtypes (but same
> internal
> implementation) will provide the best backwards compatibility, but is
> more
> cumbersome to implement.

I am not sure how that can be possible. Those types are fundamentally
different in how they store their data. String types use one byte per
character, unicode types will use 4 bytes per character. You can maybe
default to unicode in more cases in python 3, but you cannot make them
identical internally.

What about giving `np.loadtxt` an encoding kwarg or something along
that line?

- Sebastian


> 
> Is it acceptable to internally just translate all appearances of 'S'
> (NPY_STRING) to 'U' (NPY_UNICODE) and get rid of one of the two when
> running in
> python 3?
> 
> The main drawback I see is that dtype reprs would not always be as
> expected:
> 
>    # python 2
>    >>> np.array('foo', dtype='S')
>    array('foo',
>  dtype='|S3')
> 
>    # python 3
>    >>> np.array('foo', dtype='S')
>    array('foo',
>  dtype='<U3')
> 
> 
> [1] https://github.com/numpy/numpy/issues/3184
> 
> 
> Cheers,
>   Lluis
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New iterator APIs (nditer / MapIter): Overlap detection in NumPy

2016-09-12 Thread Sebastian Berg

On Mo, 2016-09-12 at 20:22 +, Pauli Virtanen wrote:
> Mon, 12 Sep 2016 11:31:07 +0200, Sebastian Berg kirjoitti:
> > 
> > > 
> > > * NPY_ITER_COPY_IF_OVERLAP, NPY_ITER_OVERLAP_NOT_SAME
> > >   flags for NpyIter_New.
> > > 
> > > * New API function PyArray_MapIterArrayCopyIfOverlap,
> > >   as ufunc.at needs to check overlaps for index arrays before
> > >   constructing iterators, and the parsing is done in multiarray.
> > I think here Nathaniels point might be right. It could be we can
> > assume
> > that copying is always fine, there is probably only one or two
> > downstream projects using this function, plus it seems harder to
> > create
> > abusing structures that actually do something useful.
> > It was only exposed for usage in `ufunc.at` if I remember right. I
> > know
> > theano uses it though, but not sure about anyone else, maybe numba.
> > On
> > the other hand It is not the worst API clutter in history.
> Do you suggest that I break the PyArray_MapIterArray API?
> 
> One issue here is that the function doesn't make distinction between
> read-
> only access and read-write access, so copying may give unnecessary 
> slowdown. The second thing is that it will result to a bit uglier
> code, as 
> I need to manage the overlap with the second operation in ufunc_at.
> 

Yeah, was only wondering about MapIterArray, because I might get away
with the API break in the case that it works everywhere for our
internal usage. But if its not quite straight forward, there is no
point in thinking about it.

> For NpyIter, I'd still be wary about copying by default, because it's
> not 
> needed everywhere (the may_share_memory checks are better done
> earlier), 
> and since the semantic change can break things inside Numpy.
> 

Yes, I tend to agree here about it. You can always wonder whether its
still the most convenient place to do the checks (at least for a few
places), but from glancing at the code, it still seems elegant to me.
If we are concerned about making the iterator more and more complex,
maybe we can really do something else about it as well.

I am not sure whether I will manage to look at it very closely soon, so
would invite anyone to take a look; this is definitely a highlight!

- Sebastian


>   Pauli
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New iterator APIs (nditer / MapIter): Overlap detection in NumPy

2016-09-12 Thread Sebastian Berg

On So, 2016-09-11 at 21:21 +, Pauli Virtanen wrote:
> Hi,
> 
> In the end some further API additions turn out to appear needed:
> 

Very nice :).

> * NPY_ITER_COPY_IF_OVERLAP, NPY_ITER_OVERLAP_NOT_SAME
>   flags for NpyIter_New.
> 
> * New API function PyArray_MapIterArrayCopyIfOverlap,
>   as ufunc.at needs to check overlaps for index arrays 
>   before constructing iterators, and the parsing is done 
>   in multiarray.

I think here Nathaniels point might be right. It could be we can assume
that copying is always fine, there is probably only one or two
downstream projects using this function, plus it seems harder to create
abusing structures that actually do something useful.
It was only exposed for usage in `ufunc.at` if I remember right. I know
theano uses it though, but not sure about anyone else, maybe numba. On
the other hand It is not the worst API clutter in history.

> 
> Continuation here: https://github.com/numpy/numpy/pull/8043
> 
> 
> 
> Wed, 07 Sep 2016 18:02:59 +0200, Sebastian Berg kirjoitti:
> 
> > 
> > Hi all,
> > 
> > Pauli just opened a nice pull request [1] to add overlap detection
> > to
> > the new iterator, this means adding a new iterator flag:
> > 
> > `NPY_ITER_COPY_IF_OVERLAP`
> > 
> > If passed to the iterator (also exposed in python), the iterator
> > will
> > copy the operands such that reading and writing should only occur
> > for
> > identical operands. For now this is implemented by always copying
> > the
> > output/writable operand (this could be improved though, so I would
> > not
> > say its fixed API).
> > 
> > Since adding this flag is new API, please feel free to suggest
> > other
> > names/approaches or even decline the change ;).
> > 
> > 
> > This is basically a first step, which should be easily followed by
> > adding overlap detection to ufuncs, removing traps such as the well
> > (or
> > not so well known) `a += a.T`. Other parts of numpy may follow one
> > by
> > one.
> > 
> > The work is based on his older awesome new memory overlap detection
> > implementation.
> > 
> > If there are no comments, I will probably merge it very soon, so we
> > can
> > look at the follow up things.
> > 
> > - Sebastian
> > 
> > 
> > [1] https://github.com/numpy/numpy/
> pull/8026___
> > 
> > NumPy-Discussion mailing list NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-11 Thread Sebastian Berg

On So, 2016-09-11 at 11:19 -0400, Marten van Kerkwijk wrote:
> There remains the option to just let subclasses deal with new ndarray
> features...  Certainly, for `Quantity`, I'll quite happily do that.
> And if it alllows the ndarray code to remain simple and efficient, it
> is probably the best solution.  -- Marten
> 

Maybe, but I can't quite shake the feeling that we would see a lot of
annoying bugs for subclasses that don't adept very quickely.


> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-11 Thread Sebastian Berg

On Di, 2016-09-06 at 13:59 -0400, Marten van Kerkwijk wrote:
> In a separate message, since perhaps a little less looney: yet
> another
> option would be work by analogy with np.ix_ and define pre-dispatch
> index preparation routines np.ox_ and np.vx_ (say), which would be
> used as in:
> ```
> array[np.ox_[:, 10]]   -- or -- array[np.vx_[:, 10]]
> ```
> This could work if those functions each return something appropriate
> for the legacy indexer, or, if that is not possible, a specific
> subclass of tuple as a marker that gets interpreted further up.
> 

A specific subclass of tuple Part of me thinks this is horrifying,
but it actually would solve some of the subclassing issues if
`arr.vindex[...]` could end up calling `__getitem__` with a bit special
indexing tuple value.

I simply can't quite find the end of subclassing issues. We have tests
for things like masked array correctly calling the `_data` subclass,
but if the `_data` subclass does not implement the new method, numpy
would have to run in circles (or something)

- Sebastian

> In the end, though, probably also too complicated. It may remain best
> to simply implement the new methods instead and keep it at that!
> 
> -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Continued New Indexing Methods Revival #N (subclasses!)

2016-09-10 Thread Sebastian Berg

On Sa, 2016-09-10 at 12:01 +0200, Sebastian Berg wrote:
> Hi all,
> 
> from the discussion, I was thinking maybe something like this:
> 
> class B():
>    def __numpy_getitem__(self, index, indexing_method="plain"):
>        # do magic.
>        return super().__numpy_getitem__(
>    index, indexing_method=indexing_method)
> 
> as new API. There are some issues, though. An old subclass may define
> `__getitem__`. Now the behaviour that would seem nice to me is:
> 
> 1. No new attribute (no `__numpy_getitem__`) and also no
>    `__getitem__`/`__setitem__`: Should just work
> 2. No new attribute but old attributes defined: Should at
>    least give a warning (or an error) when using the new
>    attributes, since the behaviour might buggy.
> 3. `__numpy_getitem__` defined: Will channel all indexing through it
>    (except maybe some edge cases in python 2). Best, also avoid that
>    use getitem in setitem trick If you define both (which might
>    make sense for some edge case stuff), you should just channel it
>    through this yourself.
> 

Maybe in shorter; I would like to know if a subclass:

1. requires no fixup
2. may need fixup
3. supports everything.

And I am not sure how to approach this.


> Now the issue I have is that for 1. and 2. to work correctly, I need
> to
> know which methods are overloaded by the subclass. Checking is a bit
> tedious and the method I hacked first for getitem and setitem does
> not
> work for a normal method.
> 
> Can anyone think of a nicer way to do this trick that does not
> require
> quite as much hackery. Or is there an easy way to do the overloading
> check?
> 
> - Sebastian
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Continued New Indexing Methods Revival #N (subclasses!)

2016-09-10 Thread Sebastian Berg

Hi all,

from the discussion, I was thinking maybe something like this:

class B():
   def __numpy_getitem__(self, index, indexing_method="plain"):
       # do magic.
       return super().__numpy_getitem__(
   index, indexing_method=indexing_method)

as new API. There are some issues, though. An old subclass may define
`__getitem__`. Now the behaviour that would seem nice to me is:

1. No new attribute (no `__numpy_getitem__`) and also no
   `__getitem__`/`__setitem__`: Should just work
2. No new attribute but old attributes defined: Should at
   least give a warning (or an error) when using the new
   attributes, since the behaviour might buggy.
3. `__numpy_getitem__` defined: Will channel all indexing through it
   (except maybe some edge cases in python 2). Best, also avoid that
   use getitem in setitem trick If you define both (which might
   make sense for some edge case stuff), you should just channel it
   through this yourself.

Now the issue I have is that for 1. and 2. to work correctly, I need to
know which methods are overloaded by the subclass. Checking is a bit
tedious and the method I hacked first for getitem and setitem does not
work for a normal method.

Can anyone think of a nicer way to do this trick that does not require
quite as much hackery. Or is there an easy way to do the overloading
check?

- Sebastian



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New iterator API (nditer): Overlap detection in NumPy

2016-09-07 Thread Sebastian Berg

On Mi, 2016-09-07 at 09:22 -0700, Nathaniel Smith wrote:
> On Sep 7, 2016 9:03 AM, "Sebastian Berg" <sebast...@sipsolutions.net>
> wrote:
> >
> > Hi all,
> >
> > Pauli just opened a nice pull request [1] to add overlap detection
> to
> > the new iterator, this means adding a new iterator flag:
> >
> > `NPY_ITER_COPY_IF_OVERLAP`
> >
> > If passed to the iterator (also exposed in python), the iterator
> will
> > copy the operands such that reading and writing should only occur
> for
> > identical operands. For now this is implemented by always copying
> the
> > output/writable operand (this could be improved though, so I would
> not
> > say its fixed API).
> I wonder if there is any way we can avoid the flag, and just make
> this happen automatically when appropriate? nditer has too many
> "unbreak-me" flags already.
> Are there any cases where we *don't* want the copy-if-overlap
> behavior? Traditionally overlap has triggered undefined behavior, so
> there's no backcompat issue, right?

Puh, I remember weird abuses, that sometimes stopped working. Even just
adding it to ufuncs might destroy some weird cases in someones
script

Whether or not we can just make it default, might be worth thinking
about it. What do downstream projects that use the API think? My guess
is that would be projects such as numexpr, numba, or I think theano?

Maybe another approach is to think about some other way to make good
defaults to the iterator easier/saner. Heck, I wonder if we would
default to things like "zero size ok" and warned about it, anyone would
notice unless as in: Oh I should make it zero size ok ;).

- Sebastian

> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] New iterator API (nditer): Overlap detection in NumPy

2016-09-07 Thread Sebastian Berg

Hi all,

Pauli just opened a nice pull request [1] to add overlap detection to
the new iterator, this means adding a new iterator flag:

`NPY_ITER_COPY_IF_OVERLAP`

If passed to the iterator (also exposed in python), the iterator will
copy the operands such that reading and writing should only occur for
identical operands. For now this is implemented by always copying the
output/writable operand (this could be improved though, so I would not
say its fixed API).

Since adding this flag is new API, please feel free to suggest other
names/approaches or even decline the change ;).


This is basically a first step, which should be easily followed by
adding overlap detection to ufuncs, removing traps such as the well (or
not so well known) `a += a.T`. Other parts of numpy may follow one by
one.

The work is based on his older awesome new memory overlap detection
implementation.

If there are no comments, I will probably merge it very soon, so we can
look at the follow up things.

- Sebastian


[1] https://github.com/numpy/numpy/pull/8026

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Sebastian Berg

On Di, 2016-09-06 at 13:59 -0400, Marten van Kerkwijk wrote:
> In a separate message, since perhaps a little less looney: yet
> another
> option would be work by analogy with np.ix_ and define pre-dispatch
> index preparation routines np.ox_ and np.vx_ (say), which would be
> used as in:
> ```
> array[np.ox_[:, 10]]   -- or -- array[np.vx_[:, 10]]
> ```
> This could work if those functions each return something appropriate
> for the legacy indexer, or, if that is not possible, a specific
> subclass of tuple as a marker that gets interpreted further up.

Sure, it would be a solution, but not sure it is any better
implementation wise then just passing an extra argument. As for the
syntax for plain arrays, I am not convinced to be honest.

- Sebastian


> In the end, though, probably also too complicated. It may remain best
> to simply implement the new methods instead and keep it at that!
> 
> -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Sebastian Berg

On Di, 2016-09-06 at 10:10 -0700, Stephan Hoyer wrote:
> On Mon, Sep 5, 2016 at 6:02 PM, Marten van Kerkwijk  gmail.com> wrote:
> > p.s. Just to be clear: personally, I think we should have neither
> > `__numpy_getitem__` nor a mixin; we should just get the quite
> > wonderful new indexing methods!
> +1
> 
> I don't maintain ndarray subclasses (I prefer composition), but I
> don't think it's too difficult to require implementing vindex and
> oindex properties from scratch.
> 

Well, in some sense why I brought it up is masked arrays. They have
quite a bit of code in `__getitem__` including doing an identical
indexing operation to the mask. I suppose you can always solve it with
such code:

def __special_getitem__(self, index, method=None):
    # do magic
    if method is None:
        # (not sure if this gets passed the base class
        res = super()[index]
    elif method == "outer":
        res = super().oindex[index]
    # ...
    # more magic.

def __getitem__(self, index):
    self.__special_getitem__(index)

@property
def oindex(self):
    # define class elsewhere I guess
    class _indexer(object):
        def __init__(self, arr):
            self.arr = arr
        def __getitem__(self, index)
            arr.__special_getitem__(index, method='oter')
    return _indexer(self)

Though I am not 100% sure without testing, a superclass method that
understands the `method` kwarg might work better. We can teach numpy to
pass in that `method` to getitem so that you don't have to implement
that `_indexer` class for the special attribute. I first started to do
that for MaskedArrays, and while it is not hard, it seemed a bit
tedious.

If we move this to a method with a new name, a slight advantage would
be that other oddities could be removed maybe.

By now it seems to me that nobody really needs the extra information
(i.e. preprocessing information of the indexing tuple)...? I thought it
could be useful to know things about the result, but I suppose you can
check most things (view vs. no view; field access; scalar access) also
afterwards?

> Side note: I would prefer the more verbose "legacy_index" to
> "lindex". We really want to discourage this one, and two new
> abbreviations are bad enough.

Sounds good to me.

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Sebastian Berg

On Di, 2016-09-06 at 10:57 +0100, Robert Kern wrote:
> On Tue, Sep 6, 2016 at 8:46 AM, Sebastian Berg <sebastian@sipsolution
> s.net> wrote:
> >
> > On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote:
> > > On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:
> > > >
> > > > Actually, on those names: an alternative to your proposal would
> be
> > > > to
> > > > introduce only one new method which can do all types of
> indexing,
> > > > depending on a keyword argument, i.e., something like
> > > > ```
> > > > def getitem(self, item, mode='outer'):
> > > >     ...
> > > > ```
> > > Have I been overthinking this, eh? Just making it
> `__getitem__(self,
> > > index, mode=...)` and then from `vindex` calling the subclasses
> > > `__getitem__(self, index, mode="vector")` or so would already
> solve
> > > the
> > > issue almost fully? Only thing I am not quite sure about:
> > >
> > > 1. Is `__getitem__` in some way special to make this difficult
> (also
> > > considering some new ideas like allowing object[a=4]?
> >
> > OK; I think the C-side slot cannot get the kwarg likely, but
> probably
> > you can find a solution for that
> 
> Well, the solution is to use a different name, I think.
> 

Yeah :). Which goes back to `__numpy_getitem__` or so, just with a
slightly different (simpler) API. Something more along:

1. If subclass has `__numpy_getitem__` call it with the method
   keyword. Or just add the argument to `__getitem__` which should
   likely work as well.
2. Implement `ndarray.__numpy_getitem__` which takes the method
   keyword and subclasses would call it instead of
   `ndarray.__getitem__` their base class call.

The option I first mentioned would be similar but allows to give a bit
of extra information to the subclass which may be useful. But if no one
actually needs that information (this information would be things
available after inspection of the indexing object) it just adds quite a
bit of code and thus a maintenance burden).

Such a new method could of course do things slightly different (such as
the scalar cases, I really have to understand that wrapping thing, I am
always worried about the array of array case as well. Or that annoying
setitem calls getitem. Or maybe not wrap the array at all.).

- Sebastian

> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Sebastian Berg

On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote:
> On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:
> > 
> > Actually, on those names: an alternative to your proposal would be
> > to
> > introduce only one new method which can do all types of indexing,
> > depending on a keyword argument, i.e., something like
> > ```
> > def getitem(self, item, mode='outer'):
> > ...
> > ```
> Have I been overthinking this, eh? Just making it `__getitem__(self,
> index, mode=...)` and then from `vindex` calling the subclasses
> `__getitem__(self, index, mode="vector")` or so would already solve
> the
> issue almost fully? Only thing I am not quite sure about:
> 
> 1. Is `__getitem__` in some way special to make this difficult (also
> considering some new ideas like allowing object[a=4]?

OK; I think the C-side slot cannot get the kwarg likely, but probably
you can find a solution for that

> 2. Do we currently have serious deficiencies we want to fix, and
> could
> maybe not fix like that.
> 
> 
> > 
> > -- Marten
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Sebastian Berg

On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:
> Actually, on those names: an alternative to your proposal would be to
> introduce only one new method which can do all types of indexing,
> depending on a keyword argument, i.e., something like
> ```
> def getitem(self, item, mode='outer'):
> ...
> ```

Have I been overthinking this, eh? Just making it `__getitem__(self,
index, mode=...)` and then from `vindex` calling the subclasses
`__getitem__(self, index, mode="vector")` or so would already solve the
issue almost fully? Only thing I am not quite sure about:

1. Is `__getitem__` in some way special to make this difficult (also
considering some new ideas like allowing object[a=4]?
2. Do we currently have serious deficiencies we want to fix, and could
maybe not fix like that.

> -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Sebastian Berg

On Mo, 2016-09-05 at 21:02 -0400, Marten van Kerkwijk wrote:
> p.s. Just to be clear: personally, I think we should have neither
> `__numpy_getitem__` nor a mixin; we should just get the quite
> wonderful new indexing methods!

Hehe, yes but see MaskedArrays. They need logic to also index the mask,
so `__getitem__`, etc. actually do a lot of things. Without any complex
changes (implementing a unified method within that class or similar).
The new indexing attributes simply cannot work right. And I think at
least a warning might be in order (from the numpy side) for such a
subclass.

But maybe MaskedArrays are pretty much the most complex subclass
available in that regard

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Sebastian Berg

On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:
> Actually, on those names: an alternative to your proposal would be to
> introduce only one new method which can do all types of indexing,
> depending on a keyword argument, i.e., something like
> ```
> def getitem(self, item, mode='outer'):
> ...
> ```

Yeah we can do that easily. The only disadvantage is losing the square
bracket notation.


> 
> -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Sebastian Berg

On Mo, 2016-09-05 at 18:19 -0500, Nathan Goldbaum wrote:
> 
> 
> On Monday, September 5, 2016, Marten van Kerkwijk <m.h.vankerkwijk@gm
> ail.com> wrote:
> > Hi Sebastian,
> > 
> > It would seem to me that any subclass has to keep up to date with
> > new
> > features in ndarray, and while I think ndarray has a responsibility
> > not to break backward compatibility, I do not think it has to
> > protect
> > against new features possibly not working as expected in
> > subclasses.
> > In particular, I think it is overly complicated (and an unnecessary
> > maintenance burden) to error out if a subclass has `__getitem__`
> > overwritten, but not `oindex`.
> > 
> > For somewhat similar reasons, I'm not too keen on a new
> > `__numpy_getitem__` method; I realise it might reduce complexity
> > for
> > some ndarray subclasses eventually, but it also is an additional
> > maintenance burden. If you really think it is useful, I think it
> > might
> > be more helpful to define a new mixin class which provides a
> > version
> > of all indexing methods that just call `__numpy_getitem__` if that
> > is
> > provided by the class that uses the mixin. I would *not* put it in
> > `ndarray` proper.
> I disagree that multiple inheritance (i.e. with your proposed mixin
> and ndarray) is something that numpy should enshrine in its API for
> subclasses. As the maintainer of an ndarray subclass, I'd much rather
> prefer just to implement a new duner method that older numpy versions
> will ignore.
>  

Hmm, OK, so that would be a + for the method solution even without the
need of any of the extra capabilities that may be possible.

> > 
> > Indeed, the above might even be handier for subclasses, since they
> > can
> > choose, if they wish, to implement a similar mixin for older numpy
> > versions, so that all the numpy version stuff can be moved to a
> > single
> > location. (E.g., I can imagine doing the same for
> > `__numpy_ufunc__`.)
> > 
> > Overall, my sense would be to keep your PR to just implementing the
> > various new index methods (which are great -- I still don't really
> > like the names, but sadly don't have better suggestions...).
> > 
> > But it might be good if others pipe in here too, in particular
> > those
> > maintaining ndarray subclasses!
> > 
> > All the best,
> > 
> > Marten
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Sebastian Berg

On Mo, 2016-09-05 at 18:24 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> It would seem to me that any subclass has to keep up to date with new
> features in ndarray, and while I think ndarray has a responsibility
> not to break backward compatibility, I do not think it has to protect
> against new features possibly not working as expected in subclasses.
> In particular, I think it is overly complicated (and an unnecessary
> maintenance burden) to error out if a subclass has `__getitem__`
> overwritten, but not `oindex`.
> 

It is not complicated implementation wise to check for `__getitem__`
existence. However, I start to agree that a warning icould be the
better option. It might work after all.

> For somewhat similar reasons, I'm not too keen on a new
> `__numpy_getitem__` method; I realise it might reduce complexity for
> some ndarray subclasses eventually, but it also is an additional
> maintenance burden. If you really think it is useful, I think it
> might
> be more helpful to define a new mixin class which provides a version
> of all indexing methods that just call `__numpy_getitem__` if that is
> provided by the class that uses the mixin. I would *not* put it in
> `ndarray` proper.
> 

Yes, that is maybe a simplier option (in the sense of maintainability),
the other would have a bit of extra information available. If this
extra information is unnecessary, a MixIn is probably a bit simpler.

> Indeed, the above might even be handier for subclasses, since they
> can
> choose, if they wish, to implement a similar mixin for older numpy
> versions, so that all the numpy version stuff can be moved to a
> single
> location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.)
> 

You can always implement a mixin for older verions if you do all the
logic yourself, but I would prefer not to duplicate that logic (Jaime
wrote a python class that does it for normal arrays -- not sure if its
100% the same as I did, but you could probably use it in a subclass).

So that a numpy provided mixin would not help with that supporting it
in old numpy versions, I think.

> Overall, my sense would be to keep your PR to just implementing the
> various new index methods (which are great -- I still don't really
> like the names, but sadly don't have better suggestions...).
> 

Well... The thing is that we have to fix the subclasses within numpy as
well (most importantly MaskedArrays). Of course you could delay things
a bit, but in the end whatever we use internally could likely also be
whatever subclasses might end up using.

> But it might be good if others pipe in here too, in particular those
> maintaining ndarray subclasses!
> 

Yeah :).

> All the best,
> 
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-05 Thread Sebastian Berg

On Mo, 2016-09-05 at 14:54 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> Indeed, having the scalar pass through `__array_wrap__` would have
> been useful (_finalize__ is too late, since one cannot change the
> class any more, just set attributes).  But that is water under the
> bridge, since we're stuck with people not expecting that.
> 
> I think the slightly larger question, but one somewhat orthogonal to
> your suggestion of a new dundermethod, is whether one cannot avoid
> more such methods by the new indexing routines returning array
> scalars
> instead of regular ones.
> 
> Obviously, though, this has larger scope, as it might be part of the
> merging of the now partially separate code paths for scalar and array
> arithmetic, etc.

Thanks for the input. I am not quite sure about all of the things.
Calling array wrap for the scalar returns does not sound like a problem
(it would also effect other code paths). Calling it only for the new
methods creates a bit of branching, but is not a big deal.

Would it help you though? You could avoid implementing all the new
indexing methods for many/most subclasses, but how do you tell numpy
that you are supporting them? Right now I thought it would make sense
to give an error if you try `subclass.vindex[...]` but the subclass has
`__getitem__` implemented (and not overwritten vindex).

The dundermethod gives a way to tell numpy: you know what to do. For
the sake of masked arrays it is also convenient (you can use the
indexer also on the mask), but masked arrays are rather special. It
would be interesting if there are more complex subclasses out there,
which implement `__getitem__` or `__setitem__`. Maybe all we need is
some new trick for the scalars and most subclasses can just remove
their `__getitem__` methods

- Sebastian

> 
> All the best,
> 
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Correct error of invalid axis arguments.

2016-09-05 Thread Sebastian Berg

On Mo, 2016-09-05 at 11:54 -0600, Charles R Harris wrote:
> Hi All,
> 
> At the moment there are two error types raised when invalid axis
> arguments are encountered: IndexError and ValueError. I prefer
> ValueError for arguments, IndexError seems more appropriate when the
> bad axis value is used as an index. In any case, having mixed error
> types is inconvenient, but also inconvenient to change. Should we
> worry about that? If so, what should the error be? Note that some of
> the mixup arises because the axis values are not checked before use,
> in which case IndexError is raised.
> 

I am not too bothered about it myself, but yes, it is a bit annoying.
My gut feeling on it would be to not worry about it much, unless we
implement some more general input validator for the python side (which
possibly could do even more like validate/convert input arrays all in
one go).
Putting explicit guards to every single python side function is of
course possible too, but I am not quite convinced its worth the
trouble.

- Sebastian

> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-04 Thread Sebastian Berg

On So, 2016-09-04 at 11:20 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> I haven't given this as much thought as it deserves, but thought I
> would comment from the astropy perspective, where we both have direct
> subclasses of `ndarray` (`Quantity`, `Column`, `MaskedColumn`) and
> classes that store their data internally as ndarray (subclass)
> instances (`Time`, `SkyCoord`, ...).
> 
> One comment would be that if one were to introduce a special method,
> one should perhaps think a bit more broadly, and capture more than
> the
> indexing methods with it. I wonder about this because for the
> array-holding classes mentioned above, we initially just had
> `__getitem__`, which got the relevant items from the underlying
> arrays, and then constructed a new instance with those. But recently
> we realised that methods like `reshape`, `transpose`, etc., require
> essentially the same steps, and so we constructed a new
> `ShapedLikeNDArray` mixin, which provides all of those [1] as long as
> one defines a single `_apply` method. (Indeed, it turns out that the
> same technique works without any real change for some numpy functions
> such as `np.broadcast_to`.)
> 
> That said, in the actual ndarray subclasses, we have not found a need
> to overwrite any of the reshaping methods, since those methods are
> all
> handled OK via `__array_finalize__`. We do overwrite `__getitem__`
> (and `item`) as we need to take care of scalars. And we would
> obviously have to overwrite `oindex`, etc., as well, for the same
> reason, so in that respect a common method might be useful.
> 
> However, perhaps it is worth considering that the only reason we need
> to overwrite them in the first place, unlike what is the case for all
> the shape-changing methods, is that scalar output does not get put
> through `__array_finalize__`. Might it be an idea to have the new
> indexing methods return array scalars instead of normal ones so we
> can
> get rid of this?

I did not realize the new numpys are special with the scalar handling?
The indexing (already before 1.9. I believe) always goes through
PyArray_ScalarReturn or so, which I thought was used by almost all
functions.

If you mean the attributes (oindex, etc.), they could behave a bit
different of course, though not sure to what it extend it actually
helps since that would also create disparity.
If we implement a new special method (__numpy_getitem__), they
definitely should behave slightly different in some places. One option
might be to not even do the wrapping, but leave it to the subclass.

However, if you have an array with arrays inside, knowing whether to
return a scalar correctly would have to rely on inspecting the index
object, which is why I suggested the indexer to give a few extra
informations (such as this one).

Of course, since the scalar return goes through a ScalarReturn
function, that function could maybe also be tought to indicate the
scalar to `__array_finalize__`/`__array_wrap__` (not sure what exactly
applies).

- Sebastian

> All the best,
> 
> Marten
> 
> [1] https://github.com/astropy/astropy/blob/master/astropy/utils/misc
> .py#L856
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-04 Thread Sebastian Berg

On So, 2016-09-04 at 14:10 +0200, Sebastian Berg wrote:
> On Sa, 2016-09-03 at 21:08 +0200, Sebastian Berg wrote:
> > 
> > Hi all,
> > 
> > not that I am planning to spend much time on this right now,
> > however,
> > I
> > did a small rebase of the stuff I had (did not push yet) on oindex
> > and
> > remembered the old problem ;).
> > 
> > The one remaining issue I have with adding things like (except
> > making
> > the code prettier and writing tests):
> > 
> > arr.oindex[...]  # outer/orthogonal indexing
> > arr.vindex[...]  # Picking of elements (much like current)
> > arr.lindex[...]  # current behaviour for backward compat
> > 
> > is what to do about subclasses. Now what I can do (and have
> > currently
> > in my branch) is to tell someone on `subclass.oindex[...]`: This
> > won't
> > work, the subclass implements `__getitem__` or `__setitem__` so I
> > don't
> > know if the result would be correct (its a bit annoying if you also
> > warn about using those attributes, but...).
> > 
> Hmm, I am considering to expose a new indexing helper object. So that
> subclasses could implement something like `__numpy_getitem__` and
> `__numpy_setitem__` and if they do (and preferably nothing else) they
> would get back passed a small object with some information about the
> indexing operation. So that the subclass would implement:
> 
> ```
> def __numpy_setitem__(self, indexer, values):
>     indexer.method  # one of {"plain", "oindex", "vindex", "lindex"}
>     indexer.scalar  # Will the result be a scalar?
>     indexer.view  # Will the result be a view or a copy?
>     # More information might be possible (note that not all checks
> are
> # done at this point, just basic checks will have happened
> already).
> 
>     # Do some code, that prepares self or values, could also use
>     # indexer for another array (e.g. mask) of the same shape.
> 
>     result = indexer(self, values)
> 
>     # Do some coded to fixup the result if necessary.
>     # Should discuss whether result is first a plain ndarray or
>     # already wrapped.
> ```

Hmm, field access is a bit annoying, but I guess can/has to be
included.

> 
> This could be implemented in the C-side without much hassle, I think.
> Of course it adds some new API which we would have to support
> indefinitely. But it seems something like this would also fix the
> hassle of identifying e.g. if the result should be a scalar for a
> subclass (which may even be impossible in some cases).
> 
> Would be very happy about feedback from subclassers!
> 
> - Sebastian
> 
> 
> > 
> > However, with or without such error, we need a nice way for
> > subclasses
> > to define these attributes! This is important even within numpy at
> > least for masked arrays (possibly also matrix and memmap).
> > 
> > They (typically) do some stuff before or after the plain indexing
> > operation, so how do we make it convenient to allow them to do the
> > same
> > stuff for the special indexing attributes without weird code
> > duplication? I can think of things, but nothing too great yet so
> > maybe
> > you guys got an elegant idea.
> > 
> > - Sebastian
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-04 Thread Sebastian Berg

On Sa, 2016-09-03 at 21:08 +0200, Sebastian Berg wrote:
> Hi all,
> 
> not that I am planning to spend much time on this right now, however,
> I
> did a small rebase of the stuff I had (did not push yet) on oindex
> and
> remembered the old problem ;).
> 
> The one remaining issue I have with adding things like (except making
> the code prettier and writing tests):
> 
> arr.oindex[...]  # outer/orthogonal indexing
> arr.vindex[...]  # Picking of elements (much like current)
> arr.lindex[...]  # current behaviour for backward compat
> 
> is what to do about subclasses. Now what I can do (and have currently
> in my branch) is to tell someone on `subclass.oindex[...]`: This
> won't
> work, the subclass implements `__getitem__` or `__setitem__` so I
> don't
> know if the result would be correct (its a bit annoying if you also
> warn about using those attributes, but...).
> 

Hmm, I am considering to expose a new indexing helper object. So that
subclasses could implement something like `__numpy_getitem__` and
`__numpy_setitem__` and if they do (and preferably nothing else) they
would get back passed a small object with some information about the
indexing operation. So that the subclass would implement:

```
def __numpy_setitem__(self, indexer, values):
    indexer.method  # one of {"plain", "oindex", "vindex", "lindex"}
    indexer.scalar  # Will the result be a scalar?
    indexer.view  # Will the result be a view or a copy?
    # More information might be possible (note that not all checks are
# done at this point, just basic checks will have happened already).

    # Do some code, that prepares self or values, could also use
    # indexer for another array (e.g. mask) of the same shape.

    result = indexer(self, values)

    # Do some coded to fixup the result if necessary.
    # Should discuss whether result is first a plain ndarray or
    # already wrapped.
```

This could be implemented in the C-side without much hassle, I think.
Of course it adds some new API which we would have to support
indefinitely. But it seems something like this would also fix the
hassle of identifying e.g. if the result should be a scalar for a
subclass (which may even be impossible in some cases).

Would be very happy about feedback from subclassers!

- Sebastian

> However, with or without such error, we need a nice way for
> subclasses
> to define these attributes! This is important even within numpy at
> least for masked arrays (possibly also matrix and memmap).
> 
> They (typically) do some stuff before or after the plain indexing
> operation, so how do we make it convenient to allow them to do the
> same
> stuff for the special indexing attributes without weird code
> duplication? I can think of things, but nothing too great yet so
> maybe
> you guys got an elegant idea.
> 
> - Sebastian
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-03 Thread Sebastian Berg

Hi all,

not that I am planning to spend much time on this right now, however, I
did a small rebase of the stuff I had (did not push yet) on oindex and
remembered the old problem ;).

The one remaining issue I have with adding things like (except making
the code prettier and writing tests):

arr.oindex[...]  # outer/orthogonal indexing
arr.vindex[...]  # Picking of elements (much like current)
arr.lindex[...]  # current behaviour for backward compat

is what to do about subclasses. Now what I can do (and have currently
in my branch) is to tell someone on `subclass.oindex[...]`: This won't
work, the subclass implements `__getitem__` or `__setitem__` so I don't
know if the result would be correct (its a bit annoying if you also
warn about using those attributes, but...).

However, with or without such error, we need a nice way for subclasses
to define these attributes! This is important even within numpy at
least for masked arrays (possibly also matrix and memmap).

They (typically) do some stuff before or after the plain indexing
operation, so how do we make it convenient to allow them to do the same
stuff for the special indexing attributes without weird code
duplication? I can think of things, but nothing too great yet so maybe
you guys got an elegant idea.

- Sebastian

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python

2016-09-02 Thread Sebastian Haase

How do these two relate to each other !?
- Sebastian


On Fri, Sep 2, 2016 at 12:33 PM, Carl Kleffner <cmkleff...@gmail.com> wrote:

> maybe https://bitbucket.org/memotype/cffiwrap or https://github.com/
> andrewleech/cfficloak helps?
>
> C.
>
>
> 2016-09-02 11:16 GMT+02:00 Nathaniel Smith <n...@pobox.com>:
>
>> On Fri, Sep 2, 2016 at 1:16 AM, Peter Creasey
>> <p.e.creasey...@googlemail.com> wrote:
>> >> Date: Wed, 31 Aug 2016 13:28:21 +0200
>> >> From: Michael Bieri <mibi...@gmail.com>
>> >>
>> >> I'm not quite sure which approach is state-of-the-art as of 2016. How
>> would
>> >> you do it if you had to make a C/C++ library available in Python right
>> now?
>> >>
>> >> In my case, I have a C library with some scientific functions on
>> matrices
>> >> and vectors. You will typically call a few functions to configure the
>> >> computation, then hand over some pointers to existing buffers
>> containing
>> >> vector data, then start the computation, and finally read back the
>> data.
>> >> The library also can use MPI to parallelize.
>> >>
>> >
>> > Depending on how minimal and universal you want to keep things, I use
>> > the ctypes approach quite often, i.e. treat your numpy inputs an
>> > outputs as arrays of doubles etc using the ndpointer(...) syntax. I
>> > find it works well if you have a small number of well-defined
>> > functions (not too many options) which are numerically very heavy.
>> > With this approach I usually wrap each method in python to check the
>> > inputs for contiguity, pass in the sizes etc. and allocate the numpy
>> > array for the result.
>>
>> FWIW, the broader Python community seems to have largely deprecated
>> ctypes in favor of cffi. Unfortunately I don't know if anyone has
>> written helpers like numpy.ctypeslib for cffi...
>>
>> -n
>>
>> --
>> Nathaniel J. Smith -- https://vorpus.org
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Which NumPy/Numpy/numpy spelling?

2016-08-31 Thread Sebastian Berg

On Di, 2016-08-30 at 12:17 -0700, Stefan van der Walt wrote:
> On Mon, Aug 29, 2016, at 04:43, m...@telenczuk.pl wrote:
> > 
> > The documentation is not consistent and it mixes both NumPy and
> > Numpy.
> > For example, the reference manual uses both spellings in the
> > introduction
> > paragraph (http://docs.scipy.org/doc/numpy/reference/):
> > 
> > "This reference manual details functions, modules, and objects
> > included in Numpy, describing what they are and what they do.
> > For
> > learning how to use NumPy, see also NumPy User Guide."
> That's technically a bug: the official spelling is NumPy.  But, no
> one
> really cares :)
> 

I like the fact that this is all posted in: [Numpy-discussion] ;).

- Sebastian

> Stéfan
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Indexing issue with ndarrays

2016-08-26 Thread Sebastian Berg

On Fr, 2016-08-26 at 09:57 -0400, Joseph Fox-Rabinovitz wrote:
> 
> 
> On Thu, Aug 25, 2016 at 4:37 PM, Sebastian Berg <sebastian@sipsolutio
> ns.net> wrote:
> > On Do, 2016-08-25 at 10:36 -0400, Joseph Fox-Rabinovitz wrote:
> > > This issue recently came up on Stack Overflow: http://stackoverfl
> > ow.c
> > > om/questions/39145795/masking-a-series-with-a-boolean-array. The
> > > poster attempted to index an ndarray with a pandas boolean Series
> > > object (all False), but the result was as if he had indexed with
> > an
> > > array of integer zeros.
> > >
> > > Can someone explain this behavior? I can see two obvious
> > > possibilities:
> > > ndarray checks if the input to __getitem__ is of exactly the
> > right
> > > type, not using instanceof.
> > > pandas actually uses a wider datatype than boolean internally, so
> > > indexing with the series is in fact indexing with an integer
> > array.
> > 
> > You are overthinking it ;). The reason is quite simply that the
> > logic
> > used to be:
> > 
> >  * Boolean array? -> think about boolean indexing.
> >  * Everything "array-like" (not caught earlier) -> convert to
> > `intp`
> > array and do integer indexing.
> > 
> > Now you might wonder why, but probably it is quite simply because
> > boolean indexing was tagged on later.
> > 
> > - Sebastian
> > 
> > 
> > > In my attempt to reproduce the poster's results, I got the
> > following
> > > warning:
> > > FutureWarning: in the future, boolean array-likes will be handled
> > as
> > > a boolean array index
> > > This indicates that the issue is probably #1 and that a fix is
> > > already on the way. Please correct me if I am wrong. Also, where
> > does
> > > the code for ndarray.__getitem__ live?
> > > Thanks,
> > >     -Joe
> > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> This makes perfect sense. I would like to help fix it if a fix is
> desired and has not been done already. Could you point me to where
> the "Boolean array?, etc." decision happens? I have had trouble
> navigating to `__getitem__` (which I assume is somewhere in
> np.core.multiarray C code.
> 

As the warning says, it already is fixed in a sense (we just have to
move forward with the deprecation, which you can maybe actually do at
this time). This is all in the mapping.c stuff, without checking, there
is a function called something like "prepare index" which goes through
all the different types of indexing objects. It should be pretty
straight forward to find the warning.

The actual old behaviour where this behaviour originated in was a
completely different code base though (you would have to check out some
pre 1.9 version of numpy if you are interested in archeology.

- Sebastian




>     -Joe
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Indexing issue with ndarrays

2016-08-25 Thread Sebastian Berg

On Do, 2016-08-25 at 10:36 -0400, Joseph Fox-Rabinovitz wrote:
> This issue recently came up on Stack Overflow: http://stackoverflow.c
> om/questions/39145795/masking-a-series-with-a-boolean-array. The
> poster attempted to index an ndarray with a pandas boolean Series
> object (all False), but the result was as if he had indexed with an
> array of integer zeros.
> 
> Can someone explain this behavior? I can see two obvious
> possibilities:
> ndarray checks if the input to __getitem__ is of exactly the right
> type, not using instanceof.
> pandas actually uses a wider datatype than boolean internally, so
> indexing with the series is in fact indexing with an integer array.

You are overthinking it ;). The reason is quite simply that the logic
used to be:

 * Boolean array? -> think about boolean indexing.
 * Everything "array-like" (not caught earlier) -> convert to `intp`
array and do integer indexing.

Now you might wonder why, but probably it is quite simply because
boolean indexing was tagged on later.

- Sebastian


> In my attempt to reproduce the poster's results, I got the following
> warning:
> FutureWarning: in the future, boolean array-likes will be handled as
> a boolean array index
> This indicates that the issue is probably #1 and that a fix is
> already on the way. Please correct me if I am wrong. Also, where does
> the code for ndarray.__getitem__ live?
> Thanks,
>     -Joe
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] How to trigger warnings for integer division in python 2

2016-08-19 Thread Sebastian Berg

On Fr, 2016-08-19 at 11:29 -0400, Stuart Berg wrote:
> Hi,
> 
> To help people migrate their code bases from Python 2 to Python 3,
> the python interpreter has a handy option '-3' that issues warnings
> at runtime.  One of the warnings is for integer division:
> 
> $ echo "print 3/2" > /tmp/foo.py
> $ python -3 /tmp/foo.py
> /tmp/foo.py:1: DeprecationWarning: classic int division
>   print 3/2
> 1
> 
> But no warnings are shown for division of numpy arrays, e.g. for a
> statement like this:
> print np.array([3]) / np.array([2])
> 
> I see that np.seterr can be used to issue certain types of division
> warnings, but not this one.  Is there a way to activate integer
> division warnings?  It would really help me migrate my application to
> Python 3.
> 

I don't think numpy implements any py3kwarnings. It seems that it would
be possible though. On newer versions we got more strict about using
floats instead of ints, so some of these places might follow up with a
warning quickly.
I guess the question is whether we should aim to add at least some of these 
warnings and someone is willing to put in the effort (I suppose it is likely 
only a few places). I am not sure how easy they are on the C side, but probably 
not difficult at all.

- Sebastian

> Thanks,
> Stuart
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Views and Increments

2016-08-08 Thread Sebastian Berg

On Mo, 2016-08-08 at 15:11 +0200, Anakim Border wrote:
> Dear List,
> 
> I'm experimenting with views and array indexing. I have written two
> code blocks that I was expecting to produce the same result.
> 
> First try:
> 
> >>> a = np.arange(10)
> >>> b = a[np.array([1,6,5])]
> >>> b += 1
> >>> a
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> 
> Alternative version:
> 
> >>> a = np.arange(10)
> >>> a[np.array([1,6,5])] += 1
> >>> a
> array([0, 2, 2, 3, 4, 6, 7, 7, 8, 9])
> 
> 
> I understand what is happening in the first case. In fact, the
> documentation is quite clear on the subject:
> 
> For all cases of index arrays, what is returned is a copy of the
> original data, not a view as one gets for slices.
> 
> What about the second case? There, I'm not keeping a reference to the
> intermediate copy (b, in the first example). Still, I don't see why
> the update (to the copy) is propagating to the original array. Is
> there any implementation detail that I'm missing?
> 

The second case translates to:

tmp = a[np.array([1,6,5])] + 1
a[np.array([1,6,5])] = tmp

this is done by python, without any interplay of numpy at all. Which is
different from `arr += 1`, which is specifically defined and translates
to `np.add(arr, 1, out=arr)`.

- Sebastian


> Best,
>   ab
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Euroscipy

2016-08-02 Thread Sebastian Berg

Hi all,

I am still pondering whether or not (and if which days) to go to
EuroScipy. Who else is there and would like to discuss a bit or
whatever else?

- Sebastian

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Is there any official position on PEP484/mypy?

2016-07-29 Thread Sebastian Berg

On Mi, 2016-07-27 at 20:07 +0100, Daniel Moisset wrote:
> 
> Hi, 
> 
> I work at Machinalis were we use a lot of numpy (and the pydata stack
> in general). Recently we've also been getting involved with mypy,
> which is a tool to type check (not on runtime, think of it as a
> linter) annotated python code (the way of annotating python types has
> been recently standarized in PEP 484).
> 
> As part of that involvement we've started creating type annotations
> for the Python libraries we use most, which include numpy. Mypy
> provides a way to specify types with annotations in separate files in
> case you don't have control over a library, so we have created an
> initial proof of concept at [1], and we are actively improving it.
> You can find some additional information about it and some problems
> we've found on the way at this blogpost [2].
> 
> What I wanted to ask is if the people involved on the numpy project
> are aware of PEP484 and if you have some interest in starting using
> them. The main benefit is that annotations serve as clear (and
> automatically testable) documentation for users, and secondary
> benefits is that users discovers bugs more quickly and that some IDEs
> (like pycharm) are starting to use this information for smart editor
> features (autocompletion, online checking, refactoring tools);
> eventually tools like jupyter could take advantage of these
> annotations in the future. And the cost of writing and including
> these are relatively low.
> 

There is currently no plan to do it as far as I know, but with these
things it is often more of a problem that someone volunteers to
maintain it then to convince everyone that it is a good idea.
If there is enough interest we could talk about hosting it on the numpy
github group as a separate project to make it a bit more
visible/obvious that such a project exists.

For inclusion in numpy, it seems to me that currently this would
probably be better of improved separately? In the long run, would it be
possible to do something like run all numpy tests and then check
whether the definitions cover (almost) everything, or test against the
documentation or so? Otherwise it might get tricky to keep things quite
up to date, at least until these type checks are very widely used. Also
I wonder if all or most of numpy can be easily put into it.

Anyway, it seems like a great project to have as much support for type
annotations as possible. I have never used them, but with editors
picking up on these things it sounds like it could be very useful in
the future.

- Sebastian

> We're doing the work anyway, but contributing our typespecs back
> could make it easier for users to benefit from this, and for us to
> maintain it and keep it in sync with future releases.
> 
> If you've never heard about PEP484 or mypy (it happens a lot) I'll be
> happy to clarify anything about it that might helpunderstand this
> situation
> 
> Thanks!
> 
> D.
> 
> 
> [1] https://github.com/machinalis/mypy-data 
> [2] http://www.machinalis.com/blog/writing-type-stubs-for-numpy/
> 
> -- 
> Daniel F. Moisset - UK Country Manager
> www.machinalis.com
> Skype: @dmoisset
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Custom Dtype/Units discussion

2016-07-15 Thread Sebastian Berg

On Do, 2016-07-14 at 18:46 -0500, Nathan Goldbaum wrote:
> We are in room 203
> 

You guys were probably doing that anyway, and I know you are too busy
right now. But if there were some nice ideas/plans from this
discussion, related to Numpy or not, I would appreciate a lot if one of
you can send a few notes.

Have some nice last days at SciPy!

- Sebastian


> On Thursday, July 14, 2016, Nathan Goldbaum <nathan12...@gmail.com>
> wrote:
> > 
> > 
> > On Thu, Jul 14, 2016 at 11:05 AM, Nathaniel Smith <n...@pobox.com>
> > wrote:
> > > Where is "the downstairs lobby"? I can think of 4 places that I
> > > might describe that way depending on context :-)
> > > (Alternative: meet by the conference registration desk?)
> > > 
> > Sorry, that's what I meant.
> >  
> > > 
> > > On Jul 14, 2016 10:52, "Nathan Goldbaum" <nathan12...@gmail.com>
> > > wrote:
> > > > Fine with me as well. Meet in the downstairs lobby after the
> > > > lightning talks?
> > > > 
> > > > On Thu, Jul 14, 2016 at 10:49 AM, Ryan May <rma...@gmail.com>
> > > > wrote:
> > > > > Fine with me.
> > > > > 
> > > > > Ryan
> > > > > 
> > > > > On Thu, Jul 14, 2016 at 12:48 AM, Nathaniel Smith <njs@pobox.
> > > > > com> wrote:
> > > > > > I have something at lunch, so dinner would be good for me
> > > > > > too.
> > > > > > On Jul 13, 2016 7:46 PM, "Charles R Harris"  > > > > > s...@gmail.com> wrote:
> > > > > > > Evening would work for me. Dinner?
> > > > > > > On Jul 13, 2016 2:43 PM, "Ryan May" <rma...@gmail.com>
> > > > > > > wrote:
> > > > > > > > On Mon, Jul 11, 2016 at 12:39 PM, Chris Barker  > > > > > > > ar...@noaa.gov> wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Sun, Jul 10, 2016 at 8:12 PM, Nathan Goldbaum  > > > > > > > > han12...@gmail.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > Maybe this can be an informal BOF session?
> > > > > > > > > > 
> > > > > > > > > or  maybe a formal BoF? after all, how formal do they
> > > > > > > > > get?
> > > > > > > > > 
> > > > > > > > > Anyway, it was my understanding that we really needed
> > > > > > > > > to do some significant refactoring of how numpy deals
> > > > > > > > > with dtypes in order to do this kind of thing cleanly
> > > > > > > > > -- so where has that gone since last year? 
> > > > > > > > > 
> > > > > > > > > Maybe this conversation should be about how to build
> > > > > > > > > a more flexible dtype system generally, rather than
> > > > > > > > > specifically about unit support. (though unit support
> > > > > > > > > is a great use-case to focus on)
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > So Thursday's options seem to be in the standard BOF
> > > > > > > > slot (up against the Numfocus BOF), or doing something
> > > > > > > > that evening, which would overlap at least part of
> > > > > > > > multiple happy hour events. I lean towards evening.
> > > > > > > > Thoughts?
> > > > > > > > 
> > > > > > > > Ryan
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > Ryan May
> > > > > > > > 
> > > > > > > > 
> > > > > > > > ___
> > > > > > > > NumPy-Discussion mailing list
> > > > > > > > NumPy-Discussion@scipy.org
> > > > > > > > https://mail.scipy.org/mailman/listinfo/numpy-discussio
> > > > > > > > n
> > > > > > > > 
> > > > > > > ___
> > > > > > > NumPy-Discussion mailing list
> > > > > > > NumPy-Discussion@scipy.org
> > > > > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > > > > > 
> > > > > > ___
> > > > > > NumPy-Discussion mailing list
> > > > > > NumPy-Discussion@scipy.org
> > > > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > > > > 
> > > > > 
> > > > > 
> > > > > -- 
> > > > > Ryan May
> > > > > 
> > > > > 
> > > > > ___
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion@scipy.org
> > > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > > > 
> > > > 
> > > > ___
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion@scipy.org
> > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > > 
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > 
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-07 Thread Sebastian Berg

On Mi, 2016-07-06 at 15:30 -0400, Benjamin Root wrote:
> I don't see how one could define a spec that would take an arbitrary
> array of indices at which to place new dimensions. By definition, you
> 

You just give a reordered range, so that (1, 0, 2) would be the current
3D version. If 1D, fill in `1` and `2`, if 2D, fill in only `2` (0D,
add everything of course).
However, I have my doubts that it is actually easier to understand then
to write yourself ;).

- Sebastian
 

> don't know how many dimensions are going to be added. If you knew,
> then you wouldn't be calling this function. I can only imagine simple
> rules such as 'left' or 'right' or maybe something akin to what
> at_least3d() implements.
> 
> On Wed, Jul 6, 2016 at 3:20 PM, Joseph Fox-Rabinovitz  @gmail.com> wrote:
> > On Wed, Jul 6, 2016 at 2:57 PM, Eric Firing <efir...@hawaii.edu>
> > wrote:
> > > On 2016/07/06 8:25 AM, Benjamin Root wrote:
> > >>
> > >> I wouldn't have the keyword be "where", as that collides with
> > the notion
> > >> of "where" elsewhere in numpy.
> > >
> > >
> > > Agreed.  Maybe "side"?
> > 
> > I have tentatively changed it to "pos". The reason that I don't
> > like
> > "side" is that it implies only a subset of the possible ways that
> > that
> > the position of the new dimensions can be specified. The current
> > implementation only puts things on one side or the other, but I
> > have
> > considered also allowing an array of indices at which to place new
> > dimensions, and/or a dictionary keyed by the starting ndims. I do
> > not
> > think "side" would be appropriate for these extended cases, even if
> > they are very unlikely to ever materialize.
> > 
> >     -Joe
> > 
> > > (I find atleast_1d and atleast_2d to be very helpful for handling
> > inputs, as
> > > Ben noted; I'm skeptical as to the value of atleast_3d and
> > atleast_nd.)
> > >
> > > Eric
> > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-06 Thread Sebastian Berg

On Mi, 2016-07-06 at 10:22 -0400, Marten van Kerkwijk wrote:
> Hi All,
> 
> I'm with Nathaniel here, in that I don't really see the point of
> these routines in the first place: broadcasting takes care of many of
> the initial use cases one might think of, and others are generally
> not all that well served by them: the examples from scipy to me do
> not really support `at_least?d`, but rather suggest that little
> thought has been put into higher-dimensional objects which should be
> treated as stacks of row or column vectors. My sense is that we're
> better off developing the direction started with `matmul`, perhaps
> adding `matvecmul` etc.
> 
> More to the point of the initial inquiry: what is the advantage of
> having a general `np.at_leastnd` routine over doing

There is another wonky reason for using the atleast_?d functions, in
that they use reshape to be fully duck typed ;) (in newer versions at
least, probably mostly for sparse arrays, not sure).

Tend to agree though, especially considering the confusing order of 3d,
which I suppose is likely due to some linalg considerations. Of course
you could supply something like an insertion order of (1, 0, 2) to
denote the current 3D case in the nd one, but frankly it seems to me
likely harder to understand how it works then to write your own
functions to just do it.

Scipy uses the 3D case exactly never (once in a test). I have my doubts
many would notice if we deprecate the 3D case, but then it is likely
more trouble then gain.

- Sebastian

> ```
> np.array(a, copy=False, ndim=n)
> ```
> or, for a list of inputs,
> ```
> [np.array(a, copy=False, ndim=n) for a in input_list]
> ```
> 
> All the best,
> 
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Is numpy.test() supposed to be multithreaded?

2016-06-29 Thread Sebastian Berg

On Mi, 2016-06-29 at 02:03 -0700, Nathaniel Smith wrote:
> As a general rule I wouldn't worry too much about test speed. Speed
> is
> extremely dependent on exact workloads. And this is doubly so for
> test
> suites -- production workloads tend to do a small number of normal
> things over and over, while a good test suite never does the same
> thing twice and spends most of its time exercising weird edge
> conditions. So unless your actual workload is running the numpy test
> suite :-), it's probably not worth trying to track down.
> 

Agreed, the test suit, and likely also the few tests which might take
most time in the end, could be arbitrarily weird and skewed. I could
for example imagine IO speed being a big factor. Also depending on
system configuration (or numpy version) a different number of tests may
be run sometimes.

What might make somewhat more sense would be to compare some of the
benchmarks `python runtests.py --bench` if you have airspeed velocity
installed. While not extensive, a lot of those things at least do test
more typical use cases. Though in any case I think the user should
probably just test some other thing.

- Sebastian


> And yeah, numpy does not in general do automatic multithreading --
> the
> only automatic multithreading you should see is when using linear
> algebra functions (matrix multiply, eigenvalue calculations, etc.)
> that dispatch to the BLAS.
> 
> -n
> 
> On Wed, Jun 29, 2016 at 12:07 AM, Ralf Gommers <ralf.gomm...@gmail.co
> m> wrote:
> > 
> > 
> > 
> > On Wed, Jun 29, 2016 at 3:27 AM, Chris Barker - NOAA Federal
> > <chris.bar...@noaa.gov> wrote:
> > > 
> > > 
> > > 
> > > > 
> > > > 
> > > > Now the user is writing back to say, "my test code is fast now,
> > > > but
> > > > numpy.test() is still about three times slower than  > > > server we
> > > > don't manage>".  When I watch htop as numpy.test() executes,
> > > > sure enough,
> > > > it's using one core
> > > 
> > > > 
> > > > * if numpy.test() is supposed to be using multiple cores, why
> > > > isn't it,
> > > > when we've established with other test code that it's now using
> > > > multiple
> > > > cores?
> > > 
> > > Some numpy.linalg functions (like np.dot) will be using multiple
> > > cores,
> > > but np.linalg.test() takes only ~1% of the time of the full test
> > > suite.
> > > Everything else will be running single core. So your observations
> > > are not
> > > surprising.
> > > 
> > > 
> > > Though why it would run slower on one box than another comparable
> > > box is a
> > > mystery...
> > 
> > Maybe just hardware config? I see a similar difference between how
> > long the
> > test suite runs on TravisCI vs my linux desktop (the latter is
> > slower,
> > surprisingly).
> > 
> > Ralf
> > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> 
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Support of '@='?

2016-06-22 Thread Sebastian Berg

On Mi, 2016-06-22 at 02:38 +0200, Hans Larsen wrote:
> I have Python 3-5-1 and NumPy 1-11! windows 64bits!
> When will by side 'M=M@P' be supported with 'M@=P'???:-(
> 

When someone gets around to making it a well defined operation? ;) Just
to be clear, `M = M @ P` is probably not what `M @= P` is, because the
result of that should probably be `temp = M @ P; M[...] = temp`.
Now this operation needs copy back to the original array from a
temporary array (you can't do it in-place, because you still need the
values in M after overwriting them if you would).

Just if you are curious why it is an error at the moment. We can't have
it be filled in by python to be not in-place (`M = M @ P` meaning), but
copying over the result is a bit annoying and nobody was quite sure
about it, so it was delayed.

- Sebastian

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-20 Thread Sebastian Berg

On Mo, 2016-06-20 at 15:15 -0700, Nathaniel Smith wrote:
> On Mon, Jun 20, 2016 at 3:09 PM,  <josef.p...@gmail.com> wrote:
> > 
> > On Mon, Jun 20, 2016 at 4:31 PM, Alan Isaac <alan.is...@gmail.com>
> > wrote:
> > > 
> > > On 6/13/2016 1:54 PM, Marten van Kerkwijk wrote:
> > > > 
> > > > 
> > > > 1. What in principle is the best return type for int ** int
> > > > (which
> > > > Josef I think most properly rephrased as whether `**` should be
> > > > thought of as a float operator, like `/` in python3 and `sqrt`
> > > > etc.);
> > > 
> > > 
> > > Perhaps the question is somewhat different.  Maybe it is: what
> > > type
> > > should a user expect when the exponent is a Python int?  The
> > > obvious
> > > choices seem to be an object array of Python ints, or an array of
> > > floats.  So far, nobody has proposed the former, and concerns
> > > have
> > > been expressed about the latter.  More important, either would
> > > break
> > > the rule that the scalar type is not important in array
> > > operations,
> > > which seems like a good general rule (useful and easy to
> > > remember).
> > > 
> > > How much commitment is there to such a rule?  E.g.,
> > > np.int64(2**7)*np.arange(5,dtype=np.int8)
> > > violates this.  One thing that has come out of this
> > > discussion for me is that the actual rules in play are
> > > hard to keep track of.  Are they all written down in
> > > one place?
> > > 
> > > I suspect there is general support for the idea that if someone
> > > explicitly specifies the same dtype for the base and the
> > > exponent then the result should also have that dtype.
> > > I think this is already true for array exponentiation
> > > and for scalar exponentiation.
> > > 
> > > One other thing that a user might expect, I believe, is that
> > > any type promotion rules for scalars and arrays will be the same.
> > > This is not currently the case, and that feels like an
> > > inconsistency.  But is it an inconsistency?  If the rule is that
> > > that array type dominates the scalar type, that may
> > > be understandable, but then it should be a firm rule.
> > > In this case, an exponent that is a Python int should not
> > > affect the dtype of the (array) result.
> > > 
> > > In sum, as a user, I've come around to Chuck's original proposal:
> > > integers raised to negative integer powers raise an error.
> > > My reason for coming around is that I believe it meshes
> > > well with a general rule that in binary operations the
> > > scalar dtypes should not influence the dtype of an array result.
> > > Otoh, it is unclear to me how much commitment there is to that
> > > rule.
> > > 
> > > Thanks in advance to anyone who can help me understand better
> > > the issues in play.
> > the main thing I get out of the discussion in this thread is that
> > this
> > is way to complicated.
> > 
> > which ints do I have?
> > 
> > is it python or one of the many numpy int types, or two different
> > (u)int types or maybe one is a scalar so it shouldn't count?
> > 
> > 
> > scalar dominates here
> > 
> > > 
> > > > 
> > > > > 
> > > > > (np.ones(5, np.int8) *1.0).dtype
> > dtype('float64')
> > 
> > otherwise a huge amount of code would be broken that uses the *1.
> > trick
> I *think* the documented rule is that scalar *kind* matters (so we
> pay
> attention to it being a float) but scalar *type* doesn't (we ignore
> whether it's float64 versus float32) and scalar *value* doesn't (we
> ignore whether it's 1.0 or 2.0**53). Obviously even this is not 100%
> true, but I think it is the original intent.
> 

Except for int types, which force a result type large enough to hold
the input value.


> My suspicion is that a better rule would be: *Python* types (int,
> float, bool) are treated as having an unspecified width, but all
> numpy
> types/dtypes are treated the same regardless of whether they're a
> scalar or not. So np.int8(2) * 2 would return an int8, but np.int8(2)
> * np.int64(2) would return an int64. But this is totally separate
> from
> the issues around **, and would require a longer discussion and
> larger
> overhaul of the typing system.
> 

I agree with that. The rule makes sense for python types, but somewhat
creates oddities for numpy types and could probably just be made more
array like there.

- Sebastian


> -n
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-11 Thread Sebastian Berg

On Fr, 2016-06-10 at 20:16 +, Ian Henriksen wrote:
> On Fri, Jun 10, 2016 at 12:01 PM Nathaniel Smith <n...@pobox.com>
> wrote:
> > On Jun 10, 2016 10:50, "Alan Isaac" <alan.is...@gmail.com> wrote:
> > >
> > > On 6/10/2016 1:34 PM, Nathaniel Smith wrote:
> > >>
> > >> You keep pounding on this example. It's a fine example, but,
> > c'mon. **2 is probably at least 100x more common in real source
> > code. Maybe 1000x more common. Why should we break the
> > >> common case for your edge case?
> > >
> > >
> > >
> > > It is hardly an "edge case".
> > > Again, **almost all** integer combinations overflow: that's the
> > point.
> > When you say "almost all", you're assuming inputs that are
> > uniformly sampled integers. I'm much more interested in what
> > proportion of calls to the ** operator involve inputs that can
> > overflow, and in real life those inputs are very heavily biased
> > towards small numbers.
> > (I also think we should default to raising an error on overflow in
> > general, with a seterr switch to turn it off when desired. But
> > that's another discussion...)
> > -n
> > 
> Another thing that would need separate discussion...
> Making 64 bit integers default in more cases would help here.
> Currently arange gives 32 bit integers on 64 bit Windows, but
> 64 bit integers on 64 bit Linux/OSX. Using size_t (or even
> int64_t) as the default size would help with overflows in
> the more common use cases. It's a hefty backcompat
> break, but 64 bit systems are much more common now,
> and using 32 bit integers on 64 bit windows is a bit odd.
> Anyway, hopefully that's not too off-topic.
> Best,

I agree, at least on python3 (the reason is that python 3, the subclass
thingy goes away, so it is less likely to break anything). I think we
could have a shot at this, it is quirky, but the current incosistency
is pretty bad too (and probably has a lot of bugs out in the wild,
because of tests on systems where long is 64bits).

A different issue though, though I wouldn't mind if someone ponders
this a bit more and maybe creates a pull request.

- Sebastian

> Ian Henriksen
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ENH: compute many inner products quickly

2016-06-06 Thread Sebastian Berg

On Di, 2016-06-07 at 00:32 +0200, Jaime Fernández del Río wrote:
> On Mon, Jun 6, 2016 at 9:35 AM, Sebastian Berg <sebastian@sipsolution
> s.net> wrote:
> > On So, 2016-06-05 at 19:20 -0600, Charles R Harris wrote:
> > >
> > >
> > > On Sun, Jun 5, 2016 at 6:41 PM, Stephan Hoyer <sho...@gmail.com>
> > > wrote:
> > > > If possible, I'd love to add new functions for "generalized
> > ufunc"
> > > > linear algebra, and then deprecate (or at least discourage)
> > using
> > > > the older versions with inferior broadcasting rules. Adding a
> > new
> > > > keyword arg means we'll be stuck with an awkward API for a long
> > > > time to come.
> > > >
> > > > There are three types of matrix/vector products for which
> > ufuncs
> > > > would be nice:
> > > > 1. matrix-matrix product (covered by matmul)
> > > > 2. matrix-vector product
> > > > 3. vector-vector (inner) product
> > > >
> > > > It's straightful to implement either of the later two options
> > by
> > > > inserting dummy dimensions and then calling matmul, but that's
> > a
> > > > pretty awkward API, especially for inner products.
> > Unfortunately,
> > > > we already use the two most obvious one word names for vector
> > inner
> > > > products (inner and dot). But on the other hand, one word names
> > are
> > > > not very descriptive, and the short name "dot" probably mostly
> > > > exists because of the lack of an infix operator.
> > > >
> > > > So I'll start by throwing out some potential new names:
> > > >
> > > > For matrix-vector products:
> > > > matvecmul (if it's worth making a new operator)
> > > >
> > > > For inner products:
> > > > vecmul (similar to matmul, but probably too ambiguous)
> > > > dot_product
> > > > inner_prod
> > > > inner_product
> > > >
> > > I was using mulmatvec, mulvecmat, mulvecvec back when I was
> > looking
> > > at this. I suppose the mul could also go in the middle, or maybe
> > > change it to x and put it in the middle: matxvec, vecxmat,
> > vecxvec.
> > >
> > 
> > Were not some of this part of the gufunc linalg functions and we
> > just
> > removed it because we were not sure about the API? Not sure
> > anymore,
> > but might be worth to have a look.
> We have
> from numpy.core.umath_tests import inner1d
> which does vectorized vector-vector multiplication, but it's
> undocumented.  There is also a matrix_multiply in that same module
> that does the obvious thing.
> And when gufuncs were introduced in linalg, there were a bunch of
> functions doing all sorts of operations without intermediate storage,
> e.g. sum3(a, b, c) -> a + b + c, that were removed before merging the
> PR. Wasn't involved at the time, so not sure what the rationale was.

I think it was probably just that the api was not thought out much.
Adding sum3 to linalg does seem a bit funny ;). I would not mind it in
numpy as such I guess, if it quite a bit faster anyway, but maybe in
its own submodule for these kind of performance optimizations.

- Sebastian


> Since we are at it, should quadratic/bilinear forms get their own
> function too?  That is, after all, what the OP was asking for.
> Jaime
> -- 
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
> planes de dominación mundial.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ENH: compute many inner products quickly

2016-06-06 Thread Sebastian Berg

On So, 2016-06-05 at 19:20 -0600, Charles R Harris wrote:
> 
> 
> On Sun, Jun 5, 2016 at 6:41 PM, Stephan Hoyer <sho...@gmail.com>
> wrote:
> > If possible, I'd love to add new functions for "generalized ufunc"
> > linear algebra, and then deprecate (or at least discourage) using
> > the older versions with inferior broadcasting rules. Adding a new
> > keyword arg means we'll be stuck with an awkward API for a long
> > time to come.
> > 
> > There are three types of matrix/vector products for which ufuncs
> > would be nice:
> > 1. matrix-matrix product (covered by matmul)
> > 2. matrix-vector product
> > 3. vector-vector (inner) product
> > 
> > It's straightful to implement either of the later two options by
> > inserting dummy dimensions and then calling matmul, but that's a
> > pretty awkward API, especially for inner products. Unfortunately,
> > we already use the two most obvious one word names for vector inner
> > products (inner and dot). But on the other hand, one word names are
> > not very descriptive, and the short name "dot" probably mostly
> > exists because of the lack of an infix operator.
> > 
> > So I'll start by throwing out some potential new names:
> > 
> > For matrix-vector products:
> > matvecmul (if it's worth making a new operator)
> > 
> > For inner products:
> > vecmul (similar to matmul, but probably too ambiguous)
> > dot_product
> > inner_prod
> > inner_product
> > 
> I was using mulmatvec, mulvecmat, mulvecvec back when I was looking
> at this. I suppose the mul could also go in the middle, or maybe
> change it to x and put it in the middle: matxvec, vecxmat, vecxvec.
> 

Were not some of this part of the gufunc linalg functions and we just
removed it because we were not sure about the API? Not sure anymore,
but might be worth to have a look.

- Sebastian


> Chuck  
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Developer Meeting at EuroScipy?

2016-05-31 Thread Sebastian Berg

Hi all,

since we had decided to do a regular developers meeting last year, how
would EuroScipy (Aug. 23.-27., Erlangen, Germany) look as a possible
place and time to have one?
I believe EuroScipy would include a few people who were not able to
come to SciPy last year, and it seems SciPy itself already sees some of
the devs quite busy anyway.

Not having checked back for room availability at the conference or
anything, one option may be:

August 24. (parallel to the second/last day of Tutorials)

Does this seem like a good plan or do you have alternative suggestions
or scheduling difficulties with this?

Regards,

Sebastian

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-28 Thread Sebastian Berg

On Fr, 2016-05-27 at 22:51 +0200, Lion Krischer wrote:
> Hi all,
> 
> I was told to take this to the mailing list. Relevant pull request:
> https://github.com/numpy/numpy/pull/7686
> 
> 
> NumPy's FFT implementation caches some form of execution plan for
> each
> encountered input data length. This is currently implemented as a
> simple
> dictionary which can grow without bounds. Calculating lots of
> different
> FFTs thus cause a memory leak from the users' perspective. We
> encountered a real world situation where this is an issue.
> 
> The PR linked above proposes to replace the simple dictionary with an
> LRU (least recently used) cache. It will remove the least recently
> used
> pieces of data if it grows beyond a specified size (currently an
> arbitrary limit of 100 MB per cache). Thus almost all users will
> still
> benefit from the caches but their total memory size is now limited.
> 
> 
> Things to consider:
> 
> * This adds quite some additional complexity but I could not find a
> simple way to achieve the same result.
> * What is a good limit on cache size? I used 100 MB because it works
> for
> my uses cases.
> 

I am +1 in principle, since I don't like that the cache might just grow
forever and I see that as a bug right now personally. One that rarely
hits maybe, but a bug.

I guess if you have the time, the perfect thing would be if you could
time how big the cache difference is anyway, etc.?
The cache mostly gets a working array I think, so does it even help for
large arrays or is the time spend for the allocation negligible anway
then?
We also have a small array cache in numpy anyway nowadays (not sure how
small small is here). Maybe this already achieves everything that the
fftcache was designed for and we could even just disable it as default?

The complexity addition is a bit annoying I must admit, on python 3
functools.lru_cache could be another option, but only there.

- Sebastian

> 
> Cheers!
> 
> Lion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] (no subject)

2016-04-27 Thread Sebastian Berg

On Mi, 2016-04-27 at 22:11 +0530, Saumyajit Dey wrote:
> Hi,
> 
> Thanks a lot for the reply. I am looking into the documentation
> already. Also is there any guide as to how the source code of Numpy
> is organised?
> 
> For example, when i write 
> 
> > np.power(2,3) 
> what is the workflow in terms of functions in different modules being
> called?
> 

No, there is not much. There are different paths/possibilities,
sometimes even intermingled. These are some:

1. Pure python functions, e.g. np.stack, np.delete, ...
   They are not hard to find/figure out (though some details may be),
   frankly, I usually just use the ?? magic in ipython.
2. Python shims for attributes, attributes usually go to the
   methods.c in numpy/core/src/multiarray
3. General C-Functions (Reshape, etc.) are usually in some specialized
   file in numpy/core/src/multiarray but wrapped by
   multiarraymodule.c, so you can backtrace from there. The exact calls
   can get pretty complex (even call back to python).
4. One important category (also tricky) are ufuncs. They form their own
   building block in the code base. The whole interplay of things is
   quite complex, so unless you need something specific I would be
   happy to understand all the things they can do for you and that
   they wrap C-functions working on a single axis in some sense.
   (code in numpy/core/src/umath)

Frankly, I often just "git grep ..." to find the right place to look
for something. It is not impossible to understand the logic behind the
files and what calls what, but I would claim it is usually faster to
grep for it if you are interested in something specific. There are some
more arcane things, such as code generations, but that is easier to ask
for/figure out for a specific problem. Things such as what happens when
a ufunc is called, and how the ufunc is created in the first place are
non-trivial.

NumPy has a few rather distinct building blocks. Ufuncs, the iterator,
general C-based shape/container functions, general python functions,
linear algebra, fft, polynoms, 

I would argue that finding something that interests you and trying to
figure that out and asking us about it explicitly is probably best.
Honestly, I think all of us devs have at least two things in the above
list we know almost nothing about. E.g. you don't need to understand
details of the FFT implementation unless you want to actually change
something there.

There are some "easy issues" marked in the git issue list, which may be
worth a shot if you like to just dive in. You could poke at one you
find interesting and then ask us (we might have tagged something as
"easy" but I would not guarantee all of them are, sometimes there are
unexpected difficulties or it is easy if you already know where to
look).

- Sebastian

> Regards,
> Saumyajit
> 
> Saumyajit Dey
> Junior Undergraduate Student:
> Department of Computer Science and Engineering
> National Institute of Technology
> Warangal (NITW), India
> Cell: +91-8885847028
> 
> On Wed, Apr 27, 2016 at 6:05 PM, Maniteja Nandana <
> maniteja.modesty...@gmail.com> wrote:
> > Hi,
> > Welcome! It would be a good exercise to look at the documentation
> > and tutorial for Numpy at http://docs.scipy.org/doc/
> > Also the lectures at the lectures at www.scipy-lectures.org might
> > be a interesting introduction to scientific python in numpy stack.
> > Hope it helps.
> > Happy learning !
> > Cheers,
> > Maniteja.
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Do getitem/setitem already have GIL?

2016-04-21 Thread Sebastian Berg

This is for a custom dtype? getitem and setitem work with objects and
must have the GIL in any case, so yes, you can safely assume this. I
think you probably have to set the flags correctly for some things to
work right. So that the PyDataType_REFCHK makro gives the right result.
Though frankly, I am just poking at it here, could be all wrong.

- Sebastian



On Mi, 2016-04-20 at 19:22 +, Steve Mitchell wrote:
> When writing custom PyArray_ArrFuncs getitem() and setitem(), do I
> need to acquire the GIL, or has it been done for me already by the
> caller?
>  
>   --Steve
>  
> http://docs.scipy.org/doc/numpy/reference/c-api.array.html?highlight=
> allow_c_api#group-2
> http://docs.scipy.org/doc/numpy/reference/internals.code-explanations
> .html?highlight=gil#function-call
> http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.
> html
> https://docs.python.org/2/c-api/init.html#thread-state-and-the-global
> -interpreter-lock
>  
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] f2py: ram usage

2016-04-10 Thread Sebastian Berg

On So, 2016-04-10 at 12:04 +0200, Vasco Gervasi wrote:
> Hi all,
> I am trying to write some code to do calculation onto an array: for
> each row I need to do some computation and have a number as return.
> To speed up the process I wrote a fortran subroutine that is called
> from python [using f2py] for each row of the array, so the input of
> this subroutine is a row and the output is a number.
> This method works but I saw some speed advantage if I pass the entire
> array to fortran and then, inside fortran, call the subroutine that
> does the math; so in this case I pass an array and return a vector.
> But I noticed that when python pass the array to fortran, the array
> is copied and the RAM usage double.

I expect that the fortran code needs your arrays to be fortran
contiguous, so the wrappers need to copy them.

The easiest solution may be to create your array in python with the
`order="F"` flag. NumPy will have a tendency to prefer C-order and uses
it as default though when doing something with an "F" ordered array.

That said, I have never used f2py, so these are just well founded
guesses.

- Sebastian



> Is there a way to "move" the array to fortran, I don't care if the
> array is lost after the call to fortran.
> The pyd module is generated using: python f2py.py -c --opt="-ffree
> -form -Ofast" -m F2PYMOD F2PYMOD.f90
> 
> Thanks
> Vasco
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-07 Thread Sebastian Berg

On Do, 2016-04-07 at 13:29 -0400, josef.p...@gmail.com wrote:
> 
> 
> On Thu, Apr 7, 2016 at 1:20 PM, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
> > On Do, 2016-04-07 at 11:56 -0400, josef.p...@gmail.com wrote:
> > >
> > >
> > 
> > 
> > 
> > >
> > > I don't think numpy treats 1d arrays as row vectors. numpy has C
> > > -order for axis preference which coincides in many cases with row
> > > vector behavior.
> > >
> > 
> > Well, broadcasting rules, are that (n,) should typically behave
> > similar
> > to (1, n). However, for dot/matmul and @ the rules are stretched to
> > mean "the one dimensional thing that gives an inner product" (using
> > matmul since my python has no @ yet):
> > 
> > In [12]: a = np.arange(20)
> > In [13]: b = np.arange(20)
> > 
> > In [14]: np.matmul(a, b)
> > Out[14]: 2470
> > 
> > In [15]: np.matmul(a, b[:, None])
> > Out[15]: array([2470])
> > 
> > In [16]: np.matmul(a[None, :], b)
> > Out[16]: array([2470])
> > 
> > In [17]: np.matmul(a[None, :], b[:, None])
> > Out[17]: array([[2470]])
> > 
> > which indeed gives us a fun thing, because if you look at the last
> > line, the outer product equivalent would be:
> > 
> > outer = np.matmul(a[None, :].T, b[:, None].T)
> > 
> > Now if I go back to the earlier example:
> > 
> > a.T @ b
> > 
> > Does not achieve the outer product at all with using T2, since
> > 
> > a.T2 @ b.T2  # only correct for a, but not for b
> > a.T2 @ b  # b attempts to be "inner", so does not work
> >  
> > It almost seems to me that the example is a counter example,
> > because on
> > first sight the `T2` attribute would still leave you with no
> > shorthand
> > for `b`.
> a.T2 @ b.T2.T
> 

Actually, better would be:

  a.T2 @ b.T2.T2  # Aha?

And true enough, that works, but is it still reasonably easy to find
and understand?
Or is it just frickeling around, the same as you would try `a[:, None]`
before finding `a[None, :]`, maybe worse?

- Sebastian

> 
> (T2 as shortcut for creating a[:, None] that's neat, except if a is
> already 2D)
> 
> Josef
>  
> >  
> > I understand the pain of having to write (and parse get into the
> > depth
> > of) things like `arr[:, np.newaxis]` or reshape. I also understand
> > the
> > idea of a shorthand for vectorized matrix operations. That is, an
> > argument for a T2 attribute which errors on 1D arrays (not sure I
> > like
> > it, but that is a different issue).
> > 
> > However, it seems that implicit adding of an axis which only works
> > half
> > the time does not help too much? I have to admit I don't write
> > these
> > things too much, but I wonder if it would not help more if we just
> > provided some better information/link to longer examples in the
> > "dimension mismatch" error message?
> > 
> > In the end it is quite simple, as Nathaniel, I think I would like
> > to
> > see some example code, where the code obviously looks easier then
> > before? With the `@` operator that was the case, with the
> > "dimension
> > adding logic" I am not so sure, plus it seems it may add other
> > pitfalls.
> > 
> > - Sebastian
> > 
> > 
> > 
> > 
> > > >>> np.concatenate(([[1,2,3]], [4,5,6]))
> > > Traceback (most recent call last):
> > >   File "<pyshell#63>", line 1, in 
> > > np.concatenate(([[1,2,3]], [4,5,6]))
> > > ValueError: arrays must have same number of dimensions
> > >
> > > It's not an uncommon exception for me.
> > >
> > > Josef
> > >
> > > >
> > > > ___
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion@scipy.org
> > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-07 Thread Sebastian Berg

On Do, 2016-04-07 at 11:56 -0400, josef.p...@gmail.com wrote:
> 
> 

> 
> I don't think numpy treats 1d arrays as row vectors. numpy has C
> -order for axis preference which coincides in many cases with row
> vector behavior.
> 

Well, broadcasting rules, are that (n,) should typically behave similar
to (1, n). However, for dot/matmul and @ the rules are stretched to
mean "the one dimensional thing that gives an inner product" (using
matmul since my python has no @ yet):

In [12]: a = np.arange(20)
In [13]: b = np.arange(20)

In [14]: np.matmul(a, b)
Out[14]: 2470

In [15]: np.matmul(a, b[:, None])
Out[15]: array([2470])

In [16]: np.matmul(a[None, :], b)
Out[16]: array([2470])

In [17]: np.matmul(a[None, :], b[:, None])
Out[17]: array([[2470]])

which indeed gives us a fun thing, because if you look at the last
line, the outer product equivalent would be:

outer = np.matmul(a[None, :].T, b[:, None].T)

Now if I go back to the earlier example:

a.T @ b

Does not achieve the outer product at all with using T2, since

a.T2 @ b.T2  # only correct for a, but not for b
a.T2 @ b  # b attempts to be "inner", so does not work

It almost seems to me that the example is a counter example, because on
first sight the `T2` attribute would still leave you with no shorthand
for `b`.

I understand the pain of having to write (and parse get into the depth
of) things like `arr[:, np.newaxis]` or reshape. I also understand the
idea of a shorthand for vectorized matrix operations. That is, an
argument for a T2 attribute which errors on 1D arrays (not sure I like
it, but that is a different issue).

However, it seems that implicit adding of an axis which only works half
the time does not help too much? I have to admit I don't write these
things too much, but I wonder if it would not help more if we just
provided some better information/link to longer examples in the
"dimension mismatch" error message?

In the end it is quite simple, as Nathaniel, I think I would like to
see some example code, where the code obviously looks easier then
before? With the `@` operator that was the case, with the "dimension
adding logic" I am not so sure, plus it seems it may add other
pitfalls.

- Sebastian

> >>> np.concatenate(([[1,2,3]], [4,5,6]))
> Traceback (most recent call last):
>   File "<pyshell#63>", line 1, in 
> np.concatenate(([[1,2,3]], [4,5,6]))
> ValueError: arrays must have same number of dimensions
> 
> It's not an uncommon exception for me.
> 
> Josef
> 
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Multidimension array access in C via Python API

2016-04-05 Thread Sebastian Berg

On Di, 2016-04-05 at 20:19 +0200, Sebastian Berg wrote:
> On Di, 2016-04-05 at 09:48 -0700, mpc wrote:
> > The idea is that I want to thin a large 2D buffer of x,y,z points
> > to
> > a given
> > resolution by dividing the data into equal sized "cubes" (i.e.
> > resolution is
> > number of cubes along each axis) and averaging the points inside
> > each
> > cube
> > (if any).
> > 
> 
> Another point is timing your actual code, in this case you could have
> noticed that all time is spend in the while loops and little time in
> those min/max calls before.
> 
> Algorithms, or what you do is the other thing. In the end, it seems
> your code is just a high dimensional histogram. Though I am not sure
> if
> numpy's histogram is fast, I am sure it vastly outperforms this and
> if

Hmm, well maybe not quite, but it seems similar like a weighted
histogram.


> you are interested in how it does this, you could even check its
> code,
> it is just in python (though numpy internally always has quite a lot
> of
> fun boilerplate to make sure of corner cases).
> 
> And if you search for what you want to do first, you may find faster
> solutions easily, batteries included and all, there are a lot of
> tools
> out there. The other point is, don't optimize much if you don't know
> exactly what you need to optimize.
> 
> - Sebastian
> 
> > 
> > *# Fill up buffer data for demonstration purposes with initial
> > buffer of
> > size 10,000,000 to reduce to 1,000,000
> > size = 1000
> > buffer = np.ndarray(shape=(size,3), dtype=np.float)
> > # fill it up
> > buffer[:, 0] = np.random.ranf(size)
> > buffer[:, 1] = np.random.ranf(size)
> > buffer[:, 2] = np.random.ranf(size)
> > 
> > # Create result buffer to size of cubed resolution (i.e. 100 ^
> > 3
> > =
> > 1,000,000)
> > resolution = 100
> > thinned_buffer = np.ndarray(shape=(resolution ** 3,3),
> > dtype=np.float)
> > 
> > # Trying to convert the following into C to speed it up
> > x_buffer = buffer[:, 0]
> > y_buffer = buffer[:, 1]
> > z_buffer = buffer[:, 2]
> > min_x = x_buffer.min()
> > max_x = x_buffer.max()
> > min_y = y_buffer.min()
> > max_y = y_buffer.max()
> > min_z = z_buffer.min()
> > max_z = z_buffer.max()
> > z_block = (max_z - min_z) / resolution
> > x_block = (max_x - min_x) / resolution
> > y_block = (max_y - min_y) / resolution
> > 
> > current_idx = 0
> > x_idx = min_x
> > while x_idx < max_x:
> > y_idx = min_y
> > while y_idx < max_y:
> > z_idx = min_z
> > while z_idx < max_z:
> > inside_block_points = np.where((x_buffer >= x_idx)
> > &
> > 
> >  (x_buffer <=
> > x_idx + x_block) &
> > 
> >  (y_buffer >=
> > y_idx) &
> > 
> >  (y_buffer <=
> > y_idx + y_block) &
> > 
> >  (z_buffer >=
> > z_idx) &
> > 
> >  (z_buffer <=
> > z_idx + z_block))
> > if inside_block_points[0].size > 0:
> > mean_point =
> > buffer[inside_block_points[0]].mean(axis=0)
> > thinned_buffer[current_idx] = mean_point
> > current_idx += 1
> > z_idx += z_block
> > y_idx += y_block
> > x_idx += x_block
> > return thin_buffer
> > *
> > 
> > 
> > 
> > --
> > View this message in context: 
> > http://numpy-discussion.10968.n7.nabble
> > .com/Multidimension-array-access-in-C-via-Python-API
> > -tp42710p42726.html
> > Sent from the Numpy-discussion mailing list archive at Nabble.com.
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Multidimension array access in C via Python API

2016-04-05 Thread Sebastian Berg

On Di, 2016-04-05 at 09:48 -0700, mpc wrote:
> The idea is that I want to thin a large 2D buffer of x,y,z points to
> a given
> resolution by dividing the data into equal sized "cubes" (i.e.
> resolution is
> number of cubes along each axis) and averaging the points inside each
> cube
> (if any).
> 

Another point is timing your actual code, in this case you could have
noticed that all time is spend in the while loops and little time in
those min/max calls before.

Algorithms, or what you do is the other thing. In the end, it seems
your code is just a high dimensional histogram. Though I am not sure if
numpy's histogram is fast, I am sure it vastly outperforms this and if
you are interested in how it does this, you could even check its code,
it is just in python (though numpy internally always has quite a lot of
fun boilerplate to make sure of corner cases).

And if you search for what you want to do first, you may find faster
solutions easily, batteries included and all, there are a lot of tools
out there. The other point is, don't optimize much if you don't know
exactly what you need to optimize.

- Sebastian

> 
> *# Fill up buffer data for demonstration purposes with initial
> buffer of
> size 10,000,000 to reduce to 1,000,000
> size = 1000
> buffer = np.ndarray(shape=(size,3), dtype=np.float)
> # fill it up
> buffer[:, 0] = np.random.ranf(size)
> buffer[:, 1] = np.random.ranf(size)
> buffer[:, 2] = np.random.ranf(size)
> 
> # Create result buffer to size of cubed resolution (i.e. 100 ^ 3
> =
> 1,000,000)
> resolution = 100
> thinned_buffer = np.ndarray(shape=(resolution ** 3,3),
> dtype=np.float)
> 
> # Trying to convert the following into C to speed it up
> x_buffer = buffer[:, 0]
> y_buffer = buffer[:, 1]
> z_buffer = buffer[:, 2]
> min_x = x_buffer.min()
> max_x = x_buffer.max()
> min_y = y_buffer.min()
> max_y = y_buffer.max()
> min_z = z_buffer.min()
> max_z = z_buffer.max()
> z_block = (max_z - min_z) / resolution
> x_block = (max_x - min_x) / resolution
> y_block = (max_y - min_y) / resolution
> 
> current_idx = 0
> x_idx = min_x
> while x_idx < max_x:
> y_idx = min_y
> while y_idx < max_y:
> z_idx = min_z
> while z_idx < max_z:
> inside_block_points = np.where((x_buffer >= x_idx) &
> 
>  (x_buffer <=
> x_idx + x_block) &
> 
>  (y_buffer >=
> y_idx) &
> 
>  (y_buffer <=
> y_idx + y_block) &
> 
>  (z_buffer >=
> z_idx) &
> 
>  (z_buffer <=
> z_idx + z_block))
> if inside_block_points[0].size > 0:
> mean_point =
> buffer[inside_block_points[0]].mean(axis=0)
> thinned_buffer[current_idx] = mean_point
> current_idx += 1
> z_idx += z_block
> y_idx += y_block
> x_idx += x_block
> return thin_buffer
> *
> 
> 
> 
> --
> View this message in context: http://numpy-discussion.10968.n7.nabble
> .com/Multidimension-array-access-in-C-via-Python-API
> -tp42710p42726.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Multi-dimensional array of splitted array

2016-03-23 Thread Sebastian Berg

On Mi, 2016-03-23 at 10:02 -0400, Joseph Fox-Rabinovitz wrote:
> On Wed, Mar 23, 2016 at 9:37 AM, Ibrahim EL MEREHBI
>  wrote:
> > Thanks Eric. I already checked that. It's not what I want. I think
> > I wasn't
> > clear about what I wanted.
> > 
> > I want to split each column but I want to do it for each column and
> > end up
> > with an array. Here's the result I wish to have:
> > 
> > array([[[0], [1, 2, 3, 4], [5, 6, 7], [8, 9]],
> >[[10], [11, 12, 13, 14], [15, 16, 17], [18, 19]],
> >[[20], [21, 21, 23, 24], [25, 26, 27], [28, 29]]],
> > dtype=object)
> > 
> 
> Apply [`np.stack`](http://docs.scipy.org/doc/numpy-1.10.0/reference/g
> enerated/numpy.stack.html#numpy.stack)
> to the result. It will merge the arrays the way you want.

Oh sorry, nvm. As an object array, it works of course


> -Joe
> 
> > 
> > Sincerely Yours,
> > Bob
> > 
> > 
> > 
> > On 23/03/2016 14:17, Eric Moore wrote:
> > 
> > Try just calling np.array_split on the full 2D array.  It splits
> > along a
> > particular axis, which is selected using the axis argument of
> > np.array_split.  The axis to split along defaults to the first so
> > the two
> > calls to np.array_split below are exactly equivalent.
> > 
> > In [16]: a = np.c_[:10,10:20,20:30]
> > 
> > 
> > In [17]: np.array_split(a, [2,5,8])
> > 
> > Out[17]:
> > 
> > [array([[ 0, 10, 20],
> > 
> > [ 1, 11, 21]]), array([[ 2, 12, 22],
> > 
> > [ 3, 13, 23],
> > 
> > [ 4, 14, 24]]), array([[ 5, 15, 25],
> > 
> > [ 6, 16, 26],
> > 
> > [ 7, 17, 27]]), array([[ 8, 18, 28],
> > 
> > [ 9, 19, 29]])]
> > 
> > 
> > In [18]: np.array_split(a, [2,5,8], 0)
> > 
> > Out[18]:
> > 
> > [array([[ 0, 10, 20],
> > 
> > [ 1, 11, 21]]), array([[ 2, 12, 22],
> > 
> > [ 3, 13, 23],
> > 
> > [ 4, 14, 24]]), array([[ 5, 15, 25],
> > 
> > [ 6, 16, 26],
> > 
> > [ 7, 17, 27]]), array([[ 8, 18, 28],
> > 
> > [ 9, 19, 29]])]
> > 
> > 
> > Eric
> > 
> > 
> > 
> > On Wed, Mar 23, 2016 at 9:06 AM, Ibrahim EL MEREHBI <
> > bobmerh...@gmail.com>
> > wrote:
> > > 
> > > Hello,
> > > 
> > > I have a multi-diensional array that I would like to split its
> > > columns.
> > > 
> > > For example consider,
> > > 
> > > dat = np.array([np.arange(10),np.arange(10,20),
> > > np.arange(20,30)]).T
> > > 
> > > array([[ 0, 10, 20],
> > >[ 1, 11, 21],
> > >[ 2, 12, 22],
> > >[ 3, 13, 23],
> > >[ 4, 14, 24],
> > >[ 5, 15, 25],
> > >[ 6, 16, 26],
> > >[ 7, 17, 27],
> > >[ 8, 18, 28],
> > >[ 9, 19, 29]])
> > > 
> > > 
> > > I already can split one column at a time:
> > > 
> > > np.array_split(dat[:,0], [2,5,8])
> > > 
> > > [array([0, 1]), array([2, 3, 4]), array([5, 6, 7]), array([8,
> > > 9])]
> > > 
> > > 
> > > How can I extend this for all columns and (overwrite or) have a
> > > new
> > > multi-dimensional array?
> > > 
> > > Thank you,
> > > Bob
> > > 
> > > 
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > 
> > 
> > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> > 
> > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Multi-dimensional array of splitted array

2016-03-23 Thread Sebastian Berg

On Mi, 2016-03-23 at 10:02 -0400, Joseph Fox-Rabinovitz wrote:
> On Wed, Mar 23, 2016 at 9:37 AM, Ibrahim EL MEREHBI
> <bobmerh...@gmail.com> wrote:
> > Thanks Eric. I already checked that. It's not what I want. I think
> > I wasn't
> > clear about what I wanted.
> > 
> > I want to split each column but I want to do it for each column and
> > end up
> > with an array. Here's the result I wish to have:
> > 
> > array([[[0], [1, 2, 3, 4], [5, 6, 7], [8, 9]],
> >[[10], [11, 12, 13, 14], [15, 16, 17], [18, 19]],
> >[[20], [21, 21, 23, 24], [25, 26, 27], [28, 29]]],
> > dtype=object)
> > 
> 
> Apply [`np.stack`](http://docs.scipy.org/doc/numpy-1.10.0/reference/g
> enerated/numpy.stack.html#numpy.stack)
> to the result. It will merge the arrays the way you want.
> 


It is simply impossible to stack those arrays like he wants, it is not
a valid array.

- Sebastian


> -Joe
> 
> > 
> > Sincerely Yours,
> > Bob
> > 
> > 
> > 
> > On 23/03/2016 14:17, Eric Moore wrote:
> > 
> > Try just calling np.array_split on the full 2D array.  It splits
> > along a
> > particular axis, which is selected using the axis argument of
> > np.array_split.  The axis to split along defaults to the first so
> > the two
> > calls to np.array_split below are exactly equivalent.
> > 
> > In [16]: a = np.c_[:10,10:20,20:30]
> > 
> > 
> > In [17]: np.array_split(a, [2,5,8])
> > 
> > Out[17]:
> > 
> > [array([[ 0, 10, 20],
> > 
> > [ 1, 11, 21]]), array([[ 2, 12, 22],
> > 
> > [ 3, 13, 23],
> > 
> > [ 4, 14, 24]]), array([[ 5, 15, 25],
> > 
> > [ 6, 16, 26],
> > 
> > [ 7, 17, 27]]), array([[ 8, 18, 28],
> > 
> > [ 9, 19, 29]])]
> > 
> > 
> > In [18]: np.array_split(a, [2,5,8], 0)
> > 
> > Out[18]:
> > 
> > [array([[ 0, 10, 20],
> > 
> > [ 1, 11, 21]]), array([[ 2, 12, 22],
> > 
> > [ 3, 13, 23],
> > 
> > [ 4, 14, 24]]), array([[ 5, 15, 25],
> > 
> > [ 6, 16, 26],
> > 
> > [ 7, 17, 27]]), array([[ 8, 18, 28],
> > 
> > [ 9, 19, 29]])]
> > 
> > 
> > Eric
> > 
> > 
> > 
> > On Wed, Mar 23, 2016 at 9:06 AM, Ibrahim EL MEREHBI <
> > bobmerh...@gmail.com>
> > wrote:
> > > 
> > > Hello,
> > > 
> > > I have a multi-diensional array that I would like to split its
> > > columns.
> > > 
> > > For example consider,
> > > 
> > > dat = np.array([np.arange(10),np.arange(10,20),
> > > np.arange(20,30)]).T
> > > 
> > > array([[ 0, 10, 20],
> > >[ 1, 11, 21],
> > >[ 2, 12, 22],
> > >[ 3, 13, 23],
> > >[ 4, 14, 24],
> > >[ 5, 15, 25],
> > >[ 6, 16, 26],
> > >[ 7, 17, 27],
> > >[ 8, 18, 28],
> > >[ 9, 19, 29]])
> > > 
> > > 
> > > I already can split one column at a time:
> > > 
> > > np.array_split(dat[:,0], [2,5,8])
> > > 
> > > [array([0, 1]), array([2, 3, 4]), array([5, 6, 7]), array([8,
> > > 9])]
> > > 
> > > 
> > > How can I extend this for all columns and (overwrite or) have a
> > > new
> > > multi-dimensional array?
> > > 
> > > Thank you,
> > > Bob
> > > 
> > > 
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > 
> > 
> > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> > 
> > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] GSoC?

2016-03-06 Thread Sebastian Berg

On Fr, 2016-03-04 at 21:20 +, Pauli Virtanen wrote:
> Thu, 11 Feb 2016 00:02:52 +0100, Ralf Gommers kirjoitti:
> [clip]
> > OK first version:
> > https://github.com/scipy/scipy/wiki/GSoC-2016-project-ideas I kept
> > some
> > of the ideas from last year, but removed all potential mentors as
> > the
> > same people may not be available this year - please re-add
> > yourselves
> > where needed.
> > 
> > And to everyone who has a good idea, and preferably is willing to
> > mentor
> > for that idea: please add it to that page.
> 
> I probably don't have bandwidth for mentoring, but as the Numpy 
> suggestions seem to be mostly "hard" problems, we can add another 
> one:
> 
> ## Dealing with overlapping input/output data
> 
> Numpy operations where output arrays overlap with 
> input arrays can produce unexpected results.
> A simple example is
> ```
> x = np.arange(100*100).reshape(100,100)
> x += x.T# <- undefined result!
> ```
> The task is to change Numpy so that the results
> here become similar to as if the input arrays
> overlapping with output were separate (here: `x += x.T.copy()`).
> The challenge here lies in doing this without sacrificing 
> too much performance or memory efficiency.
> 
> Initial steps toward solving this problem were taken in
> https://github.com/numpy/numpy/pull/6166
> where a simplest available algorithm for detecting
> if arrays overlap was added. However, this is not yet
> utilized in ufuncs. An initial attempt to sketch what 
> should be done is at https://github.com/numpy/numpy/issues/6272
> and issues referenced therein.
> 

Since I like the idea, I copy pasted it into the GSoC project ideas
wiki.

- Sebastian


> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] How to check for memory leaks?

2016-02-23 Thread Sebastian Berg

On Di, 2016-02-23 at 12:36 -0700, Charles R Harris wrote:
> Hi All,
> 
> I'm suspecting a possible memory leak in 1.11.x, what is the best way
> to check for that?
> 

Would like to learn better methods, but I tried valgrind with trace
origins and full leak check, just thinking maybe it shows something.
Unfortunately, I got the error below midway, I ran it before
successfully (with only minor obvious leaks due to things like module
wide strings) I think. My guess is, the error does not say much at all,
but I have no clue :) (running without track-origins now, maybe it
helps).

- Sebastian

Error:

VEX temporary storage exhausted.
Pool = TEMP,  start 0x38f91668 curr 0x39456190 end 0x394561a7 (size
500)

vex: the `impossible' happened:
   VEX temporary storage exhausted.

> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] reshaping empty array bug?

2016-02-23 Thread Sebastian Berg

On Di, 2016-02-23 at 21:06 +0100, Sebastian Berg wrote:
> On Di, 2016-02-23 at 14:57 -0500, Benjamin Root wrote:
> > I'd be more than happy to write up the patch. I don't think it
> > would
> > be quite like make zeros be ones, but it would be along those
> > lines.
> > One case I need to wrap my head around is to make sure that a 0
> > would
> > happen if the following was true:
> > 
> > > > > a = np.ones((0, 5*64))
> > > > > a.shape = (-1, 5, 64)
> > 
> > EDIT: Just tried the above, and it works as expected (zero in the
> > first dim)!
> > 
> > Just tried out a couple of other combos:
> > > > > a.shape = (-1,)
> > > > > a.shape
> > (0,)
> > > > > a.shape = (-1, 5, 64)
> > > > > a.shape
> > (0, 5, 64)
> > 
> 
> Seems right to me on first sight :). (I don't like shape assignments
> though, who cares for one extra view). Well, maybe 1 instead of 0
> (ignore 0s), but if the result for -1 is to use 1 and the shape is 0
> convert the 1 back to 0. But it is starting to sound a bit tricky,
> though I think it might be straight forward (i.e. no real traps and
> when it works it always is what you expect).
> The main point is, whether you can design cases where the conversion
> back to 0 hides bugs by not failing when it should. And whether that
> would be a tradeoff we are willing to accept.
> 

Another thought. Maybe you can figure out the -1 correctly, if there is
no *other* 0 involved. If there is any other 0, I could imagine
problems.

> - Sebastian
> 
> 
> > 
> > This is looking more and more like a bug to me.
> > 
> > Ben Root
> > 
> > 
> > On Tue, Feb 23, 2016 at 1:58 PM, Sebastian Berg <
> > sebast...@sipsolutions.net> wrote:
> > > On Di, 2016-02-23 at 11:45 -0500, Benjamin Root wrote:
> > > > but, it isn't really ambiguous, is it? The -1 can only refer to
> > > > a
> > > > single dimension, and if you ignore the zeros in the original
> > > > and
> > > new
> > > > shape, the -1 is easily solvable, right?
> > > 
> > > I think if there is a simple logic (like using 1 for all zeros in
> > > both
> > > input and output shape for the -1 calculation), maybe we could do
> > > it. I
> > > would like someone to think about it carefully that it would not
> > > also
> > > allow some unexpected generalizations. And at least I am getting
> > > a
> > > BrainOutOfResourcesError right now trying to figure that out :).
> > > 
> > > - Sebastian
> > > 
> > > 
> > > > Ben Root
> > > > 
> > > > On Tue, Feb 23, 2016 at 11:41 AM, Warren Weckesser <
> > > > warren.weckes...@gmail.com> wrote:
> > > > > 
> > > > > 
> > > > > On Tue, Feb 23, 2016 at 11:32 AM, Benjamin Root <
> > > > > ben.v.r...@gmail.com> wrote:
> > > > > > Not exactly sure if this should be a bug or not. This came
> > > > > > up
> > > in
> > > > > > a fairly general function of mine to process satellite
> > > > > > data.
> > > > > > Unexpectedly, one of the satellite files had no scans in
> > > > > > it,
> > > > > > triggering an exception when I tried to reshape the data
> > > > > > from
> > > it.
> > > > > > 
> > > > > > > > > import numpy as np
> > > > > > > > > a = np.zeros((0, 5*64))
> > > > > > > > > a.shape
> > > > > > (0, 320)
> > > > > > > > > a.shape = (0, 5, 64)
> > > > > > > > > a.shape
> > > > > > (0, 5, 64)
> > > > > > > > > a.shape = (0, 5*64)
> > > > > > > > > a.shape = (0, 5, -1)
> > > > > > Traceback (most recent call last):
> > > > > >   File "", line 1, in 
> > > > > > ValueError: total size of new array must be unchanged
> > > > > > 
> > > > > > So, if I know all of the dimensions, I can reshape just
> > > > > > fine.
> > > But
> > > > > > if I wanted to use the nifty -1 semantic, it completely
> > > > > > falls
> > > > > > apart. I can see arguments going either way for whether
> > > > > > this
> > > is a
> > > &g

Re: [Numpy-discussion] reshaping empty array bug?

2016-02-23 Thread Sebastian Berg

On Di, 2016-02-23 at 14:57 -0500, Benjamin Root wrote:
> I'd be more than happy to write up the patch. I don't think it would
> be quite like make zeros be ones, but it would be along those lines.
> One case I need to wrap my head around is to make sure that a 0 would
> happen if the following was true:
> 
> >>> a = np.ones((0, 5*64))
> >>> a.shape = (-1, 5, 64)
> 
> EDIT: Just tried the above, and it works as expected (zero in the
> first dim)!
> 
> Just tried out a couple of other combos:
> >>> a.shape = (-1,)
> >>> a.shape
> (0,)
> >>> a.shape = (-1, 5, 64)
> >>> a.shape
> (0, 5, 64)
> 

Seems right to me on first sight :). (I don't like shape assignments
though, who cares for one extra view). Well, maybe 1 instead of 0
(ignore 0s), but if the result for -1 is to use 1 and the shape is 0
convert the 1 back to 0. But it is starting to sound a bit tricky,
though I think it might be straight forward (i.e. no real traps and
when it works it always is what you expect).
The main point is, whether you can design cases where the conversion
back to 0 hides bugs by not failing when it should. And whether that
would be a tradeoff we are willing to accept.

- Sebastian


> 
> This is looking more and more like a bug to me.
> 
> Ben Root
> 
> 
> On Tue, Feb 23, 2016 at 1:58 PM, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
> > On Di, 2016-02-23 at 11:45 -0500, Benjamin Root wrote:
> > > but, it isn't really ambiguous, is it? The -1 can only refer to a
> > > single dimension, and if you ignore the zeros in the original and
> > new
> > > shape, the -1 is easily solvable, right?
> > 
> > I think if there is a simple logic (like using 1 for all zeros in
> > both
> > input and output shape for the -1 calculation), maybe we could do
> > it. I
> > would like someone to think about it carefully that it would not
> > also
> > allow some unexpected generalizations. And at least I am getting a
> > BrainOutOfResourcesError right now trying to figure that out :).
> > 
> > - Sebastian
> > 
> > 
> > > Ben Root
> > >
> > > On Tue, Feb 23, 2016 at 11:41 AM, Warren Weckesser <
> > > warren.weckes...@gmail.com> wrote:
> > > >
> > > >
> > > > On Tue, Feb 23, 2016 at 11:32 AM, Benjamin Root <
> > > > ben.v.r...@gmail.com> wrote:
> > > > > Not exactly sure if this should be a bug or not. This came up
> > in
> > > > > a fairly general function of mine to process satellite data.
> > > > > Unexpectedly, one of the satellite files had no scans in it,
> > > > > triggering an exception when I tried to reshape the data from
> > it.
> > > > >
> > > > > >>> import numpy as np
> > > > > >>> a = np.zeros((0, 5*64))
> > > > > >>> a.shape
> > > > > (0, 320)
> > > > > >>> a.shape = (0, 5, 64)
> > > > > >>> a.shape
> > > > > (0, 5, 64)
> > > > > >>> a.shape = (0, 5*64)
> > > > > >>> a.shape = (0, 5, -1)
> > > > > Traceback (most recent call last):
> > > > >   File "", line 1, in 
> > > > > ValueError: total size of new array must be unchanged
> > > > >
> > > > > So, if I know all of the dimensions, I can reshape just fine.
> > But
> > > > > if I wanted to use the nifty -1 semantic, it completely falls
> > > > > apart. I can see arguments going either way for whether this
> > is a
> > > > > bug or not.
> > > > >
> > > >
> > > > When you try `a.shape = (0, 5, -1)`, the size of the third
> > > > dimension is ambiguous.  From the Zen of Python:  "In the face
> > of
> > > > ambiguity, refuse the temptation to guess."
> > > >
> > > > Warren
> > > >
> > > >
> > > >
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Ben Root
> > > > >
> > > > > ___
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion@scipy.org
> > > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > > >
> > > >
> > > > ___
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion@scipy.org
> > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] reshaping empty array bug?

2016-02-23 Thread Sebastian Berg

On Di, 2016-02-23 at 11:45 -0500, Benjamin Root wrote:
> but, it isn't really ambiguous, is it? The -1 can only refer to a
> single dimension, and if you ignore the zeros in the original and new
> shape, the -1 is easily solvable, right?

I think if there is a simple logic (like using 1 for all zeros in both
input and output shape for the -1 calculation), maybe we could do it. I
would like someone to think about it carefully that it would not also
allow some unexpected generalizations. And at least I am getting a
BrainOutOfResourcesError right now trying to figure that out :).

- Sebastian


> Ben Root
> 
> On Tue, Feb 23, 2016 at 11:41 AM, Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
> > 
> > 
> > On Tue, Feb 23, 2016 at 11:32 AM, Benjamin Root <
> > ben.v.r...@gmail.com> wrote:
> > > Not exactly sure if this should be a bug or not. This came up in
> > > a fairly general function of mine to process satellite data.
> > > Unexpectedly, one of the satellite files had no scans in it,
> > > triggering an exception when I tried to reshape the data from it.
> > > 
> > > >>> import numpy as np
> > > >>> a = np.zeros((0, 5*64))
> > > >>> a.shape
> > > (0, 320)
> > > >>> a.shape = (0, 5, 64)
> > > >>> a.shape
> > > (0, 5, 64)
> > > >>> a.shape = (0, 5*64)
> > > >>> a.shape = (0, 5, -1)
> > > Traceback (most recent call last):
> > >   File "", line 1, in 
> > > ValueError: total size of new array must be unchanged
> > > 
> > > So, if I know all of the dimensions, I can reshape just fine. But
> > > if I wanted to use the nifty -1 semantic, it completely falls
> > > apart. I can see arguments going either way for whether this is a
> > > bug or not.
> > > 
> > 
> > When you try `a.shape = (0, 5, -1)`, the size of the third
> > dimension is ambiguous.  From the Zen of Python:  "In the face of
> > ambiguity, refuse the temptation to guess."
> > 
> > Warren
> > 
> > 
> > 
> > > 
> > > Thoughts?
> > > 
> > > Ben Root
> > > 
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Sebastian Berg

On Mi, 2016-02-17 at 21:53 +, G Young wrote:
> "Explicit is better than implicit" - can't argue with that.  It
> doesn't seem like the PR has gained much traction, so I'll close it.
> 

Thanks for the effort though! Sometimes we get a bit carried away with
doing fancy stuff, and I guess the idea is likely a bit too fancy for
wide application.

- Sebastian


> On Wed, Feb 17, 2016 at 9:27 PM, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
> > On Mi, 2016-02-17 at 22:10 +0100, Sebastian Berg wrote:
> > > On Mi, 2016-02-17 at 20:48 +, Robert Kern wrote:
> > > > On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoun...@gmail.com>
> > > > wrote:
> > > >
> > > > > Josef: I don't think we are making people think more. 
> > They're
> > > > > all
> > > > keyword arguments, so if you don't want to think about them,
> > then
> > > > you
> > > > leave them as the defaults, and everyone is happy.
> > > >
> > > > I believe that Josef has the code's reader in mind, not the
> > code's
> > > > writer. As a reader of other people's code (and I count 6
> > -months
> > > > -ago
> > > > -me as one such "other people"), I am sure to eventually
> > encounter
> > > > all of the different variants, so I will need to know all of
> > them.
> > > >
> > >
> > > Completely agree. Greg, if you need more then a few minutes to
> > > explain
> > > it in this case, there seems little point. It seems to me even
> > the
> > > worst cases of your examples would be covered by writing code
> > like:
> > >
> > > np.random.randint(np.iinfo(np.uint8).min, 10, dtype=np.uint8)
> > >
> > > And *everyone* will immediately know what is meant with just
> > minor
> > > extra effort for writing it. We should keep the analogy to
> > "range" as
> > > much as possible. Anything going far beyond that, can be
> > confusing.
> > > On
> > > first sight I am not convinced that there is a serious
> > convenience
> > > gain
> > > by doing magic here, but this is a simple case:
> > >
> > > "Explicit is better then implicit"
> > >
> > > since writing the explicit code is easy. It might also create
> > weird
> > > bugs if the completely unexpected (most users would probably not
> > even
> > > realize it existed) happens and you get huge numbers because you
> > > happened to have a `low=0` in there. Especially your point 2)
> > seems
> > > confusing. As for 3) if I see `np.random.randint(high=3)` I think
> > I
> > > would assume [0, 3)
> > >
> > 
> > OK, that was silly, that is what happens of course. So it is
> > explicit
> > in the sense that you have pass in at least one `None` explicitly.
> > 
> > But I am still not sure that the added convenience is big and easy
> > to
> > understand [1], if it was always lowest for low and highest for
> > high, I
> > remember get it, but it seems more complex (though None does also
> > look
> > a a bit like "default" and "default" is 0 for low).
> > 
> > - Sebastian
> > 
> > [1] As in the trade-off between added complexity vs. added
> > convenience.
> > 
> > 
> > > Additionally, I am not sure the maximum int range is such a
> > common
> > > need
> > > anyway?
> > >
> > > - Sebastian
> > >
> > >
> > > > --
> > > > Robert Kern
> > > > ___
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion@scipy.org
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Sebastian Berg

On Mi, 2016-02-17 at 22:10 +0100, Sebastian Berg wrote:
> On Mi, 2016-02-17 at 20:48 +, Robert Kern wrote:
> > On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoun...@gmail.com>
> > wrote:
> > 
> > > Josef: I don't think we are making people think more.  They're
> > > all
> > keyword arguments, so if you don't want to think about them, then
> > you
> > leave them as the defaults, and everyone is happy. 
> > 
> > I believe that Josef has the code's reader in mind, not the code's
> > writer. As a reader of other people's code (and I count 6-months
> > -ago
> > -me as one such "other people"), I am sure to eventually encounter
> > all of the different variants, so I will need to know all of them.
> > 
> 
> Completely agree. Greg, if you need more then a few minutes to
> explain
> it in this case, there seems little point. It seems to me even the
> worst cases of your examples would be covered by writing code like:
> 
> np.random.randint(np.iinfo(np.uint8).min, 10, dtype=np.uint8)
> 
> And *everyone* will immediately know what is meant with just minor
> extra effort for writing it. We should keep the analogy to "range" as
> much as possible. Anything going far beyond that, can be confusing.
> On
> first sight I am not convinced that there is a serious convenience
> gain
> by doing magic here, but this is a simple case:
> 
> "Explicit is better then implicit"
> 
> since writing the explicit code is easy. It might also create weird
> bugs if the completely unexpected (most users would probably not even
> realize it existed) happens and you get huge numbers because you
> happened to have a `low=0` in there. Especially your point 2) seems
> confusing. As for 3) if I see `np.random.randint(high=3)` I think I
> would assume [0, 3)
> 

OK, that was silly, that is what happens of course. So it is explicit
in the sense that you have pass in at least one `None` explicitly.

But I am still not sure that the added convenience is big and easy to
understand [1], if it was always lowest for low and highest for high, I
remember get it, but it seems more complex (though None does also look
a a bit like "default" and "default" is 0 for low).

- Sebastian

[1] As in the trade-off between added complexity vs. added convenience.


> Additionally, I am not sure the maximum int range is such a common
> need
> anyway?
> 
> - Sebastian
> 
> 
> > --
> > Robert Kern
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Sebastian Berg

On Mi, 2016-02-17 at 20:48 +, Robert Kern wrote:
> On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoun...@gmail.com> wrote:
> 
> > Josef: I don't think we are making people think more.  They're all
> keyword arguments, so if you don't want to think about them, then you
> leave them as the defaults, and everyone is happy. 
> 
> I believe that Josef has the code's reader in mind, not the code's
> writer. As a reader of other people's code (and I count 6-months-ago
> -me as one such "other people"), I am sure to eventually encounter
> all of the different variants, so I will need to know all of them.
> 

Completely agree. Greg, if you need more then a few minutes to explain
it in this case, there seems little point. It seems to me even the
worst cases of your examples would be covered by writing code like:

np.random.randint(np.iinfo(np.uint8).min, 10, dtype=np.uint8)

And *everyone* will immediately know what is meant with just minor
extra effort for writing it. We should keep the analogy to "range" as
much as possible. Anything going far beyond that, can be confusing. On
first sight I am not convinced that there is a serious convenience gain
by doing magic here, but this is a simple case:

"Explicit is better then implicit"

since writing the explicit code is easy. It might also create weird
bugs if the completely unexpected (most users would probably not even
realize it existed) happens and you get huge numbers because you
happened to have a `low=0` in there. Especially your point 2) seems
confusing. As for 3) if I see `np.random.randint(high=3)` I think I
would assume [0, 3)

Additionally, I am not sure the maximum int range is such a common need
anyway?

- Sebastian

> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PyData Madrid

2016-02-17 Thread Sebastian Berg

On Mi, 2016-02-17 at 20:59 +0100, Jaime Fernández del Río wrote:
> Hi all,
> 
> I just found out there is a PyData Madrid happening in early April,
> and it would feel wrong not to go, it being my hometown and all.
> 
> Aside from the usual "Who else is going? We should meet!" I was also
> thinking of submitting a proposal for a talk.  My idea was to put
> something together on "The future of NumPy indexing" and use it as an
> opportunity to raise awareness and hopefully gather feedback from
> users on the proposed changes, in sort of a "if the mountain won't
> come to Muhammad" type of thing.
> 

I guess you do know my last name means mountain in german? But if
Muhammed might come, I should really improve my arabic ;).

In any case sounds good to me if you like to do it, I don't think I
will go, though it sounds nice.
There are probably some other bigger things for "the future of NumPy",
both impact and work wise. Such as the dtypes ideas, which might be
nice to mention on such an occasion.

Of course I like feedback on indexing, though (not sure if your ideas
for indexing go further then what I think of right now). That NEP and
code is sitting there after all with a decent chunk done and pretty
much working (though relatively far from finished with testing and
subclasses). Plus we have to make sure we get the details right, and
there a talk may really help too :).

- Sebastian

> Thoughts? Comments? Anyone else going or thinking about going?
> 
> Jaime
> 
> -- 
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
> planes de dominación mundial.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NumPy 1.11.0b3 released.

2016-02-16 Thread Sebastian Berg

On Di, 2016-02-16 at 00:13 -0500, josef.p...@gmail.com wrote:
> 
> 
> On Tue, Feb 16, 2016 at 12:09 AM, <josef.p...@gmail.com> wrote:
> > 
> > 

> > 
> > 
> > Or, it forces everyone to watch out for the color of the ducks :)
> > 
> > It's just a number, whether it's python scalar, numpy scalar, 1D or
> > 2D.
> > And once we squeeze, we cannot iterate over it anymore. 
> > 
> > 
> > This looks like the last problem with have in statsmodels master.
> > Part of the reason that 0.10 hurt quite a bit is that we are using
> > in statsmodels some of the grey zones so we don't have to commit to
> > a specific usage. Even if a user or developer tries a "weird" case,
> > it works for most of the results, but breaks in some unknown
> > places. 
> > 
> > 
> I meant 1.11 here.
>  

The reason for this part is that `arr[np.array([1])]` is very different
from `arr[np.array(1)]`. For `list[np.array([1])]` if you allow
`operator.index(np.array([1]))` you will not get equivalent results for
lists and arrays.

The normal array result cannot work for lists. We had open bug reports
about it. Of course I dislike it in any case ;), but that is the
reasoning behind being a bit more restrictive for `__index__`.

- Sebastian


> > (In the current case a cryptic exception would be raised if the
> > user has two constant columns in the regression. Which is fine for
> > some usecases but not for every result.)
> > 
> > Josef
> >  
> > > 
> > > Chuck
> > > 
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > 
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Subclassing ma.masked_array, code broken after version 1.9

2016-02-15 Thread Sebastian Berg

On Mo, 2016-02-15 at 17:06 +, Gutenkunst, Ryan N - (rgutenk) wrote:
> Thank Jonathan,
> 
> Good to confirm this isn't something inappropriate I'm doing. I give
> up transparency here in my application, so I'll just work around it.
> I leave it up to wiser numpy heads as to whether it's worth altering
> these numpy.ma functions to enable subclassing.
> 

Frankly, when it comes to masked array stuff, at least I am puzzled
most of the time, so input is very welcome.
Most of the people currently contributing, barely use masked arrays as
far as I know, and sometimes it is hard to make good calls. It is a not
the easiest code base and any feedback or nudging is important. A new
release is about to come out, and if you feel it there is a serious
regression, we may want to push for fixing it (or even better, you may
have time to suggest a fix yourself).

- Sebastian


> Best,
> Ryan
> 
> On Feb 13, 2016, at 11:48 AM, Jonathan Helmus <jjhel...@gmail.com>
> wrote:
> 
> > 
> > 
> > On 2/12/16 6:06 PM, Gutenkunst, Ryan N - (rgutenk) wrote:
> > > Hello all,
> > > 
> > > In 2009 I developed an application that uses a subclass of masked
> > > arrays as a central data object. My subclass Spectrum possesses
> > > additional attributes along with many custom methods. It was very
> > > convenient to be able to use standard numpy functions for doing
> > > arithmetic on these objects. However, my code broke with numpy
> > > 1.10. I've finally had a chance to track down the problem, and I
> > > am hoping someone can suggest a workaround.
> > > 
> > > See below for an example, which is as minimal as I could concoct.
> > > In this case, I have a Spectrum object that I'd like to take the
> > > logarithm of using numpy.ma.log, while preserving the value of
> > > the "folded" attribute. Up to numpy 1.9, this worked as expected,
> > > but in numpy 1.10 and 1.11 the attribute is not preserved.
> > > 
> > > The change in behavior appears to be driven by a commit made on
> > > Jun 16th, 2015 by Marten van Kerkwijk. In particular, the commit
> > > changed _MaskedUnaryOperation.__call__ so that the result array's
> > > update_from method is no longer called with the input array as
> > > the argument, but rather the result of the numpy UnaryOperation
> > > (old line 889, new line 885). Because that UnaryOperation doesn't
> > > carry my new attribute, it's not present for update_from to
> > > access. I notice that similar changes were made to
> > > MaskedBinaryOperation, although I haven't tested those. It's not
> > > clear to me from the commit message why this particular change
> > > was made, so I don't know whether this new behavior is
> > > intentional.
> > > 
> > > I know that subclassing arrays isn't widely encouraged, but it
> > > has been very convenient in my code. Is it still possible to
> > > subclass masked_array in such a way that functions like
> > > numpy.ma.log preserve additional attributes? If so, can someone
> > > point me in the right direction?
> > > 
> > > Thanks!
> > > Ryan
> > > 
> > > *** Begin example
> > > 
> > > import numpy
> > > print 'Working with numpy {0}'.format(numpy.__version__)
> > > 
> > > class Spectrum(numpy.ma.masked_array):
> > > def __new__(cls, data, mask=numpy.ma.nomask,
> > > data_folded=None):
> > > subarr = numpy.ma.masked_array(data, mask=mask,
> > > keep_mask=True,
> > >shrink=True)
> > > subarr = subarr.view(cls)
> > > subarr.folded = data_folded
> > > 
> > > return subarr
> > > 
> > > def __array_finalize__(self, obj):
> > > if obj is None:
> > > return
> > > numpy.ma.masked_array.__array_finalize__(self, obj)
> > > self.folded = getattr(obj, 'folded', 'unspecified')
> > > 
> > > def _update_from(self, obj):
> > > print('Input to update_from: {0}'.format(repr(obj)))
> > > numpy.ma.masked_array._update_from(self, obj)
> > > self.folded = getattr(obj, 'folded', 'unspecified')
> > > 
> > > def __repr__(self):
> > > return 'Spectrum(%s, folded=%s)'\
> > > % (str(self), str(self.folded))
> > > 
> > > fs1 = Spectrum([2,3,4.], data_folded=True)
> > > fs2 = numpy.ma.log(fs1)
> > > print('fs2.folded status: {0

Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-15 Thread Sebastian Berg

On So, 2016-02-14 at 23:41 -0800, Antony Lee wrote:
> I wonder whether numpy is using the "old" iteration protocol
> (repeatedly calling x[i] for increasing i until StopIteration is
> reached?)  A quick timing shows that it is indeed slower.
> ... actually it's not even clear to me what qualifies as a sequence
> for `np.array`:
> 
> class C:   
> def __iter__(self):   
> return iter(range(10)) # [0... 9] under the new iteration
> protocol
> def __getitem__(self, i):
> raise IndexError # [] under the old iteration protocol
> 

Numpy currently uses PySequence_Fast, but it has to do a two pass
algorithm (find dtype+dims), and the range is converted twice to list
by this call. That explains the speed advantage of converting to list
manually.

- Sebastian


> np.array(C())
> ===> array(<__main__.C object at 0x7f3f2128>, dtype=object)
> 
> So how can np.array(range(...)) even work?
> 
> 2016-02-14 22:21 GMT-08:00 Ralf Gommers <ralf.gomm...@gmail.com>:
> > 
> > 
> > On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris <
> > charlesr.har...@gmail.com> wrote:
> > > 
> > > 
> > > On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers <
> > > ralf.gomm...@gmail.com> wrote:
> > > > 
> > > > 
> > > > On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <
> > > > antony@berkeley.edu> wrote:
> > > > > re: no reason why...
> > > > > This has nothing to do with Python2/Python3 (I personally
> > > > > stopped using Python2 at least 3 years ago.)  Let me put it
> > > > > this way instead: if Python3's "range" (or Python2's
> > > > > "xrange") was not a builtin type but a type provided by
> > > > > numpy, I don't think it would be controversial at all to
> > > > > provide an `__array__` special method to efficiently convert
> > > > > it to a ndarray.  It would be the same if `np.array` used a
> > > > > `functools.singledispatch` dispatcher rather than an
> > > > > `__array__` special method (which is obviously not possible
> > > > > for chronological reasons).
> > > > > 
> > > > > re: iterable vs iterator: check for the presence of the
> > > > > __next__ special method (or isinstance(x, Iterable) vs.
> > > > > isinstance(x, Iterator) and not isinstance(x, Iterable))
> > > > > 
> > > > I think it's good to do something about this, but it's not
> > > > clear what the exact proposal is. I could image one or both of:
> > > > 
> > > >   - special-case the range() object in array (and
> > > > asarray/asanyarray?) such that array(range(N)) becomes as fast
> > > > as arange(N).
> > > >   - special-case all iterators, such that array(range(N))
> > > > becomes as fast as deque(range(N))
> > > > 
> > > I think the last wouldn't help much, as numpy would still need to
> > > determine dimensions and type.  I assume that is one of the
> > > reason sparse itself doesn't do that.
> > > 
> > Not orders of magnitude, but this shows that there's something to
> > optimize for iterators:
> > 
> > In [1]: %timeit np.array(range(10))
> > 100 loops, best of 3: 14.9 ms per loop
> > 
> > In [2]: %timeit np.array(list(range(10)))
> > 100 loops, best of 3: 9.68 ms per loop
> > 
> > Ralf
> >  
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] resizeable arrays using shared memory?

2016-02-06 Thread Sebastian Berg

On Sa, 2016-02-06 at 16:56 -0600, Elliot Hallmark wrote:
> Hi all,
> 
> I have a program that uses resize-able arrays.  I already over
> -provision the arrays and use slices, but every now and then the data
> outgrows that array and it needs to be resized.  
> 
> Now, I would like to have these arrays shared between processes
> spawned via multiprocessing (for fast interprocess communication
> purposes, not for parallelizing work on an array).  I don't care
> about mapping to a file on disk, and I don't want disk I/O happening.
>   I don't care (really) about data being copied in memory on resize. 
> I *do* want the array to be resized "in place", so that the child
> processes can still access the arrays from the object they were
> initialized with.
> 
> 
> I can share arrays easily using arrays that are backed by memmap. 
> Ie:
> 
> ```
> #Source: http://github.com/rainwoodman/sharedmem 
> 
> 
> class anonymousmemmap(numpy.memmap):
> def __new__(subtype, shape, dtype=numpy.uint8, order='C'):
> 
> descr = numpy.dtype(dtype)
> _dbytes = descr.itemsize
> 
> shape = numpy.atleast_1d(shape)
> size = 1
> for k in shape:
> size *= k
> 
> bytes = int(size*_dbytes)
> 
> if bytes > 0:
> mm = mmap.mmap(-1,bytes)
> else:
> mm = numpy.empty(0, dtype=descr)
> self = numpy.ndarray.__new__(subtype, shape, dtype=descr,
> buffer=mm, order=order)
> self._mmap = mm
> return self
> 
> def __array_wrap__(self, outarr, context=None):
> return
> numpy.ndarray.__array_wrap__(self.view(numpy.ndarray), outarr,
> context)
> ```
> 
> This cannot be resized because it does not own it's own data
> (ValueError: cannot resize this array: it does not own its data). 
> (numpy.memmap has this same issue [0], even if I set refcheck to
> False and even though the docs say otherwise [1]). 
> 
> arr._mmap.resize(x) fails because it is annonymous (error: [Errno 9]
> Bad file descriptor).  If I create a file and use that fileno to
> create the memmap, then I can resize `arr._mmap` but the array itself
> is not resized.
> 
> Is there a way to accomplish what I want?  Or, do I just need to
> figure out a way to communicate new arrays to the child processes?
> 

I guess the answer is no, but the first question should be whether you
can create a new array viewing the same data that is just larger? Since
you have the mmap, that would be creating a new view into it.

I.e. your "array" would be the memmap, and to use it, you always rewrap
it into a new numpy array.

Other then that, you would have to mess with the internal ndarray
structure, since these kind of operations appear rather unsafe.

- Sebastian


> Thanks,
>   Elliot
> 
> [0] https://github.com/numpy/numpy/issues/4198.
> 
> [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.
> resize.html
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy 1.11.0b1 is out

2016-01-31 Thread Sebastian Berg

On Sa, 2016-01-30 at 20:27 +0100, Derek Homeier wrote:
> On 27 Jan 2016, at 1:10 pm, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
> > 
> > On Mi, 2016-01-27 at 11:19 +, Nadav Horesh wrote:
> > > Why the dot function/method is slower than @ on python 3.5.1?
> > > Tested
> > > from the latest 1.11 maintenance branch.
> > > 
> > 
> > The explanation I think is that you do not have a blas
> > optimization. In
> > which case the fallback mode is probably faster in the @ case
> > (since it
> > has SSE2 optimization by using einsum, while np.dot does not do
> > that).
> 
> I am a bit confused now, as A @ c is just short for A.__matmul__(c)
> or equivalent
> to np.matmul(A,c), so why would these not use the optimised blas?
> Also, I am getting almost identical results on my Mac, yet I thought
> numpy would
> by default build against the VecLib optimised BLAS. If I build
> explicitly against
> ATLAS, I am actually seeing slightly slower results.
> But I also saw these kind of warnings on the first timeit runs:
> 
> %timeit A.dot(c)
> The slowest run took 6.91 times longer than the fastest. This could
> mean that an intermediate result is being cached
> 
> and when testing much larger arrays, the discrepancy between matmul
> and dot rather
> increases, so perhaps this is more an issue of a less memory
> -efficient implementation
> in np.dot?

Sorry, I missed the fact that one of the arrays was 3D. In that case I
am not even sure which if the functions call into blas or what else
they have to do, would have to check. Note that `np.dot` uses a
different type of combinging high dimensional arrays. @/matmul
broadcasts extra axes, while np.dot will do the outer combination of
them, so that the result is:

As = A.shape
As.pop(-1)
cs = c.shape
cs.pop(-2)  # if possible
result_shape = As + cs

which happens to be identical if only A.ndim > 2 and c.ndim <= 2.

- Sebastian


> 
> Cheers,
>   Derek
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Bump warning stacklevel

2016-01-28 Thread Sebastian Berg

On Mi, 2016-01-27 at 17:12 -0500, Benjamin Root wrote:
> I like the idea of bumping the stacklevel in principle, but I am not 
> sure it is all that practical. For example, if a warning came up when 
> doing "x / y", I am assuming that it is emitted from within the ufunc 
> np.divide(). So, you would need two stacklevels based on whether the 
> entry point was the operator or a direct call to np.divide()? Also, I 
> would imagine it might get weird for numpy functions called within 
> other numpy functions. Or perhaps I am not totally understanding how 
> this would be done?

You are right of course, and the answer is that I was never planning on
fixing that. This is only for warnings by functions directly called by
the users (or those which always go through one more level, though I
did not check the C-api for such funcs). Also C-Api warnings are
already correct in this regard automatically.

Anyway, I still think it is worth it, even if in practice a lot of
warnings are such things as ufunc warnings from inside a python func.
And there is no real way to change that, that I am aware of, unless
maybe we add a warning_stacklevel argument to those C-api funcs ;).

- Sebastian


> Ben Root
> 
> 
> On Wed, Jan 27, 2016 at 5:07 PM, Ralf Gommers <ralf.gomm...@gmail.com
> > wrote:
> > 
> > 
> > On Wed, Jan 27, 2016 at 11:02 PM, sebastian <
> > sebast...@sipsolutions.net> wrote:
> > > On 2016-01-27 21:01, Ralf Gommers wrote:
> > > >  
> > > > One issue will be how to keep this consistent. `stacklevel` is
> > > > used so
> > > > rarely that new PRs will always omit it for new warnings. Will
> > > > we just
> > > > rely on code review, or would a private wrapper around `warn`
> > > > to use
> > > > inside numpy plus a test that checks that the wrapper is used
> > > > everywhere be helpful here?
> > > > 
> > > > 
> > > Yeah, I mean you could add tests for the individual functions in
> > > principle.
> > > I am not sure if adding an alias helps much, how are we going to
> > > test that
> > > warnings.warn is not being used? Seems like quite a bit of voodoo
> > > necessary
> > > for that.
> > > 
> > I was thinking something along these lines, but with a regexp
> > checking for warnings.warn: https://github.com/scipy/scipy/blob/mas
> > ter/scipy/fftpack/tests/test_import.py
> > 
> > Probably more trouble than it's worth though.
> > 
> > Ralf
> >  
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Bump warning stacklevel

2016-01-27 Thread Sebastian Berg

Hi all,

in my PR about warnings suppression, I currently also have a commit
which bumps the warning stacklevel to two (or three), i.e. use:

warnings.warn(..., stacklevel=2)

(almost) everywhere. This means that for example (take only the empty
warning):

np.mean([])

would not print:

/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:55:
RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)

but instead print the actual `np.mean([])` code line (the repetition of
the warning command is always a bit funny).

The advantage is nicer printing for the user.

The disadvantage would probably mostly be that existing warning filters
that use the `module` keyword argument, will fail.

Any objections/thoughts about doing this change to try to better report
the offending code line? Frankly, I am not sure whether there might be
a python standard about this, but I would expect that for a library
such as numpy, it makes sense to change. But, if downstream uses
warning filters with modules, we might want to reconsider for example.

- Sebastian

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Bump warning stacklevel

2016-01-27 Thread sebastian


On 2016-01-27 21:01, Ralf Gommers wrote:

On Wed, Jan 27, 2016 at 7:26 PM, Sebastian Berg
<sebast...@sipsolutions.net> wrote:


Hi all,

in my PR about warnings suppression, I currently also have a commit
which bumps the warning stacklevel to two (or three), i.e. use:

warnings.warn(..., stacklevel=2)

(almost) everywhere. This means that for example (take only the
empty
warning):

np.mean([])

would not print:

/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:55:
RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)

but instead print the actual `np.mean([])` code line (the repetition
of
the warning command is always a bit funny).

The advantage is nicer printing for the user.

The disadvantage would probably mostly be that existing warning
filters
that use the `module` keyword argument, will fail.

Any objections/thoughts about doing this change to try to better
report
the offending code line?


This has annoyed me for a long time, it's hard now to figure out where
warnings really come from. Especially when running something large
like scipy.test(). So +1.


Frankly, I am not sure whether there might be
a python standard about this, but I would expect that for a library
such as numpy, it makes sense to change. But, if downstream uses
warning filters with modules, we might want to reconsider for
example.


There probably are usages of `module`, but I'd expect that it's used a
lot less than `category` or `message`. A quick search through the
scipy repo gave me only a single case where `module` was used, and
that's in deprecated weave code so soon the count is zero.  Also, even
for relevant usage, nothing will break in a bad way - some more noise
or a spurious test failure in numpy-using code isn't the end of the
world I'd say.

One issue will be how to keep this consistent. `stacklevel` is used so
rarely that new PRs will always omit it for new warnings. Will we just
rely on code review, or would a private wrapper around `warn` to use
inside numpy plus a test that checks that the wrapper is used
everywhere be helpful here?



Yeah, I mean you could add tests for the individual functions in 
principle.
I am not sure if adding an alias helps much, how are we going to test 
that
warnings.warn is not being used? Seems like quite a bit of voodoo 
necessary

for that.

- Sebastian




Ralf


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy 1.11.0b1 is out

2016-01-27 Thread Sebastian Berg

On Mi, 2016-01-27 at 11:19 +, Nadav Horesh wrote:
> Why the dot function/method is slower than @ on python 3.5.1? Tested
> from the latest 1.11 maintenance branch.
> 

The explanation I think is that you do not have a blas optimization. In
which case the fallback mode is probably faster in the @ case (since it
has SSE2 optimization by using einsum, while np.dot does not do that).

Btw. thanks for all the work getting this done Chuck!

- Sebastian


> 
> 
> np.__version__
> Out[39]: '1.11.0.dev0+Unknown'
> 
> 
> %timeit A @ c
> 1 loops, best of 3: 185 µs per loop
> 
> 
> %timeit A.dot(c)
> 1000 loops, best of 3: 526 µs per loop
> 
> 
> %timeit np.dot(A,c)
> 1000 loops, best of 3: 527 µs per loop
> 
> 
> A.dtype, A.shape, A.flags
> Out[43]: 
> (dtype('float32'), (100, 100, 3),   C_CONTIGUOUS : True
>F_CONTIGUOUS : False
>OWNDATA : True
>WRITEABLE : True
>ALIGNED : True
>UPDATEIFCOPY : False)
> 
> 
> c.dtype, c.shape, c.flags
> Out[44]: 
> (dtype('float32'), (3, 3),   C_CONTIGUOUS : True
>F_CONTIGUOUS : False
>OWNDATA : True
>WRITEABLE : True
>ALIGNED : True
>UPDATEIFCOPY : False)
> 
> 
> 
> 
> 
> From: NumPy-Discussion <numpy-discussion-boun...@scipy.org> on behalf
> of Charles R Harris <charlesr.har...@gmail.com>
> Sent: 26 January 2016 22:49
> To: numpy-discussion; SciPy Developers List; SciPy Users List
> Subject: [Numpy-discussion] Numpy 1.11.0b1 is out
>   
> 
> 
> 
> Hi All,
> 
>  I'm pleased to announce that  Numpy 1.11.0b1 is now available on
> sourceforge. This is a source release as the mingw32 toolchain is
> broken. Please test it out and report any errors that you discover.
> Hopefully we can do better with 1.11.0 than we did with 1.10.0 ;)
> 
>  Chuck
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Inconsistent behavior for ufuncs in numpy v1.10.X

2016-01-26 Thread Sebastian Berg

On Di, 2016-01-26 at 17:27 +, Solbrig,Jeremy wrote:
> Hello Chuck,
> 
> I receive the same result with 1.10.4.  I agree that it looks like
> __array_prepare__, __array_finalize__, and __array_wrap__ have not
> been changed.  I’m starting to dig into the source again, but
> focusing on the _MaskedBinaryOperation class to try to understand
> what is going on there.
> 

Well, there was definitely a change in that code that will cause this,
i.e. the code block:

if isinstance(b, MaskedArray):
if isinstance(a, MaskedArray):
result._update_from(a)
else:
result._update_from(b)
elif isinstance(a, MaskedArray):
result._update_from(a)

was changed to something like:

masked_result._update_from(result)

My guess is almost that this fixed some other bug (but you should
probably check git blame to see why it was put in). It might be that
_optinfo should be handled more specifically. It seems like a very
weird feature to me though when the info is always copied from the left
operand...

Is _optinfo even *documented* to exist? Because frankly, unless it is
used far more, and the fact that it is hidden away with an underscore,
I am not sure we should prioritize that it would keep working,
considering that I am unsure that it ever did something very elegantly.

- Sebastian



> Jeremy
> 
> From: NumPy-Discussion <numpy-discussion-boun...@scipy.org> on behalf
> of Charles R Harris <charlesr.har...@gmail.com>
> Reply-To: Discussion of Numerical Python <numpy-discussion@scipy.org>
> Date: Tuesday, January 26, 2016 at 10:17 AM
> To: Discussion of Numerical Python <numpy-discussion@scipy.org>
> Subject: Re: [Numpy-discussion] Inconsistent behavior for ufuncs in
> numpy v1.10.X
> 
> 
> 
> On Tue, Jan 26, 2016 at 10:11 AM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
> > 
> > 
> > On Mon, Jan 25, 2016 at 10:43 PM, Solbrig,Jeremy <
> > jeremy.solb...@colostate.edu> wrote:
> > > Hello,
> > > 
> > > Much of what is below was copied from this stack overflow
> > > question.
> > > I am attempting to subclass numpy.ma.MaskedArray.  I am currently
> > > using Python v2.7.10.  The problem discussed below does not occur
> > > in Numpy v1.9.2, but does occur in all versions of Numpy v1.10.x.
> > > In all versions of Numpy v1.10.x, using mathematical operators on
> > > my subclass behaves differently than using the analogous ufunc.
> > > When using the ufunc directly (e.g. np.subtract(arr1,
> > >  arr2)), __array_prepare__, __array_finalize__, and
> > > __array_wrap__ are all called as expected, however, when using
> > > the symbolic operator (e.g. arr1-arr2) only __array_finalize__ is
> > > called. As a consequence, I lose any information stored in
> > >  arr._optinfo when a mathematical operator is used.
> > > Here is a code snippet that illustrates the issue.
> > > 
> > > #!/bin/env pythonimport numpy as np
> > > from numpy.ma import MaskedArray, nomask
> > > 
> > > class InfoArray(MaskedArray):
> > > def __new__(cls, info=None, data=None, mask=nomask,
> > > dtype=None, 
> > > copy=False, subok=True, ndmin=0, fill_value=None,
> > > keep_mask=True, hard_mask=None, shrink=True,
> > > **kwargs):
> > > obj = super(InfoArray, cls).__new__(cls, data=data,
> > > mask=mask,
> > >   dtype=dtype, copy=copy, subok=subok,
> > > ndmin=ndmin, 
> > >   fill_value=fill_value, hard_mask=hard_mask,
> > >   shrink=shrink, **kwargs)
> > > obj._optinfo['info'] = info
> > > return obj
> > > 
> > > def __array_prepare__(self, out, context=None):
> > > print '__array_prepare__'
> > > return super(InfoArray, self).__array_prepare__(out,
> > > context)
> > > 
> > > def __array_wrap__(self, out, context=None):
> > > print '__array_wrap__'
> > > return super(InfoArray, self).__array_wrap__(out,
> > > context)
> > > 
> > > def __array_finalize__(self, obj):
> > > print '__array_finalize__'
> > > return super(InfoArray, self).__array_finalize__(obj)if
> > > __name__ == "__main__":
> > > arr1 = InfoArray('test', data=[1,2,3,4,5,6])
> > > arr2 = InfoArray(data=[0,1,2,3,4,5])
> > > 
> > > diff1 = np.subtract(arr1, arr2)
> > > print diff1._optinfo
> >

[Numpy-discussion] Testing warnings

2016-01-26 Thread Sebastian Berg

Hi all,

so I have been thinking about this a little more, and I do not think
there is a truly nice solution to the python bug:
http://bugs.python.org/issue4180 (does not create problems for new
pythons).

However, I have been so annoyed by trying to test FutureWarnings or
DeprecationWarnings in the past that I want *some* improvement. You can
do quite a lot by adding some new features, but there are also some
limitations.

I think that we must be able to:

 o Filter out warnings on the global test run level.
 o Find all not explicitly filtered warnings during development easily.
 o We should be able to test any (almost any?) warnings, even those
that would be filtered globally.

The next line of considerations for me is, whether we want:

 o To be able to *print* warnings during test runs? (in release mode)
 o Be able to not repeat filtering of globally filtered warnings when
filtering additional warnings in an individual test?
 o Be able to count warnings, but ignore other warnings (not the global
ones, though).
 o Filter warnings by module? (might be hard to impossible)

And one further option:
 o Could we accept that testing warnings is *only* possible reliable in
Python 3.4+? It would however even mean that we have to fully *skip*
tests that would ensure specific warnings to be given.

The first things can be achieved setting all warnings to errors on the
global level and trying to make the local tests as specific specific as
possible. I could go ahead with it. There will likely be some uglier
points, but it should work. They do not require funnier new hacks.

For all I can see, the second bunch of things requires new features
such as in my current PR. So, I want to know whether we can/want to go
ahead with this kind of idea [1].

For me personally, I cannot accept we do not provide the first points.

The second bunch, I would like some of them (I do not know about
printing warnings in release mode?), and skipping tests on Python 2,
seems to me even worse then ugly hacks.
Getting there is a bit uglier (requires a new context manager for all I
see), an I tend to think it is worth the trouble, but I don't think it
is vital.

In other words, I don't care too much about those points, but I want to
get somewhere because I have been bitten often enough by the annoying
and in my opinion simply unacceptable (on python 2) use of "ignore"
warnings filters in tests. The current state makes finding warnings
given in our own tests almost impossible, in the best case they will
have to be fixed much much later when the change actually occurs, in
the worst case we never find our own real bugs.

So where to go? :)

- Sebastian


[1] I need to fix the module filtering point, the module filtering does
not work reliably currently, I think it can be fixed at least 99.5%,
but it is not too pretty (not that the user should notice).



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Make as_strided result writeonly

2016-01-25 Thread Sebastian Berg

On Mo, 2016-01-25 at 16:11 +0100, Sturla Molden wrote:
> On 23/01/16 22:25, Sebastian Berg wrote:
> 
> > Do you agree with this, or would it be a major inconvenience?
> 
> I think any user of as_strided should be considered a power user.
> This 
> is an inherently dangerous function, that can easily segfault the 
> process. Anyone who uses as_strided should be assumed to have taken
> all 
> precautions.
> 

I am ready to accept this notion (and I guess just put repeated
warnings in the doc string).
However, two things about it, first my impression is that for a lot of
"not really power users" this function sometimes seems like a big
hammer to solve all their problems.
Second, I think even power users have sometimes wrong ideas about
numpy's ufuncs and memory overlap. This starts with operations such as
`arr[1:] += arr[:-1]` (see [1]) (for which there is at least a start on
fixing it), and only gets worse with self-overlapping arrays.

That said, I guess I could agree with you in the regard that there are
so many *other* awful ways to use as_strided, that maybe it really is
just so bad, that improving one thing doesn't actually help anyway ;).
I was actually considering adding a UserWarning when it is likely that
invalid memory is being addressed.

I still dislike that it returns something writable *as default* though.
Unless you write a single element, writing to such an array will be in
many cases unpredictable, no matter how power user you are.

- Sebastian

[1] WARNING: This is a dangerous example, we ideally want it to be
identical to arr[1:] += arr[:-1].copy() always:

In [7]: arr = np.arange(10).reshape(5, 2)
In [8]: arr[1:] += arr[:-1]
In [9]: arr
Out[9]: 
array([[ 0,  1],
   [ 2,  4],
   [ 6,  9],
   [12, 16],
   [20, 25]])

In [10]: arr = np.arange(10)[::-1].copy()[::-1].reshape(5, 2)
In [11]: arr[1:] += arr[:-1]  # happens to be "correct"
In [12]: arr
Out[12]: 
array([[ 0,  1],
   [ 2,  4],
   [ 6,  8],
   [10, 12],
   [14, 16]])

> -1
> 
> Sturla
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Make as_strided result writeonly

2016-01-25 Thread Sebastian Berg

On Mon Jan 25 01:46:55 2016 GMT+0100, Juan Nunez-Iglesias wrote:
> > Yeah, that is a real use case. I am not planning to remove the option,
> > but it would be as a `readonly` keyword argument, which means you would
> > need to make the code depend on the numpy version if you require a
> > writable array [1].
> >
> > [1] as_strided does not currently support arr.flags.writable = True for
> its result array.
> 
> Can you explain this in more detail? I'm writing to the result without
> problem. Anyway, it only occurred to me after your response that the
> deprecation path would be quite disruptive to downstream libraries,
> including scikit-image. My feeling is that unless someone has been bitten
> by this (?), the benefit does not outweigh the cost of deprecation. Perhaps
> something to push to 2.0?
>

Setting the writeable flag to true, when it is set false before does not work 
with the way as_strided works right now with a DummyArray. I guess it is 
because its base is somehow not an ndarray. It should be possible to try/except 
creating it with np.ndarray, then it would be possible in your usecase.

I do not have experience of beeing badly bitten, myself. Would you think it is 
fine if setting the flag to true would work in your case?

 
> On Sun, Jan 24, 2016 at 8:17 PM, Sebastian Berg <sebast...@sipsolutions.net>
> wrote:
> 
> > On So, 2016-01-24 at 13:00 +1100, Juan Nunez-Iglesias wrote:
> > > I've used as_strided before to create an "endless" output array when
> > > I didn't care about the result of an operation, just the side effect.
> > > See eg here. So I would certainly like option to remain to get a
> > > writeable array. In general, I'm sceptical about whether the benefits
> > > outweigh the costs.
> >
> > Yeah, that is a real use case. I am not planning to remove the option,
> > but it would be as a `readonly` keyword argument, which means you would
> > need to make the code depend on the numpy version if you require a
> > writable array [1].
> > This actually somewhat defeats the purpose of all of this, but
> > `np.ndarray` can do this dummy thing for you I think, so you could get
> > around that, but
> >
> > The purpose is that if you actually would use an as_strided array in
> > your operation, the result is unpredictable (not just complicated). And
> > while as_strided is IMO designed to be used by people who know what
> > they are doing, I have a feeling it is being used quite a lot in
> > general.
> >
> > We did a similar thing for the new `broadcast_to`, though I think there
> > we decided to skip the readonly until complains happen.
> >
> > Actually there is one more thing I might do. And that is issue a
> > UserWarning when new array quite likely points to invalid memory.
> >
> > - Sebastian
> >
> >
> > [1] as_strided does not currently support arr.flags.writable = True for
> > its result array.
> >
> >
> > > On Sun, Jan 24, 2016 at 9:20 AM, Nathaniel Smith <n...@pobox.com>
> > > wrote:
> > > > On Sat, Jan 23, 2016 at 1:25 PM, Sebastian Berg
> > > > <sebast...@sipsolutions.net> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I have just opened a PR, to make as_strided writeonly (as
> > > > default). The
> > > >
> > > > I think you meant readonly :-)
> > > >
> > > > > reasoning for this change is that an `as_strided` array often
> > > > have self
> > > > > overlapping memory. However, writing to an array where multiple
> > > > > elements have the identical memory address can be confusing, and
> > > > the
> > > > > results are typically unpredictable.
> > > > >
> > > > > Considering the danger, the proposal is to add a `readonly=True`.
> > > > A
> > > > > poweruser (who that function is designed for anyway), could thus
> > > > still
> > > > > get a writeable array.
> > > > >
> > > > > For the moment, writing to the result would raise a FutureWarning
> > > > with
> > > > > `readonly="warn"`.
> > > >
> > > > This should just be a deprecation warning, right? (Because
> > > > switching
> > > > an array from writeable->readonly might cause previously correct
> > > > code
> > > > to error out, but not to silently start returning different
> > > > results.)
> > > >
> > > >

Re: [Numpy-discussion] Make as_strided result writeonly

2016-01-24 Thread Sebastian Berg

On So, 2016-01-24 at 13:00 +1100, Juan Nunez-Iglesias wrote:
> I've used as_strided before to create an "endless" output array when
> I didn't care about the result of an operation, just the side effect.
> See eg here. So I would certainly like option to remain to get a
> writeable array. In general, I'm sceptical about whether the benefits
> outweigh the costs.

Yeah, that is a real use case. I am not planning to remove the option,
but it would be as a `readonly` keyword argument, which means you would
need to make the code depend on the numpy version if you require a
writable array [1].
This actually somewhat defeats the purpose of all of this, but
`np.ndarray` can do this dummy thing for you I think, so you could get
around that, but

The purpose is that if you actually would use an as_strided array in
your operation, the result is unpredictable (not just complicated). And
while as_strided is IMO designed to be used by people who know what
they are doing, I have a feeling it is being used quite a lot in
general.

We did a similar thing for the new `broadcast_to`, though I think there
we decided to skip the readonly until complains happen.

Actually there is one more thing I might do. And that is issue a
UserWarning when new array quite likely points to invalid memory.

- Sebastian

[1] as_strided does not currently support arr.flags.writable = True for
its result array.

> On Sun, Jan 24, 2016 at 9:20 AM, Nathaniel Smith <n...@pobox.com>
> wrote:
> > On Sat, Jan 23, 2016 at 1:25 PM, Sebastian Berg
> > <sebast...@sipsolutions.net> wrote:
> > >
> > > Hi all,
> > >
> > > I have just opened a PR, to make as_strided writeonly (as
> > default). The
> > 
> > I think you meant readonly :-)
> > 
> > > reasoning for this change is that an `as_strided` array often
> > have self
> > > overlapping memory. However, writing to an array where multiple
> > > elements have the identical memory address can be confusing, and
> > the
> > > results are typically unpredictable.
> > >
> > > Considering the danger, the proposal is to add a `readonly=True`.
> > A
> > > poweruser (who that function is designed for anyway), could thus
> > still
> > > get a writeable array.
> > >
> > > For the moment, writing to the result would raise a FutureWarning
> > with
> > > `readonly="warn"`.
> > 
> > This should just be a deprecation warning, right? (Because
> > switching
> > an array from writeable->readonly might cause previously correct
> > code
> > to error out, but not to silently start returning different
> > results.)
> > 
> > > Do you agree with this, or would it be a major inconvenience?
> > 
> > AFAIK the only use cases for as_strided involve self-overlap (for
> > non-self-overlap you can generally use reshape / indexing / etc.
> > and
> > it's much simpler). So +1 from me.
> > 
> > -n
> > 
> > --
> > Nathaniel J. Smith -- https://vorpus.org
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Make as_strided result writeonly

2016-01-23 Thread Sebastian Berg

Hi all,

I have just opened a PR, to make as_strided writeonly (as default). The
reasoning for this change is that an `as_strided` array often have self
overlapping memory. However, writing to an array where multiple
elements have the identical memory address can be confusing, and the
results are typically unpredictable.

Considering the danger, the proposal is to add a `readonly=True`. A
poweruser (who that function is designed for anyway), could thus still
get a writeable array.

For the moment, writing to the result would raise a FutureWarning with
`readonly="warn"`.

Do you agree with this, or would it be a major inconvenience?

- Sebastian


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-21 Thread Sebastian Berg

On Do, 2016-01-21 at 09:38 +, Robert Kern wrote:
> On Tue, Jan 19, 2016 at 5:35 PM, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
> >
> > On Di, 2016-01-19 at 16:28 +, G Young wrote:
> > > In rand range, it raises an exception if low >= high.
> > >
> > > I should also add that AFAIK enforcing low >= high with floats is
> a
> > > lot trickier than it is for integers.  I have been knee-deep in
> > > corner cases for some time with randint where numbers that are
> > > visually different are cast as the same number by numpy due to
> > > rounding and representation issues.  That situation only gets
> worse
> > > with floats.
> > >
> >
> > Well, actually random.uniform docstring says:
> >
> > Get a random number in the range [a, b) or [a, b] depending on
> > rounding.
> 
> Which docstring are you looking at? The current one says [low, high)
> 

Sorry, I was referring to the python random.uniform function. And as
far as I now understand the current numpy equivalent (intentionally or
not) seems to do the same thing, which suits fine with me.

- Sebastian



> http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.unif
> orm.html#numpy.random.uniform
> 
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Set FutureWarnings to error in (dev) tests?

2016-01-21 Thread Sebastian Berg

Hi all,

should we try to set FutureWarnings to errors in dev tests? I am
seriously annoyed by FutureWarnings getting lost all over for two
reasons. First, it is hard to impossible to find even our own errors
for our own FutureWarning changes. Secondly, we currently would not
even see any Futurewarnings from someone else. For numpy that may not
be a big issue, but still.

So, I was starting to try it, and it is annoying obviously. The only
serious issue, though, seems actually MA [1].
Sometimes one would add a filter for a whole test file (for my start I
did this for the NaT stuff) [2].

Anyway, should we attempt to do this? I admit that trying to make it
work, even *with* the change FutureWarnings suddenly pop up when you
make the warnings being given less often (I will guess that warning was
already issued at import time somewhere).

- Sebastian


[1] And at that a brand new future warning there, which seems too
agressive in any case, though that won't change much.

[2] One annoying thing about this, the filter might never be removed.
One could add a canary maybe to error out when the filter is not needed
anymore.

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Set FutureWarnings to error in (dev) tests?

2016-01-21 Thread Sebastian Berg

On Do, 2016-01-21 at 16:15 -0800, Nathaniel Smith wrote:
> On Thu, Jan 21, 2016 at 4:05 PM, Sebastian Berg
> <sebast...@sipsolutions.net> wrote:
> > Hi all,
> > 
> > should we try to set FutureWarnings to errors in dev tests? I am
> > seriously annoyed by FutureWarnings getting lost all over for two
> > reasons. First, it is hard to impossible to find even our own
> > errors
> > for our own FutureWarning changes. Secondly, we currently would not
> > even see any Futurewarnings from someone else. For numpy that may
> > not
> > be a big issue, but still.
> 
> Yeah, I noticed this recently too :-(. Definitely it is the right
> thing to do, I think. And this is actually more true the more
> annoying
> it is, because if we're triggering lots of FutureWarnings then we
> should fix that :-).
> 

Yeah, the problem is that some FutureWarnings that are given in the
dozens. Injecting the filter on the module level is possible, but not
quite correct. Maybe one could do evil things similar to a "module
decorator" to add the warning context + filter to every single function
in a module starting with "test_".

Another method could be to abuse `__warningregistry__`, but there are
at least two reasons why this is probably not viable (nevermind that it
would be ugly as well).

Doesn't nose maybe provide *something*? I mean seriously, testing
warnings tends to be hell broke lose? Change one thing, suddenly dozens
appear from nowhere, never sure you found all cases, etc.

- Sebastian

> -n
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Set FutureWarnings to error in (dev) tests?

2016-01-21 Thread Sebastian Berg

On Do, 2016-01-21 at 17:07 -0800, Nathaniel Smith wrote:
> Warnings filters can be given a regex matching the warning text, I
> think?

Doesn't cut it, because you need to set the warning to "always", so
then if you don't want to print it, you are stuck

I wrote a context manager + func decorator + class decorator, which can
do it (it overrides how the warning gets printed). Still got to
decorate every weird test or at least test class, or escalate it to the
global level. Needs some more, but that suppression functionality (in
what exact form it could be in the end), is much more reasonable then
our typical warning context, since that drops even printing (though
since we would use Error during development it should not matter much).

The stuff is basically a safe version of:
with warnings.catch_warnings():
warnings.filterwarnings("ignore", )  # oh noe not ignore!

which also still prints other warnings.

- sebastian


> On Jan 21, 2016 5:00 PM, "Sebastian Berg" <sebast...@sipsolutions.net
> > wrote:
> > On Do, 2016-01-21 at 16:51 -0800, Nathaniel Smith wrote:
> > > On Thu, Jan 21, 2016 at 4:44 PM, Sebastian Berg
> > > <sebast...@sipsolutions.net> wrote:
> > > > On Do, 2016-01-21 at 16:15 -0800, Nathaniel Smith wrote:
> > > > > On Thu, Jan 21, 2016 at 4:05 PM, Sebastian Berg
> > > > > <sebast...@sipsolutions.net> wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > should we try to set FutureWarnings to errors in dev tests?
> > I
> > > > > > am
> > > > > > seriously annoyed by FutureWarnings getting lost all over
> > for
> > > > > > two
> > > > > > reasons. First, it is hard to impossible to find even our
> > own
> > > > > > errors
> > > > > > for our own FutureWarning changes. Secondly, we currently
> > would
> > > > > > not
> > > > > > even see any Futurewarnings from someone else. For numpy
> > that
> > > > > > may
> > > > > > not
> > > > > > be a big issue, but still.
> > > > >
> > > > > Yeah, I noticed this recently too :-(. Definitely it is the
> > right
> > > > > thing to do, I think. And this is actually more true the more
> > > > > annoying
> > > > > it is, because if we're triggering lots of FutureWarnings
> > then we
> > > > > should fix that :-).
> > > > >
> > > >
> > > > Yeah, the problem is that some FutureWarnings that are given in
> > the
> > > > dozens. Injecting the filter on the module level is possible,
> > but
> > > > not
> > > > quite correct. Maybe one could do evil things similar to a
> > "module
> > > > decorator" to add the warning context + filter to every single
> > > > function
> > > > in a module starting with "test_".
> > >
> > > Can we remove the FutureWarnings by making whatever change
> > they're
> > > warning about? :-)
> > >
> > 
> > That would be equivalent of not issueing the futurewarning at all
> > ;).
> > Well, you can write a context manager and put this stuff into the
> > 
> > if __name__ == '__main__':
> > 
> > block actually. It is still annoying, since ideally we want to
> > filter a
> > single warning, which means the necessaty to install our own
> > warning
> > printer.
> > 
> > 
> > > > Another method could be to abuse `__warningregistry__`, but
> > there
> > > > are
> > > > at least two reasons why this is probably not viable (nevermind
> > > > that it
> > > > would be ugly as well).
> > > >
> > > > Doesn't nose maybe provide *something*? I mean seriously,
> > testing
> > > > warnings tends to be hell broke lose? Change one thing,
> > suddenly
> > > > dozens
> > > > appear from nowhere, never sure you found all cases, etc.
> > >
> > > AFAICT nose doesn't provide much of anything for working with
> > > warnings. :-/
> > >
> > > -n
> > > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

1 2 3 4 5 6 7 8 9 >

1 - 100 of 802 matches

Mail list logo