Re: [Numpy-discussion] PyData Barcelona this May

2017-03-23 Thread Jaime Fernández del Río
Thanks for the suggestion!

That abstract is limited to 400 characters, and it's already at 350+, so
there isn't much room left.
If it gets accepted I will eventually need to fill out an extended
abstract, where I will make sure to explain that better.

Jaime

On Tue, Mar 21, 2017 at 3:45 PM, Daπid <davidmen...@gmail.com> wrote:

> On 20 March 2017 at 19:58, Jaime Fernández del Río <jaime.f...@gmail.com>
> wrote:
> >
> > Taking NumPy In Stride
> > This workshop is aimed at users already familiar with NumPy. We will
> dissect
> > the NumPy memory model with the help of a very powerful abstraction:
> > strides.
> > Participants will learn how to create different views out of the same
> data,
> > including multidimensional ones, get a new angle on how and why
> broadcasting
> > works, and explore related techniques to write faster, more efficient
> code.
>
> I think I only understand this abstract because I know what views are.
> Maybe you could add a line explaining what they are? (I cannot think
> of one myself).
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PyData Barcelona this May

2017-03-21 Thread Jaime Fernández del Río
On Mon, Mar 20, 2017 at 10:13 PM, Chris Barker <chris.bar...@noaa.gov>
wrote:

> On Mon, Mar 20, 2017 at 11:58 AM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>>  I have just submitted a workshop proposal with the following short
>> description:
>>
>> Taking NumPy In Stride
>> This workshop is aimed at users already familiar with NumPy. We will
>> dissect
>> the NumPy memory model with the help of a very powerful abstraction:
>> strides.
>> Participants will learn how to create different views out of the same
>> data,
>> including multidimensional ones, get a new angle on how and why
>> broadcasting
>> works, and explore related techniques to write faster, more efficient
>> code.
>>
>
> I'd go!
>
> And nice title :-)
>
> Any thoughts on a similar one for SciPy in Austin?
>

I'll be more than happy to share presentations, notebooks and whatnot with
someone wanting to run the tutorial over there. But Austin is a looong way
from Zürich, and the dates conflict with my son's birthday, so I don't
think I will be going...

Jaime


>
> -CHB
>
>
>
>
>
>> Let's see what the organizers think of it...
>>
>> Jaime
>>
>>
>> On Fri, Mar 17, 2017 at 10:59 PM, Ralf Gommers <ralf.gomm...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sat, Mar 18, 2017 at 8:41 AM, Chris Barker <chris.bar...@noaa.gov>
>>> wrote:
>>>
>>>> On Fri, Mar 17, 2017 at 4:37 AM, Jaime Fernández del Río <
>>>> jaime.f...@gmail.com> wrote:
>>>>
>>>>>
>>>>>- many people that use numpy in their daily work don't know what
>>>>>strides are, this was a BIG surprise for me.
>>>>>
>>>>> I'm not surprised at all. To start with, the majority of users are
>>> self-taught programmers that never used something lower level than Python
>>> or Matlab. Even talking to them about memory layout presents challenges.
>>>
>>>
>>>>
>>>>>-
>>>>>
>>>>> Based on that experience, I was thinking that maybe a good topic for a
>>>>> workshop would be NumPy's memory model: views, reshaping, strides, some
>>>>> hints of buffering in the iterator...
>>>>>
>>>>
>>> This material has been used multiple times in EuroScipy tutorials and
>>> may be of use: http://www.scipy-lectures.org/
>>> advanced/advanced_numpy/index.html
>>>
>>> Ralf
>>>
>>>
>>>
>>>> I think this is a great idea. In fact, when I do an intro to numpy, I
>>>> spend a bit of time on those issues, 'cause I think it's key to "Getting"
>>>> numpy, and not something that people end up learning on their own from
>>>> tutorials, etc. However, in my  case, I try to jam it into a low-level
>>>> intro, and I think that fails :-(
>>>>
>>>> So doing it on it's own with the assumption that participant already
>>>> know the basics of the high level python interface is a great idea.
>>>>
>>>> Maybe a "advanced" numpy tutorial for SciPy 2017 in Austin also???
>>>>
>>>> Here is my last talk -- maybe it'll be helpful.
>>>>
>>>> http://uwpce-pythoncert.github.io/SystemDevelopment/scipy.html#scipy
>>>>
>>>> the strides stuff is covered in a notebook here:
>>>>
>>>> https://github.com/UWPCE-PythonCert/SystemDevelopment/blob/m
>>>> aster/Examples/numpy/stride_tricks.ipynb
>>>>
>>>> other notebooks here:
>>>>
>>>> https://github.com/UWPCE-PythonCert/SystemDevelopment/tree/m
>>>> aster/Examples/numpy
>>>>
>>>> and the source for the whole thing is here:
>>>>
>>>> https://github.com/UWPCE-PythonCert/SystemDevelopment/blob/m
>>>> aster/slides_sources/source/scipy.rst
>>>>
>>>>
>>>> All licensed under: Creative Commons Attribution-ShareAlike -- so
>>>> please use anything you find useful.
>>>>
>>>> -CHB
>>>>
>>>>
>>>>
>>>> And Julian's temporary work lends itself to a very nice talk, more on
>>>>> Python internals than on NumPy, but it's a very cool subject nonetheless.
>>>>>
>>>>> So my thinking is that I am going to propose those two, as a workshop
>>>>> and a talk. Thoughts?
>>>>>
&

Re: [Numpy-discussion] PyData Barcelona this May

2017-03-20 Thread Jaime Fernández del Río
Thanks for the links, Chris and Ralf, will be very helpful eventually. I
have just submitted a workshop proposal with the following short
description:

Taking NumPy In Stride
This workshop is aimed at users already familiar with NumPy. We will dissect
the NumPy memory model with the help of a very powerful abstraction:
strides.
Participants will learn how to create different views out of the same data,
including multidimensional ones, get a new angle on how and why broadcasting
works, and explore related techniques to write faster, more efficient code.

Let's see what the organizers think of it...

Jaime


On Fri, Mar 17, 2017 at 10:59 PM, Ralf Gommers <ralf.gomm...@gmail.com>
wrote:

>
>
> On Sat, Mar 18, 2017 at 8:41 AM, Chris Barker <chris.bar...@noaa.gov>
> wrote:
>
>> On Fri, Mar 17, 2017 at 4:37 AM, Jaime Fernández del Río <
>> jaime.f...@gmail.com> wrote:
>>
>>>
>>>- many people that use numpy in their daily work don't know what
>>>strides are, this was a BIG surprise for me.
>>>
>>> I'm not surprised at all. To start with, the majority of users are
> self-taught programmers that never used something lower level than Python
> or Matlab. Even talking to them about memory layout presents challenges.
>
>
>>
>>>-
>>>
>>> Based on that experience, I was thinking that maybe a good topic for a
>>> workshop would be NumPy's memory model: views, reshaping, strides, some
>>> hints of buffering in the iterator...
>>>
>>
> This material has been used multiple times in EuroScipy tutorials and may
> be of use: http://www.scipy-lectures.org/advanced/advanced_numpy/index.
> html
>
> Ralf
>
>
>
>> I think this is a great idea. In fact, when I do an intro to numpy, I
>> spend a bit of time on those issues, 'cause I think it's key to "Getting"
>> numpy, and not something that people end up learning on their own from
>> tutorials, etc. However, in my  case, I try to jam it into a low-level
>> intro, and I think that fails :-(
>>
>> So doing it on it's own with the assumption that participant already know
>> the basics of the high level python interface is a great idea.
>>
>> Maybe a "advanced" numpy tutorial for SciPy 2017 in Austin also???
>>
>> Here is my last talk -- maybe it'll be helpful.
>>
>> http://uwpce-pythoncert.github.io/SystemDevelopment/scipy.html#scipy
>>
>> the strides stuff is covered in a notebook here:
>>
>> https://github.com/UWPCE-PythonCert/SystemDevelopment/blob/
>> master/Examples/numpy/stride_tricks.ipynb
>>
>> other notebooks here:
>>
>> https://github.com/UWPCE-PythonCert/SystemDevelopment/tree/
>> master/Examples/numpy
>>
>> and the source for the whole thing is here:
>>
>> https://github.com/UWPCE-PythonCert/SystemDevelopment/blob/
>> master/slides_sources/source/scipy.rst
>>
>>
>> All licensed under: Creative Commons Attribution-ShareAlike -- so please
>> use anything you find useful.
>>
>> -CHB
>>
>>
>>
>> And Julian's temporary work lends itself to a very nice talk, more on
>>> Python internals than on NumPy, but it's a very cool subject nonetheless.
>>>
>>> So my thinking is that I am going to propose those two, as a workshop
>>> and a talk. Thoughts?
>>>
>>> Jaime
>>>
>>> On Thu, Mar 9, 2017 at 8:29 PM, Sebastian Berg <
>>> sebast...@sipsolutions.net> wrote:
>>>
>>>> On Thu, 2017-03-09 at 15:45 +0100, Jaime Fernández del Río wrote:
>>>> > There will be a PyData conference in Barcelona this May:
>>>> >
>>>> > http://pydata.org/barcelona2017/
>>>> >
>>>> > I am planning on attending, and was thinking of maybe proposing to
>>>> > organize a numpy-themed workshop or tutorial.
>>>> >
>>>> > My personal inclination would be to look at some advanced topic that
>>>> > I know well, like writing gufuncs in Cython, but wouldn't mind doing
>>>> > a more run of the mill thing. Anyone has any thoughts or experiences
>>>> > on what has worked well in similar situations? Any specific topic you
>>>> > always wanted to attend a workshop on, but were afraid to ask?
>>>> >
>>>> > Alternatively, or on top of the workshop, I could propose to do a
>>>> > talk: talking last year at PyData Madrid about the new indexing was a
>>>> > lot of fun! Thing is, I have been quite disconn

Re: [Numpy-discussion] PyData Barcelona this May

2017-03-17 Thread Jaime Fernández del Río
Last night I gave a short talk to the PyData Zürich meetup on Julian's
temporary elision PR, and Pauli's overlapping memory one. My learnings from
that experiment are:

   - there is no way to talk about both things in a 30 minute talk: I
   barely scraped the surface and ended up needing 25 minutes.
   - many people that use numpy in their daily work don't know what strides
   are, this was a BIG surprise for me.

Based on that experience, I was thinking that maybe a good topic for a
workshop would be NumPy's memory model: views, reshaping, strides, some
hints of buffering in the iterator...

And Julian's temporary work lends itself to a very nice talk, more on
Python internals than on NumPy, but it's a very cool subject nonetheless.

So my thinking is that I am going to propose those two, as a workshop and a
talk. Thoughts?

Jaime

On Thu, Mar 9, 2017 at 8:29 PM, Sebastian Berg <sebast...@sipsolutions.net>
wrote:

> On Thu, 2017-03-09 at 15:45 +0100, Jaime Fernández del Río wrote:
> > There will be a PyData conference in Barcelona this May:
> >
> > http://pydata.org/barcelona2017/
> >
> > I am planning on attending, and was thinking of maybe proposing to
> > organize a numpy-themed workshop or tutorial.
> >
> > My personal inclination would be to look at some advanced topic that
> > I know well, like writing gufuncs in Cython, but wouldn't mind doing
> > a more run of the mill thing. Anyone has any thoughts or experiences
> > on what has worked well in similar situations? Any specific topic you
> > always wanted to attend a workshop on, but were afraid to ask?
> >
> > Alternatively, or on top of the workshop, I could propose to do a
> > talk: talking last year at PyData Madrid about the new indexing was a
> > lot of fun! Thing is, I have been quite disconnected from the project
> > this past year, and can't really think of any worthwhile topic. Is
> > there any message that we as a project would like to get out to the
> > larger community?
> >
>
> Francesc already pointed out the temporary optimization. From what I
> remember, my personal highlight would probably be Pauli's work on the
> memory overlap detection. Though both are rather passive improvements I
> guess (you don't really have to learn them to use them), its very cool!
> And if its about highlighting new stuff, these can probably easily fill
> a talk.
>
> > And if you are planning on attending, please give me a shout.
> >
>
> Barcelona :). Maybe I should think about it, but probably not.
>
>
> > Thanks,
> >
> > Jaime
> >
> > --
> > (\__/)
> > ( O.o)
> > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
> > planes de dominación mundial.
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] PyData Barcelona this May

2017-03-09 Thread Jaime Fernández del Río
There will be a PyData conference in Barcelona this May:

http://pydata.org/barcelona2017/

I am planning on attending, and was thinking of maybe proposing to organize
a numpy-themed workshop or tutorial.

My personal inclination would be to look at some advanced topic that I know
well, like writing gufuncs in Cython, but wouldn't mind doing a more run of
the mill thing. Anyone has any thoughts or experiences on what has worked
well in similar situations? Any specific topic you always wanted to attend
a workshop on, but were afraid to ask?

Alternatively, or on top of the workshop, I could propose to do a talk:
talking last year at PyData Madrid about the new indexing was a lot of fun!
Thing is, I have been quite disconnected from the project this past year,
and can't really think of any worthwhile topic. Is there any message that
we as a project would like to get out to the larger community?

And if you are planning on attending, please give me a shout.

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] StackOverflow documentation

2016-07-21 Thread Jaime Fernández del Río
StackOverflow now also has documentation, and there already is a NumPy tag:

http://stackoverflow.com/documentation/numpy

Not sure what, if anything, do we want to do with this, nor how to handle
not having to different sources with the same information. Any thoughts?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Python 3 dict support (issue #5718)

2016-07-21 Thread Jaime Fernández del Río
On Wed, Jul 20, 2016 at 9:39 PM, Hannah <story...@gmail.com> wrote:

> I second (and third & fourth &...) this
>
> Thanks so much for this, Jaime, it's exactly what I was looking for and
> couldn't find. Maybe this can be linked to in the contribute docs as an
> "orienting yourself in the codebase" page?
>
Glad to know it helped, realizing how this works was one of the big leaps
in understanding the code base for myself too.  We (my wife and I) have
sent the kids on vacation with their grandparents, so when we are done
enjoying our shortly recovered freedom hunting for Pokemons around Zurich,
I'll try to find time to write this up in a form suitable for the docs,
hopefully over the next couple of weeks.

I have been known to not keep commitments like this in the past, so don't
hold your breath just in case...

Jaime


>
> On Wed, Jul 20, 2016 at 11:07 AM, Joseph Fox-Rabinovitz <
> jfoxrabinov...@gmail.com> wrote:
>
>> Jaime,
>>
>> This is a great intro for people looking to jump into the C side of
>> things. I have been trying to figure out which bits are the important
>> ones from looking at the code and the docs. Your post cut out most of
>> the confusion. Is there some way you would consider adding something
>> like this this to the docs?
>>
>> -Joe
>>
>>
>> On Wed, Jul 20, 2016 at 8:52 AM, Jaime Fernández del Río
>> <jaime.f...@gmail.com> wrote:
>> > On Wed, Jul 20, 2016 at 4:28 AM, Hannah <story...@gmail.com> wrote:
>> >>
>> >> Hi,
>> >> I started venturing down the rabbit hole of trying to write a patch to
>> add
>> >> support for numpy to convert python 3 dictionary keys
>> >> (collections.abc.ViewMapping objects), which is open issue #5718 and am
>> >> having trouble orienting myself. I'm unclear as to where the python
>> entry
>> >> point into array is (basically, what function np.array drops into and
>> if
>> >> this is in Python or C) and where/what language (fine with writing
>> either) a
>> >> patch that supports MappingViews would go. Any help getting oriented
>> would
>> >> be much appreciated.
>> >
>> >
>> > Hi Hannah,
>> >
>> > ǹp.array is written in C, and is part of the multiarray module that is
>> > defined here:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c
>> >
>> > The "array" name is mapped here:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c#L4093
>> >
>> > to the function _array_fromobject defined here:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c#L1557
>> >
>> > That functions does some checking and has a couple of fast paths for the
>> > case where the input is already an array or a subclass, but for the
>> general
>> > case it relies on PyArray_CheckFromAny:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1848
>> >
>> > which in turn calls Pyarray_FromAny:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1674
>> >
>> > You will also haveto take a look at what goes on in
>> > PyArray_GetArrayParamsFromObject, which gets called by PyArray_FromAny;
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1428
>> >
>> > as well as several other places, but I think they are all (or most of
>> them)
>> > in ctors.c.
>> >
>> > You may also want to take a llok at PyArray_FromIter, which is the
>> function
>> > that ultimately takes care of calls to np.fromiter:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L3657
>> >
>> > It's messy, but not that bad once you get used to it: good luck!
>> >
>> > Jaime
>> >
>> >>
>> >>
>> >> The reasoning for the patch is s that dicts are one of the most common
>> >> Python datatypes and this specifically is because of an upstream issue
>> of
>> >> wanting dict support in matplotlib.
>> >>
>> >> Thanks,
>> >> Hannah
>> >>
>> >> __

Re: [Numpy-discussion] Python 3 dict support (issue #5718)

2016-07-20 Thread Jaime Fernández del Río
On Wed, Jul 20, 2016 at 4:28 AM, Hannah  wrote:

> Hi,
> I started venturing down the rabbit hole of trying to write a patch to add
> support for numpy to convert python 3 dictionary keys
> (collections.abc.ViewMapping objects), which is open issue #5718 and am
> having trouble orienting myself. I'm unclear as to where the python entry
> point into array is (basically, what function np.array drops into and if
> this is in Python or C) and where/what language (fine with writing either)
> a patch that supports MappingViews would go. Any help getting oriented
> would be much appreciated.
>

Hi Hannah,

ǹp.array is written in C, and is part of the multiarray module that is
defined here:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c

The "array" name is mapped here:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c#L4093

to the function _array_fromobject defined here:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c#L1557

That functions does some checking and has a couple of fast paths for the
case where the input is already an array or a subclass, but for the general
case it relies on PyArray_CheckFromAny:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1848

which in turn calls Pyarray_FromAny:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1674

You will also haveto take a look at what goes on in
PyArray_GetArrayParamsFromObject, which gets called by PyArray_FromAny;

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1428

as well as several other places, but I think they are all (or most of them)
in ctors.c.

You may also want to take a llok at PyArray_FromIter, which is the function
that ultimately takes care of calls to np.fromiter:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L3657

It's messy, but not that bad once you get used to it: good luck!

Jaime


>
> The reasoning for the patch is s that dicts are one of the most common
> Python datatypes and this specifically is because of an upstream issue of
> wanting dict support in matplotlib.
>
> Thanks,
> Hannah
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.dtype has the wrong size, try recompiling

2016-07-05 Thread Jaime Fernández del Río
This question on Stack Overflow:

http://stackoverflow.com/questions/38197086/sklearn-numpy-dtype-has-the-wrong-size-try-recompiling-in-both-pycharm-and-te

If I remember correctly this has something to do with Cython, right? Can't
remeber the details though, can someone help that poor soul out?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: compute many inner products quickly

2016-06-06 Thread Jaime Fernández del Río
On Mon, Jun 6, 2016 at 9:35 AM, Sebastian Berg 
wrote:

> On So, 2016-06-05 at 19:20 -0600, Charles R Harris wrote:
> >
> >
> > On Sun, Jun 5, 2016 at 6:41 PM, Stephan Hoyer 
> > wrote:
> > > If possible, I'd love to add new functions for "generalized ufunc"
> > > linear algebra, and then deprecate (or at least discourage) using
> > > the older versions with inferior broadcasting rules. Adding a new
> > > keyword arg means we'll be stuck with an awkward API for a long
> > > time to come.
> > >
> > > There are three types of matrix/vector products for which ufuncs
> > > would be nice:
> > > 1. matrix-matrix product (covered by matmul)
> > > 2. matrix-vector product
> > > 3. vector-vector (inner) product
> > >
> > > It's straightful to implement either of the later two options by
> > > inserting dummy dimensions and then calling matmul, but that's a
> > > pretty awkward API, especially for inner products. Unfortunately,
> > > we already use the two most obvious one word names for vector inner
> > > products (inner and dot). But on the other hand, one word names are
> > > not very descriptive, and the short name "dot" probably mostly
> > > exists because of the lack of an infix operator.
> > >
> > > So I'll start by throwing out some potential new names:
> > >
> > > For matrix-vector products:
> > > matvecmul (if it's worth making a new operator)
> > >
> > > For inner products:
> > > vecmul (similar to matmul, but probably too ambiguous)
> > > dot_product
> > > inner_prod
> > > inner_product
> > >
> > I was using mulmatvec, mulvecmat, mulvecvec back when I was looking
> > at this. I suppose the mul could also go in the middle, or maybe
> > change it to x and put it in the middle: matxvec, vecxmat, vecxvec.
> >
>
> Were not some of this part of the gufunc linalg functions and we just
> removed it because we were not sure about the API? Not sure anymore,
> but might be worth to have a look.
>

We have

from numpy.core.umath_tests import inner1d

which does vectorized vector-vector multiplication, but it's undocumented.
There is also a matrix_multiply in that same module that does the obvious
thing.

And when gufuncs were introduced in linalg, there were a bunch of functions
doing all sorts of operations without intermediate storage, e.g. sum3(a, b,
c) -> a + b + c, that were removed before merging the PR. Wasn't involved
at the time, so not sure what the rationale was.

Since we are at it, should quadratic/bilinear forms get their own function
too?  That is, after all, what the OP was asking for.

Jaime
-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Developer Meeting at EuroScipy?

2016-05-31 Thread Jaime Fernández del Río
On Tue, May 31, 2016 at 8:37 AM, Sebastian Berg 
wrote:

> Hi all,
>
> since we had decided to do a regular developers meeting last year, how
> would EuroScipy (Aug. 23.-27., Erlangen, Germany) look as a possible
> place and time to have one?
> I believe EuroScipy would include a few people who were not able to
> come to SciPy last year, and it seems SciPy itself already sees some of
> the devs quite busy anyway.
>

I will not be able to attend the full conference, but it's only a 10 hour
train ride from what I now call home, so I am positive I can be there for a
full day meeting, no matter what day of that week it is.

Jaime


>
> Not having checked back for room availability at the conference or
> anything, one option may be:
>
> August 24. (parallel to the second/last day of Tutorials)
>
> Does this seem like a good plan or do you have alternative suggestions
> or scheduling difficulties with this?
>
> Regards,
>
> Sebastian
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cross-correlation PR stuck in limbo

2016-05-27 Thread Jaime Fernández del Río
I did an overall review of the code a couple of weeks ago (see the PR for
details), and there is quite some work to be done before we can merge
Honi's code. But if he can find the time to work on the coding, I'll try to
be more diligent about the reviewing.

Jaime

On Fri, May 27, 2016 at 12:51 PM, Elliot Hallmark 
wrote:

> +1
>
> This would really help with large data sets in certain situations.
>
> Is there still disagreement about whether this should be included? Or are
> there some minor details still? Or just lost in the shuffle?
>
> Hopefully,
>   Elliot
>
> On Wed, May 4, 2016 at 7:07 AM, Pierre Haessig 
> wrote:
>
>> Hi,
>>
>> I don't know how to push the PR forward, but all I can say is that this
>> maxlag feature would be a major improvement for using Numpy in time
>> series analysis! Immediate benefits downstream for Matplotlib and
>> statsmodel.
>>
>> Thanks Honi for having taken the time to implement this!
>>
>> best,
>> Pierre
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Starting work on ufunc rewrite

2016-04-01 Thread Jaime Fernández del Río
On Thu, Mar 31, 2016 at 10:14 PM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> There is certainly good precedent for the approach you suggest.
> Shortly after Nathaniel mentioned the rewrite to me, I looked up
> d-pointers as a possible technique: https://wiki.qt.io/D-Pointer.
>

Yes, the idea is similar, although somewhat simpler since we are doing C,
not C++.


>
> If we allow arbitrary kwargs for the new functions, is that something
> you would want to note in the public structure? I was thinking
> something along the lines of adding a hook to process additional
> kwargs and return a void * that would then be passed to the loop.
>

I'm not sure I understand what you mean... But I also don't think it is
very relevant at this point? What I intend to do is simply to hide the guts
of ufuncs, breaking everyone's code once... so that we can later change
whatever we want without breaking anything else. PyUFunc_GenericFunction
already takes *args and **kwargs, and the internal logic of how these get
processed can be modified at will. If what you are proposing is to create a
PyUFunc_FromFuncAndDataAndSignatureAndKwargProcessor API function that
would provide a customized function to process extra kwargs and somehow
pass them into the actual ufunc loop, that would just be an API extension,
and there shouldn't be any major problem in introducing that whenever,
especially once we are free to modify the internal representation of ufuncs
without breaking ABI compatibility.


> To do this incrementally, perhaps opening a special development branch
> on the main repository is in order?
>

Yes, something like that seems like the right thing to do indeed. I would
like someone with more git foo than me to spell out the details of how we
would create and eventually merge that branch.


>
> I would love to join in the fun as time permits. Unfortunately, it is
> not especially permissive right about now. I will at least throw in
> some ideas that I have been mulling over.
>

Please do!

Jaime


>
>     -Joe
>
>
> On Thu, Mar 31, 2016 at 4:00 PM, Jaime Fernández del Río
> <jaime.f...@gmail.com> wrote:
> > I have started discussing with Nathaniel the implementation of the ufunc
> ABI
> > break that he proposed in a draft NEP a few months ago:
> >
> > http://thread.gmane.org/gmane.comp.python.numeric.general/61270
> >
> > His original proposal was to make the public portion of PyUFuncObject be:
> >
> > typedef struct {
> > PyObject_HEAD
> > int nin, nout, nargs;
> > } PyUFuncObject;
> >
> > Of course the idea is that internally we would use a much larger struct
> that
> > we could change at will, as long as its first few entries matched those
> of
> > PyUFuncObject. My problem with this, and I may very well be missing
> > something, is that in PyUFunc_Type we need to set the tp_basicsize to the
> > size of the extended struct, so we would end up having to expose its
> > contents. This is somewhat similar to what now happens with
> PyArrayObject:
> > anyone can #include "ndarraytypes.h", cast PyArrayObject* to
> > PyArrayObjectFields*, and access the guts of the struct without using the
> > supplied API inline functions. Not the end of the world, but if you want
> to
> > make something private, you might as well make it truly private.
> >
> > I think it would be to have something similar to what NpyIter does::
> >
> > typedef struct {
> > PyObject_HEAD
> > NpyUFunc *ufunc;
> > } PyUFuncObject;
> >
> > where NpyUFunc would, at this level, be an opaque type of which nothing
> > would be known. We could have some of the NpyUFunc attributes cached on
> the
> > PyUFuncObject struct for easier access, as is done in
> NewNpyArrayIterObject.
> > This would also give us more liberty in making NpyUFunc be whatever we
> want
> > it to be, including a variable-sized memory chunk that we could use and
> > access at will. NpyIter is again a good example, where rather than
> storing
> > pointers to strides and dimensions arrays, these are made part of the
> > NpyIter memory chunk, effectively being equivalent to having variable
> sized
> > arrays as part of the struct. And I think we will probably no longer
> trigger
> > the Cython warnings about size changes either.
> >
> > Any thoughts on this approach? Is there anything fundamentally wrong with
> > what I'm proposing here?
> >
> > Also, this is probably going to end up being a rewrite of a pretty large
> and
> > complex codebase. I am not sure that working on this on my own and
> > eventually sending a humongous PR is the 

[Numpy-discussion] Starting work on ufunc rewrite

2016-03-31 Thread Jaime Fernández del Río
I have started discussing with Nathaniel the implementation of the ufunc
ABI break that he proposed in a draft NEP a few months ago:

http://thread.gmane.org/gmane.comp.python.numeric.general/61270

His original proposal was to make the public portion of PyUFuncObject be:

typedef struct {
PyObject_HEAD
int nin, nout, nargs;
} PyUFuncObject;

Of course the idea is that internally we would use a much larger struct
that we could change at will, as long as its first few entries matched
those of PyUFuncObject. My problem with this, and I may very well be
missing something, is that in PyUFunc_Type we need to set the tp_basicsize
to the size of the extended struct, so we would end up having to expose its
contents. This is somewhat similar to what now happens with PyArrayObject:
anyone can #include "ndarraytypes.h", cast PyArrayObject* to
PyArrayObjectFields*, and access the guts of the struct without using the
supplied API inline functions. Not the end of the world, but if you want to
make something private, you might as well make it truly private.

I think it would be to have something similar to what NpyIter does::

typedef struct {
PyObject_HEAD
NpyUFunc *ufunc;
} PyUFuncObject;

where NpyUFunc would, at this level, be an opaque type of which nothing
would be known. We could have some of the NpyUFunc attributes cached on the
PyUFuncObject struct for easier access, as is done in NewNpyArrayIterObject.
This would also give us more liberty in making NpyUFunc be whatever we want
it to be, including a variable-sized memory chunk that we could use and
access at will. NpyIter is again a good example, where rather than storing
pointers to strides and dimensions arrays, these are made part of the
NpyIter memory chunk, effectively being equivalent to having variable sized
arrays as part of the struct. And I think we will probably no longer
trigger the Cython warnings about size changes either.

Any thoughts on this approach? Is there anything fundamentally wrong with
what I'm proposing here?

Also, this is probably going to end up being a rewrite of a pretty large
and complex codebase. I am not sure that working on this on my own and
eventually sending a humongous PR is the best approach. Any thoughts on how
best to handle turning this into a collaborative, incremental effort?
Anyone who would like to join in the fun?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Make np.bincount output same dtype as weights

2016-03-28 Thread Jaime Fernández del Río
Have modified the PR to do the "promote integers to at least long" we do in
np.sum.

Jaime

On Mon, Mar 28, 2016 at 9:55 PM, CJ Carey <perimosocord...@gmail.com> wrote:

> Another +1 for Josef's interpretation from me. Consistency with np.sum
> seems like the best option.
>
> On Sat, Mar 26, 2016 at 11:12 PM, Juan Nunez-Iglesias <jni.s...@gmail.com>
> wrote:
>
>> Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect
>> np.bincount to behave like np.sum with regards to promoting weights dtypes.
>> Including bool.
>>
>> On Sun, Mar 27, 2016 at 1:58 PM, <josef.p...@gmail.com> wrote:
>>
>>> On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
>>> <jfoxrabinov...@gmail.com> wrote:
>>> > Would it make sense to just make the output type large enough to hold
>>> the
>>> > cumulative sum of the weights?
>>> >
>>> >
>>> > - Joseph Fox-Rabinovitz
>>> >
>>> > -- Original message--
>>> >
>>> > From: Jaime Fernández del Río
>>> >
>>> > Date: Sat, Mar 26, 2016 16:16
>>> >
>>> > To: Discussion of Numerical Python;
>>> >
>>> > Subject:[Numpy-discussion] Make np.bincount output same dtype as
>>> weights
>>> >
>>> > Hi all,
>>> >
>>> > I have just submitted a PR (#7464) that fixes an enhancement request
>>> > (#6854), making np.bincount return an array of the same type as the
>>> weights
>>> > parameter.  This is an important deviation from current behavior, which
>>> > always casts weights to double, and always returns a double array, so I
>>> > would like to hear what others think about the worthiness of this.
>>> Main
>>> > discussion points:
>>> >
>>> > np.bincount now works with complex weights (yay!), I guess this should
>>> be a
>>> > pretty uncontroversial enhancement.
>>> > The return is of the same type as weights, which means that small
>>> integers
>>> > are very likely to overflow.  This is exactly what #6854 requested, but
>>> > perhaps we should promote the output for integers to a long, as we do
>>> in
>>> > np.sum?
>>>
>>> I always thought of bincount with weights just as a group-by sum. So
>>> it would be easier to remember and have fewer surprises if it matches
>>> the behavior of np.sum.
>>>
>>> > Boolean arrays stay boolean, and OR, rather than sum, the weights. Is
>>> this
>>> > what one would want? If we decide that integer promotion is the way to
>>> go,
>>> > perhaps booleans should go in the same pack?
>>>
>>> Isn't this calculating the sum, i.e. count of True by group, already?
>>> Based on a quick example with numpy 1.9.2, I don't think I ever used
>>> bool weights before.
>>>
>>>
>>> > This new implementation currently supports all of the reasonable native
>>> > types, but has no fallback for user defined types.  I guess we should
>>> > attempt to cast the array to double as before if no native loop can be
>>> > found? It would be good to have a way of testing this though, any
>>> thoughts
>>> > on how to go about this?
>>> > Does a behavior change like this require some deprecation period? What
>>> would
>>> > that look like?
>>> > I have also added broadcasting of weights to the full size of list, so
>>> that
>>> > one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to
>>> tile
>>> > the single weight to the size of the bins list.
>>> >
>>> > Any other thoughts are very welcome as well!
>>>
>>> (2-D weights ?)
>>>
>>>
>>> Josef
>>>
>>>
>>> >
>>> > Jaime
>>> >
>>> > --
>>> > (__/)
>>> > ( O.o)
>>> > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
>>> planes de
>>> > dominación mundial.
>>> >
>>> > ___
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@scipy.org
>>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Behavior of .reduceat()

2016-03-27 Thread Jaime Fernández del Río
Two of the oldest issues in the tracker (#834
 and #835
) are about how .reduceat()
handles its indices parameter. I have been taking a look at the source
code, and it would be relatively easy to modify, the hardest part being to
figure out what the exact behavior should be.

Current behavior is that np.ufunc.reduceat(x, ind) returns
[np.ufunc.reduce(a[ind[i]:ind[i+1]]
for i in range(len(ind))] with a couple of caveats:

   1. if ind[i] >= ind[i+1], then a[ind[i]] is returned, rather than a
   reduction over an empty slice.
   2. an index of len(ind) is appended to the indices argument, to be used
   as the endpoint of the last slice to reduce over.
   3. aside from this last case, the indices are required to be strictly
   inbounds, 0 <= index < len(x), or an error is raised

The proposed new behavior, with some optional behaviors, would be:

   1. if ind[i] >= ind[i+1], then a reduction over an empty slice, i.e. the
   ufunc identity, is returned. This includes raising an error if the ufunc
   does not have an identity, e.g. np.minimum.
   2. to fully support the "reduction over slices" idea, some form of out
   of bounds indices should be allowed. This could mean either that:
  1. only index = len(x) is allowed without raising an error, to allow
  computing the full reduction anywhere, not just as the last entry of the
  return, or
  2. allow any index in -len(x) <= index <= len(x), with the usual
  meaning given to negative values, or
  3. any index is allowed, with reduction results clipped to existing
  values (and the usual meaning for negative values).
   3. Regarding the appending of that last index of len(ind) to indices, we
   could:
  1. keep appending it, or
  2. never append it, since you can now request it without an error
  being raised, or
  3. only append it if the last index is smaller than len(x).

My thoughts on the options:

   - The minimal, more conservative approach would go with 2.1 and 3.1. And
   of course 1, if we don't implement that none of this makes sense.
   - I kind of think 2.2 or even 2.3 are a nice enhancement that shouldn't
   break too much stuff.
   - 3.2 I'm not sure about, probably hurts more than it helps at this
   point, although in a brand new design you probably would either not append
   the last index or also prepend a zero, as in np.split.
   - And 3.3 seems too magical, probably not a good idea, only listed it
   for completeness.

Any other thoughts or votes on what, if anything, should we implement, and
what the deprecation of current behavior should look like?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Some thoughts on ufunc reduction methods

2016-03-26 Thread Jaime Fernández del Río
The following write up was triggered by issue #7265
, you may want to review that
discussion for context:

In the docs, ufunc reduction operations are explained as "cumulatively
applying the ufunc along an axis." This basically limits just about any
reduction operation to ufuncs with two inpus and one output. Another take
on the same reduction idea is that, if the inputs are a and b and the
output is c, c is the result of updating a with the value in b. With this
idea in mind, in e.g.

add(a, b) -> c = a + b,

c would be thought of as the updated value of a, after applying b to it.
One would expect that, for any registered loop suitable for this
interpretation, the types of a and c would be identical, but b could be of
some other type. As an example consider

count(a, b) -> c = a + 1,

where a and c would typically be intp, but b could have just about any type.

The power of this description of ufuncs suited for reduction is that it is
very easy to generalize beyond the current "two inputs, one output"
restriction. E.g. a "sum of squared differences" ufunc defined as:

ssqd(sd2, x, y) -> sd2 += (x - y)**2,

could be used to compute squared euclidean distances between vectors doing:

ssqd.reduce((x, y)),

without the memory overhead of current available approaches.

In general, a reduction of a ufunc with m inputs and n outputs, m > n,
would produce n results out of m - n inputs. Such a ufunc would have to
define a suitable identity to initialize each of those m - n outputs at the
beginning of any reduction, rather than the single identity ufuncs now hold.

It can be argued that this generalization of reduction is just a
redefinition of a subset of what gufuncs can do, e.g. a "sum of squared
differences" gufunc with signature (n),(n)->() would do the same as the
above reduction, probably faster. And it is not clear that getting
accumulate and reduceat for free, or reduction over multiple axes,
justifies the extra complexity. Especially since it is likely that
something similarly powerful could be built on top of gufuncs.

Where an approach such as this would shine is if we had a reduction method
that did not require a single strided memory segment to act on. Setting
aside the generalization of reduction described above, a groupby or reduceby
method would take an array of values and an array of groups, and compute a
reduction value for each of the unique groups. With the above
generalizations, one could compute, e.g. the variance over each group by
doing something like:

# how exactly does keepdims work here is not clear at all!
counts = count.reduceby(values, groups, keepdims=True)
mean = np.add.reduceby(values, groups, keepdims=True) / counts
var = np.ssqd.reduceby((values, mean), groups) / counts

I am not fully sure of whether this is just useful for this particular
little example of computing variance using a ufunc method that doesn't
exist, or if it is valid in other settings too. Implementing this would add
complexity to an already complex system, so it better be good for something!

One of the weakest points I see is the need to rely on a set of predefined
identities to start the reduction. Think of trying to generalize reduction
to gufuncs, another worthy effort, and take matrix multiplication with
signature (i,j),(j,k)->(i,k). In the current paradigm you would impose that
both inputs and the output parameters be interchangeable, which leads
rather naturally to the condition i = j = k for this reduction to be
doable. But with the new paradigm you only need that the first input and
output be interchangeable, which only provides you with j = k as a
condition. And while this is a more general behavior, the questions of what
value to set i to in the output, and what to init the output to, when
trying to do a reduction over a stack of square arrays, make it fairly
unusable. So a backup to the "two inputs, one output, initialize to some
item in the reduction array" would have to be kept in place

It is also important to note that expanding reduction to gufuncs would
probably require that we impose some iteration order guarantees on
ourselves, as the poster child for this (matrix multiplication) is not in
general a commutative operation. Would we want to do this to ourselves?
Would the features today justify the burden for ever after?

Any thoughts on where, if anywhere, to go with this, are very welcome!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Make np.bincount output same dtype as weights

2016-03-26 Thread Jaime Fernández del Río
Hi all,

I have just submitted a PR (#7464 )
that fixes an enhancement request (#6854
), making np.bincount return an
array of the same type as the weights parameter.  This is an important
deviation from current behavior, which always casts weights to double, and
always returns a double array, so I would like to hear what others think
about the worthiness of this.  Main discussion points:

   - np.bincount now works with complex weights (yay!), I guess this should
   be a pretty uncontroversial enhancement.
   - The return is of the same type as weights, which means that small
   integers are very likely to overflow.  This is exactly what #6854
   requested, but perhaps we should promote the output for integers to a
   long, as we do in np.sum?
   - Boolean arrays stay boolean, and OR, rather than sum, the weights. Is
   this what one would want? If we decide that integer promotion is the way to
   go, perhaps booleans should go in the same pack?
   - This new implementation currently supports all of the reasonable
   native types, but has no fallback for user defined types.  I guess we
   should attempt to cast the array to double as before if no native loop can
   be found? It would be good to have a way of testing this though, any
   thoughts on how to go about this?
   - Does a behavior change like this require some deprecation period? What
   would that look like?
   - I have also added broadcasting of weights to the full size of list, so
   that one can do e.g. np.bincount([1, 2, 3], weights=2j) without having
   to tile the single weight to the size of the bins list.

Any other thoughts are very welcome as well!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to generalized ufunc core dimension checking

2016-03-20 Thread Jaime Fernández del Río
On Thu, Mar 17, 2016 at 10:41 PM, Stephan Hoyer  wrote:

> On Thu, Mar 17, 2016 at 1:04 AM, Travis Oliphant 
> wrote:
>
>> I think that is a good idea.Let the user decide if scalar
>> broadcasting is acceptable for their function.
>>
>> Here is a simple concrete example where scalar broadcasting makes sense:
>>
>>
>> A 1-d dot product (the core of np.inner)   (k), (k) -> ()
>>
>> A user would assume they could call this function with a scalar in either
>> argument and have it broadcast to a 1-d array.Of course, if both
>> arguments are scalars, then it doesn't make sense.
>>
>> Having a way for the user to allow scalar broadcasting seems sensible and
>> a nice compromise.
>>
>> -Travis
>>
>
> To generalize a little bit, consider the entire family of weighted
> statistical function (mean, std, median, etc.). For example, the gufunc
> version of np.average is basically equivalent to np.inner with a bit of
> preprocessing.
>
> Arguably, it *could* make sense to broadcast weights when given a scalar:
> np.average(values, weights=1.0 / len(values)) is pretty unambiguous.
>
> That said, adding an explicit "scalar broadcasting OK flag" seems like a
> hack that will need even more special logic (e.g., so we can error if both
> arguments to np.inner are scalars).
>
> Multiple dispatch for gufunc core signatures seems like the cleaner
> solution. If you want np.inner to handle scalars, you need to supply core
> signatures (k),()->() and (),(k)->() along with (k),(k)->(). This is the
> similar to vision of three core signatures for np.matmul: (i),(i,j)->(j),
> (i,j),(j)->(i) and (i,j),(j,k)->(i,k).
>

Would the logic for such a thing be consistent? E.g. how do you decide if
the user is requesting (k),(k)->(), or (k),()->() with broadcasting over a
non-core dimension of size k in the second argument? What if your
signatures are (m, k),(k)->(m) and (k),(n,k)->(n) and your two inputs are
(m,k) and (n,k), how do you decide which one to call? Or alternatively, how
do you detect and forbid such ambiguous designs? Figuring out the dispatch
rules for the general case seems like a non-trivial problem to me.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PyData Madrid

2016-02-20 Thread Jaime Fernández del Río
On Sat, Feb 20, 2016 at 8:13 PM, David Cournapeau <courn...@gmail.com>
wrote:

>
>
> On Sat, Feb 20, 2016 at 5:26 PM, Kiko <kikocorre...@gmail.com> wrote:
>
>>
>>
>> 2016-02-20 17:58 GMT+01:00 Ralf Gommers <ralf.gomm...@gmail.com>:
>>
>>>
>>>
>>> On Wed, Feb 17, 2016 at 9:46 PM, Sebastian Berg <
>>> sebast...@sipsolutions.net> wrote:
>>>
>>>> On Mi, 2016-02-17 at 20:59 +0100, Jaime Fernández del Río wrote:
>>>> > Hi all,
>>>> >
>>>> > I just found out there is a PyData Madrid happening in early April,
>>>> > and it would feel wrong not to go, it being my hometown and all.
>>>> >
>>>> > Aside from the usual "Who else is going? We should meet!" I was also
>>>> > thinking of submitting a proposal for a talk.  My idea was to put
>>>> > something together on "The future of NumPy indexing" and use it as an
>>>> > opportunity to raise awareness and hopefully gather feedback from
>>>> > users on the proposed changes, in sort of a "if the mountain won't
>>>> > come to Muhammad" type of thing.
>>>> >
>>>>
>>>> I guess you do know my last name means mountain in german? But if
>>>> Muhammed might come, I should really improve my arabic ;).
>>>>
>>>> In any case sounds good to me if you like to do it, I don't think I
>>>> will go, though it sounds nice.
>>>>
>>>
>>> Sounds like a good idea to me too. I like both the concrete topic, as
>>> well as just having a talk on Numpy at a PyData conference. In general
>>> there are too few (if any) talks on Numpy and other core libraries at
>>> PyData and Scipy confs I think.
>>>
>>
>> +1.
>>
>> It would be great a numpy talk from a core developer. BTW, C4P closes
>> tomorrow!!!
>>
>
With a full day to spare, I have submitted a talk proposal:

Brief Description
Advanced (a.k.a. "fancy") indexing is one of NumPy's greatest features.
It is also well known for its ability to trip and confuse beginners and
experts alike. This talk will review how it works and why it is great,
give some insight on why it is how it is, explore some of its darkest
corners, and go over some recent proposals to rationalize it.

Detailed Abstract
Advanced (a.k.a. _fancy_) indexing is one of NumPy's greatest features.
Once past the rather steep learning curve, it enables a very expressive
and powerful syntax, and makes coding a wide range of complex operations
a breeze. But this versatility comes with a dark side of surprising
results for some seemingly simple cases, and conflicts with the design
choices of more recent data analysis packages. This has led to a
viewpoint with growing support among the community that fancy indexing
may be too fancy for its own good. This talk will review the workings of
advanced indexing, highlighting where it excels, and where it falls
short, and give some context on the logic behind some design decisions.
It will also cover the existing
[NumPy Enhancement Proposal (NEP)](https://github.com/numpy/numpy/pull/6256)
to "implement an intuitive and fully featured advanced indexing."


>
>> Jaime, if you come to Madrid you know you have some beers waiting for you.
>>
>
Talk or not, I'm really looking forward to those beers and getting to meet
Juan Luis and you!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] PyData Madrid

2016-02-17 Thread Jaime Fernández del Río
Hi all,

I just found out there is a PyData Madrid happening in early April, and it
would feel wrong not to go, it being my hometown and all.

Aside from the usual "Who else is going? We should meet!" I was also
thinking of submitting a proposal for a talk.  My idea was to put something
together on "The future of NumPy indexing" and use it as an opportunity to
raise awareness and hopefully gather feedback from users on the proposed
changes, in sort of a "if the mountain won't come to Muhammad" type of
thing.

Thoughts? Comments? Anyone else going or thinking about going?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-20 Thread Jaime Fernández del Río
There doesn't seem to be much of a consensus on the way to go, so leaving
things as they are and have been seems the wisest choice for now, thanks
for all the feedback. I will work with Greg on documenting the status quo
properly.

We probably want to follow the lead of the stdlib's random.uniform on how
the openness of the interval actually results in practice:

https://docs.python.org/3.6/library/random.html#random.uniform


On Thu, Jan 21, 2016 at 2:55 AM, Peter Creasey <
p.e.creasey...@googlemail.com> wrote:

> >> I would also point out that requiring open vs closed intervals (in
> >> doubles) is already an extremely specialised use case. In terms of
> >> *sampling the reals*, there is no difference between the intervals
> >> (a,b) and [a,b], because the endpoints have measure 0, and even with
> >> double-precision arithmetic, you are going to have to make several
> >> petabytes of random data before you hit an endpoint...
> >>
> > Petabytes ain't what they used to be ;) I remember testing some hardware
> > which, due to grounding/timing issues would occasionally goof up a
> readable
> > register. The hardware designers never saw it because they didn't test
> for
> > hours and days at high data rates. But it was there, and it would show up
> > in the data. Measure zero is about as real as real numbers...
> >
> > Chuck
>
> Actually, your point is well taken and I am quite mistaken. If you
> pick some values like uniform(low, low * (1+2**-52)) then you can hit
> your endpoints pretty easily. I  am out of practice making
> pathological tests for double precision arithmetic.
>
> I guess my suggestion would be to add the deprecation warning and
> change the docstring to warn that the interval is not guaranteed to be
> right-open.
>
> Peter
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PR #7083: ENH: Added 'doane' and 'sqrt' estimators to np.histogram

2016-01-20 Thread Jaime Fernández del Río
The tests are not passing, seems like you are taking the sqrt of a negative
number, may want to check the inputs and raise a more informative error
(and add a test for it).

Jaime

On Thu, Jan 21, 2016 at 7:51 AM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> Please let me know if there is anything wrong or missing. I have added
> a couple of estimators that I find useful sometimes.
>
> -Joe
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Behavior of np.random.uniform

2016-01-19 Thread Jaime Fernández del Río
Hi all,

There is a PR (#7026 ) that
documents the current behavior of np.random.uniform when the low and high
parameters it takes do not conform to the expected low < high. Basically:

   - if low < high, random numbers are drawn from [low, high),
   - if low = high, all random numbers will be equal to low, and
   - if low > high, random numbers are drawn from (high, low] (notice the
   change in the open side of the interval.)

My only worry is that, once we document this, we can no longer claim that
it is a bug.  So I would like to hear from others what do they think.  The
other more or less obvious options would be to:

   - Raise an error, but this would require a deprecation cycle, as people
   may be relying on the current undocumented behavior.
   - Check the inputs and draw numbers from [min(low, high), max(low, high)),
   which is minimally different from current behavior.

I will be merging the current documentation changes in the next few days,
so it would be good if any concerns were voiced before that.

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Building master issues on Windows

2016-01-04 Thread Jaime Fernández del Río
On Tue, Jan 5, 2016 at 3:15 AM, G Young  wrote:

> Hello all,
>
> I've recently encountered issues building numpy off master on Windows in
> which setup.py complains that it can't find the Advapi library in any
> directories (which is an empty list).  I scanned the DLL under System32 and
> ran /sfc scannow as Administrator, and both came up clean.  Is anyone else
> encountering (or can reproduce) this issue, or is it just me?  Any
> suggestions about how to resolve this problem?
>

Can you give more details on your setup?

Jaime

(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PR for complex np.interp, question about assert_almost_equal

2015-12-22 Thread Jaime Fernández del Río
On Tue, Dec 22, 2015 at 7:42 AM, Ralf Gommers 
wrote:

>
>
> On Tue, Dec 22, 2015 at 12:55 AM, Peter Creasey <
> p.e.creasey...@googlemail.com> wrote:
>
>> Hi all,
>> I submitted a PR (#6872) for using complex numbers in np.lib.interp.
>>
>> The tests pass on my machine, but I see that the TravisCI builds are
>> giving assertion fails (on my own test) with python 3.3 and 3.5 of the
>> form:
>> > assert_almost_equal
>> > TypeError: Cannot cast array data from dtype('complex128') to
>> dtype('float64') according to the rule 'safe'
>>
>> When I was writing the test I used np.testing.assert_almost_equal with
>> complex128 as it works in my python 2.7, however having checked the
>> docstring I cannot tell what the expected behaviour should be (complex
>> or no complex allowed). Should my test be changed or the
>> assert_almost_equal?
>>
>
> Hi Peter, that error is unrelated to assert_almost_equal. What happens is
> that when you pass in a complex argument `fp` to your modified
> `compiled_interp`, you're somewhere doing a cast that's not safe and
> trigger the error at
> https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L1930.
> For what "safe casting" means, see
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.can_cast.html
>

The problem then is probably here

.

You may want to throw in a PyErr_Clear()
 when the
conversion of the fp array to NPY_DOUBLE fails before trying with
NPY_CDOUBLE, and check if it goes away.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A minor clarification no why count_nonzero is faster for boolean arrays

2015-12-17 Thread Jaime Fernández del Río
On Thu, Dec 17, 2015 at 7:37 PM, CJ Carey  wrote:

> I believe this line is the reason:
>
> https://github.com/numpy/numpy/blob/c0e48cfbbdef9cca954b0c4edd0052e1ec8a30aa/numpy/core/src/multiarray/item_selection.c#L2110
>

The magic actually happens in  count_nonzero_bytes_384, a few lines before
that (line 1986).

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Build broken

2015-12-14 Thread Jaime Fernández del Río
Hi,

Travis is repeatedly being unable to complete one of our test builds. It
all started after I merged a very simple PR that changed a single word in a
docstring, so I have a hard time believing that is the actual cause. Can
anyone who actually knows what Travis is doing take a look at the log:

https://travis-ci.org/numpy/numpy/builds/96836128

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where is Jaime?

2015-12-06 Thread Jaime Fernández del Río
On Sun, Dec 6, 2015 at 4:15 AM, Charles R Harris <charlesr.har...@gmail.com>
wrote:

>
>
> On Sat, Dec 5, 2015 at 4:49 PM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>> I'm alive and well: trying to stay afloat on a sea of messaging
>> protocols, Java and Swiss bureaucracy, but doing great aside from that.
>>
>> Jaime
>>
>>
> Glad to hear it. I was beginning to worry...
>
> Java? Poor soul. Anything special about the Swiss bureaucracy? Reminds me
> of the old joke
>

Well, if you don't have a suitcase full of $100 bills, opening a bank
account in Switzerland is surprisingly difficult, especially if you are
moving here from the U.S. If immigration then decides to request from you
documents they already have, thus delaying your residence permit by a few
more weeks, the bank ends up returning your first salary, just about the
same time you have to pay a 3 month rental deposit for a small and
ridiculously expensive apartment. Everything is slowly falling into place,
but it has been an interesting ride.


> *Heaven and Hell*
>
> Heaven Is Where:
>
> The French are the chefs
> The Italians are the lovers
> The British are the police
> The Germans are the mechanics
> And the Swiss make everything run on time
>
>
> Hell is Where:
>
> The British are the chefs
> The Swiss are the lovers
> The French are the mechanics
> The Italians make everything run on time
> And the Germans are the police
>

The trains and trams do seem to run remarkably on time, but I don't think
Eva would be too happy about me setting out to test how good lovers the
Swiss are...

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array of random numbers fails to construct

2015-12-06 Thread Jaime Fernández del Río
On Sun, Dec 6, 2015 at 10:07 PM, Matthew Brett 
wrote:

> Hi,
>
> On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student)
>  wrote:
> > This works. A big array of eight bit random numbers is constructed:
> >
> > import numpy as np
> >
> > spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8)
> >
> >
> >
> > This fails. It eats up all 64GBy of RAM:
> >
> > spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8)
> >
> >
> > The difference is a factor of two, 2**21 rather than 2**20, for the
> extent
> > of the first axis.
>
> I think what's happening is that this:
>
> np.random.randint(0,255, (2**21,2**12))
>
> creates 2**33 random integers, which (on 64-bit) will be of dtype
> int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes
> = 512 GiB.
>

8 is only 2**3, so it is "just" 64 GiB, which also explains why the half
sized array does work, but yes, that is most likely what's happening.

Jaime

>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where is Jaime?

2015-12-05 Thread Jaime Fernández del Río
I'm alive and well: trying to stay afloat on a sea of messaging protocols,
Java and Swiss bureaucracy, but doing great aside from that.

Jaime

On Sat, Dec 5, 2015 at 9:43 PM, Charles R Harris 
wrote:

> Anyone hear from Jaime lately? He seems to have gone missing.
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Commit rights for Jonathan J. Helmus

2015-10-31 Thread Jaime Fernández del Río
"Gruetzi!", as I just found out we say in Switzerland...
On Oct 30, 2015 8:20 AM, "Jonathan Helmus"  wrote:

> On 10/28/2015 09:43 PM, Allan Haldane wrote:
> > On 10/28/2015 05:27 PM, Nathaniel Smith wrote:
> >> Hi all,
> >>
> >> Jonathan J. Helmus (@jjhelmus) has been given commit rights -- let's all
> >> welcome him aboard.
> >>
> >> -n
> >
> > Welcome Jonathan, happy to have you on the team!
> >
> > Allan
> >
>
> Thanks you everyone for the kind welcome.  I'm looking forwarding to
> being part of them team.
>
> - Jonathan Helmus
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy Generalized Ufuncs: Pointer Arithmetic and Segmentation Faults (Debugging?)

2015-10-25 Thread Jaime Fernández del Río
HI Eleanore,

Thanks for the kind words, you are very welcome!

As for your issues, I think they are coming from the handling of the
strides you are doing in the slow_dtw_dist function.  The strides are the
number of bytes you have to advance your pointer to get to the next item.
In your code, you end up doing something akin to:

dtype *v_i = v0;
...
for (...) {
...
v_i += stride_v;
}

This, rather than increase the v_i pointer by stride_v bytes, increases it
by stride_v * sizeof(dtype), and with the npy_double you seem to be using
as dtype, sends you out of your allocated memory at a rate 8x too fast.

What you increase by stride_v has to be of char* type, so one simple
solution would be to do something like:

char *v_ptr = (char *)v0;
...
for (...) {
dtype v_val = *(dtype *)v_ptr;
...
v_ptr += stride_v;
}

And use v_val directly wherever you were dereferencing v_i before.

Jaime


On Sun, Oct 25, 2015 at 5:06 AM,  wrote:

> Dear Numpy maintainers and developers,
>
> Thanks for providing such a great numerical library!
>
> I’m currently trying to implement the Dynamic Time Warping metric as a set
> of generalised numpy ufuncs, but unfortunately, I have lasting issues with
> pointer arithmetic and segmentation faults. Is there any way that I can
> use GDB or some such to debug a python/numpy extension? Furthermore: is it
> necessary to use pointer arithmetic to access the function arguments (as
> seen on http://docs.scipy.org/doc/numpy/user/c-info.ufunc-tutorial.html)
> or is element access (operator[]) also permissible?
>
> To break it down quickly, I need to have a fast DTW distance function
> dist_dtw() with two vector inputs (broadcasting should be possible), two
> scalar parameters and one scalar output (signature: (i), (j), (), () -> ())
> usable in python for a 1-Nearest Neighbor classification algorithm. The
> extension also implements two functions compute_envelope() and
> piecewise_mean_reduction() which are used for lower-bounding based on Keogh
> and Ratanamahatana, 2005. The source code is available at
> http://pastebin.com/MunNaP7V and the prominent segmentation fault happens
> somewhere in the chain dist_dtw() —> meta_dtw_dist() —> slow_dtw_dist(),
> but I fail to pin it down.
>
> Aside from my primary questions, I wonder how to approach
> errors/exceptions and unit testing when developing numpy ufuncs. Are there
> any examples apart from the numpy manual that I could use as reference
> implementations of generalised numpy ufuncs?
>
> I would greatly appreciate some insight into properly developing
> generalised ufuncs.
>
> Best,
> Eleanore
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Workshop tonight, expect GitHub activity

2015-10-19 Thread Jaime Fernández del Río
Hi all,

As mentioned a few weeks ago, I am organizing a "Become an Open Source
Contributor" workshop tonight, for the Data Science Student Society at UCSD.

During this morning I will be creating a few ridiculously simple issues,
e.g. "missing space, arrayobject --> array object", for participants to
work on as part of the workshop. So there may also be a surge in simple PRs
starting at around 7 PM PST.

Please, bear with us. And refrain from fixing those issues if you are not a
workshop participant!

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Workshop tonight, expect GitHub activity

2015-10-19 Thread Jaime Fernández del Río
On Mon, Oct 19, 2015 at 9:11 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> Hi all,
>
> As mentioned a few weeks ago, I am organizing a "Become an Open Source
> Contributor" workshop tonight, for the Data Science Student Society at UCSD.
>
> During this morning I will be creating a few ridiculously simple issues,
> e.g. "missing space, arrayobject --> array object", for participants to
> work on as part of the workshop. So there may also be a surge in simple PRs
> starting at around 7 PM PST.
>

Ok, so issues 6515 up to and including 6525 are mine, should have them all
fixed and closed by the end of today.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Numpy for data manipulation

2015-10-01 Thread Jaime Fernández del Río
On Thu, Oct 1, 2015 at 11:46 AM, Alex Rogozhnikov <
alex.rogozhni...@yandex.ru> wrote:

> Hi, I have written some numpy tips and tricks I am using, which may be
> interesting to you.
> This is quite long reading, so I've splitted it into two parts:
>
> http://arogozhnikov.github.io/2015/09/29/NumpyTipsAndTricks1.html


The recommendation of inverting a permutation by argsort'ing it, while it
works, is suboptimal, as it takes O(n log(n)) time, and you can do it in
linear time:

In [14]: import numpy as np

In [15]: arr = np.random.rand(10)

In [16]: perm = arr.argsort()

In [17]: perm
Out[17]: array([5, 0, 9, 4, 2, 8, 6, 7, 1, 3])

In [18]: inv_perm = np.empty_like(perm)

In [19]: inv_perm[perm] = np.arange(len(perm))

In [20]: np.all(inv_perm == perm.argsort())
Out[20]: True

It does require two lines of code, so for small stuff it is probably good
enough to argsort, but it gave e.g. np.unique a nice boost on larger arrays
when we applied it there.

Jaime


>
> http://arogozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html
>
> Comments are welcome, specially if you know any other ways to make this
> code faster (or better).
>
> Regards,
> Alex.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] "Become an Open Source Contributor" workshop

2015-09-23 Thread Jaime Fernández del Río
Apologies for the cross-posting.

The Data Science Student Society of the University of California San Diego,
or DS3 @ UCSD as they like to call themselves, will be holding biweekly
Python themed workshops starting this fall.  On the week of October 19th,
they will be having yours truly doing a "Become an Open Source Contributor"
piece.  It will be a shortish event, 60-90 minutes, so my idea was to cover
the following:

   1. (15 min) An introduction to the Python data science landscape.
   2. (30 min) An overview of the GitHub workflow that most (all?) of the
   projects follow.
   3. (30-45 min) A hands on session, where we would make sure everyone
   gets set up in GitHub, and forks and clones their favorite project.  Time
   and participant willingness permitting, I would like to take advantage of
   my commit bits, and have some of the participants submit a simple PR, e.g.
   fixing a documentation typo, to NumPy or SciPy, and hit the green button
   right there, so that they get to leave as knighted FOSS contributors.

And this is what I am hoping to get from you, the community:

   1. If anyone in the area would like to get involved, please contact me.
   I have recruited a couple of volunteers from PySanDiego, but could use more
   help.
   2. I'm also hoping to get some help, especially with the introductory
   part.  Given that the crowd will mostly be university students and some
   faculty, it would be great if someone who actually knew what they were
   talking about could deliver a short, 10 minute talk, on Python, data
   science, and academia.  I'm sure we could arrange it to have someone join
   by video conference.
   3. If you have organized anything similar in the past, and have material
   that I could use to, ahem, draw inspiration from, or recommendations to
   make, or whatever, I'd love to hear from you.

Thanks for reading!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] composition of the steering council (was Re: Governance model request)

2015-09-23 Thread Jaime Fernández del Río
For what it is worth, I'm +1 on including any of the "founding fathers"
(and uncles!) that wish to be included in the committee, or council, or
however we ended up calling it.  I'm also OK with singling out Travis as
creator of NumPy in the wording of the document.

Especially since the prerogative of members is not to **VOTE**, but to
**VETO**, I don't see adding more people having any effect on the
individual power of members, so my +1 stands regardless of what the total
member count is.

Jaime


On Wed, Sep 23, 2015 at 4:08 PM, Charles R Harris  wrote:

>
>
> On Wed, Sep 23, 2015 at 3:21 PM, Travis Oliphant 
> wrote:
>
>>
>>
>>> Regarding the seed council, I just tried to pick an objective
>>> criterion and an arbitrary date that seemed generally in keeping with
>>> idea of "should be active in the last 1-to-2-years-ish". Fiddling with
>>> the exact date in particular makes very little difference -- between
>>> pushing it back to 2 years ago today or forward to 1 year ago today,
>>> the only thing that changes is whether Pauli makes the list or not.
>>> (And Pauli is obviously a great council candidate, though I don't know
>>> whether he even wants to be on it.)
>>>
>>> > Personally, I have no idea how big the council should be. Too big, and
>>> > there is no point, consensus is harder to reach the larger the group,
>>> > and the main (only?) role of the council is to resolve issues where
>>> > consensus has not been reached in the larger community. But what is
>>> > too big?
>>>
>>>
>>> > As for make-up of the council, I think we need to expand beyond people
>>> > who have recently contributed core code.
>>> >
>>> > Yes, the council does need to have expertise to make technical
>>> > decisions, but if you think about the likely contentious issues like
>>> > ABI breakage, a core-code focused view is incomplete. So there should
>>> > be representation by:
>>> >
>>> > Someone(s) with a long history of working with the code -- that
>>> > institutional memory of why decisions were made the way they were
>>> > could be key.
>>>
>>> Sure -- though I can't really imagine any way of framing a rule like
>>> this that *wouldn't* be satisfied by Chuck + Ralf + Pauli, so my guess
>>> is that such a rule would not actually have any effect on the council
>>> membership in practice.
>>>
>>
>> As the original author of NumPy, I would like to be on the seed council
>> as long as it is larger than 7 people.That is my proposal.I don't
>> need to be a permanent member, but I do believe I have enough history that
>> I can understand issues even if I haven't been working on code directly.
>>
>>
>> I think I do bring history and information that provides all of the
>> history that could be helpful on occasion. In addition, if a matter is
>> important enough to even be brought to the attention of this council, I
>> would like to be involved in the discussion about it.
>>
>> It's a simple change to the text --- basically an explanation that Travis
>> requested to be on the seed council.
>>
>
> I too would like you to be a member. We could either write it into the
> text in recognition of your status as the Numpy creator, or it could be the
> first order of business. I would only ask that you give yourself some time
> to become familiar with how things work and the people involved in the
> current community. It has been some years since you have been active in
> code development.
>
> Chuck
>
>>
>>
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Commit rights for Allan Haldane

2015-09-22 Thread Jaime Fernández del Río
Congrats Allan!

Jaime

On Tue, Sep 22, 2015 at 11:54 AM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

> Hi All,
>
> Allan Haldane has been given commit rights. Here's to the new member of
> the team.
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] facebook, twitter, and g+

2015-09-22 Thread Jaime Fernández del Río
+1 for twitter
+0 for the others

On Tue, Sep 22, 2015 at 8:14 PM, Bryan Van de Ven 
wrote:

>
> > On Sep 22, 2015, at 9:58 PM, Charles R Harris 
> wrote:
> >
> > Hi All,
> >
> > Just posting to elicit thoughts about scipy.org having a presence in
> social media for announcements.
>
> Of the ones listed in the subject, I would suggest Twitter is the most
> valuable. It has been great for release and other announcements. I find
> Facebook and G+ a little odd for software projects personally, though I
> guess they could be useful if you wanted to have somewhat longer "posts".
> But of course that's also yet more for someone to have maintain.
>
> One additional consideration is that some people will inevitably attempt
> to elicit technical support over these channels. I generally have to steer
> people to the mailing list immediately.
>
> Bryan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-21 Thread Jaime Fernández del Río
We have the PyArrayObject vs PyArrayObject_fields definition in
ndarraytypes.h that is used to enforce access to the members through inline
functions rather than directly,  which seems to me like the right way to
go: don't leave stones unturned, hide everything and provide PyUFunc_NIN,
PyUFunc_NOUT and friends to handle those too.

On Sun, Sep 20, 2015 at 9:13 PM, Nathaniel Smith  wrote:

> Hi all,
>
> Here's a first draft NEP for comments.
>
> --
>
> Synopsis
> 
>
> Improving numpy's dtype system requires that ufunc loops start having
> access to details of the specific dtype instance they are acting on:
> e.g. an implementation of np.equal for strings needs access to the
> dtype object in order to know what "n" to pass to strncmp. Similar
> issues arise with variable length strings, missing values, categorical
> data, unit support, datetime with timezone support, etc. -- this is a
> major blocker for improving numpy.
>
> Unfortunately, the current ufunc inner loop function signature makes
> it very difficult to provide this information. We might be able to
> wedge it in there, but it'd be ugly.
>
> The other option would be to change the signature. What would happen
> if we did this? For most common uses of the C API/ABI, we could do
> this easily while maintaining backwards compatibility. But there are
> also some rarely-used parts  of the API/ABI that would be
> prohibitively difficult to preserve.
>
> In addition, there are other potential changes to ufuncs on the
> horizon (e.g. extensions of gufuncs to allow them to be used more
> generally), and the current API exposure is so massive that any such
> changes will be difficult to make in a fully compatible way. This NEP
> thus considers the possibility of closing down the ufunc API to a
> minimal, maintainable subset of the current API.
>
> To better understand the consequences of this potential change, I
> performed an exhaustive analysis of all the code on Github, Bitbucket,
> and Fedora, among others. The results make me highly confident that of
> all the publically available projects in the world, the only ones
> which touch the problematic parts of the ufunc API are: Numba,
> dynd-python, and `gulinalg `_
> (with the latter's exposure being trivial).
>
> Given this, I propose that for 1.11 we:
> 1) go ahead and hide/disable the problematic parts of the ABI/API,
> 2) coordinate with the known affected projects to minimize disruption
> to their users (which is made easier since they are all projects that
> are almost exclusively distributed via conda, which enforces strict
> NumPy ABI versioning),
> 3) publicize these changes widely so as to give any private code that
> might be affected a chance to speak up or adapt, and
> 4) leave the "ABI version tag" as it is, so as not to force rebuilds
> of the vast majority of projects that will be unaffected by these
> changes.
>
> This NEP defers the question of exactly what the improved API should
> be, since there's no point in trying to nail down the details until
> we've decided whether it's even possible to change.
>
>
> Details
> ===
>
> The problem
> ---
>
> Currently, a ufunc inner loop implementation is called via the
> following function prototype::
>
> typedef void (*PyUFuncGenericFunction)
> (char **args,
>  npy_intp *dimensions,
>  npy_intp *strides,
>  void *innerloopdata);
>
> Here ``args`` is an array of pointers to 1-d buffers of input/output
> data, ``dimensions`` is a pointer to the number of entries in these
> buffers, ``strides`` is an array of integers giving the strides for
> each input/output array, and ``innerloopdata`` is an arbitrary void*
> supplied by whoever registered the ufunc loop. (For gufuncs, extra
> shape and stride information about the core dimensions also gets
> packed into the ends of these arrays in a somewhat complicated way.)
>
> There are 4 key items that define a NumPy array: data, shape, strides,
> dtype. Notice that this function only gets access to 3 of them. Our
> goal is to fix that. For example, a better signature would be::
>
> typedef void (*PyUFuncGenericFunction_NEW)
> (char **data,
>  npy_intp *shapes,
>  npy_intp *strides,
>  PyArray_Descr *dtypes,   /* NEW */
>  void *innerloopdata);
>
> (In practice I suspect we might want to make some more changes as
> well, like upgrading gufunc core shape/strides to proper arguments
> instead of tacking it onto the existing arrays, and adding an "escape
> valve" void* reserved for future extensions. But working out such
> details is outside the scope of this NEP; the above will do for
> illustration.)
>
> The goal of this NEP is to clear the ground so that we can start
> supporting ufunc inner loops that take dtype arguments, and make other
> enhancements to ufunc functionality going 

Re: [Numpy-discussion] np.sign and object comparisons

2015-09-01 Thread Jaime Fernández del Río
On Mon, Aug 31, 2015 at 11:49 PM, Nathaniel Smith <n...@pobox.com> wrote:

> On Sun, Aug 30, 2015 at 9:09 PM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>>
>> There are three ways of fixing this that I see:
>>
>>1. Arbitrarily choose a value to set the return to. This is
>>equivalent to choosing a default return for `cmp` for comparisons. This
>>preserves behavior, but feels wrong.
>>2. Similarly to how np.sign of a floating point array with nans
>>returns nan for those values, return e,g, None for these cases. This
>>is my preferred option.
>>3. Raise an error, along the lines of the TypeError: unorderable types
>>that 3.x produces for some comparisons.
>>
>> Having read the other replies so far -- given that no-one seems to have
> any clear intuition or use cases, I guess I find option 3 somewhat
> tempting... it keeps our options open until someone who actually cares
> comes along with a use case to hone our intuition on, and is very safe in
> the mean time.
>
> (This was noticed in the course of routine code cleanups, right, not an
> external bug report? For all we know right now, no actual user has ever
> even tried to apply np.sign to an object array?)
>

We do have a user that tried np.sign on an object array, and discovered
that our Py3K object comparison was crap:
https://github.com/numpy/numpy/issues/6229

No report of anyone trying np.sign on anything other than numbers that we
know of, though.

I'm starting to think that, given the lack of agreement, I thinking I am
going to agree with you that raising an error may be the better option,
because it's the least likely to break people's code if we later find we
need to change it.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.sign and object comparisons

2015-08-30 Thread Jaime Fernández del Río
There's been some work going on recently on Py2 vs Py3 object comparisons.
If you want all the background, see gh-6265
https://github.com/numpy/numpy/issues/6265 and follow the links there.

There is a half baked PR in the works, gh-6269
https://github.com/numpy/numpy/pull/6269, that tries to unify behavior
and fix some bugs along the way, by replacing all 2.x uses of
PyObject_Compare with several calls to PyObject_RichCompareBool, which is
available on 2.6, the oldest Python version we support.

The poster child for this example is computing np.sign on an object array
that has an np.nan entry. 2.x will just make up an answer for us:

 cmp(np.nan, 0)
-1

even though none of the relevant compares succeeds:

 np.nan  0
False
 np.nan  0
False
 np.nan == 0
False

The current 3.x is buggy, so the fact that it produces the same made up
result as in 2.x is accidental:

 np.sign(np.array([np.nan], 'O'))
array([-1], dtype=object)

Looking at the code, it seems that the original intention was for the
answer to be `0`, which is equally made up but perhaps makes a little more
sense.

There are three ways of fixing this that I see:

   1. Arbitrarily choose a value to set the return to. This is equivalent
   to choosing a default return for `cmp` for comparisons. This preserves
   behavior, but feels wrong.
   2. Similarly to how np.sign of a floating point array with nans returns
   nan for those values, return e,g, None for these cases. This is my
   preferred option.
   3. Raise an error, along the lines of the TypeError: unorderable types
   that 3.x produces for some comparisons.

Thoughts anyone?

Jaime
-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpty FFT.FFT slow with certain samples

2015-08-28 Thread Jaime Fernández del Río
On Fri, Aug 28, 2015 at 11:02 AM, Joseph Codadeen jdm...@hotmail.com
wrote:

 Hi,

 I am a numpy newbie.

 I have two wav files, one that numpy takes a long time to process the FFT.
 They was created within audacity using white noise and silence for gaps.


1. my_1_minute_noise_with_gaps.wav
2. my_1_minute_noise_with_gaps_truncated.wav


 The files are very similar in the following way;


- 1. is white noise with silence gaps on every 15 second interval.
- 2. is 1. but slightly shorter, i.e. I trimmed some ms off the end
but it still has the last gap at 60s.


 The code I am using processes the file like this;

 framerate, data = scipy.io.wavfile.read(filepath)
 right = data[:, 0]
 # Align it to be efficient.
 if len(right) % 2 != 0:
 right = right[range(len(right) - 1)]
 noframes = len(right)
 fftout = np.fft.fft(right) / noframes#  I am timing this cmd

 Using timeit...


- my_1_minute_noise_with_gaps_truncated took *30.75620985s* to process.
- my_1_minute_noise_with_gaps took *22307.13917s* to process.


 Could someone tell me why this behaviour is happening please?

 Sorry I can't attach the files as this email gets bounced but you could
 easily create the files yourself.
 E.g my last gap width is 59.9995 - 1:00.0005, I repeat this every 15
 seconds.
 My truncated file is 1:00.0015s long, non-truncated is 1:00.0833s long


It is almost certainly caused by the number of samples in your signals,
i.e. look at what noframes is in one case and the other.

You will get best performance when noframes is a power of two, or has a
factorization that includes many small integers (2, 3, 5, perhaps also 7),
and the worst if the size is a prime number.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-08-28 Thread Jaime Fernández del Río
On Fri, Aug 28, 2015 at 2:40 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Fr, 2015-08-28 at 09:46 +0100, Matthew Brett wrote:
  Hi,
 
  On Fri, Aug 28, 2015 at 5:59 AM, Jaime Fernández del Río
  jaime.f...@gmail.com wrote:
   On Thu, Aug 27, 2015 at 11:06 AM, Matthew Brett 
 matthew.br...@gmail.com
   wrote:
  
   Hi,
  
   On Thu, Aug 27, 2015 at 6:23 PM,  josef.p...@gmail.com wrote:
   
   
On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett
matthew.br...@gmail.com
wrote:
   
Hi
   
On Thu, Aug 27, 2015 at 5:11 PM,  josef.p...@gmail.com wrote:


 On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett
 matthew.br...@gmail.com
 wrote:

 Hi,

 On Thu, Aug 27, 2015 at 3:34 PM,  josef.p...@gmail.com wrote:
 [snip]
  I don't really see a problem with codifying the status quo.

 That's an excellent point.If we believe that the current
 situation
 is the best possible, both now and in the future, then
 codifying the
 status quo is an excellent idea.

 So, we should probably first start by asking ourselves:

 * what numpy is doing well;
 * what numpy could do better;

 and then ask, is there some way we could make it more likely we
 will
 improve over time.

 [snip]

  As the current debate shows it's possible to have a public
  discussion
  about
  the direction of the project without having to delegate
 providing
  a
  vision
  to a president.

 The idea of a president that I had in mind, was not someone who
 makes
 all decisions, but the person who holds themselves responsible
 for
 the
 performance of the project.  If the project has a coherent
 vision
 already, the president has no need to provide one, but it's the
 president's job to worry about whether we have vision or not,
 and do
 what they need to, to make sure we don't lose track of that.
  If
 you
 don't know it already, I highly recommend Jim Collins' work on
 'level
 5 leadership' [1]


 Still doesn't sound like the need for a president to me

  the person who holds themselves responsible for the
 performance of the project

 sounds more like the role of the core group (adding plural to
 persons)
 to
 me, and cannot be pushed of to an official president.
   
Except that, in the past, having multiple people taking decisions
 has
led to the situation where no-one feels themselves accountable for
 the
result, hence this situation tends to lead to stagnation.
   
   
Is there any evidence for this?
  
   Oh - dear - that's the key point, but I'm obviously not making it
   clearly enough.  Yes there is, and that was the evidence I was
   pointing to before.
  
   But anyway - Sebastian is right - this discussion isn't going anywhere
   useful.
  
   So - let's step back.
  
   In thinking about governance, we first need to ask what we want to
   achieve.  This includes considering the risks ahead for the project.
  
   So, in the spirit of fruitful discussion, can I ask what y'all
   consider to be the current problems with working on numpy (other than
   the technical ones).   What is numpy doing well, and what is it doing
   badly? What risks do we have to plan for in the future?
  
   joke
   Are you trying to prove the point that consensus doesn't work by
 making it
   impossible to reach a consensus on this? ;-)
   /joke
 
  Forgive me if I use this joke to see if I can get us any further.
 
  If this was code, I think this joke would not be funny, because we
  wouldn't expect to reach consensus without considering all the
  options, and discussing their pros and cons.
 
  Why would that not be useful in the case of forms of governance?
 

 Oh, it is true. I think we (those in the room in Austin) just have
 thought about it a bit already, so now we have to be a bit patient with
 everyone who just saw the plans the first time. But I hope we can agree
 that we should decide on some form of governance in the next few weeks,
 even if it may not be perfect.

 My personal problem with your ideas is not that I do not care for the
 warnings, but having already spend some time trying to put together this
 (and this is nothing weird, this is very common practice in open
 source), I personally do not want to spend time inventing something
 completely new.

 We must discuss improvements to the document, and even whole different
 approaches. But for me at least, I need something a little more
 specific. Maybe I am daft, but I hear this is a bad idea without also
 providing another approach (that seems doable).
 And I do not buy that it is *that* bad, it is a very common governance
 structure for open source. The presidency suggestions may be another
 approach and certainly something we can pick up ideas from, but to me it
 is so vague that I cannot even start comprehending what it would mean

Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-08-27 Thread Jaime Fernández del Río
On Thu, Aug 27, 2015 at 11:06 AM, Matthew Brett matthew.br...@gmail.com
wrote:

 Hi,

 On Thu, Aug 27, 2015 at 6:23 PM,  josef.p...@gmail.com wrote:
 
 
  On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett matthew.br...@gmail.com
 
  wrote:
 
  Hi
 
  On Thu, Aug 27, 2015 at 5:11 PM,  josef.p...@gmail.com wrote:
  
  
   On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett
   matthew.br...@gmail.com
   wrote:
  
   Hi,
  
   On Thu, Aug 27, 2015 at 3:34 PM,  josef.p...@gmail.com wrote:
   [snip]
I don't really see a problem with codifying the status quo.
  
   That's an excellent point.If we believe that the current
 situation
   is the best possible, both now and in the future, then codifying the
   status quo is an excellent idea.
  
   So, we should probably first start by asking ourselves:
  
   * what numpy is doing well;
   * what numpy could do better;
  
   and then ask, is there some way we could make it more likely we will
   improve over time.
  
   [snip]
  
As the current debate shows it's possible to have a public
 discussion
about
the direction of the project without having to delegate providing a
vision
to a president.
  
   The idea of a president that I had in mind, was not someone who makes
   all decisions, but the person who holds themselves responsible for
 the
   performance of the project.  If the project has a coherent vision
   already, the president has no need to provide one, but it's the
   president's job to worry about whether we have vision or not, and do
   what they need to, to make sure we don't lose track of that.   If you
   don't know it already, I highly recommend Jim Collins' work on 'level
   5 leadership' [1]
  
  
   Still doesn't sound like the need for a president to me
  
the person who holds themselves responsible for the
   performance of the project
  
   sounds more like the role of the core group (adding plural to
 persons)
   to
   me, and cannot be pushed of to an official president.
 
  Except that, in the past, having multiple people taking decisions has
  led to the situation where no-one feels themselves accountable for the
  result, hence this situation tends to lead to stagnation.
 
 
  Is there any evidence for this?

 Oh - dear - that's the key point, but I'm obviously not making it
 clearly enough.  Yes there is, and that was the evidence I was
 pointing to before.

 But anyway - Sebastian is right - this discussion isn't going anywhere
 useful.

 So - let's step back.

 In thinking about governance, we first need to ask what we want to
 achieve.  This includes considering the risks ahead for the project.

 So, in the spirit of fruitful discussion, can I ask what y'all
 consider to be the current problems with working on numpy (other than
 the technical ones).   What is numpy doing well, and what is it doing
 badly? What risks do we have to plan for in the future?


joke
Are you trying to prove the point that consensus doesn't work by making it
impossible to reach a consensus on this? ;-)
/joke

One thing we are doing very badly is leveraging resources outside of
contributions of work and time from individuals.  Getting sponsors to
finance work on what is the cornerstone of just about any Python package
that has to add two numbers together shouldn't be too hard, especially
seeing success stories like Jupyter's, who I believe has several paid
developers working full time.  That requires formalizing governance,
because apparently sponsors are a little wary of giving money to people on
the internet. ;-)  Fernando Pérez was extremely emphatic about the size of
the opportunity NumPy was letting slip by not formalizing *any* governance
model.  And it is a necessary first step so that e.g. we have the money to,
say a year from now, get the right people together for a couple of days to
figure out a better governance model.  I'd argue that money would be better
spent financing a talented developer to advance e.g. Nathaniel's new dtype
system to end all dtype systems, but that's a different story.

Largely because of the above, even if Nathaniel's document involved tossing
a coin to resolve disputes, I'd rather have that now than something much
better never. Because there really is no alternative to Nathaniel's
write-up of the status quo, other than the status quo without a write-up:
it has taken him two months to put this draft together, **after** we agreed
over several hours of face to face discussion on what the model should be.
And I'm sure he has hated every minute he has had to put into it.  So if we
keep going around this in circles, after a few days we will all grow tired
and go back to fighting over whether indexing should transpose subspaces or
not, and all that other cool stuff we really enjoy. And a year from now we
will be in the same place we are now, only a year older and deeper in
(technical) debt.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación 

Re: [Numpy-discussion] Fwd: Reverse(DESC)-ordered sorting

2015-08-19 Thread Jaime Fernández del Río
On Wed, Aug 19, 2015 at 1:10 PM, Feng Yu rainwood...@gmail.com wrote:

 Dear list,

 This is forwarded from issue 6217
 https://github.com/numpy/numpy/issues/6217

 What is the way to implement DESC ordering in the sorting routines of
 numpy?

 (I am borrowing DESC/ASC from the SQL notation)

 For a stable DESC ordering sort, one can not  revert the sorted array via
 argsort()[::-1] .

 I propose the following API change to argsorts/sort. (haven't thought
 about lexsort yet) I will use argsort as an example.

 Currently, argsort supports sorting by keys ('order') and by 'axis'.
 These two somewhat orthonal interfaces need to be treated differently.

 1. by axis.

 Since there is just one sorting key, a single 'reversed' keyword
 argument is sufficient:

 a.argsort(axis=0, kind='merge', reversed=True)

 Jaime suggested this can be implemented efficiently as a
 post-processing step.
 (https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is
 there a reference to the algorithm?


My thinking was that, for native types, you can stably reverse a sorted
permutation in-place by first reversing item-by-item, then reversing every
chunk of repeated entries. Sort of the way you would reverse the words in a
sentence in-place: first reverse every letter, then reverse everything
bounded by spaces:

TURN ME AROUND - DNUORA EM NRUT - AROUND EM NRUT - AROUND ME NRUT -
AROUND ME TURN

We could even add a type-specific function to do this for each of the
native types in the npy_sort library.

As I mentioned in Yu's very nice PR
https://github.com/numpy/numpy/pull/6222, probably it is best to leave
the signature of the function alone, and have something like order='desc'
be the trigger for the proposed reversed=True.

Jaime



 Because all of the sorting algorithms for 'atomic' dtypes are using
 _LT functions, a post processing step seems to be the only viable
 solution, if possible.



 2. by fields, ('order' kwarg)

 A single 'reversed' keyword argument will not work, because some keys
 are ASC but others are DESC, for example, sorting my LastName ASC,
 then Salary DESC.

 a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'),
 ('Salary', 'desc'))])

 The parsing rule of order is:

 - if an item is tuple, the first item is the fieldname, the second
 item is DESC/ASC
 - if an item is scalar, the fieldname is the item, the ordering is ASC.

 This part of the code already goes to VOID_compare, which walks a
 temporary copy of a.dtype to call the comparison functions.

 If I understood the purpose of c_metadata (numpy 1.7+) correctly, the
 ASC/DESC flags, offsets, and comparison functions can all be
 pre-compiled and passed into VOID_compare via c_metadata of the
 temporary type-descriptor.

 By just looking this will actually make VOID_compare faster by
 avoiding calling several Python C-API functions. negating the return
 value of cmp seems to be a negligable overhead in such a complex
 function.


 3. If both reversed and order is given, the ASC/DESC fields in 'order'
 are effectively reversed.

 Any comments?

 Best,

 Yu
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to np.digitize since NumPy 1.9?

2015-08-13 Thread Jaime Fernández del Río
On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum nathan12...@gmail.com
wrote:

 Hi all,

 I've been testing the package I spend most of my time on, yt, under numpy
 1.10b1 since the announcement went out.

 I think I've narrowed down and fixed all of the test failures that cropped
 up except for one last issue. It seems that the behavior of np.digitize
 with respect to ndarray subclasses has changed since the NumPy 1.9 series.
 Consider the following test script:

 ```python
 import numpy as np


 class MyArray(np.ndarray):
 def __new__(cls, *args, **kwargs):
 return np.ndarray.__new__(cls, *args, **kwargs)

 data = np.arange(100)

 bins = np.arange(100) + 0.5

 data = data.view(MyArray)

 bins = bins.view(MyArray)

 digits = np.digitize(data, bins)

 print type(digits)
 ```

 Under NumPy 1.9.2, this prints type 'numpy.ndarray', but under the
 1.10 beta, it prints class '__main__.MyArray'

 I'm curious why this change was made. Since digitize outputs index arrays,
 it doesn't make sense to me why it should return anything but a plain
 ndarray. I see in the release notes that digitize now uses searchsorted
 under the hood. Is this related?


It is indeed searchsorted's fault, as it returns an object of the same type
as the needle (the items to search for):

 import numpy as np
 class A(np.ndarray): pass
 class B(np.ndarray): pass
 np.arange(10).view(A).searchsorted(np.arange(5).view(B))
B([0, 1, 2, 3, 4])

I am all for making index-returning functions always return a base ndarray,
and will be more than happy to send a PR fixing this if there is some
agreement.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to np.digitize since NumPy 1.9?

2015-08-13 Thread Jaime Fernández del Río
On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum nathan12...@gmail.com
wrote:



 On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río 
 jaime.f...@gmail.com wrote:

 On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum nathan12...@gmail.com
 wrote:

 Hi all,

 I've been testing the package I spend most of my time on, yt, under
 numpy 1.10b1 since the announcement went out.

 I think I've narrowed down and fixed all of the test failures that
 cropped up except for one last issue. It seems that the behavior of
 np.digitize with respect to ndarray subclasses has changed since the NumPy
 1.9 series. Consider the following test script:

 ```python
 import numpy as np


 class MyArray(np.ndarray):
 def __new__(cls, *args, **kwargs):
 return np.ndarray.__new__(cls, *args, **kwargs)

 data = np.arange(100)

 bins = np.arange(100) + 0.5

 data = data.view(MyArray)

 bins = bins.view(MyArray)

 digits = np.digitize(data, bins)

 print type(digits)
 ```

 Under NumPy 1.9.2, this prints type 'numpy.ndarray', but under the
 1.10 beta, it prints class '__main__.MyArray'

 I'm curious why this change was made. Since digitize outputs index
 arrays, it doesn't make sense to me why it should return anything but a
 plain ndarray. I see in the release notes that digitize now uses
 searchsorted under the hood. Is this related?


 It is indeed searchsorted's fault, as it returns an object of the same
 type as the needle (the items to search for):

  import numpy as np
  class A(np.ndarray): pass
  class B(np.ndarray): pass
  np.arange(10).view(A).searchsorted(np.arange(5).view(B))
 B([0, 1, 2, 3, 4])

 I am all for making index-returning functions always return a base
 ndarray, and will be more than happy to send a PR fixing this if there is
 some agreement.


 I think that is the right thing to do.


 Awesome, I'd appreciate having a PR to fix this. Arguably the return type
 *could* be the same type as the inputs, but given that it's a behavior
 change I agree that it's best to add a patch so the output of serachsorted
 is sanitized to be an ndarray before it's returned by digitize.


It is relatively simple to do, just replace Py_TYPE(ap2) with PyArray_Type
in this line:

https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multiarray/item_selection.c#L1725

Then fix all the tests that are expecting searchsorted to return something
else than a base ndarray. We already have modified nonzero to return base
ndarray's in this release, see the release notes, so it will go with the
same theme.

For 1.11 I think we should try to extend this if it returns an index, it
will be a base ndarray to all other functions that don't right now. Then
sit back and watch AstroPy come down in flames... ;-)))

Seriously, I think this makes a lot of sense, and should be documented as
the way NumPy handles index arrays.

Anyway, I will try to find time tonight to put this PR together, unless
someone beats me to it, which I would be totally fine with.

Jaime



 To answer Nathaniel's question, I opened an issue on yt's bitbucket page
 to record the test failures:


 https://bitbucket.org/yt_analysis/yt/issues/1063/new-test-failures-using-numpy-110-beta

 I've fixed two of the classes of errors in that bug in yt itself, since it
 looks like we were relying on buggy or deprecated behavior in NumPy. Here
 are the PRs for those fixes:


 https://bitbucket.org/yt_analysis/yt/pull-requests/1697/cast-enzo-grid-start-index-to-int-arrays/diff

 https://bitbucket.org/yt_analysis/yt/pull-requests/1696/add-assert_allclose_units-like/diff


 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to np.digitize since NumPy 1.9?

2015-08-13 Thread Jaime Fernández del Río
On Thu, Aug 13, 2015 at 9:57 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum nathan12...@gmail.com
 wrote:



 On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río 
 jaime.f...@gmail.com wrote:

 On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum nathan12...@gmail.com
  wrote:

 Hi all,

 I've been testing the package I spend most of my time on, yt, under
 numpy 1.10b1 since the announcement went out.

 I think I've narrowed down and fixed all of the test failures that
 cropped up except for one last issue. It seems that the behavior of
 np.digitize with respect to ndarray subclasses has changed since the NumPy
 1.9 series. Consider the following test script:

 ```python
 import numpy as np


 class MyArray(np.ndarray):
 def __new__(cls, *args, **kwargs):
 return np.ndarray.__new__(cls, *args, **kwargs)

 data = np.arange(100)

 bins = np.arange(100) + 0.5

 data = data.view(MyArray)

 bins = bins.view(MyArray)

 digits = np.digitize(data, bins)

 print type(digits)
 ```

 Under NumPy 1.9.2, this prints type 'numpy.ndarray', but under the
 1.10 beta, it prints class '__main__.MyArray'

 I'm curious why this change was made. Since digitize outputs index
 arrays, it doesn't make sense to me why it should return anything but a
 plain ndarray. I see in the release notes that digitize now uses
 searchsorted under the hood. Is this related?


 It is indeed searchsorted's fault, as it returns an object of the same
 type as the needle (the items to search for):

  import numpy as np
  class A(np.ndarray): pass
  class B(np.ndarray): pass
  np.arange(10).view(A).searchsorted(np.arange(5).view(B))
 B([0, 1, 2, 3, 4])

 I am all for making index-returning functions always return a base
 ndarray, and will be more than happy to send a PR fixing this if there is
 some agreement.


 I think that is the right thing to do.


 Awesome, I'd appreciate having a PR to fix this. Arguably the return type
 *could* be the same type as the inputs, but given that it's a behavior
 change I agree that it's best to add a patch so the output of serachsorted
 is sanitized to be an ndarray before it's returned by digitize.


 It is relatively simple to do, just replace Py_TYPE(ap2) with
 PyArray_Type in this line:


 https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multiarray/item_selection.c#L1725

 Then fix all the tests that are expecting searchsorted to return something
 else than a base ndarray. We already have modified nonzero to return base
 ndarray's in this release, see the release notes, so it will go with the
 same theme.

 For 1.11 I think we should try to extend this if it returns an index, it
 will be a base ndarray to all other functions that don't right now. Then
 sit back and watch AstroPy come down in flames... ;-)))

 Seriously, I think this makes a lot of sense, and should be documented as
 the way NumPy handles index arrays.

 Anyway, I will try to find time tonight to put this PR together, unless
 someone beats me to it, which I would be totally fine with.


PR #6206 it is: https://github.com/numpy/numpy/pull/6206

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem.

2015-08-07 Thread Jaime Fernández del Río
On Fri, Aug 7, 2015 at 2:33 AM, David Cournapeau courn...@gmail.com wrote:

 Which command exactly did you run to have that error ? Normally, the code
 in msvc9compiler should not be called if you call the setup.py with the
 mingw compiler as expected by distutils


FWIW, the incantation that works for me to compile numpy on Windows with
mingw is:

python setup.py config --compiler=mingw32 build --compiler=mingw32 install

but I am not sure I have ever tried it with Python 3.

I think my source for this was:

http://nipy.sourceforge.net/nipy/devel/devel/install/windows_scipy_build.html

Jaime



 On Fri, Aug 7, 2015 at 12:19 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Thu, Aug 6, 2015 at 5:11 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Thu, Aug 6, 2015 at 4:22 PM, David Cournapeau courn...@gmail.com
 wrote:

 Sorry if that's obvious, but do you have Visual Studio 2010 installed ?

 On Thu, Aug 6, 2015 at 11:17 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:

 Anyone know how to fix this? I've run into it before and never got it
 figured out.

 [192.168.121.189:22] out:   File
 C:\Python34\lib\distutils\msvc9compiler.py, line 259, in query_vcvarsall
 [192.168.121.189:22] out:
 [192.168.121.189:22] out: raise DistutilsPlatformError(Unable to
 find vcvarsall.bat)
 [192.168.121.189:22] out:
 [192.168.121.189:22] out: distutils.errors.DistutilsPlatformError:
 Unable to find vcvarsall.bat

 Chuck



 I'm running numpy-vendor, which is running wine. I think it is all mingw
 with a few installed dll's. The error is coming from the Python distutils
 as part of `has_cblas`.


 It's not impossible that we have changed the build somewhere along the
 line.

 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about unaligned access

2015-07-06 Thread Jaime Fernández del Río
On Mon, Jul 6, 2015 at 10:18 AM, Francesc Alted fal...@gmail.com wrote:

 Hi,

 I have stumbled into this:

 In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
 np.int64), ('f1', np.int32)])

 In [63]: %timeit sa['f0'].sum()
 100 loops, best of 3: 4.52 ms per loop

 In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
 np.int64), ('f1', np.int64)])

 In [65]: %timeit sa['f0'].sum()
 1000 loops, best of 3: 896 µs per loop

 The first structured array is made of 12-byte records, while the second is
 made by 16-byte records, but the latter performs 5x faster.  Also, using an
 structured array that is made of 8-byte records is the fastest (expected):

 In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), dtype=[('f0',
 np.int64)])

 In [67]: %timeit sa['f0'].sum()
 1000 loops, best of 3: 567 µs per loop

 Now, my laptop has a Ivy Bridge processor (i5-3380M) that should perform
 quite well on unaligned data:


 http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/

 So, if 4 years-old Intel architectures do not have a penalty for unaligned
 access, why I am seeing that in NumPy?  That strikes like a quite strange
 thing to me.


I believe that the way numpy is setup, it never does unaligned access,
regardless of the platform, in case it gets run on one that would go up in
flames if you tried to. So my guess would be that you are seeing chunked
copies into a buffer, as opposed to bulk copying or no copying at all, and
that would explain your timing differences. But Julian or Sebastian can
probably give you a more informed answer.

Jaime



 Thanks,
 Francesc

 --
 Francesc Alted

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Open CV 3.0 + NPY_RELAXED_STRIDES

2015-06-11 Thread Jaime Fernández del Río
On Thu, Jun 11, 2015 at 2:44 AM, Julian Taylor 
jtaylor.deb...@googlemail.com wrote:

 On Thu, Jun 11, 2015 at 11:39 AM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  On Mi, 2015-06-10 at 21:03 -0600, Charles R Harris wrote:
 
 
  snip
 
 
* Relaxed stride checking will be the default in 1.10.0
  Is this still the plan?
 
 
  Yes, but it won't be quite the same as the master branch.  Currently
  an unusual value for the stride (?) is used in order to smoke out
  misuse, but that value will be more rational in the release.
 
 
  +1, it should not be as bad/common in practice once rolled out. That
  said, I do not mind delaying things beyond 1.10, it might be better for
  compatibility if someone gets a new numpy on top of oldish other
  packages.
  So I am good with planning to go ahead for the moment. But if anyone
  complains, I would back down for 1.10 probably.
 
  - Sebastian
 

 With beside scipy.ndimage also opencv being broken I think we will
 have to delay it beyond 1.10, though we should have at least an alpha,
 maybe even a beta with it enabled to induce some panic that hopefully
 will spure some fixes.


OpenCV shouldn't be broken any more if the merge this:

https://github.com/Itseez/opencv/pull/4117

I would appreciate a second set of eyes looking over the logic in that PR.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Open CV 3.0 + NPY_RELAXED_STRIDES

2015-06-10 Thread Jaime Fernández del Río
On Wed, Jun 10, 2015 at 5:53 PM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 I'm in the midst of a Python 3.5 + MSVS 2015 compilation frenzy. Today it
 was time for Open CV 3.0, where I found a nasty bug that I have eventually
 tracked down to using a development version of NumPy, and Open CV 3.0
  choking on relaxed strides, as it does a check that every stride is a
 multiple of the itemsize.

 I was thinking of submitting a patch to opencv to fix this, but was
 wondering whether we have plans to eventually have relaxed strides out in
 the wild in user releases, or is it just a testing tool for development?


I see that in the release notes of 1.9 we had the following:


   - Relaxed stride checking will be the default in 1.10.0

Is this still the plan?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NumPy + Python 3.5 + Windows + VS2015

2015-06-08 Thread Jaime Fernández del Río
I have just unsuccessfully tried to build numpy under Windows for Python
3.5, using the latest release candidate for Visual Studio 2015.

A very early failure with a:

RuntimeError: Broken toolchain: cannot link a simple C program

even though repeating the sequence of commands that lead to the failure
manually seems to work.

Anyway, before diving deeper into this, has anyone tried this out already
and have some success or failure stories to share?

Thanks,

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] binary wheels for numpy?

2015-05-15 Thread Jaime Fernández del Río
On Fri, May 15, 2015 at 6:56 PM, josef.p...@gmail.com wrote:



 On Fri, May 15, 2015 at 4:07 PM, Chris Barker chris.bar...@noaa.gov
 wrote:

 Hi folks.,

 I did a little intro to scipy session as part of a larger Python class
 the other day, and was dismayed to find that pip install numpy still
 dosn't work on Windows.

 Thanks mostly to Matthew Brett's work, the whole scipy stack is
 pip-installable on OS-X, it would be really nice if we had that for Windows.

 And no, saying you should go get Python(x,y) or Anaconda, or Canopy,
 or...) is really not a good solution. That is indeed the way to go if
 someone is primarily focusing on computational programming, but if you have
 a web developer, or someone new to Python for general use, they really
 should be able to just grab numpy and play around with it a bit without
 having to start all over again.


 Unrelated to the pip/wheel discussion.

 In my experience by far the easiest to get something running to play with
 is using Winpython. Download and unzip (and maybe add to system path) and
 most of the data analysis stack is available.

 I haven't even bothered yet to properly install a full system python on
 my Windows machine. I'm just working with 3 winpython. (One even has Julia
 and IJulia included after following the installation instructions for a
 short time.)


+1 on WinPython. I have half a dozen installations of it, none registered
with Windows.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Create a n-D grid; meshgrid alternative

2015-05-12 Thread Jaime Fernández del Río
On Tue, May 12, 2015 at 1:17 AM, Stefan Otte stefan.o...@gmail.com wrote:

 Hello,

 indeed I was looking for the cartesian product.

 I timed the two stackoverflow answers and the winner is not quite as clear:

 n_elements:10  cartesian  0.00427 cartesian2  0.00172
 n_elements:   100  cartesian  0.02758 cartesian2  0.01044
 n_elements:  1000  cartesian  0.97628 cartesian2  1.12145
 n_elements:  5000  cartesian 17.14133 cartesian2 31.12241

 (This is for two arrays as parameters: np.linspace(0, 1, n_elements))
 cartesian2 seems to be slower for bigger.


On my system, the following variation on Pauli's answer is 2-4x faster than
his for your test cases:

def cartesian4(arrays, out=None):
arrays = [np.asarray(x).ravel() for x in arrays]
dtype = np.result_type(*arrays)

n = np.prod([arr.size for arr in arrays])
if out is None:
out = np.empty((len(arrays), n), dtype=dtype)
else:
out = out.T

for j, arr in enumerate(arrays):
n /= arr.size
out.shape = (len(arrays), -1, arr.size, n)
out[j] = arr[np.newaxis, :, np.newaxis]
out.shape = (len(arrays), -1)

return out.T


 I'd really appreciate if this was be part of numpy. Should I create a pull
 request?


There hasn't been any opposition, quite the contrary, so yes, I would go
ahead an create that PR. I somehow feel this belongs with the set
operations, rather than with the indexing ones. Other thoughts?

Also for consideration: should it work on flattened arrays? or should we
give it an axis argument, and then broadcast on the rest, a la
generalized ufunc?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Create a n-D grid; meshgrid alternative

2015-05-10 Thread Jaime Fernández del Río
On Sun, May 10, 2015 at 4:40 AM, Stefan Otte stefan.o...@gmail.com wrote:

 Hey,

 quite often I want to evaluate a function on a grid in a n-D space.
 What I end up doing (and what I really dislike) looks something like this:

   x = np.linspace(0, 5, 20)
   M1, M2 = np.meshgrid(x, x)
   X = np.column_stack([M1.flatten(), M2.flatten()])
   X.shape  # (400, 2)

   fancy_function(X)

 I don't think I ever used `meshgrid` in any other way.
 Is there a better way to create such a grid space?

 I wrote myself a little helper function:

   def gridspace(linspaces):
   return np.column_stack([space.flatten()
   for space in np.meshgrid(*linspaces)])

 But maybe something like this should be part of numpy?


Isn't what you are trying to build a cartesian product function? There is a
neat, efficient implementation of such a function in StackOverflow, by our
own pv.:

http://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays/1235363#1235363

Perhaps we could make this part of numpy.lib.arraysetops? Isthere room for
other combinatoric generators, i.e. combinations, permutations... as in
itertools?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Create a n-D grid; meshgrid alternative

2015-05-10 Thread Jaime Fernández del Río
On Sun, May 10, 2015 at 7:05 AM, Stefan Otte stefan.o...@gmail.com wrote:

 I just drafted different versions of the `gridspace` function:

 https://tmp23.tmpnb.org/user/1waoqQ8PJBJ7/notebooks/2015-05%20gridspace.ipynb


The link seems to be broken...

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-09 Thread Jaime Fernández del Río
There is a reported bug (issue #5837
https://github.com/numpy/numpy/issues/5837) regarding different returns
from np.nonzero with 1-D vs higher dimensional arrays. A full summary of
the differences can be seen from the following output:

 class C(np.ndarray): pass
...
 a = np.arange(6).view(C)
 b = np.arange(6).reshape(2, 3).view(C)
 anz = a.nonzero()
 bnz = b.nonzero()

 type(anz[0])
type 'numpy.ndarray'
 anz[0].flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
 anz[0].base

 type(bnz[0])
class '__main__.C'
 bnz[0].flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False
 bnz[0].base
array([[0, 1],
   [0, 2],
   [1, 0],
   [1, 1],
   [1, 2]])

The original bug report was only concerned with the non-writeability of
higher dimensional array returns, but there are more differences: 1-D
always returns an ndarray that owns its memory and is writeable, but higher
dimensional arrays return views, of the type of the original array, that
are non-writeable.

I have a branch that attempts to fix this by making both 1-D and n-D arrays:

   1. return a view, never the base array,
   2. return an ndarray, never a subclass, and
   3. return a writeable view.

I guess the most controversial choice is #2, and in fact making that change
breaks a few tests. I nevertheless think that all of the index returning
functions (nonzero, argsort, argmin, argmax, argpartition) should always
return a bare ndarray, not a subclass. I'd be happy to be corrected, but I
can't think of any situation in which preserving the subclass would be
needed for these functions.

Since we are changing the returns of a few other functions in 1.10
(diagonal, diag, ravel), it may be a good moment to revisit the behavior
for these other functions. Any thoughts?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to set a fixed sized dtype suitable for bitwise operations

2015-04-28 Thread Jaime Fernández del Río
On Tue, Apr 28, 2015 at 7:00 AM, Benjamin Root ben.r...@ou.edu wrote:

 I have a need to have a numpy array of 17 byte (more specifically, at
 least 147 bits) values that I would be doing some bit twiddling on. I have
 found that doing a dtype of i17 yields a dtype of int32, which is
 completely not what I intended. Doing 'u17' gets an data type not
 understood. I have tried 'a17', but then bitwise_or() and left_shift() do
 not work (returns NotImplemented).


 How should I be going about this?


The correct type to use would be a void dtype:

 dt = np.dtype('V17')
 dt.itemsize
17

Unfortunately, it does not support bitwise operations either, which seems
like an oddity to me:

 a = np.empty(2, dt)
 a[0] = 'abcdef'
 a[1] = bytearray([64, 56, 78])
 a[0] | a[1]
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: unsupported operand type(s) for |: 'numpy.void' and 'numpy.void'

Any fundamental reason for this?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar mistersh...@gmail.com wrote:

 Yeah, I'm not arguing, I'm just curious about your reasoning.  That
 explains why not C++.  Why would you want to do this in C and not Python?


Well, the algorithm has to iterate over all the inputs, updating the
estimated percentile positions at every iteration. Because the estimated
percentiles may change in every iteration, I don't think there is an easy
way of vectorizing the calculation with numpy. So I think it would be very
slow if done in Python.

Looking at this in some more details, how is this typically used? Because
it gives you approximate values that should split your sample into
similarly filled bins, but because the values are approximate, to compute a
proper histogram you would still need to do the binning to get the exact
results, right? Even with this drawback P-2 does have an algorithmic
advantage, so for huge inputs and many bins it should come ahead. But for
many medium sized problems it may be faster to simply use np.partition,
which gives you the whole thing in a single go. And it would be much
simpler to implement.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 8:06 AM, Neil Girdhar mistersh...@gmail.com wrote:

 You got it.  I remember this from when I worked at Google and we would
 process (many many) logs.  With enough bins, the approximation is still
 really close.  It's great if you want to make an automatic plot of data.
 Calling numpy.partition a hundred times is probably slower than calling P^2
 with n=100 bins.  I don't think it does O(n) computations per point.  I
 think it's more like O(log(n)).


Looking at it again, it probably is O(n) after all: it does a binary
search, which is O(log n), but it then goes on to update all the n bin
counters and estimations, so O(n) I'm afraid. So there is no algorithmic
advantage over partition/percentile: if there are m samples and n bins, P-2
that O(n) m times, while partition does O(m) n times, so both end up being
O(m n). It seems to me that the big thing of P^2 is not having to hold the
full dataset in memory. Online statistics (is that the name for this?),
even if only estimations, is a cool thing, but I am not sure numpy is the
place for them. That's not to say that we couldn't eventually have P^2
implemented for histogram, but I would start off with a partition based one.

Would SciPy have a place for online statistics? Perhaps there's room for
yet another scikit?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 9:14 AM, Eric Moore e...@redtetrahedron.org wrote:

 This blog post, and the links within also seem relevant.  Appears to have
 python code available to try things out as well.


 https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest


Very cool indeed... The original works is licensed under an Apache 2.0
license (https://github.com/tdunning/t-digest/blob/master/LICENSE). I am
not fluent in legalese, so not sure whether that means we can use it or
not, seems awfully more complicated than what we normally use.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Jaime Fernández del Río
On Tue, Apr 14, 2015 at 6:16 PM, Neil Girdhar mistersh...@gmail.com wrote:

 If you're going to C, is there a reason not to go to C++ and include the
 already-written Boost code?  Otherwise, why not use Python?


I think we have an explicit rule against C++, although I may be wrong. Not
sure how much of boost we would have to make part of numpy to use that, the
whole accumulators lib I'm guessing? Seems like an awful lot given what we
are after.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Jaime Fernández del Río
On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Can I suggest that we instead add the P-square algorithm for the dynamic
  calculation of histograms?
  (
 http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf
 )
 
  This is already implemented in C++'s boost library
  (
 http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
 )
 
  I implemented it in Boost Python as a module, which I'm happy to share.
  This is much better than fixed-width histograms in practice.  Rather than
  adjusting the number of bins, it adjusts what you really want, which is
 the
  resolution of the bins throughout the domain.

 This definitely sounds like a useful thing to have in numpy or scipy
 (though if it's possible to do without using Boost/C++ that would be
 nice). But yeah, we should leave the existing histogram alone (in this
 regard) and add a new name for this like adaptive_histogram or
 something. Then you can set about convincing matplotlib and friends to
 use it by default :-)


Would having a negative number of bins mean this many, but with optimized
boundaries be too clever an interface?

I have taken a look at the paper linked, and the P-2 algorithm would not be
too complicated to implement from scratch, although it would require
writing some C code I'm afraid.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-12 Thread Jaime Fernández del Río
On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote:


 http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
 tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb

 Long story short, histogram visualisations that depend on numpy (such as
 matplotlib, or  nearly all of them) have poor default behaviour as I have
 to
 constantly play around with  the number of bins to get a good idea of what
 I'm
 looking at. The bins=10 works ok for  up to 1000 points or very normal
 data,
 but has poor performance for anything else, and  doesn't account for
 variability either. I don't have a method easily available to scale the
 number
 of bins given the data.

 R doesn't suffer from these problems and provides methods for use with it's
 hist  method. I would like to provide similar functionality for
 matplotlib, to
 at least provide  some kind of good starting point, as histograms are very
 useful for initial data discovery.

 The notebook above provides an explanation of the problem as well as some
 proposed  alternatives. Use different datasets (type and size) to see the
 performance of the  suggestions. All of the methods proposed exist in R and
 literature.

 I've put together an implementation to add this new functionality, but am
 hesitant to  make a pull request as I would like some feedback from a
 maintainer before doing so.


+1 on the PR.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-05 Thread Jaime Fernández del Río
On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely
 based on Stephan's code plus some axis remapping, that provides all the
 needed functionality for getting and setting with orthogonal indices.

 Would those interested rather see it as a gist to play around with, or as
 a PR adding an orthogonally indexable `.ix_` argument to ndarray?


A PR it is, #5749 https://github.com/numpy/numpy/pull/5749 to be precise.
I think it has all the bells and whistles: integers, boolean and integer
1-D arrays, slices, ellipsis, and even newaxis, both for getting and
setting. No tests yet, so correctness of the implementation is dubious at
best. As a small example:

 a = np.arange(60).reshape(3, 4, 5)
 a.ix_
numpy.core._indexer.OrthogonalIndexer at 0x1027979d0
 a.ix_[[0, 1], :, [True, False, True, False, True]]
array([[[ 0,  2,  4],
[ 5,  7,  9],
[10, 12, 14],
[15, 17, 19]],

   [[20, 22, 24],
[25, 27, 29],
[30, 32, 34],
[35, 37, 39]]])
 a.ix_[[0, 1], :, [True, False, True, False, True]] = 0
 a
array([[[ 0,  1,  0,  3,  0],
[ 0,  6,  0,  8,  0],
[ 0, 11,  0, 13,  0],
[ 0, 16,  0, 18,  0]],

   [[ 0, 21,  0, 23,  0],
[ 0, 26,  0, 28,  0],
[ 0, 31,  0, 33,  0],
[ 0, 36,  0, 38,  0]],

   [[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-03 Thread Jaime Fernández del Río
I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely
based on Stephan's code plus some axis remapping, that provides all the
needed functionality for getting and setting with orthogonal indices.

Would those interested rather see it as a gist to play around with, or as a
PR adding an orthogonally indexable `.ix_` argument to ndarray?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Jaime Fernández del Río
On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett matthew.br...@gmail.com
wrote:

 Hi,

 On Thu, Apr 2, 2015 at 6:09 PM,  josef.p...@gmail.com wrote:
  On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote:
  On 2015/04/02 1:14 PM, Hanno Klemm wrote:
  Well, I have written quite a bit of code that relies on fancy
  indexing, and I think the question, if the behaviour of the []
  operator should be changed has sailed with numpy now at version 1.9.
  Given the amount packages that rely on numpy, changing this
  fundamental behaviour would not be a clever move.
 
  Are you *positive* that there is no clever way to make a transition?
  It's not worth any further thought?
 
  I guess it would be similar to python 3 string versus bytes, but
  without the overwhelming benefits.
 
  I don't think I would be in favor of deprecating fancy indexing even
  if it were possible. In general, my impression is that if there is a
  trade-off in numpy between powerful machinery versus easy to learn and
  teach, then the design philosophy when in favor of power.
 
  I think numpy indexing is not too difficult and follows a consistent
  pattern, and I completely avoid mixing slices and index arrays with
  ndim  2.

 I'm sure y'all are totally on top of this, but for myself, I would
 like to distinguish:

 * fancy indexing with boolean arrays - I use it all the time and don't
 get confused;
 * fancy indexing with non-boolean arrays - horrendously confusing,
 almost never use it, except on a single axis when I can't confuse it
 with orthogonal indexing:

 In [3]: a = np.arange(24).reshape(6, 4)

 In [4]: a
 Out[4]:
 array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])

 In [5]: a[[1, 2, 4]]
 Out[5]:
 array([[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[16, 17, 18, 19]])

 I also remember a discussion with Travis O where he was also saying
 that this indexing was confusing and that it would be good if there
 was some way to transition to what he called outer product indexing (I
 think that's the same as 'orthogonal' indexing).

  I think it should be DOA, except as a discussion topic for numpy 3000.

 I think there are two proposals here:

 1) Add some syntactic sugar to allow orthogonal indexing of numpy
 arrays, no backward compatibility break.

 That seems like a very good idea to me - were there any big objections to
 that?

 2) Over some long time period, move the default behavior of np.array
 non-boolean indexing from the current behavior to the orthogonal
 behavior.

 That is going to be very tough, because it will cause very confusing
 breakage of legacy code.

 On the other hand, maybe it is worth going some way towards that, like
 this:

 * implement orthogonal indexing as a method arr.sensible_index[...]
 * implement the current non-boolean fancy indexing behavior as a
 method - arr.crazy_index[...]
 * deprecate non-boolean fancy indexing as standard arr[...] indexing;
 * wait a long time;
 * remove non-boolean fancy indexing as standard arr[...] (errors are
 preferable to change in behavior)

 Then if we are brave we could:

 * wait a very long time;
 * make orthogonal indexing the default.

 But the not-brave steps above seem less controversial, and fairly
 reasonable.

 What about that as an approach?


Your option 1 was what was being discussed before the posse was assembled
to bring fancy indexing before justice... ;-)

My background is in image processing, and I have used fancy indexing in all
its fanciness far more often than orthogonal or outer product indexing. I
actually have a vivid memory of the moment I fell in love with NumPy: after
seeing a code snippet that ran a huge image through a look-up table by
indexing the LUT with the image. Beautifully simple. And here
http://stackoverflow.com/questions/12014186/fancier-fancy-indexing-in-numpy
is a younger me, learning to ride NumPy without the training wheels.

Another obvious use case that you can find all over the place in
scikit-image is drawing a curve on an image from the coordinates.

If there is such strong agreement on an orthogonal indexer, we might as
well go ahead an implement it. But before considering any bolder steps, we
should probably give it a couple of releases to see how many people out
there really use it.

Jaime

P.S. As an aside on the remapping of axes when arrays and slices are mixed,
there really is no better way. Once you realize that the array indexing a
dimension does not have to be 1-D, it should clearly appear that what seems
the obvious way does not generalize to the general case. E.g.:

One may rightfully think that:

 a = np.arange(60).reshape(3, 4, 5)
 a[np.array([1])[:, None], ::2, [0, 1, 3]].shape
(1, 3, 2)

should not reorder the axes, and return an array of shape (1, 2, 3). But
what do you do in the following case?

 idx0 = np.random.randint(3, size=(10, 1, 10))
 idx2 = np.random.randint(5, size=(1, 20, 

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Jaime Fernández del Río
On Thu, Apr 2, 2015 at 1:29 AM, Stephan Hoyer sho...@gmail.com wrote:

 On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río 
 jaime.f...@gmail.com wrote:

 Is there any other package implementing non-orthogonal indexing aside
 from numpy?


 I think we can safely say that NumPy's implementation of broadcasting
 indexing is unique :).

 The issue is that many other packages rely on numpy for implementation of
 custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not
 immediately obvious what sort of indexing these objects represent.

 If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
 for improving that to provide the full functionality of orthogonal
 indexing. I just need a little more convincing that those new
 attributes/indexers are going to ever see any real use.


 Orthogonal indexing is close to the norm for packages that implement
 labeled data structures, both because it's easier to understand and
 implement, and because it's difficult to maintain associations with labels
 through complex broadcasting indexing.

 Unfortunately, the lack of a full featured implementation of orthogonal
 indexing has lead to that wheel being reinvented at least three times (in
 Iris, xray [1] and pandas). So it would be nice to have a canonical
 implementation that supports slices and integers in numpy for that reason
 alone. This could be done by building on the existing `np.ix_` function,
 but a new indexer seems more elegant: there's just much less noise with
 `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`.

 It's also well known that indexing with __getitem__ can be much slower
 than np.take. It seems plausible to me that a careful implementation of
 orthogonal indexing could close or eliminate this speed gap, because the
 model for orthogonal indexing is so much simpler than that for broadcasting
 indexing: each element of the key tuple can be applied separately along the
 corresponding axis.

 So I think there could be a real benefit to having the feature in numpy.
 In particular, if somebody is up for implementing it in C or Cython, I
 would be very pleased.

  Cheers,
 Stephan

 [1] Here is my implementation of remapping from orthogonal to broadcasting
 indexing. It works, but it's a real mess, especially because I try to
 optimize by minimizing the number of times slices are converted into arrays:

 https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68


I believe you can leave all slices unchanged if you later reshuffle your
axes. Basically all the fancy-indexed axes go in the front of the shape in
order, and the subspace follows, e.g.:

 a = np.arange(60).reshape(3, 4, 5)
 a[np.array([1])[:, None], ::2, np.array([1, 2, 3])].shape
(1, 3, 2)

So you would need to swap the second and last axes and be done. You would
not get a contiguous array without a copy, but that's a different story.
Assigning to an orthogonally indexed subarray is an entirely different
beast, not sure if there is a use case for that.

We probably need more traction on the should this be done? discussion
than on the can this be done? one, the need for a reordering of the axes
swings me slightly in favor, but I mostly don't see it yet. Nathaniel
usually has good insights on who we are, where do we come from, where are
we going to, type of questions, would be good to have him chime in.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-01 Thread Jaime Fernández del Río
On Wed, Apr 1, 2015 at 2:17 AM, R Hattersley rhatters...@gmail.com wrote:

 There are two different interpretations in common use of how to handle
 multi-valued (array/sequence) indexes. The numpy style is to consider all
 multi-valued indices together which allows arbitrary points to be
 extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to
 consider each multi-valued index independently.

 For example:

  type(v)type 'netCDF4.Variable' v.shape
 (240, 37, 49) v[(0, 1), (0, 2, 3)].shape
 (2, 3, 49) np.array(v)[(0, 1), (0, 2, 3)].shape
 Traceback (most recent call last):
   File stdin, line 1, in moduleIndexError: shape mismatch: indexing 
 arrays could not be broadcast together with shapes (2,) (3,)


 In a netcdf4-python GitHub issue
 https://github.com/Unidata/netcdf4-python/issues/385 the authors of
 various orthogonal indexing packages have been discussing how to
 distinguish the two behaviours and have currently settled on a boolean
 __orthogonal_indexing__ attribute.

 1. Is there any appetite for adding that attribute (with the value
 `False`) to ndarray?

 2. As suggested by shoyer
 https://github.com/Unidata/netcdf4-python/issues/385#issuecomment-87775034,
 is there any appetite for adding an alternative indexer to ndarray where
 __orthogonal_indexing__ = True? For example: myarray.ix_[(0,1), (0, 2, 3)]


Is there any other package implementing non-orthogonal indexing aside from
numpy? I understand that it would be nice to do:

if x.__orthogonal_indexing__:
return x[idx]
else:
return x.ix_[idx]

But I think you would get the exact same result doing:

if isinstance(x, np.ndarray):
return x[np.ix_(*idx)]
else:
return x[idx]

If `not x.__orthogonal_indexing__` is going to be a proxy for
`isinstance(x, ndarray)` I don't really see the point of disguising it,
explicit is better than implicit and all that.

If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
for improving that to provide the full functionality of orthogonal
indexing. I just need a little more convincing that those new
attributes/indexers are going to ever see any real use.

Jaime


 Richard

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Adding 'where' to ufunc methods?

2015-04-01 Thread Jaime Fernández del Río
This question on StackOverflow:

http://stackoverflow.com/questions/29394377/minimum-of-numpy-array-ignoring-diagonal

Got me thinking that I had finally found a use for the 'where' kwarg of
ufuncs. Unfortunately it is only provided for the ufunc itself, but not for
any of its methods.

Is there any fundamental reason these were not implemented back in the day?
Any frontal opposition to having them now?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rename arguments to np.clip and np.put

2015-03-30 Thread Jaime Fernández del Río
On Mon, Mar 30, 2015 at 3:59 PM, Allan Haldane allanhald...@gmail.com
wrote:

 Hello everyone,

 What does the list think of renaming the arguments of np.clip and np.put
 to match those of ndarray.clip/put? Currently the signatures are

 np.clip(a, a_min, a_max, out=None)
 ndarray.clip(a, min=None, max=None, out=None)

 np.put(a, ind, v, mode='raise')
 ndarray.put(indices, values, mode='raise')

 (The docstring for ndarray.clip is incorrect, too).

 I suggest the signatures might be changed to this:

 np.clip(a, min=None, max=None, out=None, **kwargs)
 np.put(a, indices, values, mode='raise')

 We can still take care of the old argument names for np.clip using
 **kwards, while showing a deprecation warning. I think that would be
 fully back-compatible. Note this makes np.clip more flexible as only one
 of min or max are needed now, just like ndarray.clip.

 np.put is trickier to keep back-compatible as it has two positional
 arguments. Someone who called `np.put(a, v=0, ind=1)` would be in
 trouble with this change, although I didn't find anyone on github doing
 so. I suppose to maintain back-compatibility we could make indices and
 values keyword args, and use the same kwargs trick as in np.clip, but
 that might be confusing since they're both required args.


Ideally we would want the signature to show as you describe it in the
documentation, but during the deprecation period be something like e.g.

np.put(a, indices=None, values=None, mode='raise', **kwargs)

Not sure if that is even possible, maybe with functools.update_wrapper?

Jaime



 Any opinions or suggestions?

 Allan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] flags attribute of ndarrays`

2015-03-26 Thread Jaime Fernández del Río
Hi,

I have just submitted a longish issue on the github repo, #5721
https://github.com/numpy/numpy/issues/5721. After going in quite some
detail over the flags attribute of ndarray's I have found several
inconsistencies with the C API, and would like to make some changes. The
details are in gh, but as a high level summary:

   1. arr.flags.farray is almost certainly broken, and returns some useless
   result. I would like to make it behave like the C API's PyArray_ISFARRAY,
   even though that doesn't seem to be the intention of the original coder.
   2. arr.flags.fortran is inconsitent with the C API's PyArray_ISFORTRAN.
   I think it should be modified to match it, but understand this may be too
   much of a backwards compatibility breach.
   3. I would like for `arr.flags` to truly behave as a mutable property of
   `arr`. An explanation can be found on another thread's discussion with
   Benjamin Root earlier today.
   4. I would like to match the Python and C version's of these as much as
   possible, and avoid future deviation, by actually using the C versions in
   the Python ones, even if this may introduce subtle behavior changes.

Feedback is very welcome.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Do you find this behavior surprising?

2015-03-25 Thread Jaime Fernández del Río
 import numpy as np
 a = np.arange(10)
 flags = a.flags
 flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
 flags.writeable = False
 a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : False  --- WTF!!??
  ALIGNED : True
  UPDATEIFCOPY : False

I understand why this is happening, and that there is no other obvious way
to make

a.flags.writeable = False

work than to have the return of a.flags linked to a under the hood.

But I don't think this is documented anywhere, and wonder if perhaps it
should.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do you find this behavior surprising?

2015-03-25 Thread Jaime Fernández del Río
On Wed, Mar 25, 2015 at 1:45 PM, Benjamin Root ben.r...@ou.edu wrote:

 I fail to see the wtf.


 flags = a.flags

 So, flags at this point is just an alias to a.flags, just like any
 other variable in python

 flags.writeable = False would then be equivalent to a.flags.writeable =
 False. There is nothing numpy-specific here. a.flags is mutable object.
 This is how Python works.

 Ben Root


Ah, yes indeed. If you think of it that way it does make all the sense in
the world.

But of course that is not what is actually going on, as flags is a single C
int of the PyArrayObject struct, and a.flags is just a proxy built from it,
and great coding contortions have to be made to have changes to the proxy
rewritten into the owner array.

I guess then the surprising behavior is this other one, which was the one I
(wrongly) expected intuitively:

 a = np.arange(10)
 flags = a.flags
 a.flags.writeable = False
 flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

This could be fixed to work properly, although it is probably not worth
worrying much.

Properties of properties are weird...

Jaime
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rewrite np.histogram in c?

2015-03-16 Thread Jaime Fernández del Río
On Sun, Mar 15, 2015 at 11:06 PM, Robert McGibbon rmcgi...@gmail.com
wrote:

 It might make sense to dispatch to difference c implements if the bins are
 equally spaced (as created by using an integer for the np.histogram bins
 argument), vs. non-equally-spaced bins.


Dispatching to a different method seems like a no brainer indeed. The
question is whether we really need to do this in C. Maybe for some very
specific case or cases it makes sense to have a super fast C path, e,g. no
weights and bins is an integer. Even then, rather than rewriting the whole
thing in C, it may be a better idea to leave the parsing of the inputs in
Python, and have a C helper function wrapped and privately exposed,
similarly to how `np.core.multiarray.interp` is used by `np.interp`.

But I would still first give it a try in Python...

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rewrite np.histogram in c?

2015-03-16 Thread Jaime Fernández del Río
On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer jerome.kief...@esrf.fr
wrote:

 On Mon, 16 Mar 2015 06:56:58 -0700
 Jaime Fernández del Río jaime.f...@gmail.com wrote:

  Dispatching to a different method seems like a no brainer indeed. The
  question is whether we really need to do this in C.

 I need to do both unweighted  weighted histograms and we got a factor 5
 using (simple) cython:
 it is in the proceedings of Euroscipy, last year.
 http://arxiv.org/pdf/1412.6367.pdf


If I read your paper and code properly, you got 5x faster, mostly because
you combined the weighted and unweighted histograms into a single search of
the array, and because you used an algorithm that can only be applied to
equal- sized bins, similarly to the 10x speed-up Robert was reporting.

I think that having a special path for equal sized bins is a great idea:
let's do it, PRs are always welcome!
Similarly, getting the counts together with the weights seems like a very
good idea.

I also think that writing it in Python is going to take us 80% of the way
there: most of the improvements both of you have reported are not likely to
be coming from the language chosen, but from the algorithm used. And if C
proves to be sufficiently faster to warrant using it, it should be confined
to the number crunching: I don;t think there is any point in rewriting
argument parsing in C.

Also, keep in mind `np.histogram` can now handle arrays of just about
**any** dtype. Handling that complexity in C is not a ride in the park.
Other functions like `np.bincount` and `np.digitize` cheat by only handling
`double` typed arrays, a luxury that histogram probably can't afford at
this point in time.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rewrite np.histogram in c?

2015-03-16 Thread Jaime Fernández del Río
On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon rmcgi...@gmail.com wrote:

 Hi,

 Numpy.histogram is implemented in python, and is a little sluggish. This
 has been discussed previously on the mailing list, [1, 2]. It came up in a
 project that I maintain, where a new feature is bottlenecked by
 numpy.histogram, and one developer suggested a faster implementation in
 cython [3].

 Would it make sense to reimplement this function in c? or cython? Is
 moving functions like this from python to c to improve performance within
 the scope of the development roadmap for numpy? I started implementing this
 a little bit in c, [4] but I figured I should check in here first.


Where do you think the performance gains will come from? The PR in your
project that claims a 10x speed-up uses a method that is only fit for
equally spaced bins. I want to think that implementing that exact same
algorithm in Python with NumPy would be comparably fast, say within 2x.

For the general case, NumPy is already doing most of the heavy lifting (the
sorting and the searching) in C: simply replicating the same algorithmic
approach entirely in C is unlikely to provide any major speed-up. And if
the change is to the algorithm, then we should first try it out in Python.

That said, if you can speed things up 10x, I don't think there is going to
be much opposition to moving it to C!

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.10

2015-03-13 Thread Jaime Fernández del Río
On Thu, Mar 12, 2015 at 10:16 PM, Charles R Harris 
charlesr.har...@gmail.com wrote:



 On Sun, Mar 8, 2015 at 3:43 PM, Ralf Gommers ralf.gomm...@gmail.com
 wrote:



 On Sat, Mar 7, 2015 at 12:40 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:

 Hi All,

 Time to start thinking about numpy 1.10.


 Sounds good. Do we have a volunteer for release manager already?


 I guess it is my turn, unless someone else wants the experience.


What does a release manager do? I will eventually want to be able to tell
my grandchildren that I once managed a numpy release, but am not sure if I
can successfully handle it on my own right now. I will probably need to up
my git foo, which is nothing to write home about...

Maybe for this one I can sign up for release minion, so you have someone to
offload menial tasks?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy array casting ruled not safe

2015-03-07 Thread Jaime Fernández del Río
On Sat, Mar 7, 2015 at 1:52 PM, Charles R Harris charlesr.har...@gmail.com
wrote:



 On Sat, Mar 7, 2015 at 2:45 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Sat, Mar 7, 2015 at 2:02 PM, Dinesh Vadhia dineshbvad...@hotmail.com
 wrote:

   This was originally posted on SO (
 https://stackoverflow.com/questions/28853740/numpy-array-casting-ruled-not-safe)
 and it was suggested it is probably a bug in numpy.take.

 Python 2.7.8 |Anaconda 2.1.0 (32-bit)| (default, Jul  2 2014, 15:13:35)
 [MSC v.1500 32 bit (Intel)] on win32
 Type copyright, credits or license() for more information.

  import numpy
  numpy.__version__
 '1.9.2'

  a = numpy.array([9, 7, 5, 4, 3, 1], dtype=numpy.uint32)
  b = numpy.array([1, 3], dtype=numpy.uint32)
  c = a.take(b)

 Traceback (most recent call last):
   File pyshell#5, line 1, in module
 c = a.take(b)
 TypeError: Cannot cast array data from dtype('uint32') to dtype('int32')
 according to the rule 'safe'


 This actually looks correct for 32-bit windows. Numpy indexes with a
 signed type big enough to hold a pointer to void, which in this case is an
 int32, and the uint32 cannot be safely cast to that type.

 Chuck


 I note that on SO Jaime made the suggestion that take use unsafe casting
 and throw an error on out of bounds indexes. That sounds reasonable,
 although for sufficiently large integer types an index could wrap around to
 a good value. Maybe make it work only for npy_uintp.

 Chuck


It is mostly about consistency, and having take match what indexing already
does, which is to unsafely cast all integers:

In [11]: np.arange(10)[np.uint64(2**64-1)]
Out[11]: 9

I think no one has ever complained about that obviously wrong behavior, but
people do get annoyed if they cannot use their perfectly valid uint64 array
because we want to protect them from themselves. Sebastian has probably
given this more thought than anyone else, it would be interesting to hear
his thoughts on this.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ufuncs now take a tuple of arrays as 'out' kwarg

2015-03-05 Thread Jaime Fernández del Río
Hi all,

There is a PR, ready to be merged, that adds the possibility of passing a
tuple of arrays in the 'out' kwarg to ufuncs with multiple outputs:

https://github.com/numpy/numpy/pull/5621

The new functionality is as follows:

* If the ufunc has a single output, then the 'out' kwarg can either be a
single array (or None) like today, or a tuple holding a single array (or
None).

* If the ufunc has more than one output, then the 'out' kwarg must be a
tuple with one array (or None) per output argument. The old behavior, where
only the first output could be specified, is now deprecated, will raise a
deprecation warning, and potentially be changed to an error in the future.

* In both cases, positional and keyword output arguments are incompatible.
This has been made a little more strict, as the following is valid in =
1.9.x but will now raise an error:

np.add(2, 2, None, out=arr)

There seemed to be a reasonable amount of agreement on the goodness of this
change from the discussions on github, but I wanted to inform the larger
audience, in case there are any addressable concerns.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] linalg.norm probems

2015-03-03 Thread Jaime Fernández del Río
On Tue, Mar 3, 2015 at 4:11 PM, Charles R Harris charlesr.har...@gmail.com
wrote:

 Hi All,

 This is with reference to issue  #5626
 https://github.com/numpy/numpy/issues/5626. Currently linalg.norm
 converts the input like so `x = asarray(x)`. This can produce integer
 arrays, which in turn may create problems of overflow, or the failure of
 the abs functions for minimum values of signed integer types. I propose to
 convert the input to a minimum precision of float32. However, this will be
 a change in behavior. I'd guess that that might not be much of a problem,
 as otherwise it is likely that this problem would have been reported
 earlier.

 Thoughts?


Not sure if it makes sense here, but elsewhere (I think it was polyval) we
let object arrays through unchanged.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GSoC'15 - mentors ideas

2015-02-26 Thread Jaime Fernández del Río
On Thu, Feb 26, 2015 at 2:54 AM, Todd toddr...@gmail.com wrote:

 I am not able to mentor, but I have some ideas about easier projects.
 These may be too easy, too hard, or not even desirable so take them or
 leave them as you please.

 scipy:

 Implement a set of circular statistics functions comparable to those in R
 or MATLAB circular statistics toolbox.

 Either implement some window functions that only apply to the beginning
 and end of an array, or implement a wrapper that takes a window function
 and some parameters and creates a new window that only applies to the
 beginning and end of an array.

 numpy:

 Integrate the bottleneck project optimizations into numpy proper.

 Not sure how much of the bottleneck optimizations can be fitted into the
ufunc machinery. But I'd be more than happy to mentor or co-mentor an
implementation in numpy of the moving window functions. I have already
contributed some work on some of those in scipy.ndimage and pandas, and
find the subject fascinating.

 Integrate as much as possible the matplotlib.mlab functionality into numpy
 (and, optionally, also scipy).

 In many places different approaches to the same task have substantially
 different performance (such as indexing vs. take) and check for one
 approach being substantially slower.  If it is, fix the performance problem
 if possible (perhaps by using the same implementation), and if not document
 the difference.

The take performance advantage is no longer there since seberg's rewrite of
indexing. Are there any other obvious examples?

 Modify ufuncs so their documentation appears in help() in addition to
 numpy.info().

To add one of my own: the old iterator is still being used in many, many
places throughout the numpy code base. Wouldn't it make sense to port those
to the new one? In doing so, it would probably lead to producing simplified
interfaces to the new iterator, e.g. reproducing the old PyIter_AllButAxis
is infinitely more verbose with the new iterator.



 Hi all,


 On Fri, Feb 20, 2015 at 10:05 AM, Ralf Gommers ralf.gomm...@gmail.com
 wrote:

 Hi all,

 It's time to start preparing for this year's Google Summer of Code. There
 is actually one urgent thing to be done (before 19.00 UTC today), which is
 to get our ideas page in decent shape. It doesn't have to be final, but
 there has to be enough on there for the organizers to judge it. This page
 is here: https://github.com/scipy/scipy/wiki/GSoC-project-ideas. I'll be
 reworking it and linking it from the PSF page today, but if you already
 have new ideas please add them there. See
 https://wiki.python.org/moin/SummerOfCode/OrgIdeasPageTemplate for this
 year's template for adding a new idea.


 The ideas page is now in pretty good shape. More ideas are very welcome
 though, especially easy or easy/intermediate ideas. Numpy right now has
 zero easy ones and Scipy only one and a half.

 What we also need is mentors. All ideas already have a potential mentor
 listed, however some ideas are from last year and I'm not sure that all
 those mentors really are available this year. And more than one potential
 mentor per idea is always good. So can everyone please add/remove his or
 her name on that page?

 I'm happy to take care of most of the organizational aspects this year,
 however I'll be offline for two weeks in July and from the end of August
 onwards, so I'll some help in those periods. Any volunteers?

 Thanks,
 Ralf




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Objects exposing the array interface

2015-02-25 Thread Jaime Fernández del Río
An issue was raised yesterday in github, regarding np.may_share_memory when
run on a class exposing an array using the __array__ method. You can check
the details here:

https://github.com/numpy/numpy/issues/5604

Looking into it, I found out that NumPy doesn't really treat objects
exposing __array__,, __array_interface__, or __array_struct__ as if they
were proper arrays:

   1. When converting these objects to arrays using PyArray_Converter, if
   the arrays returned by any of the array interfaces is not C contiguous,
   aligned, and writeable, a copy that is will be made. Proper arrays and
   subclasses are passed unchanged. This is the source of the error reported
   above.
   2. When converting these objects using PyArray_OutputConverter, as well
   as in similar code in the ufucn machinery, anything other than a proper
   array or subclass raises an error. This means that, contrary to what the
   docs on subclassing say, see below, you cannot use an object exposing the
   array interface as an output parameter to a ufunc

The following classes can be used to test this behavior:

class Foo:
def __init__(self, arr):
self.arr = arr
def __array__(self):
return self.arr

class Bar:
def __init__(self, arr):
self.arr = arr
self.__array_interface__ = arr.__array_interface__

class Baz:
def __init__(self, arr):
self.arr = arr
self.__array_struct__ = arr.__array_struct__

They all behave the same with these examples:

 a = Foo(np.ones(5))
 np.add(a, a)
array([ 2.,  2.,  2.,  2.,  2.])
 np.add.accumulate(a)
array([ 1.,  2.,  3.,  4.,  5.])
 np.add(a, a, out=a)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: return arrays must be of ArrayType
 np.add.accumulate(a, out=a)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: output must be an array

I think this should be changed, and whatever gets handed by this
methods/interfaces be treated as if it were an array or subclass of it.
This is actually what the docs on subclassing say about __array__ here:

http://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__

This also seems to contradict a rather cryptic comment in the code of
PyArray_GetArrayParamsFromObject, which is part of the call sequence of
this whole mess, see here:

https://github.com/numpy/numpy/blob/maintenance/1.9.x/numpy/core/src/multiarray/ctors.c#L1495

/*
 * If op supplies the __array__ function.
 * The documentation says this should produce a copy, so
 * we skip this method if writeable is true, because the intent
 * of writeable is to modify the operand.
 * XXX: If the implementation is wrong, and/or if actual
 *  usage requires this behave differently,
 *  this should be changed!
 */

There has already been some discussion in the issue linked above, but I
would appreciate any other thoughts on the idea of treating objects with
some form of array interface as if they were arrays. Does it need a
deprecation cycle? Is there some case I am not considering where this could
go horribly wrong?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objects exposing the array interface

2015-02-25 Thread Jaime Fernández del Río
On Wed, Feb 25, 2015 at 1:56 PM, Stephan Hoyer sho...@gmail.com wrote:



 On Wed, Feb 25, 2015 at 1:24 PM, Jaime Fernández del Río 
 jaime.f...@gmail.com wrote:

 1. When converting these objects to arrays using PyArray_Converter, if
 the arrays returned by any of the array interfaces is not C contiguous,
 aligned, and writeable, a copy that is will be made. Proper arrays and
 subclasses are passed unchanged. This is the source of the error reported
 above.



 When converting these objects to arrays using PyArray_Converter, if the
 arrays returned by any of the array interfaces is not C contiguous,
 aligned, and writeable, a copy that is will be made. Proper arrays and
 subclasses are passed unchanged. This is the source of the error reported
 above.

 I'm not entirely sure I understand this -- how is PyArray_Convert used in
 numpy? For example, if I pass a non-contiguous array to your class Foo,
 np.asarray does not do a copy:


It is used by many (all?) C functions that take an array as input. This
follows a different path than what np.asarray or np.asanyarray do, which
are calls to np.array, which maps to the C function _array_fromobject which
can be found here:

https://github.com/numpy/numpy/blob/maintenance/1.9.x/numpy/core/src/multiarray/multiarraymodule.c#L1592

And ufuncs have their own conversion code, which doesn't really help
either. Not sure it would be possible to have them all use a common code
base, but it is certainly well worth trying.



 In [25]: orig = np.zeros((3, 4))[:2, :3]

 In [26]: orig.flags
 Out[26]:
   C_CONTIGUOUS : False
   F_CONTIGUOUS : False
   OWNDATA : False
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False

 In [27]: subclass = Foo(orig)

 In [28]: np.asarray(subclass)
 Out[28]:
 array([[ 0.,  0.,  0.],
[ 0.,  0.,  0.]])

 In [29]: np.asarray(subclass)[:] = 1

 In [30]: np.asarray(subclass)
 Out[30]:
 array([[ 1.,  1.,  1.],
[ 1.,  1.,  1.]])


 But yes, this is probably a bug.

 2. When converting these objects using PyArray_OutputConverter, as well as
 in similar code in the ufucn machinery, anything other than a proper array
 or subclass raises an error. This means that, contrary to what the docs on
 subclassing say, see below, you cannot use an object exposing the array
 interface as an output parameter to a ufunc


 Here it might be a good idea to distinguish between objects that define
 __array__ vs __array_interface__/__array_struct__. A class that defines
 __array__ might not be very ndarray-like at all, but rather be something
 that can be *converted* to an ndarray. For example, objects in pandas
 define __array__, but updating the return value of df.__array__() in-place
 will not necessarily update the DataFrame (e.g., if the frame had
 inhomogeneous dtypes).


I am not really sure what the behavior of __array__ should be. The link to
the subclassing docs I gave before indicates that it should be possible to
write to it if it is writeable (and probably pandas should set the
writeable flag to False if it cannot be reliably written to), but the
obscure comment I mentioned seems to point to the opposite, that it should
never be written to. This is probably a good moment in time to figure out
what the proper behavior should be and document it.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nonzero behavior with multidimensional arrays

2015-02-23 Thread Jaime Fernández del Río
On Mon, Feb 23, 2015 at 12:12 PM, Julian Taylor 
jtaylor.deb...@googlemail.com wrote:

 On 23.02.2015 08:52, Jaime Fernández del Río wrote:
  This was raised in SO today:
 
 
 http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-only-for-multi-dimensional-arrays/28664009
 
  np.nonzero (and np.where for boolean arrays) behave differently for 1-D
  and higher dimensional arrays:
 
  In the first case, a tuple with a single behaved base ndarray is
 returned:
 
  In the second, a tuple with as many arrays as dimensions in the passed
  array is returned, but the arrays are not base ndarrays, but of the same
  subtype as was passed to the function. These arrays are also set as
  non-writeable:
 


 The non-writeable looks like a bug too me, it should probably just use
 PyArray_FLAGS(self) instead of 0. We had a similar one with the new
 indexing, its easy to forget this.

 Concerning subtypes, I don't think there is a good reason to preserve
 them here and it should just return an ndarray.
 where with one argument returns a new object that indexes the input
 object so it is not really related anymore to what it indexes and there
 is no information that numpy could reasonably propagate.


That was my thinking when I sent that message last night: add the
PyArray_FLAGS argument, and pass the type of the return array rather than
the input array when creating the views.

I tried to put that in a PR, but it fails a number of tests, as the return
of np.nonzero is specifically checked to return the subtype of the passed
in array, both in matrixlib, as well as in core/test_regression.py, related
to Trac #791:

https://github.com/numpy/numpy/issues/1389

So it seems that 7 years ago they had a different view on this, perhaps
Chuck remembers what the rationale was, but this seems like a weird
requirement for index returning functions: nonzero, argmin/max, argsort,
argpartition and the like.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.nonzero behavior with multidimensional arrays

2015-02-22 Thread Jaime Fernández del Río
This was raised in SO today:

http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-only-for-multi-dimensional-arrays/28664009

np.nonzero (and np.where for boolean arrays) behave differently for 1-D and
higher dimensional arrays:

In the first case, a tuple with a single behaved base ndarray is returned:

 a = np.ma.array(range(6))
 np.where(a  3)
(array([4, 5]),)
 np.where(a  3)[0].flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In the second, a tuple with as many arrays as dimensions in the passed
array is returned, but the arrays are not base ndarrays, but of the same
subtype as was passed to the function. These arrays are also set as
non-writeable:

 np.where(a.reshape(2, 3)  3)
(masked_array(data = [1 1],
 mask = False,
   fill_value = 99)
, masked_array(data = [1 2],
 mask = False,
   fill_value = 99)
)
 np.where(a.reshape(2, 3)  3)[0].flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False

I can't think of any reason that justifies this difference, and believe
they should be made to return similar results. My feeling is that the
proper behavior is the 1-D one, and that the behavior for multidimensional
arrays should match it. Anyone can think of any reason that justifies the
current behavior?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Matrix Class

2015-02-14 Thread Jaime Fernández del Río
On Sat, Feb 14, 2015 at 5:21 PM, josef.p...@gmail.com wrote:

 On Sat, Feb 14, 2015 at 4:27 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Sat, Feb 14, 2015 at 12:36 PM, josef.p...@gmail.com wrote:
 
  On Sat, Feb 14, 2015 at 12:05 PM, cjw c...@ncf.ca wrote:
  
   On 14-Feb-15 11:35 AM, josef.p...@gmail.com wrote:
  
   On Wed, Feb 11, 2015 at 4:18 PM, Ryan Nelson rnelsonc...@gmail.com
   wrote:
  
   Colin,
  
   I currently use Py3.4 and Numpy 1.9.1. However, I built a quick test
   conda
   environment with Python2.7 and Numpy 1.7.0, and I get the same:
  
   
   Python 2.7.9 |Continuum Analytics, Inc.| (default, Dec 18 2014,
   16:57:52)
   [MSC v
   .1500 64 bit (AMD64)]
   Type copyright, credits or license for more information.
  
   IPython 2.3.1 -- An enhanced Interactive Python.
   Anaconda is brought to you by Continuum Analytics.
   Please check out: http://continuum.io/thanks and
 https://binstar.org
   ? - Introduction and overview of IPython's features.
   %quickref - Quick reference.
   help  - Python's own help system.
   object?   - Details about 'object', use 'object??' for extra
 details.
  
   In [1]: import numpy as np
  
   In [2]: np.__version__
   Out[2]: '1.7.0'
  
   In [3]: np.mat([4,'5',6])
   Out[3]:
   matrix([['4', '5', '6']],
   dtype='|S1')
  
   In [4]: np.mat([4,'5',6], dtype=int)
   Out[4]: matrix([[4, 5, 6]])
   ###
  
   As to your comment about coordinating with Statsmodels, you should
 see
   the
   links in the thread that Alan posted:
   http://permalink.gmane.org/gmane.comp.python.numeric.general/56516
   http://permalink.gmane.org/gmane.comp.python.numeric.general/56517
   Josef's comments at the time seem to echo the issues the devs (and
   others)
   have with the matrix class. Maybe things have changed with
   Statsmodels.
  
   Not changed, we have a strict policy against using np.matrix.
  
   generic efficient versions for linear operators, kronecker or sparse
   block matrix styly operations would be useful, but I would use array
   semantics, similar to using dot or linalg functions on ndarrays.
  
   Josef
   (long reply canceled because I'm writing too much that might only be
   of tangential interest or has been in some of the matrix discussion
   before.)
  
   Josef,
  
   Many thanks.  I have gained the impression that there is some
 antipathy
   to
   np.matrix, perhaps this is because, as others have suggested, the
 array
   doesn't provide an appropriate framework.
 
  It's not directly antipathy, it's cost-benefit analysis.
 
  np.matrix has few advantages, but makes reading and maintaining code
  much more difficult.
  Having to watch out for multiplication `*` is a lot of extra work.
 
  Checking shapes and fixing bugs with unexpected dtypes is also a lot
  of work, but we have large benefits.
  For a long time the policy in statsmodels was to keep pandas out of
  the core of functions (i.e. out of the actual calculations) and
  restrict it to inputs and returns. However, pandas is becoming more
  popular and can do some things much better than plain numpy, so it is
  slowly moving inside some of our core calculations.
  It's still an easy source of bugs, but we do gain something.
 
 
  Any bits of Pandas that might be good for numpy/scipy to steal?

 I'm not a Pandas expert.
 Some of it comes into statsmodels because we need the data handling
 also inside a function, e.g. keeping track of labels, indices, and so
 on. Another reason is that contributors are more familiar with
 pandas's way of solving a problems, even if I suspect numpy would be
 more efficient.

 However, a recent change, replaces where I would have used np.unique
 with pandas.factorize which is supposed to be faster.
 https://github.com/statsmodels/statsmodels/pull/2213


Numpy could use some form of hash table for its arraysetops, which is where
pandas is getting its advantage from. It is a tricky thing though, see e.g.
these timings:

a = np.ranomdom.randint(10, size=1000)
srs = pd.Series(a)

%timeit np.unique(a)

10 loops, best of 3: 13.2 µs per loop

%timeit srs.unique()

10 loops, best of 3: 15.6 µs per loop


%timeit pd.factorize(a)

1 loops, best of 3: 25.6 µs per loop

%timeit np.unique(a, return_inverse=True)

1 loops, best of 3: 82.5 µs per loop

This last timings are with 1.9.0 an 0.14.0, so numpy doesn't have
https://github.com/numpy/numpy/pull/5012 yet, which makes the operation in
which numpy is slower about 2x faster. And if you need your unique values
sorted, then things are more even, especially if numpy runs 2x faster:

%timeit pd.factorize(a, sort=True)

1 loops, best of 3: 36.4 µs per loop

The algorithms scale differently though, so for sufficiently large data
Pandas is going to win almost certainly. Not sure if they support all
dtypes, nor how efficient their use of memory is.

I did a toy implementation of a hash table, mimicking Python's dictionary,
for numpy some time 

Re: [Numpy-discussion] New function: np.stack?

2015-02-05 Thread Jaime Fernández del Río
On Thu, Feb 5, 2015 at 11:10 AM, Benjamin Root ben.r...@ou.edu wrote:

 +1! I could never keep straight which stack function I needed anyway.

 Wasn't there a proposal a while back for a more generic stacker, like
 tetrix or something that allowed one to piece together tiles of different
 sizes?

 Ben Root

 On Thu, Feb 5, 2015 at 2:06 PM, Stephan Hoyer sho...@gmail.com wrote:

 There are two usual ways to combine a sequence of arrays into a new array:
 1. concatenated along an existing axis
 2. stacked along a new axis

 For 1, we have np.concatenate. For 2, we have np.vstack, np.hstack,
 np.dstack and np.column_stack. For arrays with arbitrary dimensions, there
 is the np.array constructor, possibly with transpose to get the result in
 the correct order. (I've used this last option in the past but haven't been
 especially happy with it -- it takes some trial and error to get the axis
 swapping or transpose right for higher dimensional input.)

 This methods are similar but subtly distinct, and none of them generalize
 well to n-dimensional input. It seems like the function we are missing is
 the plain np.stack, which takes the axis to stack along as a keyword
 argument. The exact desired functionality is clearest to understand by
 example:

  X = [np.random.randn(100, 200) for i in range(10)]
  stack(X, axis=0).shape
 (10, 100, 200)
  stack(X, axis=1).shape
 (100, 10, 200)
  stack(X, axis=2).shape
 (100, 200, 10)

 So I'd like to propose this new function for numpy. The desired signature
 would be simply np.stack(arrays, axis=0). Ideally, the confusing mess of
 other stacking functions could then be deprecated, though we could probably
 never remove them.


Leaving aside error checking, once you have a positive axis, I think this
can be implemented in 2 lines of code:

sl = (slice(None),)*axis + (np.newaxis,)
return np.concatenate(arr[sl] for arr in arrays)

I don't have an opinion either way, and I guess if the hstacks and company
have a place in numpy, this does as well.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Views of a different dtype

2015-02-04 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 1:52 PM, Ian Henriksen 
insertinterestingnameh...@gmail.com wrote:

 On Tue Feb 03 2015 at 1:47:34 PM Jaime Fernández del Río 
 jaime.f...@gmail.com wrote:

 On Tue, Feb 3, 2015 at 8:59 AM, Sebastian Berg 
 sebast...@sipsolutions.net wrote:

 On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
 
 snip
 
 
 
  Do you have a concrete example of what a non (1, 1) array that fails
  with relaxed strides would look like?
 
 
  If we used, as right now, the array flags as a first choice point, and
  only if none is set try to determine it from the strides/dimensions
  information, I fail to imagine any situation where the end result
  would be worse than now. I don't think that a little bit of
  predictable surprising in an advanced functionality is too bad. We
  could start raising on the face of ambiguity, we refuse to guess
  errors, even for the current behavior you show above, but that is more
  likely to trip people by not giving them any simple workaround, that
  it seems to me would be add a .T if all dimensions are 1 in some
  particular situations. Or are you thinking of something more serious
  than a shape mismatch when you write about breaking current code?
 

 Yes, I am talking only about wrongly shaped results for some fortran
 order arrays. A (20, 1) fortran order complex array being viewed as
 float, will with relaxed strides become a (20, 2) array instead of a
 (40, 1) one.


 That is a limitation of the current implementation too, and happens
 already whenever relaxed strides are in place. Which is the default for
 1.10, right?

 Perhaps giving 'view' an 'order' or 'axis' kwarg could make sense after
 all? It should probably be more of a hint of what to do (fortran vs c) when
 in doubt. C would prioritize last axis, F the first, and we could even
 add a raise option to have it fail if the axis cannot be inferred from
 the strides and shape. Current behavior would is equivalent to what C
 would do.

 Jaime



 IMHO, the best option would be something like this:

 - When changing to a type with smaller itemsize, add a new axis after the
 others so the resulting array is C contiguous (unless a different axis is
 specified by a keyword argument). The advantage here is that if you index
 the new view using the old indices for an entry, you get an array showing
 its representation in the new type.
 - No shape change for views with the same itemsize
 - When changing to a type with a larger itemsize, collapse along the last
 axis unless a different axis is specified, throwing an error if the axis
 specified does not match the axis specified.


My only concern with adding a new axis, backwards compatibility aside, is
that you would not know wether to keep or discard the resulting size 1
dimension when taking a view of the view. We could reuse the keepdims
terminology from ufuncs, though.

So the current behavior would remain unchanged if you set axis=None,
keepdims=True, and we could transition to a more reasonable default with
axis=-1, keepdims=False over a few releases with adequate warnings.

In my mind, when expanding, the axis would not indicate where to place the
new axis, but which axis to collapse over, that is the hard part to figure
out!If you want something else, it is only a call to rollaxis away. Perhaps
we could also give keepdims a meaning when expanding, as to whether to add
a new axis at the end, or change the size of the chosen dimension.

I don't know, there may be an actual interface hidden somewhere here, but
still needs some  cooking to fully figure it out.

Jaime



 The last point essentially is just adding an axis argument. I like that
 idea because it gives users the ability to do all that the array's memory
 layout allows in a clear and concise way. Throwing an error if the default
 axis doesn't work would be a good way to prevent strange bugs from
 happening when the default behavior is expected.

 The first point would be a break in backwards compatibility, so I'm not
 sure if it's feasible at this point. The advantage would be that all all
 arrays returned when using this functionality would be contiguous along the
 last axis. The shape of the new array would be independent of the memory
 layout of the original one. This would also be a much cleaner way to ensure
 that views of a different type are always reversible while still allowing
 for relaxed strides.

 Either way, thanks for looking into this. It's a great feature to have
 available.

 -Ian Henriksen





 
  If there are any real loopholes in expanding this functionality, then
  lets not do it, but we know we have at least one user unsatisfied with
  the current performance, so I really think it is worth trying. Plus,
  I'll admit to that, messing around with some of these stuff deep
  inside the guts of the beast is lots of fun! ;)
 

 I do not think there are loopholes with expanding this functionality. I
 think there have regressions when we put relaxed strides

Re: [Numpy-discussion] Any interest in a 'heaviside' ufunc?

2015-02-03 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 12:58 PM, Warren Weckesser 
warren.weckes...@gmail.com wrote:

 I have an implementation of the Heaviside function as numpy ufunc.  Is
 there any interest in adding this to numpy?  The function is simply:

 0if x  0
 heaviside(x) =  0.5  if x == 0
 1if x  0


I don't think there's anything like it in numpy. Wouldn't scipy.special be
a better home for it?

Jaime





 Warren



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Views of a different dtype

2015-02-03 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 1:28 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Mo, 2015-02-02 at 06:25 -0800, Jaime Fernández del Río wrote:
  On Sat, Jan 31, 2015 at 1:17 AM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
  On Fr, 2015-01-30 at 19:52 -0800, Jaime Fernández del Río
  wrote:
   On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith
  n...@pobox.com
   wrote:
   On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández
  del Río
   jaime.f...@gmail.com wrote:
   [...]
 
  snip
 
  
   Could we make it more like: check to see if the last
  dimension
   works.
   If not, raise an error (and let the user transpose
  some other
   dimension there if that's what they wanted)? Or
  require the
   user to
   specify which dimension will absorb the shape
  change? (If we
   were
   doing this from scratch, then it would be tempting
  to just say
   that we
   always add a new dimension at the end with
  newtype.itemsize /
   oldtype.itemsize entries, or absorb such a dimension
  if
   shrinking. As
   a bonus, this would always work, regardless of
  contiguity!
   Except that
   when shrinking the last dimension would have to be
  contiguous,
   of
   course.)
  
  
   When we roll @ in and people start working with stacks of
  matrices, we
   will probably find ourselves having to create an alias,
  similar to .T,
   for .swapaxes(-1, -2). Searching for the smallest stride
  allows to
   take views of such arrays, which does not work right now
  because the
   array is no longer contiguous globally.
  
 
  That is true, but I agree with Nathaniel at least as far as
  that I would
  prefer a user to be able to safely use `view` even he has not
  even an
  inkling about what his memory layout is. One option would be
  an
  `axis=-1` default (maybe FutureWarn this from `axis=None`
  which would
  look at order, see below -- or maybe have axis='A', 'C' and
  'F' and
  default to 'A' for starters).
 
  This even now could start creating bugs when enabling relaxed
  strides :(, because your good old fortran order complex array
  being
  viewed as a float one could expand along the wrong axis, and
  even
  without such arrays swap order pretty fast when operating on
  them, which
  can create impossibly to find bugs, because even a poweruser
  is likely
  to forget about such things.
 
  Of course you could argue that view is a poweruser feature and
  a user
  using it should keep these things in mind Though if you
  argue that,
  you can almost just use `np.ndarray` directly ;) -- ok, not
  really
  considering how cumbersome it is, but still.
 
 
  I have been giving this some thought, and am willing to concede that
  my first proposal may have been too ambitious. So even though the knob
  goes to 11, we can always do things incrementally. I am also wary of
  adding new keywords when it seems obvious that we do not have the
  functionality completely figured out, so here's my new proposal:
 
 
* The objective is that a view of an array that is the result of
  slicing a contiguous array should be possible, if it remains
  contiguous (meaning stride == itemsize) along its original
  contiguous (first or last) dimension. This eliminates axis
  transposition from the previous proposal, although reversing
  the axes completely would also work.
* To verify this, unless the C contiguous or Fortran contiguous
  flags are set, we would still need to look at the strides. An
  array would be C contiguous if, starting from the last stride
  it is equal to the itemsize, and working backwards every next
  stride is larger or equal than the product of the previous
  stride by the previous dimension. dimensions of size 1 would
  be ignored for these, except for the last one, which would be
  taken to have stride = itemsize. The Fortran case is of course
  the same in reverse.
* I think the above combined with the current preference of C
  contiguousness over Fortran, would actually allow the views to
  always be reversible, which is also a nice thing to have.
  This eliminates most of the weirdness, but extends

Re: [Numpy-discussion] Views of a different dtype

2015-02-03 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 8:59 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
 
 snip
 
 
 
  Do you have a concrete example of what a non (1, 1) array that fails
  with relaxed strides would look like?
 
 
  If we used, as right now, the array flags as a first choice point, and
  only if none is set try to determine it from the strides/dimensions
  information, I fail to imagine any situation where the end result
  would be worse than now. I don't think that a little bit of
  predictable surprising in an advanced functionality is too bad. We
  could start raising on the face of ambiguity, we refuse to guess
  errors, even for the current behavior you show above, but that is more
  likely to trip people by not giving them any simple workaround, that
  it seems to me would be add a .T if all dimensions are 1 in some
  particular situations. Or are you thinking of something more serious
  than a shape mismatch when you write about breaking current code?
 

 Yes, I am talking only about wrongly shaped results for some fortran
 order arrays. A (20, 1) fortran order complex array being viewed as
 float, will with relaxed strides become a (20, 2) array instead of a
 (40, 1) one.


That is a limitation of the current implementation too, and happens already
whenever relaxed strides are in place. Which is the default for 1.10,
right?

Perhaps giving 'view' an 'order' or 'axis' kwarg could make sense after
all? It should probably be more of a hint of what to do (fortran vs c) when
in doubt. C would prioritize last axis, F the first, and we could even
add a raise option to have it fail if the axis cannot be inferred from
the strides and shape. Current behavior would is equivalent to what C
would do.

Jaime



 
  If there are any real loopholes in expanding this functionality, then
  lets not do it, but we know we have at least one user unsatisfied with
  the current performance, so I really think it is worth trying. Plus,
  I'll admit to that, messing around with some of these stuff deep
  inside the guts of the beast is lots of fun! ;)
 

 I do not think there are loopholes with expanding this functionality. I
 think there have regressions when we put relaxed strides to on, because
 suddenly the fortran order array might be expanded along a 1-sized axis,
 because it is also C order. So I wonder if we can fix these regressions
 and at the same time maybe provide a more intuitive approach then using
 the memory order blindly


 - Sebastian

 
  Jaime
 
 
  - Sebastian
 
 
  
   Jaime
  
  
  
   - Sebastian
  
   
I guess the main consideration for this is
  that we
   may be
stuck with
stuff b/c of backwards compatibility. Can
  you maybe
   say a
little bit
about what is allowed now, and what
  constraints that
   puts on
things?
E.g. are we already grovelling around in
  strides and
   picking
random
dimensions in some cases?
   
   
Just to restate it: right now we only allow new
  views if the
   array is
globally contiguous, so either along the first or
  last
   dimension.
   
   
Jaime
   
   
-n
   
--
Nathaniel J. Smith
Postdoctoral researcher - Informatics -
  University
   of
Edinburgh
http://vorpus.org
   
   ___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
   
  
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
   
   
   
   
   
--
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma
  y ayúdale
   en sus
planes de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
   
  http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Views of a different dtype

2015-01-30 Thread Jaime Fernández del Río
On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río
 jaime.f...@gmail.com wrote:
 [...]
  With all these in mind, my proposal for the new behavior is that taking a
  view of an array with a different dtype would require:
 
  That the newtype and oldtype be compatible, as defined by the algorithm
  checking object offsets linked above.
  If newtype.itemsize == oldtype.itemsize no more checks are needed, make
 it
  happen!
  If the array is C/Fortran contiguous, check that the size in bytes of the
  last/first dimension is evenly divided by newtype.itemsize. If it does,
 go
  for it.
  For non-contiguous arrays:
 
  Ignoring dimensions of size 1, check that no stride is smaller than
 either
  oldtype.itemsize or newtype.itemsize. If any is found this is an
 as_strided
  product, sorry, can't do it!
  Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride
 ==
  oldtype.itemsize
 
  If found, check that it is the only one with that stride, that it is the
  minimal stride, and that the size in bytes of that dimension is evenly
  divided by newitem,itemsize.
  If none is found, check if there is a size 1 dimension that is also
 unique
  (unless we agree on a default, as mentioned above) and that
 newtype.itemsize
  evenly divides oldtype.itemsize.

 I'm really wary of this idea that we go grovelling around looking for
 some suitable dimension somewhere to absorb the new items. Basically
 nothing in numpy produces semantically different arrays (e.g., ones
 with different shapes) depending on the *strides* of the input array.


In a convoluted way, changing the dtype already does different thing
depending on the strides, as right now the expansion/contraction happens
along the last/first axis depending if the array is C/Fortran contiguous,
and those flags are calculated from the strides:

 a = np.ones((2, 2), dtype=complex)
 a.view(float).shape
(2, 4)
 a.T.view(float).shape
(4, 2)

A user unaware that transposition has changed the memory layout will be
surprised to find his complex values being unpacked along the first, not
the last dimension. But that's the way it already is.

With my proposal above, the intention is that the same happens not only for
the standard reverse axes order transposition, but with any other one,
even if you have sliced the array.



 Could we make it more like: check to see if the last dimension works.
 If not, raise an error (and let the user transpose some other
 dimension there if that's what they wanted)? Or require the user to
 specify which dimension will absorb the shape change? (If we were
 doing this from scratch, then it would be tempting to just say that we
 always add a new dimension at the end with newtype.itemsize /
 oldtype.itemsize entries, or absorb such a dimension if shrinking. As
 a bonus, this would always work, regardless of contiguity! Except that
 when shrinking the last dimension would have to be contiguous, of
 course.)


When we roll @ in and people start working with stacks of matrices, we will
probably find ourselves having to create an alias, similar to .T, for
.swapaxes(-1, -2). Searching for the smallest stride allows to take views
of such arrays, which does not work right now because the array is no
longer contiguous globally.



 I guess the main consideration for this is that we may be stuck with
 stuff b/c of backwards compatibility. Can you maybe say a little bit
 about what is allowed now, and what constraints that puts on things?
 E.g. are we already grovelling around in strides and picking random
 dimensions in some cases?


Just to restate it: right now we only allow new views if the array is
globally contiguous, so either along the first or last dimension.

Jaime



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Views of a different dtype

2015-01-28 Thread Jaime Fernández del Río
HI all,

There has been some recent discussion going on on the limitations that
numpy imposes to taking views of an array with a different dtype.

As of right now, you can basically only take a view of an array if it has
no Python objects and neither the old nor the new dtype are structured.
Furthermore, the array has to be either C or Fortran contiguous.

This seem to be way too strict, but the potential for disaster getting a
loosening of the restrictions wrong is big, so it should be handled with
care.

Allan Haldane and myself have been looking into this separately and
discussing some of the details over at github, and we both think that the
only true limitation that has to be imposed is that the offsets of Python
objects within the new and old dtypes remain compatible. I have expanded
Allan's work from here:

https://github.com/ahaldane/numpy/commit/e9ca367

to make it as flexible as I have been able. An implementation of the
algorithm in Python, with a few tests, can be found here:

https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py

I would appreciate getting some eyes on it for correctness, and to make
sure that it won't break with some weird dtype.

I am also trying to figure out what the ground rules for stride and shape
conversions when taking a view with a different dtype should be. I
submitted a PR (gh-5508) a couple for days ago working on that, but I am
not so sure that the logic is completely sound. Again, to get more eyes on
it, I am going to reproduce my thoughts here on the hope of getting some
feedback.

The objective would be to always allow a view of a different dtype (given
that the dtypes be compatible as described above) to be taken if:

   - The itemsize of the dtype doesn't change.
   - The itemsize changes, but the array being viewed is the result of
   slicing and transposing axes of a contiguous array, and it is still
   contiguous, defined as stride == dtype.itemsize, along its smallest-strided
   dimension, and the itemsize of the newtype exactly divides the size of that
   dimension.
   - Ideally taking a view should be a reversible process, i.e. if oldtype
   = arr.dtype, then doing arr.view(newtype).view(oldtype) should give you
   back a view of arr with the same original shape, strides and dtype.

This last point can get tricky if the minimal stride dimension has size 1,
as there could be several of those, e.g.:

 a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1)
 a.flags.contiguous
False
 a.shape
(3, 1, 2)
 a.strides  # the stride of the size 1 dimension could be anything,
ignore it!
(32, 8, 8)

b = a.view(complex)  # this fails right now, but should work
 b.flags.contiguous
False
 b.shape
(3, 1, 1)
 b.strides  # the stride of the size 1 dimensions could be anything,
ignore them!
(32, 16, 16)

c = b.view(float)  # which of the two size 1 dimensions should we expand?


In the face of ambiguity refuse the temptation to guess dictates that
last view should raise an error, unless we agree and document some default.
Any thoughts?

Then there is the endless complication one could get into with arrays
created with as_strided. I'm not smart enough to figure when and when not
those could work, but am willing to retake the discussion if someone wiser
si interested.

With all these in mind, my proposal for the new behavior is that taking a
view of an array with a different dtype would require:

   1. That the newtype and oldtype be compatible, as defined by the
   algorithm checking object offsets linked above.
   2. If newtype.itemsize == oldtype.itemsize no more checks are needed,
   make it happen!
   3. If the array is C/Fortran contiguous, check that the size in bytes of
   the last/first dimension is evenly divided by newtype.itemsize. If it does,
   go for it.
   4. For non-contiguous arrays:
  1. Ignoring dimensions of size 1, check that no stride is smaller
  than either oldtype.itemsize or newtype.itemsize. If any is found this is
  an as_strided product, sorry, can't do it!
  2. Ignoring dimensions of size 1, find a contiguous dimension, i.e.
  stride == oldtype.itemsize
 1. If found, check that it is the only one with that stride, that
 it is the minimal stride, and that the size in bytes of that
dimension is
 evenly divided by newitem,itemsize.
 2. If none is found, check if there is a size 1 dimension that is
 also unique (unless we agree on a default, as mentioned
above) and that
 newtype.itemsize evenly divides oldtype.itemsize.

Apologies for the long, dense content, but any thought or comments are very
welcome.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Float view of complex array

2015-01-27 Thread Jaime Fernández del Río
On Mon, Jan 26, 2015 at 10:28 PM, Jens Jørgen Mortensen je...@fysik.dtu.dk
wrote:

 On 01/26/2015 11:02 AM, Jaime Fernández del Río wrote:
  On Mon, Jan 26, 2015 at 1:41 AM, Sebastian Berg
  sebast...@sipsolutions.net mailto:sebast...@sipsolutions.net wrote:
 
  On Mo, 2015-01-26 at 09:24 +0100, Jens Jørgen Mortensen wrote:
   Hi!
  
   I have a view of a 2-d complex array that I would like to view
  as a 2-d
   float array.  This works OK:
  
 np.ones((2, 4), complex).view(float)
   array([[ 1.,  0.,  1.,  0.,  1.,  0.,1.,  0.],
   [ 1.,  0.,  1.,  0.,  1.,  0.,  1.,  0.]])
  
   but this doesn't:
  
 np.ones((2, 4), complex)[:, :2].view(float)
   Traceback (most recent call last):
  File stdin, line 1, in module
   ValueError: new type not compatible with array.
 np.__version__
   '1.9.0'
  
   and I don't understand why.  When looking at the memory layout,
  I think
   it should be possible.
  
 
  Yes, it should be possible, but it is not :). You could hack it by
  using
  `np.ndarray` (or stride tricks). Or maybe you are interested
  making the
  checks whether it makes sense or not less strict.
 
 
  How would it be possible? He goes from an array with 16 byte strides
  along the last axis:
 
  r0i0, r1i1, r2i2, r3i3
 
  to one with 32 byte strides, which is OK
 
  r0i0, , r2i2, 
 
  but everything breaks down when he wants to have alternating strides
  of 8 and 24 bytes:
 
  r0, i0, , r2, i2, 

 No, that is not what I want.  I want this:

 r0, i0, r1, i1, , 

 with stride 8 on the last axis - which should be fine.  My current
 workaround is to do a copy() before view() - thanks Maniteja.


My bad, you are absolutely right, Jens...

I have put together a quick PR (https://github.com/numpy/numpy/pull/5508)
that fixes your use case, by relaxing the requirements for views of
different dtypes. I'd appreciate if you could take a look at the logic in
the code (it is profusely commented), and see if you can think of other
cases that can be viewed as another dtype that I may have overlooked.

Thanks,

Jaime



Jens Jørgen

 
  which cannot be hacked in any sensible way.
 
  What I think could be made to work, but also fails, is this:
 
  np.ones((2, 4), complex).reshape(2, 4, 1)[:, :2, :].view(float)
 
  Here the original strides are (64, 16, xx) and the resulting view
  should have strides (64, 32, 8), not sure what trips this.
 
  Jaime
 
 
  - Sebastian
 
   Jens Jørgen
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
  --
  (\__/)
  ( O.o)
  (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
  planes de dominación mundial.
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sorting refactor

2015-01-16 Thread Jaime Fernández del Río
On Fri, Jan 16, 2015 at 4:19 AM, Matthew Brett matthew.br...@gmail.com
wrote:

 Hi,

 On Fri, Jan 16, 2015 at 5:24 AM, Jaime Fernández del Río
 jaime.f...@gmail.com wrote:
  Hi all,
 
  I have been taking a deep look at the sorting functionality in numpy,
 and I
  think it could use a face lift in the form of a big code refactor, to get
  rid of some of the ugliness in the code and make it easier to maintain.
 What
  I have in mind basically amounts to:
 
  Refactor _new_argsortlike to get rid of code duplication (there are two
  branches, one with buffering, one without, virtually identical, that
 could
  be merged into a single one).
  Modify _new_argsortlike so that it can properly handle byte-swapped
 inputs
  of any dtype, see gh-5441. Add proper handling of types with references,
 in
  preparation for the rest of changes.
  Add three functions to the npy_sort library: npy_aquicksort,
 npy_aheapsort,
  npy_amergesort, with a signature compatible with PyArray_ArgSortFunc ,
 i.e.
  (char* start, npy_intp* result, npy_intp length, PyArrayObject *arr).
 These
  turn out to be almost identical to the string and unicode sort functions,
  but using the dtype's compare function to handle comparisons.
  Modify PyArray_ArgSort (and PyArray_ArgPartition) to always call
  _new_argsortlike, even when there is no type specific argsort function,
 by
  using the newly added npy_axxx functions. This simplifies
 PyArray_ArgSort a
  lot, and gets rid of some of the global variable ugliness in the current
  code. And makes argsorting over non-contiguous axis more memory
 efficient.
  Refactor _new_sortlike similarly to _new_argsortlike
  Modify the npy_quicksort, npy_mergesort and npy_heapsort functions in
  npy_sort to have a signature compatible with PyArray_SortFunc, i.e.
 (char*
  start, npy_intp length, PyArrayObject *arr). npy_quicksort will no longer
  rely on libc's qsort, but be very similar to the string or unicode
 quicksort
  functions
  Modify PyArray_Sort (and PyArray_Partition) to always call _new_sortlike,
  similarly to what was done with PyArray_ArgSort. This allows completing
 the
  removal of the remaining global variable ugliness, as well as similar
  benefits as for argsort before.
 
  This changes will make it easier for me to add a Timsort generic type
  function to numpy's arsenal of sorting routines. And I think they have
 value
  by cleaning the source code on their own. So my questions, mostly to the
  poor souls that will have to code review changes to  several hundred
 lines
  of code:
 
  Does this make sense, or is it better left alone? A subset of 1, 2 and 5
 are
  a must to address the issues in gh-5441, the rest could arguably be left
 as
  is.
  Would you rather see it submitted as one ginormous PR? Or split into 4
 or 5
  incremental ones?

 Do you think it would be possible to split this into several PRs, with
 the initial one being the refactoring, and the subsequent ones being
 additions to sorting functionality?


Just to be clear, nothing in the long list of changes I posted earlier is
truly a change in the sorting functionality, except in the case of
quicksort (and argquicksort) for generic types, which would no longer rely
on qsort, but on our own implementation of it, just like every other sort
in numpy.

So yes, the refactor PR can precede new functionality. And it can even be
split into 3 or 4 incremental PRs, to make the reviewers life easier. Since
it is most likely Charles and/or Julian that are going to have to swallow
the review pill, I'd like to hear from them how would they like it better.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sorting refactor

2015-01-16 Thread Jaime Fernández del Río
On Fri, Jan 16, 2015 at 8:15 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Fri, Jan 16, 2015 at 7:11 AM, Jaime Fernández del Río 
 jaime.f...@gmail.com wrote:

 On Fri, Jan 16, 2015 at 3:33 AM, Lars Buitinck larsm...@gmail.com
 wrote:

 2015-01-16 11:55 GMT+01:00  numpy-discussion-requ...@scipy.org:
  Message: 2
  Date: Thu, 15 Jan 2015 21:24:00 -0800
  From: Jaime Fern?ndez del R?o jaime.f...@gmail.com
  Subject: [Numpy-discussion] Sorting refactor
  To: Discussion of Numerical Python numpy-discussion@scipy.org
  Message-ID:
  
 capowhwkf6rnwcrgmcwsmq_lo3hshjgbvlsrn19z-mdpe25e...@mail.gmail.com
  Content-Type: text/plain; charset=utf-8
 
  This changes will make it easier for me to add a Timsort generic type
  function to numpy's arsenal of sorting routines. And I think they have
  value by cleaning the source code on their own.

 Yes, they do. I've been looking at the sorting functions as well and
 I've found the following:

 * The code is generally hard to read because it prefers pointer over
 indices. I'm wondering if it would get slower using indices. The
 closer these algorithms are to the textbook, the easier to insert
 fancy optimizations.


 They are harder to read, but so cute to look at! C code just wouldn't
 feel the same without some magical pointer arithmetic thrown in here and
 there. ;-)


 Pointers were faster than indexing. That advantage can be hardware
 dependent, but for small numbers of pointers is typical.




 * The heap sort exploits undefined behavior by using a pointer that
 points before the start of the array. However, rewriting it to always
 point within the array made it slower. I haven't tried rewriting it
 using indices


 Fortran uses the same pointer trick for one based indexing, or at least
 the old DEC compilers did ;) There is no reason to avoid it.


 .

 * Quicksort has a quadratic time worst case. I think it should be
 turned into an introsort [1] for O(n log n) worst case; we have the
 heapsort needed to do that.

 * Quicksort is robust to repeated elements, but doesn't exploit them.
 It can be made to run in linear time if the input array has only O(1)
 distinct elements [2]. This may come at the expense of some
 performance on arrays with no repeated elements.


 Java famously changed its library implementation of quicksort to a dual
 pivot one invented by Vladimir Yaroslavskiy[1], they claim that with
 substantial performance gains. I tried to implement that for numpy [2], but
 couldn't get it to work any faster than the current code.


 For sorting, simple often beats fancy.



 * Using optimal sorting networks instead of insertion sort as the base
 case can speed up quicksort on float arrays by 5-10%, but only if NaNs
 are moved out of the way first so that comparisons become cheaper [3].
 This has consequences for the selection algorithms that I haven't
 fully worked out yet.



 I expect the gains here would be for small sorts, which tend to be
 dominated by call overhead.


 Even if we stick with selection sort, we should spin it off into an
 inline smallsort function within the npy_sort library, and have quicksort
 and mergesort call the same function, instead of each implementing their
 own. It would make optimizations like the sorting networks easier to
 implement for all sorts. We could even expose it outside npy_sort, as there
 are a few places around the code base that have ad-hoc implementations of
 sorting.


 Good idea, I've thought of doing it myself.



 * Using Cilk Plus to parallelize merge sort can make it significantly
 faster than quicksort at the expense of only a few lines of code, but
 I haven't checked whether Cilk Plus plays nicely with multiprocessing
 and other parallelism options (remember the trouble with OpenMP-ified
 OpenBLAS?).

 This isn't really an answer to your questions, more like a brain dump
 from someone who's stared at the same code for a while and did some
 experiments. I'm not saying we should implement all of this, but keep
 in mind that there are some interesting options besides implementing
 timsort.


 Timsort came up in a discussion several months ago, where I proposed
 adding a mergesorted function (which I have mostly ready, by the way, [3])
 to speed-up some operations in arraysetops. I have serious doubts that it
 will perform comparably to the other sorts unless comparisons are terribly
 expensive, which they typically aren't in numpy, but it has been an
 interesting learning exercise so far, and I don't mind taking it all the
 way.

 Most of my proposed original changes do not affect the core sorting
 functionality, just the infrastructure around it. But if we agree that
 sorting has potential for being an actively developed part of the code
 base, then cleaning up its surroundings for clarity makes sense, so I'm
 taking your brain dump as an aye for my proposal. ;-)


 I have a generic quicksort with standard interface sitting around
 somewhere in an ancient branch. Sorting objects

  1   2   >