Re: [Numpy-discussion] PyData Barcelona this May

2017-03-23 Thread Jaime Fernández del Río
Thanks for the suggestion!

That abstract is limited to 400 characters, and it's already at 350+, so
there isn't much room left.
If it gets accepted I will eventually need to fill out an extended
abstract, where I will make sure to explain that better.

Jaime

On Tue, Mar 21, 2017 at 3:45 PM, Daπid  wrote:

> On 20 March 2017 at 19:58, Jaime Fernández del Río 
> wrote:
> >
> > Taking NumPy In Stride
> > This workshop is aimed at users already familiar with NumPy. We will
> dissect
> > the NumPy memory model with the help of a very powerful abstraction:
> > strides.
> > Participants will learn how to create different views out of the same
> data,
> > including multidimensional ones, get a new angle on how and why
> broadcasting
> > works, and explore related techniques to write faster, more efficient
> code.
>
> I think I only understand this abstract because I know what views are.
> Maybe you could add a line explaining what they are? (I cannot think
> of one myself).
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PyData Barcelona this May

2017-03-21 Thread Jaime Fernández del Río
On Mon, Mar 20, 2017 at 10:13 PM, Chris Barker 
wrote:

> On Mon, Mar 20, 2017 at 11:58 AM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>>  I have just submitted a workshop proposal with the following short
>> description:
>>
>> Taking NumPy In Stride
>> This workshop is aimed at users already familiar with NumPy. We will
>> dissect
>> the NumPy memory model with the help of a very powerful abstraction:
>> strides.
>> Participants will learn how to create different views out of the same
>> data,
>> including multidimensional ones, get a new angle on how and why
>> broadcasting
>> works, and explore related techniques to write faster, more efficient
>> code.
>>
>
> I'd go!
>
> And nice title :-)
>
> Any thoughts on a similar one for SciPy in Austin?
>

I'll be more than happy to share presentations, notebooks and whatnot with
someone wanting to run the tutorial over there. But Austin is a looong way
from Zürich, and the dates conflict with my son's birthday, so I don't
think I will be going...

Jaime


>
> -CHB
>
>
>
>
>
>> Let's see what the organizers think of it...
>>
>> Jaime
>>
>>
>> On Fri, Mar 17, 2017 at 10:59 PM, Ralf Gommers 
>> wrote:
>>
>>>
>>>
>>> On Sat, Mar 18, 2017 at 8:41 AM, Chris Barker 
>>> wrote:
>>>
>>>> On Fri, Mar 17, 2017 at 4:37 AM, Jaime Fernández del Río <
>>>> jaime.f...@gmail.com> wrote:
>>>>
>>>>>
>>>>>- many people that use numpy in their daily work don't know what
>>>>>strides are, this was a BIG surprise for me.
>>>>>
>>>>> I'm not surprised at all. To start with, the majority of users are
>>> self-taught programmers that never used something lower level than Python
>>> or Matlab. Even talking to them about memory layout presents challenges.
>>>
>>>
>>>>
>>>>>-
>>>>>
>>>>> Based on that experience, I was thinking that maybe a good topic for a
>>>>> workshop would be NumPy's memory model: views, reshaping, strides, some
>>>>> hints of buffering in the iterator...
>>>>>
>>>>
>>> This material has been used multiple times in EuroScipy tutorials and
>>> may be of use: http://www.scipy-lectures.org/
>>> advanced/advanced_numpy/index.html
>>>
>>> Ralf
>>>
>>>
>>>
>>>> I think this is a great idea. In fact, when I do an intro to numpy, I
>>>> spend a bit of time on those issues, 'cause I think it's key to "Getting"
>>>> numpy, and not something that people end up learning on their own from
>>>> tutorials, etc. However, in my  case, I try to jam it into a low-level
>>>> intro, and I think that fails :-(
>>>>
>>>> So doing it on it's own with the assumption that participant already
>>>> know the basics of the high level python interface is a great idea.
>>>>
>>>> Maybe a "advanced" numpy tutorial for SciPy 2017 in Austin also???
>>>>
>>>> Here is my last talk -- maybe it'll be helpful.
>>>>
>>>> http://uwpce-pythoncert.github.io/SystemDevelopment/scipy.html#scipy
>>>>
>>>> the strides stuff is covered in a notebook here:
>>>>
>>>> https://github.com/UWPCE-PythonCert/SystemDevelopment/blob/m
>>>> aster/Examples/numpy/stride_tricks.ipynb
>>>>
>>>> other notebooks here:
>>>>
>>>> https://github.com/UWPCE-PythonCert/SystemDevelopment/tree/m
>>>> aster/Examples/numpy
>>>>
>>>> and the source for the whole thing is here:
>>>>
>>>> https://github.com/UWPCE-PythonCert/SystemDevelopment/blob/m
>>>> aster/slides_sources/source/scipy.rst
>>>>
>>>>
>>>> All licensed under: Creative Commons Attribution-ShareAlike -- so
>>>> please use anything you find useful.
>>>>
>>>> -CHB
>>>>
>>>>
>>>>
>>>> And Julian's temporary work lends itself to a very nice talk, more on
>>>>> Python internals than on NumPy, but it's a very cool subject nonetheless.
>>>>>
>>>>> So my thinking is that I am going to propose those two, as a workshop
>>>>> and a talk. Thoughts?
>>>>>
>>>>> Jai

Re: [Numpy-discussion] PyData Barcelona this May

2017-03-20 Thread Jaime Fernández del Río
Thanks for the links, Chris and Ralf, will be very helpful eventually. I
have just submitted a workshop proposal with the following short
description:

Taking NumPy In Stride
This workshop is aimed at users already familiar with NumPy. We will dissect
the NumPy memory model with the help of a very powerful abstraction:
strides.
Participants will learn how to create different views out of the same data,
including multidimensional ones, get a new angle on how and why broadcasting
works, and explore related techniques to write faster, more efficient code.

Let's see what the organizers think of it...

Jaime


On Fri, Mar 17, 2017 at 10:59 PM, Ralf Gommers 
wrote:

>
>
> On Sat, Mar 18, 2017 at 8:41 AM, Chris Barker 
> wrote:
>
>> On Fri, Mar 17, 2017 at 4:37 AM, Jaime Fernández del Río <
>> jaime.f...@gmail.com> wrote:
>>
>>>
>>>- many people that use numpy in their daily work don't know what
>>>strides are, this was a BIG surprise for me.
>>>
>>> I'm not surprised at all. To start with, the majority of users are
> self-taught programmers that never used something lower level than Python
> or Matlab. Even talking to them about memory layout presents challenges.
>
>
>>
>>>-
>>>
>>> Based on that experience, I was thinking that maybe a good topic for a
>>> workshop would be NumPy's memory model: views, reshaping, strides, some
>>> hints of buffering in the iterator...
>>>
>>
> This material has been used multiple times in EuroScipy tutorials and may
> be of use: http://www.scipy-lectures.org/advanced/advanced_numpy/index.
> html
>
> Ralf
>
>
>
>> I think this is a great idea. In fact, when I do an intro to numpy, I
>> spend a bit of time on those issues, 'cause I think it's key to "Getting"
>> numpy, and not something that people end up learning on their own from
>> tutorials, etc. However, in my  case, I try to jam it into a low-level
>> intro, and I think that fails :-(
>>
>> So doing it on it's own with the assumption that participant already know
>> the basics of the high level python interface is a great idea.
>>
>> Maybe a "advanced" numpy tutorial for SciPy 2017 in Austin also???
>>
>> Here is my last talk -- maybe it'll be helpful.
>>
>> http://uwpce-pythoncert.github.io/SystemDevelopment/scipy.html#scipy
>>
>> the strides stuff is covered in a notebook here:
>>
>> https://github.com/UWPCE-PythonCert/SystemDevelopment/blob/
>> master/Examples/numpy/stride_tricks.ipynb
>>
>> other notebooks here:
>>
>> https://github.com/UWPCE-PythonCert/SystemDevelopment/tree/
>> master/Examples/numpy
>>
>> and the source for the whole thing is here:
>>
>> https://github.com/UWPCE-PythonCert/SystemDevelopment/blob/
>> master/slides_sources/source/scipy.rst
>>
>>
>> All licensed under: Creative Commons Attribution-ShareAlike -- so please
>> use anything you find useful.
>>
>> -CHB
>>
>>
>>
>> And Julian's temporary work lends itself to a very nice talk, more on
>>> Python internals than on NumPy, but it's a very cool subject nonetheless.
>>>
>>> So my thinking is that I am going to propose those two, as a workshop
>>> and a talk. Thoughts?
>>>
>>> Jaime
>>>
>>> On Thu, Mar 9, 2017 at 8:29 PM, Sebastian Berg <
>>> sebast...@sipsolutions.net> wrote:
>>>
>>>> On Thu, 2017-03-09 at 15:45 +0100, Jaime Fernández del Río wrote:
>>>> > There will be a PyData conference in Barcelona this May:
>>>> >
>>>> > http://pydata.org/barcelona2017/
>>>> >
>>>> > I am planning on attending, and was thinking of maybe proposing to
>>>> > organize a numpy-themed workshop or tutorial.
>>>> >
>>>> > My personal inclination would be to look at some advanced topic that
>>>> > I know well, like writing gufuncs in Cython, but wouldn't mind doing
>>>> > a more run of the mill thing. Anyone has any thoughts or experiences
>>>> > on what has worked well in similar situations? Any specific topic you
>>>> > always wanted to attend a workshop on, but were afraid to ask?
>>>> >
>>>> > Alternatively, or on top of the workshop, I could propose to do a
>>>> > talk: talking last year at PyData Madrid about the new indexing was a
>>>> > lot of fun! Thing is, I have been quite disconnected from the project
>>>>

Re: [Numpy-discussion] PyData Barcelona this May

2017-03-17 Thread Jaime Fernández del Río
Last night I gave a short talk to the PyData Zürich meetup on Julian's
temporary elision PR, and Pauli's overlapping memory one. My learnings from
that experiment are:

   - there is no way to talk about both things in a 30 minute talk: I
   barely scraped the surface and ended up needing 25 minutes.
   - many people that use numpy in their daily work don't know what strides
   are, this was a BIG surprise for me.

Based on that experience, I was thinking that maybe a good topic for a
workshop would be NumPy's memory model: views, reshaping, strides, some
hints of buffering in the iterator...

And Julian's temporary work lends itself to a very nice talk, more on
Python internals than on NumPy, but it's a very cool subject nonetheless.

So my thinking is that I am going to propose those two, as a workshop and a
talk. Thoughts?

Jaime

On Thu, Mar 9, 2017 at 8:29 PM, Sebastian Berg 
wrote:

> On Thu, 2017-03-09 at 15:45 +0100, Jaime Fernández del Río wrote:
> > There will be a PyData conference in Barcelona this May:
> >
> > http://pydata.org/barcelona2017/
> >
> > I am planning on attending, and was thinking of maybe proposing to
> > organize a numpy-themed workshop or tutorial.
> >
> > My personal inclination would be to look at some advanced topic that
> > I know well, like writing gufuncs in Cython, but wouldn't mind doing
> > a more run of the mill thing. Anyone has any thoughts or experiences
> > on what has worked well in similar situations? Any specific topic you
> > always wanted to attend a workshop on, but were afraid to ask?
> >
> > Alternatively, or on top of the workshop, I could propose to do a
> > talk: talking last year at PyData Madrid about the new indexing was a
> > lot of fun! Thing is, I have been quite disconnected from the project
> > this past year, and can't really think of any worthwhile topic. Is
> > there any message that we as a project would like to get out to the
> > larger community?
> >
>
> Francesc already pointed out the temporary optimization. From what I
> remember, my personal highlight would probably be Pauli's work on the
> memory overlap detection. Though both are rather passive improvements I
> guess (you don't really have to learn them to use them), its very cool!
> And if its about highlighting new stuff, these can probably easily fill
> a talk.
>
> > And if you are planning on attending, please give me a shout.
> >
>
> Barcelona :). Maybe I should think about it, but probably not.
>
>
> > Thanks,
> >
> > Jaime
> >
> > --
> > (\__/)
> > ( O.o)
> > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
> > planes de dominación mundial.
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] PyData Barcelona this May

2017-03-09 Thread Jaime Fernández del Río
There will be a PyData conference in Barcelona this May:

http://pydata.org/barcelona2017/

I am planning on attending, and was thinking of maybe proposing to organize
a numpy-themed workshop or tutorial.

My personal inclination would be to look at some advanced topic that I know
well, like writing gufuncs in Cython, but wouldn't mind doing a more run of
the mill thing. Anyone has any thoughts or experiences on what has worked
well in similar situations? Any specific topic you always wanted to attend
a workshop on, but were afraid to ask?

Alternatively, or on top of the workshop, I could propose to do a talk:
talking last year at PyData Madrid about the new indexing was a lot of fun!
Thing is, I have been quite disconnected from the project this past year,
and can't really think of any worthwhile topic. Is there any message that
we as a project would like to get out to the larger community?

And if you are planning on attending, please give me a shout.

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] StackOverflow documentation

2016-07-21 Thread Jaime Fernández del Río
StackOverflow now also has documentation, and there already is a NumPy tag:

http://stackoverflow.com/documentation/numpy

Not sure what, if anything, do we want to do with this, nor how to handle
not having to different sources with the same information. Any thoughts?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Python 3 dict support (issue #5718)

2016-07-21 Thread Jaime Fernández del Río
On Wed, Jul 20, 2016 at 9:39 PM, Hannah  wrote:

> I second (and third & fourth &...) this
>
> Thanks so much for this, Jaime, it's exactly what I was looking for and
> couldn't find. Maybe this can be linked to in the contribute docs as an
> "orienting yourself in the codebase" page?
>
Glad to know it helped, realizing how this works was one of the big leaps
in understanding the code base for myself too.  We (my wife and I) have
sent the kids on vacation with their grandparents, so when we are done
enjoying our shortly recovered freedom hunting for Pokemons around Zurich,
I'll try to find time to write this up in a form suitable for the docs,
hopefully over the next couple of weeks.

I have been known to not keep commitments like this in the past, so don't
hold your breath just in case...

Jaime


>
> On Wed, Jul 20, 2016 at 11:07 AM, Joseph Fox-Rabinovitz <
> jfoxrabinov...@gmail.com> wrote:
>
>> Jaime,
>>
>> This is a great intro for people looking to jump into the C side of
>> things. I have been trying to figure out which bits are the important
>> ones from looking at the code and the docs. Your post cut out most of
>> the confusion. Is there some way you would consider adding something
>> like this this to the docs?
>>
>> -Joe
>>
>>
>> On Wed, Jul 20, 2016 at 8:52 AM, Jaime Fernández del Río
>>  wrote:
>> > On Wed, Jul 20, 2016 at 4:28 AM, Hannah  wrote:
>> >>
>> >> Hi,
>> >> I started venturing down the rabbit hole of trying to write a patch to
>> add
>> >> support for numpy to convert python 3 dictionary keys
>> >> (collections.abc.ViewMapping objects), which is open issue #5718 and am
>> >> having trouble orienting myself. I'm unclear as to where the python
>> entry
>> >> point into array is (basically, what function np.array drops into and
>> if
>> >> this is in Python or C) and where/what language (fine with writing
>> either) a
>> >> patch that supports MappingViews would go. Any help getting oriented
>> would
>> >> be much appreciated.
>> >
>> >
>> > Hi Hannah,
>> >
>> > ǹp.array is written in C, and is part of the multiarray module that is
>> > defined here:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c
>> >
>> > The "array" name is mapped here:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c#L4093
>> >
>> > to the function _array_fromobject defined here:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c#L1557
>> >
>> > That functions does some checking and has a couple of fast paths for the
>> > case where the input is already an array or a subclass, but for the
>> general
>> > case it relies on PyArray_CheckFromAny:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1848
>> >
>> > which in turn calls Pyarray_FromAny:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1674
>> >
>> > You will also haveto take a look at what goes on in
>> > PyArray_GetArrayParamsFromObject, which gets called by PyArray_FromAny;
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1428
>> >
>> > as well as several other places, but I think they are all (or most of
>> them)
>> > in ctors.c.
>> >
>> > You may also want to take a llok at PyArray_FromIter, which is the
>> function
>> > that ultimately takes care of calls to np.fromiter:
>> >
>> >
>> https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L3657
>> >
>> > It's messy, but not that bad once you get used to it: good luck!
>> >
>> > Jaime
>> >
>> >>
>> >>
>> >> The reasoning for the patch is s that dicts are one of the most common
>> >> Python datatypes and this specifically is because of an upstream issue
>> of
>> >> wanting dict support in matplotlib.
>> >>
>> >> Thanks,
>> >> Hannah
>> >>
>> >> ___
>> >> NumPy-D

Re: [Numpy-discussion] Python 3 dict support (issue #5718)

2016-07-20 Thread Jaime Fernández del Río
On Wed, Jul 20, 2016 at 4:28 AM, Hannah  wrote:

> Hi,
> I started venturing down the rabbit hole of trying to write a patch to add
> support for numpy to convert python 3 dictionary keys
> (collections.abc.ViewMapping objects), which is open issue #5718 and am
> having trouble orienting myself. I'm unclear as to where the python entry
> point into array is (basically, what function np.array drops into and if
> this is in Python or C) and where/what language (fine with writing either)
> a patch that supports MappingViews would go. Any help getting oriented
> would be much appreciated.
>

Hi Hannah,

ǹp.array is written in C, and is part of the multiarray module that is
defined here:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c

The "array" name is mapped here:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c#L4093

to the function _array_fromobject defined here:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/multiarraymodule.c#L1557

That functions does some checking and has a couple of fast paths for the
case where the input is already an array or a subclass, but for the general
case it relies on PyArray_CheckFromAny:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1848

which in turn calls Pyarray_FromAny:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1674

You will also haveto take a look at what goes on in
PyArray_GetArrayParamsFromObject, which gets called by PyArray_FromAny;

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L1428

as well as several other places, but I think they are all (or most of them)
in ctors.c.

You may also want to take a llok at PyArray_FromIter, which is the function
that ultimately takes care of calls to np.fromiter:

https://github.com/numpy/numpy/blob/maintenance/1.11.x/numpy/core/src/multiarray/ctors.c#L3657

It's messy, but not that bad once you get used to it: good luck!

Jaime


>
> The reasoning for the patch is s that dicts are one of the most common
> Python datatypes and this specifically is because of an upstream issue of
> wanting dict support in matplotlib.
>
> Thanks,
> Hannah
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.dtype has the wrong size, try recompiling

2016-07-05 Thread Jaime Fernández del Río
This question on Stack Overflow:

http://stackoverflow.com/questions/38197086/sklearn-numpy-dtype-has-the-wrong-size-try-recompiling-in-both-pycharm-and-te

If I remember correctly this has something to do with Cython, right? Can't
remeber the details though, can someone help that poor soul out?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: compute many inner products quickly

2016-06-06 Thread Jaime Fernández del Río
On Mon, Jun 6, 2016 at 9:35 AM, Sebastian Berg 
wrote:

> On So, 2016-06-05 at 19:20 -0600, Charles R Harris wrote:
> >
> >
> > On Sun, Jun 5, 2016 at 6:41 PM, Stephan Hoyer 
> > wrote:
> > > If possible, I'd love to add new functions for "generalized ufunc"
> > > linear algebra, and then deprecate (or at least discourage) using
> > > the older versions with inferior broadcasting rules. Adding a new
> > > keyword arg means we'll be stuck with an awkward API for a long
> > > time to come.
> > >
> > > There are three types of matrix/vector products for which ufuncs
> > > would be nice:
> > > 1. matrix-matrix product (covered by matmul)
> > > 2. matrix-vector product
> > > 3. vector-vector (inner) product
> > >
> > > It's straightful to implement either of the later two options by
> > > inserting dummy dimensions and then calling matmul, but that's a
> > > pretty awkward API, especially for inner products. Unfortunately,
> > > we already use the two most obvious one word names for vector inner
> > > products (inner and dot). But on the other hand, one word names are
> > > not very descriptive, and the short name "dot" probably mostly
> > > exists because of the lack of an infix operator.
> > >
> > > So I'll start by throwing out some potential new names:
> > >
> > > For matrix-vector products:
> > > matvecmul (if it's worth making a new operator)
> > >
> > > For inner products:
> > > vecmul (similar to matmul, but probably too ambiguous)
> > > dot_product
> > > inner_prod
> > > inner_product
> > >
> > I was using mulmatvec, mulvecmat, mulvecvec back when I was looking
> > at this. I suppose the mul could also go in the middle, or maybe
> > change it to x and put it in the middle: matxvec, vecxmat, vecxvec.
> >
>
> Were not some of this part of the gufunc linalg functions and we just
> removed it because we were not sure about the API? Not sure anymore,
> but might be worth to have a look.
>

We have

from numpy.core.umath_tests import inner1d

which does vectorized vector-vector multiplication, but it's undocumented.
There is also a matrix_multiply in that same module that does the obvious
thing.

And when gufuncs were introduced in linalg, there were a bunch of functions
doing all sorts of operations without intermediate storage, e.g. sum3(a, b,
c) -> a + b + c, that were removed before merging the PR. Wasn't involved
at the time, so not sure what the rationale was.

Since we are at it, should quadratic/bilinear forms get their own function
too?  That is, after all, what the OP was asking for.

Jaime
-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Developer Meeting at EuroScipy?

2016-05-31 Thread Jaime Fernández del Río
On Tue, May 31, 2016 at 8:37 AM, Sebastian Berg 
wrote:

> Hi all,
>
> since we had decided to do a regular developers meeting last year, how
> would EuroScipy (Aug. 23.-27., Erlangen, Germany) look as a possible
> place and time to have one?
> I believe EuroScipy would include a few people who were not able to
> come to SciPy last year, and it seems SciPy itself already sees some of
> the devs quite busy anyway.
>

I will not be able to attend the full conference, but it's only a 10 hour
train ride from what I now call home, so I am positive I can be there for a
full day meeting, no matter what day of that week it is.

Jaime


>
> Not having checked back for room availability at the conference or
> anything, one option may be:
>
> August 24. (parallel to the second/last day of Tutorials)
>
> Does this seem like a good plan or do you have alternative suggestions
> or scheduling difficulties with this?
>
> Regards,
>
> Sebastian
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cross-correlation PR stuck in limbo

2016-05-27 Thread Jaime Fernández del Río
I did an overall review of the code a couple of weeks ago (see the PR for
details), and there is quite some work to be done before we can merge
Honi's code. But if he can find the time to work on the coding, I'll try to
be more diligent about the reviewing.

Jaime

On Fri, May 27, 2016 at 12:51 PM, Elliot Hallmark 
wrote:

> +1
>
> This would really help with large data sets in certain situations.
>
> Is there still disagreement about whether this should be included? Or are
> there some minor details still? Or just lost in the shuffle?
>
> Hopefully,
>   Elliot
>
> On Wed, May 4, 2016 at 7:07 AM, Pierre Haessig 
> wrote:
>
>> Hi,
>>
>> I don't know how to push the PR forward, but all I can say is that this
>> maxlag feature would be a major improvement for using Numpy in time
>> series analysis! Immediate benefits downstream for Matplotlib and
>> statsmodel.
>>
>> Thanks Honi for having taken the time to implement this!
>>
>> best,
>> Pierre
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Starting work on ufunc rewrite

2016-04-01 Thread Jaime Fernández del Río
On Thu, Mar 31, 2016 at 10:14 PM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> There is certainly good precedent for the approach you suggest.
> Shortly after Nathaniel mentioned the rewrite to me, I looked up
> d-pointers as a possible technique: https://wiki.qt.io/D-Pointer.
>

Yes, the idea is similar, although somewhat simpler since we are doing C,
not C++.


>
> If we allow arbitrary kwargs for the new functions, is that something
> you would want to note in the public structure? I was thinking
> something along the lines of adding a hook to process additional
> kwargs and return a void * that would then be passed to the loop.
>

I'm not sure I understand what you mean... But I also don't think it is
very relevant at this point? What I intend to do is simply to hide the guts
of ufuncs, breaking everyone's code once... so that we can later change
whatever we want without breaking anything else. PyUFunc_GenericFunction
already takes *args and **kwargs, and the internal logic of how these get
processed can be modified at will. If what you are proposing is to create a
PyUFunc_FromFuncAndDataAndSignatureAndKwargProcessor API function that
would provide a customized function to process extra kwargs and somehow
pass them into the actual ufunc loop, that would just be an API extension,
and there shouldn't be any major problem in introducing that whenever,
especially once we are free to modify the internal representation of ufuncs
without breaking ABI compatibility.


> To do this incrementally, perhaps opening a special development branch
> on the main repository is in order?
>

Yes, something like that seems like the right thing to do indeed. I would
like someone with more git foo than me to spell out the details of how we
would create and eventually merge that branch.


>
> I would love to join in the fun as time permits. Unfortunately, it is
> not especially permissive right about now. I will at least throw in
> some ideas that I have been mulling over.
>

Please do!

Jaime


>
> -Joe
>
>
> On Thu, Mar 31, 2016 at 4:00 PM, Jaime Fernández del Río
>  wrote:
> > I have started discussing with Nathaniel the implementation of the ufunc
> ABI
> > break that he proposed in a draft NEP a few months ago:
> >
> > http://thread.gmane.org/gmane.comp.python.numeric.general/61270
> >
> > His original proposal was to make the public portion of PyUFuncObject be:
> >
> > typedef struct {
> > PyObject_HEAD
> > int nin, nout, nargs;
> > } PyUFuncObject;
> >
> > Of course the idea is that internally we would use a much larger struct
> that
> > we could change at will, as long as its first few entries matched those
> of
> > PyUFuncObject. My problem with this, and I may very well be missing
> > something, is that in PyUFunc_Type we need to set the tp_basicsize to the
> > size of the extended struct, so we would end up having to expose its
> > contents. This is somewhat similar to what now happens with
> PyArrayObject:
> > anyone can #include "ndarraytypes.h", cast PyArrayObject* to
> > PyArrayObjectFields*, and access the guts of the struct without using the
> > supplied API inline functions. Not the end of the world, but if you want
> to
> > make something private, you might as well make it truly private.
> >
> > I think it would be to have something similar to what NpyIter does::
> >
> > typedef struct {
> > PyObject_HEAD
> > NpyUFunc *ufunc;
> > } PyUFuncObject;
> >
> > where NpyUFunc would, at this level, be an opaque type of which nothing
> > would be known. We could have some of the NpyUFunc attributes cached on
> the
> > PyUFuncObject struct for easier access, as is done in
> NewNpyArrayIterObject.
> > This would also give us more liberty in making NpyUFunc be whatever we
> want
> > it to be, including a variable-sized memory chunk that we could use and
> > access at will. NpyIter is again a good example, where rather than
> storing
> > pointers to strides and dimensions arrays, these are made part of the
> > NpyIter memory chunk, effectively being equivalent to having variable
> sized
> > arrays as part of the struct. And I think we will probably no longer
> trigger
> > the Cython warnings about size changes either.
> >
> > Any thoughts on this approach? Is there anything fundamentally wrong with
> > what I'm proposing here?
> >
> > Also, this is probably going to end up being a rewrite of a pretty large
> and
> > complex codebase. I am not sure that working on this on my own and
> > eventually sending a humongous PR is the be

[Numpy-discussion] Starting work on ufunc rewrite

2016-03-31 Thread Jaime Fernández del Río
I have started discussing with Nathaniel the implementation of the ufunc
ABI break that he proposed in a draft NEP a few months ago:

http://thread.gmane.org/gmane.comp.python.numeric.general/61270

His original proposal was to make the public portion of PyUFuncObject be:

typedef struct {
PyObject_HEAD
int nin, nout, nargs;
} PyUFuncObject;

Of course the idea is that internally we would use a much larger struct
that we could change at will, as long as its first few entries matched
those of PyUFuncObject. My problem with this, and I may very well be
missing something, is that in PyUFunc_Type we need to set the tp_basicsize
to the size of the extended struct, so we would end up having to expose its
contents. This is somewhat similar to what now happens with PyArrayObject:
anyone can #include "ndarraytypes.h", cast PyArrayObject* to
PyArrayObjectFields*, and access the guts of the struct without using the
supplied API inline functions. Not the end of the world, but if you want to
make something private, you might as well make it truly private.

I think it would be to have something similar to what NpyIter does::

typedef struct {
PyObject_HEAD
NpyUFunc *ufunc;
} PyUFuncObject;

where NpyUFunc would, at this level, be an opaque type of which nothing
would be known. We could have some of the NpyUFunc attributes cached on the
PyUFuncObject struct for easier access, as is done in NewNpyArrayIterObject.
This would also give us more liberty in making NpyUFunc be whatever we want
it to be, including a variable-sized memory chunk that we could use and
access at will. NpyIter is again a good example, where rather than storing
pointers to strides and dimensions arrays, these are made part of the
NpyIter memory chunk, effectively being equivalent to having variable sized
arrays as part of the struct. And I think we will probably no longer
trigger the Cython warnings about size changes either.

Any thoughts on this approach? Is there anything fundamentally wrong with
what I'm proposing here?

Also, this is probably going to end up being a rewrite of a pretty large
and complex codebase. I am not sure that working on this on my own and
eventually sending a humongous PR is the best approach. Any thoughts on how
best to handle turning this into a collaborative, incremental effort?
Anyone who would like to join in the fun?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Make np.bincount output same dtype as weights

2016-03-28 Thread Jaime Fernández del Río
Have modified the PR to do the "promote integers to at least long" we do in
np.sum.

Jaime

On Mon, Mar 28, 2016 at 9:55 PM, CJ Carey  wrote:

> Another +1 for Josef's interpretation from me. Consistency with np.sum
> seems like the best option.
>
> On Sat, Mar 26, 2016 at 11:12 PM, Juan Nunez-Iglesias 
> wrote:
>
>> Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect
>> np.bincount to behave like np.sum with regards to promoting weights dtypes.
>> Including bool.
>>
>> On Sun, Mar 27, 2016 at 1:58 PM,  wrote:
>>
>>> On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
>>>  wrote:
>>> > Would it make sense to just make the output type large enough to hold
>>> the
>>> > cumulative sum of the weights?
>>> >
>>> >
>>> > - Joseph Fox-Rabinovitz
>>> >
>>> > -- Original message--
>>> >
>>> > From: Jaime Fernández del Río
>>> >
>>> > Date: Sat, Mar 26, 2016 16:16
>>> >
>>> > To: Discussion of Numerical Python;
>>> >
>>> > Subject:[Numpy-discussion] Make np.bincount output same dtype as
>>> weights
>>> >
>>> > Hi all,
>>> >
>>> > I have just submitted a PR (#7464) that fixes an enhancement request
>>> > (#6854), making np.bincount return an array of the same type as the
>>> weights
>>> > parameter.  This is an important deviation from current behavior, which
>>> > always casts weights to double, and always returns a double array, so I
>>> > would like to hear what others think about the worthiness of this.
>>> Main
>>> > discussion points:
>>> >
>>> > np.bincount now works with complex weights (yay!), I guess this should
>>> be a
>>> > pretty uncontroversial enhancement.
>>> > The return is of the same type as weights, which means that small
>>> integers
>>> > are very likely to overflow.  This is exactly what #6854 requested, but
>>> > perhaps we should promote the output for integers to a long, as we do
>>> in
>>> > np.sum?
>>>
>>> I always thought of bincount with weights just as a group-by sum. So
>>> it would be easier to remember and have fewer surprises if it matches
>>> the behavior of np.sum.
>>>
>>> > Boolean arrays stay boolean, and OR, rather than sum, the weights. Is
>>> this
>>> > what one would want? If we decide that integer promotion is the way to
>>> go,
>>> > perhaps booleans should go in the same pack?
>>>
>>> Isn't this calculating the sum, i.e. count of True by group, already?
>>> Based on a quick example with numpy 1.9.2, I don't think I ever used
>>> bool weights before.
>>>
>>>
>>> > This new implementation currently supports all of the reasonable native
>>> > types, but has no fallback for user defined types.  I guess we should
>>> > attempt to cast the array to double as before if no native loop can be
>>> > found? It would be good to have a way of testing this though, any
>>> thoughts
>>> > on how to go about this?
>>> > Does a behavior change like this require some deprecation period? What
>>> would
>>> > that look like?
>>> > I have also added broadcasting of weights to the full size of list, so
>>> that
>>> > one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to
>>> tile
>>> > the single weight to the size of the bins list.
>>> >
>>> > Any other thoughts are very welcome as well!
>>>
>>> (2-D weights ?)
>>>
>>>
>>> Josef
>>>
>>>
>>> >
>>> > Jaime
>>> >
>>> > --
>>> > (__/)
>>> > ( O.o)
>>> > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
>>> planes de
>>> > dominación mundial.
>>> >
>>> > ___
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@scipy.org
>>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Behavior of .reduceat()

2016-03-27 Thread Jaime Fernández del Río
Two of the oldest issues in the tracker (#834
 and #835
) are about how .reduceat()
handles its indices parameter. I have been taking a look at the source
code, and it would be relatively easy to modify, the hardest part being to
figure out what the exact behavior should be.

Current behavior is that np.ufunc.reduceat(x, ind) returns
[np.ufunc.reduce(a[ind[i]:ind[i+1]]
for i in range(len(ind))] with a couple of caveats:

   1. if ind[i] >= ind[i+1], then a[ind[i]] is returned, rather than a
   reduction over an empty slice.
   2. an index of len(ind) is appended to the indices argument, to be used
   as the endpoint of the last slice to reduce over.
   3. aside from this last case, the indices are required to be strictly
   inbounds, 0 <= index < len(x), or an error is raised

The proposed new behavior, with some optional behaviors, would be:

   1. if ind[i] >= ind[i+1], then a reduction over an empty slice, i.e. the
   ufunc identity, is returned. This includes raising an error if the ufunc
   does not have an identity, e.g. np.minimum.
   2. to fully support the "reduction over slices" idea, some form of out
   of bounds indices should be allowed. This could mean either that:
  1. only index = len(x) is allowed without raising an error, to allow
  computing the full reduction anywhere, not just as the last entry of the
  return, or
  2. allow any index in -len(x) <= index <= len(x), with the usual
  meaning given to negative values, or
  3. any index is allowed, with reduction results clipped to existing
  values (and the usual meaning for negative values).
   3. Regarding the appending of that last index of len(ind) to indices, we
   could:
  1. keep appending it, or
  2. never append it, since you can now request it without an error
  being raised, or
  3. only append it if the last index is smaller than len(x).

My thoughts on the options:

   - The minimal, more conservative approach would go with 2.1 and 3.1. And
   of course 1, if we don't implement that none of this makes sense.
   - I kind of think 2.2 or even 2.3 are a nice enhancement that shouldn't
   break too much stuff.
   - 3.2 I'm not sure about, probably hurts more than it helps at this
   point, although in a brand new design you probably would either not append
   the last index or also prepend a zero, as in np.split.
   - And 3.3 seems too magical, probably not a good idea, only listed it
   for completeness.

Any other thoughts or votes on what, if anything, should we implement, and
what the deprecation of current behavior should look like?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Make np.bincount output same dtype as weights

2016-03-26 Thread Jaime Fernández del Río
On Sat, Mar 26, 2016 at 11:10 PM, Juan Nunez-Iglesias 
wrote:

> Just to clarify, this will only affect weighted bincounts, right? I can't
> tell you in how many places my code depends on the return type being
> integer!!!
>

Indeed! Unweighted bincounts still return, as all counting operations, a
np.intp array. Sorry for the noise!

Jaime



>
>
> On 27 Mar 2016, 7:16 AM +1100, Jaime Fernández del Río <
> jaime.f...@gmail.com>, wrote:
>
> Hi all,
>
> I have just submitted a PR (#7464
> <https://github.com/numpy/numpy/pull/7464>) that fixes an enhancement
> request (#6854 <https://github.com/numpy/numpy/issues/6854>), making
> np.bincount return an array of the same type as the weights parameter.
> This is an important deviation from current behavior, which always casts
> weights to double, and always returns a double array, so I would like to
> hear what others think about the worthiness of this.  Main discussion
> points:
>
>- np.bincount now works with complex weights (yay!), I guess this
>should be a pretty uncontroversial enhancement.
>- The return is of the same type as weights, which means that small
>integers are very likely to overflow.  This is exactly what #6854
>requested, but perhaps we should promote the output for integers to a
>long, as we do in np.sum?
>- Boolean arrays stay boolean, and OR, rather than sum, the weights.
>Is this what one would want? If we decide that integer promotion is the way
>to go, perhaps booleans should go in the same pack?
>- This new implementation currently supports all of the reasonable
>native types, but has no fallback for user defined types.  I guess we
>should attempt to cast the array to double as before if no native loop can
>be found? It would be good to have a way of testing this though, any
>thoughts on how to go about this?
>- Does a behavior change like this require some deprecation period?
>What would that look like?
>- I have also added broadcasting of weights to the full size of list,
>so that one can do e.g. np.bincount([1, 2, 3], weights=2j) without
>having to tile the single weight to the size of the bins list.
>
> Any other thoughts are very welcome as well!
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Some thoughts on ufunc reduction methods

2016-03-26 Thread Jaime Fernández del Río
The following write up was triggered by issue #7265
, you may want to review that
discussion for context:

In the docs, ufunc reduction operations are explained as "cumulatively
applying the ufunc along an axis." This basically limits just about any
reduction operation to ufuncs with two inpus and one output. Another take
on the same reduction idea is that, if the inputs are a and b and the
output is c, c is the result of updating a with the value in b. With this
idea in mind, in e.g.

add(a, b) -> c = a + b,

c would be thought of as the updated value of a, after applying b to it.
One would expect that, for any registered loop suitable for this
interpretation, the types of a and c would be identical, but b could be of
some other type. As an example consider

count(a, b) -> c = a + 1,

where a and c would typically be intp, but b could have just about any type.

The power of this description of ufuncs suited for reduction is that it is
very easy to generalize beyond the current "two inputs, one output"
restriction. E.g. a "sum of squared differences" ufunc defined as:

ssqd(sd2, x, y) -> sd2 += (x - y)**2,

could be used to compute squared euclidean distances between vectors doing:

ssqd.reduce((x, y)),

without the memory overhead of current available approaches.

In general, a reduction of a ufunc with m inputs and n outputs, m > n,
would produce n results out of m - n inputs. Such a ufunc would have to
define a suitable identity to initialize each of those m - n outputs at the
beginning of any reduction, rather than the single identity ufuncs now hold.

It can be argued that this generalization of reduction is just a
redefinition of a subset of what gufuncs can do, e.g. a "sum of squared
differences" gufunc with signature (n),(n)->() would do the same as the
above reduction, probably faster. And it is not clear that getting
accumulate and reduceat for free, or reduction over multiple axes,
justifies the extra complexity. Especially since it is likely that
something similarly powerful could be built on top of gufuncs.

Where an approach such as this would shine is if we had a reduction method
that did not require a single strided memory segment to act on. Setting
aside the generalization of reduction described above, a groupby or reduceby
method would take an array of values and an array of groups, and compute a
reduction value for each of the unique groups. With the above
generalizations, one could compute, e.g. the variance over each group by
doing something like:

# how exactly does keepdims work here is not clear at all!
counts = count.reduceby(values, groups, keepdims=True)
mean = np.add.reduceby(values, groups, keepdims=True) / counts
var = np.ssqd.reduceby((values, mean), groups) / counts

I am not fully sure of whether this is just useful for this particular
little example of computing variance using a ufunc method that doesn't
exist, or if it is valid in other settings too. Implementing this would add
complexity to an already complex system, so it better be good for something!

One of the weakest points I see is the need to rely on a set of predefined
identities to start the reduction. Think of trying to generalize reduction
to gufuncs, another worthy effort, and take matrix multiplication with
signature (i,j),(j,k)->(i,k). In the current paradigm you would impose that
both inputs and the output parameters be interchangeable, which leads
rather naturally to the condition i = j = k for this reduction to be
doable. But with the new paradigm you only need that the first input and
output be interchangeable, which only provides you with j = k as a
condition. And while this is a more general behavior, the questions of what
value to set i to in the output, and what to init the output to, when
trying to do a reduction over a stack of square arrays, make it fairly
unusable. So a backup to the "two inputs, one output, initialize to some
item in the reduction array" would have to be kept in place

It is also important to note that expanding reduction to gufuncs would
probably require that we impose some iteration order guarantees on
ourselves, as the poster child for this (matrix multiplication) is not in
general a commutative operation. Would we want to do this to ourselves?
Would the features today justify the burden for ever after?

Any thoughts on where, if anywhere, to go with this, are very welcome!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Make np.bincount output same dtype as weights

2016-03-26 Thread Jaime Fernández del Río
Hi all,

I have just submitted a PR (#7464 )
that fixes an enhancement request (#6854
), making np.bincount return an
array of the same type as the weights parameter.  This is an important
deviation from current behavior, which always casts weights to double, and
always returns a double array, so I would like to hear what others think
about the worthiness of this.  Main discussion points:

   - np.bincount now works with complex weights (yay!), I guess this should
   be a pretty uncontroversial enhancement.
   - The return is of the same type as weights, which means that small
   integers are very likely to overflow.  This is exactly what #6854
   requested, but perhaps we should promote the output for integers to a
   long, as we do in np.sum?
   - Boolean arrays stay boolean, and OR, rather than sum, the weights. Is
   this what one would want? If we decide that integer promotion is the way to
   go, perhaps booleans should go in the same pack?
   - This new implementation currently supports all of the reasonable
   native types, but has no fallback for user defined types.  I guess we
   should attempt to cast the array to double as before if no native loop can
   be found? It would be good to have a way of testing this though, any
   thoughts on how to go about this?
   - Does a behavior change like this require some deprecation period? What
   would that look like?
   - I have also added broadcasting of weights to the full size of list, so
   that one can do e.g. np.bincount([1, 2, 3], weights=2j) without having
   to tile the single weight to the size of the bins list.

Any other thoughts are very welcome as well!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to generalized ufunc core dimension checking

2016-03-20 Thread Jaime Fernández del Río
On Thu, Mar 17, 2016 at 10:41 PM, Stephan Hoyer  wrote:

> On Thu, Mar 17, 2016 at 1:04 AM, Travis Oliphant 
> wrote:
>
>> I think that is a good idea.Let the user decide if scalar
>> broadcasting is acceptable for their function.
>>
>> Here is a simple concrete example where scalar broadcasting makes sense:
>>
>>
>> A 1-d dot product (the core of np.inner)   (k), (k) -> ()
>>
>> A user would assume they could call this function with a scalar in either
>> argument and have it broadcast to a 1-d array.Of course, if both
>> arguments are scalars, then it doesn't make sense.
>>
>> Having a way for the user to allow scalar broadcasting seems sensible and
>> a nice compromise.
>>
>> -Travis
>>
>
> To generalize a little bit, consider the entire family of weighted
> statistical function (mean, std, median, etc.). For example, the gufunc
> version of np.average is basically equivalent to np.inner with a bit of
> preprocessing.
>
> Arguably, it *could* make sense to broadcast weights when given a scalar:
> np.average(values, weights=1.0 / len(values)) is pretty unambiguous.
>
> That said, adding an explicit "scalar broadcasting OK flag" seems like a
> hack that will need even more special logic (e.g., so we can error if both
> arguments to np.inner are scalars).
>
> Multiple dispatch for gufunc core signatures seems like the cleaner
> solution. If you want np.inner to handle scalars, you need to supply core
> signatures (k),()->() and (),(k)->() along with (k),(k)->(). This is the
> similar to vision of three core signatures for np.matmul: (i),(i,j)->(j),
> (i,j),(j)->(i) and (i,j),(j,k)->(i,k).
>

Would the logic for such a thing be consistent? E.g. how do you decide if
the user is requesting (k),(k)->(), or (k),()->() with broadcasting over a
non-core dimension of size k in the second argument? What if your
signatures are (m, k),(k)->(m) and (k),(n,k)->(n) and your two inputs are
(m,k) and (n,k), how do you decide which one to call? Or alternatively, how
do you detect and forbid such ambiguous designs? Figuring out the dispatch
rules for the general case seems like a non-trivial problem to me.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PyData Madrid

2016-02-20 Thread Jaime Fernández del Río
On Sat, Feb 20, 2016 at 8:13 PM, David Cournapeau 
wrote:

>
>
> On Sat, Feb 20, 2016 at 5:26 PM, Kiko  wrote:
>
>>
>>
>> 2016-02-20 17:58 GMT+01:00 Ralf Gommers :
>>
>>>
>>>
>>> On Wed, Feb 17, 2016 at 9:46 PM, Sebastian Berg <
>>> sebast...@sipsolutions.net> wrote:
>>>
>>>> On Mi, 2016-02-17 at 20:59 +0100, Jaime Fernández del Río wrote:
>>>> > Hi all,
>>>> >
>>>> > I just found out there is a PyData Madrid happening in early April,
>>>> > and it would feel wrong not to go, it being my hometown and all.
>>>> >
>>>> > Aside from the usual "Who else is going? We should meet!" I was also
>>>> > thinking of submitting a proposal for a talk.  My idea was to put
>>>> > something together on "The future of NumPy indexing" and use it as an
>>>> > opportunity to raise awareness and hopefully gather feedback from
>>>> > users on the proposed changes, in sort of a "if the mountain won't
>>>> > come to Muhammad" type of thing.
>>>> >
>>>>
>>>> I guess you do know my last name means mountain in german? But if
>>>> Muhammed might come, I should really improve my arabic ;).
>>>>
>>>> In any case sounds good to me if you like to do it, I don't think I
>>>> will go, though it sounds nice.
>>>>
>>>
>>> Sounds like a good idea to me too. I like both the concrete topic, as
>>> well as just having a talk on Numpy at a PyData conference. In general
>>> there are too few (if any) talks on Numpy and other core libraries at
>>> PyData and Scipy confs I think.
>>>
>>
>> +1.
>>
>> It would be great a numpy talk from a core developer. BTW, C4P closes
>> tomorrow!!!
>>
>
With a full day to spare, I have submitted a talk proposal:

Brief Description
Advanced (a.k.a. "fancy") indexing is one of NumPy's greatest features.
It is also well known for its ability to trip and confuse beginners and
experts alike. This talk will review how it works and why it is great,
give some insight on why it is how it is, explore some of its darkest
corners, and go over some recent proposals to rationalize it.

Detailed Abstract
Advanced (a.k.a. _fancy_) indexing is one of NumPy's greatest features.
Once past the rather steep learning curve, it enables a very expressive
and powerful syntax, and makes coding a wide range of complex operations
a breeze. But this versatility comes with a dark side of surprising
results for some seemingly simple cases, and conflicts with the design
choices of more recent data analysis packages. This has led to a
viewpoint with growing support among the community that fancy indexing
may be too fancy for its own good. This talk will review the workings of
advanced indexing, highlighting where it excels, and where it falls
short, and give some context on the logic behind some design decisions.
It will also cover the existing
[NumPy Enhancement Proposal (NEP)](https://github.com/numpy/numpy/pull/6256)
to "implement an intuitive and fully featured advanced indexing."


>
>> Jaime, if you come to Madrid you know you have some beers waiting for you.
>>
>
Talk or not, I'm really looking forward to those beers and getting to meet
Juan Luis and you!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] PyData Madrid

2016-02-17 Thread Jaime Fernández del Río
Hi all,

I just found out there is a PyData Madrid happening in early April, and it
would feel wrong not to go, it being my hometown and all.

Aside from the usual "Who else is going? We should meet!" I was also
thinking of submitting a proposal for a talk.  My idea was to put something
together on "The future of NumPy indexing" and use it as an opportunity to
raise awareness and hopefully gather feedback from users on the proposed
changes, in sort of a "if the mountain won't come to Muhammad" type of
thing.

Thoughts? Comments? Anyone else going or thinking about going?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PR #7083: ENH: Added 'doane' and 'sqrt' estimators to np.histogram

2016-01-20 Thread Jaime Fernández del Río
The tests are not passing, seems like you are taking the sqrt of a negative
number, may want to check the inputs and raise a more informative error
(and add a test for it).

Jaime

On Thu, Jan 21, 2016 at 7:51 AM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> Please let me know if there is anything wrong or missing. I have added
> a couple of estimators that I find useful sometimes.
>
> -Joe
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-20 Thread Jaime Fernández del Río
There doesn't seem to be much of a consensus on the way to go, so leaving
things as they are and have been seems the wisest choice for now, thanks
for all the feedback. I will work with Greg on documenting the status quo
properly.

We probably want to follow the lead of the stdlib's random.uniform on how
the openness of the interval actually results in practice:

https://docs.python.org/3.6/library/random.html#random.uniform


On Thu, Jan 21, 2016 at 2:55 AM, Peter Creasey <
p.e.creasey...@googlemail.com> wrote:

> >> I would also point out that requiring open vs closed intervals (in
> >> doubles) is already an extremely specialised use case. In terms of
> >> *sampling the reals*, there is no difference between the intervals
> >> (a,b) and [a,b], because the endpoints have measure 0, and even with
> >> double-precision arithmetic, you are going to have to make several
> >> petabytes of random data before you hit an endpoint...
> >>
> > Petabytes ain't what they used to be ;) I remember testing some hardware
> > which, due to grounding/timing issues would occasionally goof up a
> readable
> > register. The hardware designers never saw it because they didn't test
> for
> > hours and days at high data rates. But it was there, and it would show up
> > in the data. Measure zero is about as real as real numbers...
> >
> > Chuck
>
> Actually, your point is well taken and I am quite mistaken. If you
> pick some values like uniform(low, low * (1+2**-52)) then you can hit
> your endpoints pretty easily. I  am out of practice making
> pathological tests for double precision arithmetic.
>
> I guess my suggestion would be to add the deprecation warning and
> change the docstring to warn that the interval is not guaranteed to be
> right-open.
>
> Peter
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Behavior of np.random.uniform

2016-01-19 Thread Jaime Fernández del Río
Hi all,

There is a PR (#7026 ) that
documents the current behavior of np.random.uniform when the low and high
parameters it takes do not conform to the expected low < high. Basically:

   - if low < high, random numbers are drawn from [low, high),
   - if low = high, all random numbers will be equal to low, and
   - if low > high, random numbers are drawn from (high, low] (notice the
   change in the open side of the interval.)

My only worry is that, once we document this, we can no longer claim that
it is a bug.  So I would like to hear from others what do they think.  The
other more or less obvious options would be to:

   - Raise an error, but this would require a deprecation cycle, as people
   may be relying on the current undocumented behavior.
   - Check the inputs and draw numbers from [min(low, high), max(low, high)),
   which is minimally different from current behavior.

I will be merging the current documentation changes in the next few days,
so it would be good if any concerns were voiced before that.

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Building master issues on Windows

2016-01-04 Thread Jaime Fernández del Río
On Tue, Jan 5, 2016 at 3:15 AM, G Young  wrote:

> Hello all,
>
> I've recently encountered issues building numpy off master on Windows in
> which setup.py complains that it can't find the Advapi library in any
> directories (which is an empty list).  I scanned the DLL under System32 and
> ran /sfc scannow as Administrator, and both came up clean.  Is anyone else
> encountering (or can reproduce) this issue, or is it just me?  Any
> suggestions about how to resolve this problem?
>

Can you give more details on your setup?

Jaime

(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PR for complex np.interp, question about assert_almost_equal

2015-12-22 Thread Jaime Fernández del Río
On Tue, Dec 22, 2015 at 7:42 AM, Ralf Gommers 
wrote:

>
>
> On Tue, Dec 22, 2015 at 12:55 AM, Peter Creasey <
> p.e.creasey...@googlemail.com> wrote:
>
>> Hi all,
>> I submitted a PR (#6872) for using complex numbers in np.lib.interp.
>>
>> The tests pass on my machine, but I see that the TravisCI builds are
>> giving assertion fails (on my own test) with python 3.3 and 3.5 of the
>> form:
>> > assert_almost_equal
>> > TypeError: Cannot cast array data from dtype('complex128') to
>> dtype('float64') according to the rule 'safe'
>>
>> When I was writing the test I used np.testing.assert_almost_equal with
>> complex128 as it works in my python 2.7, however having checked the
>> docstring I cannot tell what the expected behaviour should be (complex
>> or no complex allowed). Should my test be changed or the
>> assert_almost_equal?
>>
>
> Hi Peter, that error is unrelated to assert_almost_equal. What happens is
> that when you pass in a complex argument `fp` to your modified
> `compiled_interp`, you're somewhere doing a cast that's not safe and
> trigger the error at
> https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L1930.
> For what "safe casting" means, see
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.can_cast.html
>

The problem then is probably here

.

You may want to throw in a PyErr_Clear()
 when the
conversion of the fp array to NPY_DOUBLE fails before trying with
NPY_CDOUBLE, and check if it goes away.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A minor clarification no why count_nonzero is faster for boolean arrays

2015-12-17 Thread Jaime Fernández del Río
On Thu, Dec 17, 2015 at 7:37 PM, CJ Carey  wrote:

> I believe this line is the reason:
>
> https://github.com/numpy/numpy/blob/c0e48cfbbdef9cca954b0c4edd0052e1ec8a30aa/numpy/core/src/multiarray/item_selection.c#L2110
>

The magic actually happens in  count_nonzero_bytes_384, a few lines before
that (line 1986).

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Build broken

2015-12-14 Thread Jaime Fernández del Río
Hi,

Travis is repeatedly being unable to complete one of our test builds. It
all started after I merged a very simple PR that changed a single word in a
docstring, so I have a hard time believing that is the actual cause. Can
anyone who actually knows what Travis is doing take a look at the log:

https://travis-ci.org/numpy/numpy/builds/96836128

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array of random numbers fails to construct

2015-12-06 Thread Jaime Fernández del Río
On Sun, Dec 6, 2015 at 10:07 PM, Matthew Brett 
wrote:

> Hi,
>
> On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student)
>  wrote:
> > This works. A big array of eight bit random numbers is constructed:
> >
> > import numpy as np
> >
> > spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8)
> >
> >
> >
> > This fails. It eats up all 64GBy of RAM:
> >
> > spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8)
> >
> >
> > The difference is a factor of two, 2**21 rather than 2**20, for the
> extent
> > of the first axis.
>
> I think what's happening is that this:
>
> np.random.randint(0,255, (2**21,2**12))
>
> creates 2**33 random integers, which (on 64-bit) will be of dtype
> int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes
> = 512 GiB.
>

8 is only 2**3, so it is "just" 64 GiB, which also explains why the half
sized array does work, but yes, that is most likely what's happening.

Jaime

>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where is Jaime?

2015-12-06 Thread Jaime Fernández del Río
On Sun, Dec 6, 2015 at 4:15 AM, Charles R Harris 
wrote:

>
>
> On Sat, Dec 5, 2015 at 4:49 PM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>> I'm alive and well: trying to stay afloat on a sea of messaging
>> protocols, Java and Swiss bureaucracy, but doing great aside from that.
>>
>> Jaime
>>
>>
> Glad to hear it. I was beginning to worry...
>
> Java? Poor soul. Anything special about the Swiss bureaucracy? Reminds me
> of the old joke
>

Well, if you don't have a suitcase full of $100 bills, opening a bank
account in Switzerland is surprisingly difficult, especially if you are
moving here from the U.S. If immigration then decides to request from you
documents they already have, thus delaying your residence permit by a few
more weeks, the bank ends up returning your first salary, just about the
same time you have to pay a 3 month rental deposit for a small and
ridiculously expensive apartment. Everything is slowly falling into place,
but it has been an interesting ride.


> *Heaven and Hell*
>
> Heaven Is Where:
>
> The French are the chefs
> The Italians are the lovers
> The British are the police
> The Germans are the mechanics
> And the Swiss make everything run on time
>
>
> Hell is Where:
>
> The British are the chefs
> The Swiss are the lovers
> The French are the mechanics
> The Italians make everything run on time
> And the Germans are the police
>

The trains and trams do seem to run remarkably on time, but I don't think
Eva would be too happy about me setting out to test how good lovers the
Swiss are...

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where is Jaime?

2015-12-05 Thread Jaime Fernández del Río
I'm alive and well: trying to stay afloat on a sea of messaging protocols,
Java and Swiss bureaucracy, but doing great aside from that.

Jaime

On Sat, Dec 5, 2015 at 9:43 PM, Charles R Harris 
wrote:

> Anyone hear from Jaime lately? He seems to have gone missing.
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Commit rights for Jonathan J. Helmus

2015-10-31 Thread Jaime Fernández del Río
"Gruetzi!", as I just found out we say in Switzerland...
On Oct 30, 2015 8:20 AM, "Jonathan Helmus"  wrote:

> On 10/28/2015 09:43 PM, Allan Haldane wrote:
> > On 10/28/2015 05:27 PM, Nathaniel Smith wrote:
> >> Hi all,
> >>
> >> Jonathan J. Helmus (@jjhelmus) has been given commit rights -- let's all
> >> welcome him aboard.
> >>
> >> -n
> >
> > Welcome Jonathan, happy to have you on the team!
> >
> > Allan
> >
>
> Thanks you everyone for the kind welcome.  I'm looking forwarding to
> being part of them team.
>
> - Jonathan Helmus
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy Generalized Ufuncs: Pointer Arithmetic and Segmentation Faults (Debugging?)

2015-10-25 Thread Jaime Fernández del Río
HI Eleanore,

Thanks for the kind words, you are very welcome!

As for your issues, I think they are coming from the handling of the
strides you are doing in the slow_dtw_dist function.  The strides are the
number of bytes you have to advance your pointer to get to the next item.
In your code, you end up doing something akin to:

dtype *v_i = v0;
...
for (...) {
...
v_i += stride_v;
}

This, rather than increase the v_i pointer by stride_v bytes, increases it
by stride_v * sizeof(dtype), and with the npy_double you seem to be using
as dtype, sends you out of your allocated memory at a rate 8x too fast.

What you increase by stride_v has to be of char* type, so one simple
solution would be to do something like:

char *v_ptr = (char *)v0;
...
for (...) {
dtype v_val = *(dtype *)v_ptr;
...
v_ptr += stride_v;
}

And use v_val directly wherever you were dereferencing v_i before.

Jaime


On Sun, Oct 25, 2015 at 5:06 AM,  wrote:

> Dear Numpy maintainers and developers,
>
> Thanks for providing such a great numerical library!
>
> I’m currently trying to implement the Dynamic Time Warping metric as a set
> of generalised numpy ufuncs, but unfortunately, I have lasting issues with
> pointer arithmetic and segmentation faults. Is there any way that I can
> use GDB or some such to debug a python/numpy extension? Furthermore: is it
> necessary to use pointer arithmetic to access the function arguments (as
> seen on http://docs.scipy.org/doc/numpy/user/c-info.ufunc-tutorial.html)
> or is element access (operator[]) also permissible?
>
> To break it down quickly, I need to have a fast DTW distance function
> dist_dtw() with two vector inputs (broadcasting should be possible), two
> scalar parameters and one scalar output (signature: (i), (j), (), () -> ())
> usable in python for a 1-Nearest Neighbor classification algorithm. The
> extension also implements two functions compute_envelope() and
> piecewise_mean_reduction() which are used for lower-bounding based on Keogh
> and Ratanamahatana, 2005. The source code is available at
> http://pastebin.com/MunNaP7V and the prominent segmentation fault happens
> somewhere in the chain dist_dtw() —> meta_dtw_dist() —> slow_dtw_dist(),
> but I fail to pin it down.
>
> Aside from my primary questions, I wonder how to approach
> errors/exceptions and unit testing when developing numpy ufuncs. Are there
> any examples apart from the numpy manual that I could use as reference
> implementations of generalised numpy ufuncs?
>
> I would greatly appreciate some insight into properly developing
> generalised ufuncs.
>
> Best,
> Eleanore
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Workshop tonight, expect GitHub activity

2015-10-19 Thread Jaime Fernández del Río
On Mon, Oct 19, 2015 at 9:11 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> Hi all,
>
> As mentioned a few weeks ago, I am organizing a "Become an Open Source
> Contributor" workshop tonight, for the Data Science Student Society at UCSD.
>
> During this morning I will be creating a few ridiculously simple issues,
> e.g. "missing space, arrayobject --> array object", for participants to
> work on as part of the workshop. So there may also be a surge in simple PRs
> starting at around 7 PM PST.
>

Ok, so issues 6515 up to and including 6525 are mine, should have them all
fixed and closed by the end of today.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Workshop tonight, expect GitHub activity

2015-10-19 Thread Jaime Fernández del Río
Hi all,

As mentioned a few weeks ago, I am organizing a "Become an Open Source
Contributor" workshop tonight, for the Data Science Student Society at UCSD.

During this morning I will be creating a few ridiculously simple issues,
e.g. "missing space, arrayobject --> array object", for participants to
work on as part of the workshop. So there may also be a surge in simple PRs
starting at around 7 PM PST.

Please, bear with us. And refrain from fixing those issues if you are not a
workshop participant!

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Numpy for data manipulation

2015-10-01 Thread Jaime Fernández del Río
On Thu, Oct 1, 2015 at 11:46 AM, Alex Rogozhnikov <
alex.rogozhni...@yandex.ru> wrote:

> Hi, I have written some numpy tips and tricks I am using, which may be
> interesting to you.
> This is quite long reading, so I've splitted it into two parts:
>
> http://arogozhnikov.github.io/2015/09/29/NumpyTipsAndTricks1.html


The recommendation of inverting a permutation by argsort'ing it, while it
works, is suboptimal, as it takes O(n log(n)) time, and you can do it in
linear time:

In [14]: import numpy as np

In [15]: arr = np.random.rand(10)

In [16]: perm = arr.argsort()

In [17]: perm
Out[17]: array([5, 0, 9, 4, 2, 8, 6, 7, 1, 3])

In [18]: inv_perm = np.empty_like(perm)

In [19]: inv_perm[perm] = np.arange(len(perm))

In [20]: np.all(inv_perm == perm.argsort())
Out[20]: True

It does require two lines of code, so for small stuff it is probably good
enough to argsort, but it gave e.g. np.unique a nice boost on larger arrays
when we applied it there.

Jaime


>
> http://arogozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html
>
> Comments are welcome, specially if you know any other ways to make this
> code faster (or better).
>
> Regards,
> Alex.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] composition of the steering council (was Re: Governance model request)

2015-09-23 Thread Jaime Fernández del Río
For what it is worth, I'm +1 on including any of the "founding fathers"
(and uncles!) that wish to be included in the committee, or council, or
however we ended up calling it.  I'm also OK with singling out Travis as
creator of NumPy in the wording of the document.

Especially since the prerogative of members is not to **VOTE**, but to
**VETO**, I don't see adding more people having any effect on the
individual power of members, so my +1 stands regardless of what the total
member count is.

Jaime


On Wed, Sep 23, 2015 at 4:08 PM, Charles R Harris  wrote:

>
>
> On Wed, Sep 23, 2015 at 3:21 PM, Travis Oliphant 
> wrote:
>
>>
>>
>>> Regarding the seed council, I just tried to pick an objective
>>> criterion and an arbitrary date that seemed generally in keeping with
>>> idea of "should be active in the last 1-to-2-years-ish". Fiddling with
>>> the exact date in particular makes very little difference -- between
>>> pushing it back to 2 years ago today or forward to 1 year ago today,
>>> the only thing that changes is whether Pauli makes the list or not.
>>> (And Pauli is obviously a great council candidate, though I don't know
>>> whether he even wants to be on it.)
>>>
>>> > Personally, I have no idea how big the council should be. Too big, and
>>> > there is no point, consensus is harder to reach the larger the group,
>>> > and the main (only?) role of the council is to resolve issues where
>>> > consensus has not been reached in the larger community. But what is
>>> > too big?
>>>
>>>
>>> > As for make-up of the council, I think we need to expand beyond people
>>> > who have recently contributed core code.
>>> >
>>> > Yes, the council does need to have expertise to make technical
>>> > decisions, but if you think about the likely contentious issues like
>>> > ABI breakage, a core-code focused view is incomplete. So there should
>>> > be representation by:
>>> >
>>> > Someone(s) with a long history of working with the code -- that
>>> > institutional memory of why decisions were made the way they were
>>> > could be key.
>>>
>>> Sure -- though I can't really imagine any way of framing a rule like
>>> this that *wouldn't* be satisfied by Chuck + Ralf + Pauli, so my guess
>>> is that such a rule would not actually have any effect on the council
>>> membership in practice.
>>>
>>
>> As the original author of NumPy, I would like to be on the seed council
>> as long as it is larger than 7 people.That is my proposal.I don't
>> need to be a permanent member, but I do believe I have enough history that
>> I can understand issues even if I haven't been working on code directly.
>>
>>
>> I think I do bring history and information that provides all of the
>> history that could be helpful on occasion. In addition, if a matter is
>> important enough to even be brought to the attention of this council, I
>> would like to be involved in the discussion about it.
>>
>> It's a simple change to the text --- basically an explanation that Travis
>> requested to be on the seed council.
>>
>
> I too would like you to be a member. We could either write it into the
> text in recognition of your status as the Numpy creator, or it could be the
> first order of business. I would only ask that you give yourself some time
> to become familiar with how things work and the people involved in the
> current community. It has been some years since you have been active in
> code development.
>
> Chuck
>
>>
>>
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] "Become an Open Source Contributor" workshop

2015-09-23 Thread Jaime Fernández del Río
Apologies for the cross-posting.

The Data Science Student Society of the University of California San Diego,
or DS3 @ UCSD as they like to call themselves, will be holding biweekly
Python themed workshops starting this fall.  On the week of October 19th,
they will be having yours truly doing a "Become an Open Source Contributor"
piece.  It will be a shortish event, 60-90 minutes, so my idea was to cover
the following:

   1. (15 min) An introduction to the Python data science landscape.
   2. (30 min) An overview of the GitHub workflow that most (all?) of the
   projects follow.
   3. (30-45 min) A hands on session, where we would make sure everyone
   gets set up in GitHub, and forks and clones their favorite project.  Time
   and participant willingness permitting, I would like to take advantage of
   my commit bits, and have some of the participants submit a simple PR, e.g.
   fixing a documentation typo, to NumPy or SciPy, and hit the green button
   right there, so that they get to leave as knighted FOSS contributors.

And this is what I am hoping to get from you, the community:

   1. If anyone in the area would like to get involved, please contact me.
   I have recruited a couple of volunteers from PySanDiego, but could use more
   help.
   2. I'm also hoping to get some help, especially with the introductory
   part.  Given that the crowd will mostly be university students and some
   faculty, it would be great if someone who actually knew what they were
   talking about could deliver a short, 10 minute talk, on Python, data
   science, and academia.  I'm sure we could arrange it to have someone join
   by video conference.
   3. If you have organized anything similar in the past, and have material
   that I could use to, ahem, draw inspiration from, or recommendations to
   make, or whatever, I'd love to hear from you.

Thanks for reading!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] facebook, twitter, and g+

2015-09-22 Thread Jaime Fernández del Río
+1 for twitter
+0 for the others

On Tue, Sep 22, 2015 at 8:14 PM, Bryan Van de Ven 
wrote:

>
> > On Sep 22, 2015, at 9:58 PM, Charles R Harris 
> wrote:
> >
> > Hi All,
> >
> > Just posting to elicit thoughts about scipy.org having a presence in
> social media for announcements.
>
> Of the ones listed in the subject, I would suggest Twitter is the most
> valuable. It has been great for release and other announcements. I find
> Facebook and G+ a little odd for software projects personally, though I
> guess they could be useful if you wanted to have somewhat longer "posts".
> But of course that's also yet more for someone to have maintain.
>
> One additional consideration is that some people will inevitably attempt
> to elicit technical support over these channels. I generally have to steer
> people to the mailing list immediately.
>
> Bryan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Commit rights for Allan Haldane

2015-09-22 Thread Jaime Fernández del Río
Congrats Allan!

Jaime

On Tue, Sep 22, 2015 at 11:54 AM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

> Hi All,
>
> Allan Haldane has been given commit rights. Here's to the new member of
> the team.
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-21 Thread Jaime Fernández del Río
We have the PyArrayObject vs PyArrayObject_fields definition in
ndarraytypes.h that is used to enforce access to the members through inline
functions rather than directly,  which seems to me like the right way to
go: don't leave stones unturned, hide everything and provide PyUFunc_NIN,
PyUFunc_NOUT and friends to handle those too.

On Sun, Sep 20, 2015 at 9:13 PM, Nathaniel Smith  wrote:

> Hi all,
>
> Here's a first draft NEP for comments.
>
> --
>
> Synopsis
> 
>
> Improving numpy's dtype system requires that ufunc loops start having
> access to details of the specific dtype instance they are acting on:
> e.g. an implementation of np.equal for strings needs access to the
> dtype object in order to know what "n" to pass to strncmp. Similar
> issues arise with variable length strings, missing values, categorical
> data, unit support, datetime with timezone support, etc. -- this is a
> major blocker for improving numpy.
>
> Unfortunately, the current ufunc inner loop function signature makes
> it very difficult to provide this information. We might be able to
> wedge it in there, but it'd be ugly.
>
> The other option would be to change the signature. What would happen
> if we did this? For most common uses of the C API/ABI, we could do
> this easily while maintaining backwards compatibility. But there are
> also some rarely-used parts  of the API/ABI that would be
> prohibitively difficult to preserve.
>
> In addition, there are other potential changes to ufuncs on the
> horizon (e.g. extensions of gufuncs to allow them to be used more
> generally), and the current API exposure is so massive that any such
> changes will be difficult to make in a fully compatible way. This NEP
> thus considers the possibility of closing down the ufunc API to a
> minimal, maintainable subset of the current API.
>
> To better understand the consequences of this potential change, I
> performed an exhaustive analysis of all the code on Github, Bitbucket,
> and Fedora, among others. The results make me highly confident that of
> all the publically available projects in the world, the only ones
> which touch the problematic parts of the ufunc API are: Numba,
> dynd-python, and `gulinalg `_
> (with the latter's exposure being trivial).
>
> Given this, I propose that for 1.11 we:
> 1) go ahead and hide/disable the problematic parts of the ABI/API,
> 2) coordinate with the known affected projects to minimize disruption
> to their users (which is made easier since they are all projects that
> are almost exclusively distributed via conda, which enforces strict
> NumPy ABI versioning),
> 3) publicize these changes widely so as to give any private code that
> might be affected a chance to speak up or adapt, and
> 4) leave the "ABI version tag" as it is, so as not to force rebuilds
> of the vast majority of projects that will be unaffected by these
> changes.
>
> This NEP defers the question of exactly what the improved API should
> be, since there's no point in trying to nail down the details until
> we've decided whether it's even possible to change.
>
>
> Details
> ===
>
> The problem
> ---
>
> Currently, a ufunc inner loop implementation is called via the
> following function prototype::
>
> typedef void (*PyUFuncGenericFunction)
> (char **args,
>  npy_intp *dimensions,
>  npy_intp *strides,
>  void *innerloopdata);
>
> Here ``args`` is an array of pointers to 1-d buffers of input/output
> data, ``dimensions`` is a pointer to the number of entries in these
> buffers, ``strides`` is an array of integers giving the strides for
> each input/output array, and ``innerloopdata`` is an arbitrary void*
> supplied by whoever registered the ufunc loop. (For gufuncs, extra
> shape and stride information about the core dimensions also gets
> packed into the ends of these arrays in a somewhat complicated way.)
>
> There are 4 key items that define a NumPy array: data, shape, strides,
> dtype. Notice that this function only gets access to 3 of them. Our
> goal is to fix that. For example, a better signature would be::
>
> typedef void (*PyUFuncGenericFunction_NEW)
> (char **data,
>  npy_intp *shapes,
>  npy_intp *strides,
>  PyArray_Descr *dtypes,   /* NEW */
>  void *innerloopdata);
>
> (In practice I suspect we might want to make some more changes as
> well, like upgrading gufunc core shape/strides to proper arguments
> instead of tacking it onto the existing arrays, and adding an "escape
> valve" void* reserved for future extensions. But working out such
> details is outside the scope of this NEP; the above will do for
> illustration.)
>
> The goal of this NEP is to clear the ground so that we can start
> supporting ufunc inner loops that take dtype arguments, and make other
> enhancements to ufunc functionality going forward.
>
>
> Pr

Re: [Numpy-discussion] np.sign and object comparisons

2015-09-01 Thread Jaime Fernández del Río
On Mon, Aug 31, 2015 at 11:49 PM, Nathaniel Smith  wrote:

> On Sun, Aug 30, 2015 at 9:09 PM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>>
>> There are three ways of fixing this that I see:
>>
>>1. Arbitrarily choose a value to set the return to. This is
>>equivalent to choosing a default return for `cmp` for comparisons. This
>>preserves behavior, but feels wrong.
>>2. Similarly to how np.sign of a floating point array with nans
>>returns nan for those values, return e,g, None for these cases. This
>>is my preferred option.
>>3. Raise an error, along the lines of the TypeError: unorderable types
>>that 3.x produces for some comparisons.
>>
>> Having read the other replies so far -- given that no-one seems to have
> any clear intuition or use cases, I guess I find option 3 somewhat
> tempting... it keeps our options open until someone who actually cares
> comes along with a use case to hone our intuition on, and is very safe in
> the mean time.
>
> (This was noticed in the course of routine code cleanups, right, not an
> external bug report? For all we know right now, no actual user has ever
> even tried to apply np.sign to an object array?)
>

We do have a user that tried np.sign on an object array, and discovered
that our Py3K object comparison was crap:
https://github.com/numpy/numpy/issues/6229

No report of anyone trying np.sign on anything other than numbers that we
know of, though.

I'm starting to think that, given the lack of agreement, I thinking I am
going to agree with you that raising an error may be the better option,
because it's the least likely to break people's code if we later find we
need to change it.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.sign and object comparisons

2015-08-30 Thread Jaime Fernández del Río
There's been some work going on recently on Py2 vs Py3 object comparisons.
If you want all the background, see gh-6265
 and follow the links there.

There is a half baked PR in the works, gh-6269
, that tries to unify behavior
and fix some bugs along the way, by replacing all 2.x uses of
PyObject_Compare with several calls to PyObject_RichCompareBool, which is
available on 2.6, the oldest Python version we support.

The poster child for this example is computing np.sign on an object array
that has an np.nan entry. 2.x will just make up an answer for us:

>>> cmp(np.nan, 0)
-1

even though none of the relevant compares succeeds:

>>> np.nan < 0
False
>>> np.nan > 0
False
>>> np.nan == 0
False

The current 3.x is buggy, so the fact that it produces the same made up
result as in 2.x is accidental:

>>> np.sign(np.array([np.nan], 'O'))
array([-1], dtype=object)

Looking at the code, it seems that the original intention was for the
answer to be `0`, which is equally made up but perhaps makes a little more
sense.

There are three ways of fixing this that I see:

   1. Arbitrarily choose a value to set the return to. This is equivalent
   to choosing a default return for `cmp` for comparisons. This preserves
   behavior, but feels wrong.
   2. Similarly to how np.sign of a floating point array with nans returns
   nan for those values, return e,g, None for these cases. This is my
   preferred option.
   3. Raise an error, along the lines of the TypeError: unorderable types
   that 3.x produces for some comparisons.

Thoughts anyone?

Jaime
-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-08-28 Thread Jaime Fernández del Río
On Fri, Aug 28, 2015 at 2:40 AM, Sebastian Berg 
wrote:

> On Fr, 2015-08-28 at 09:46 +0100, Matthew Brett wrote:
> > Hi,
> >
> > On Fri, Aug 28, 2015 at 5:59 AM, Jaime Fernández del Río
> >  wrote:
> > > On Thu, Aug 27, 2015 at 11:06 AM, Matthew Brett <
> matthew.br...@gmail.com>
> > > wrote:
> > >>
> > >> Hi,
> > >>
> > >> On Thu, Aug 27, 2015 at 6:23 PM,   wrote:
> > >> >
> > >> >
> > >> > On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett
> > >> > 
> > >> > wrote:
> > >> >>
> > >> >> Hi
> > >> >>
> > >> >> On Thu, Aug 27, 2015 at 5:11 PM,   wrote:
> > >> >> >
> > >> >> >
> > >> >> > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett
> > >> >> > 
> > >> >> > wrote:
> > >> >> >>
> > >> >> >> Hi,
> > >> >> >>
> > >> >> >> On Thu, Aug 27, 2015 at 3:34 PM,   wrote:
> > >> >> >> [snip]
> > >> >> >> > I don't really see a problem with "codifying" the status quo.
> > >> >> >>
> > >> >> >> That's an excellent point.If we believe that the current
> > >> >> >> situation
> > >> >> >> is the best possible, both now and in the future, then
> codifying the
> > >> >> >> status quo is an excellent idea.
> > >> >> >>
> > >> >> >> So, we should probably first start by asking ourselves:
> > >> >> >>
> > >> >> >> * what numpy is doing well;
> > >> >> >> * what numpy could do better;
> > >> >> >>
> > >> >> >> and then ask, is there some way we could make it more likely we
> will
> > >> >> >> improve over time.
> > >> >> >>
> > >> >> >> [snip]
> > >> >> >>
> > >> >> >> > As the current debate shows it's possible to have a public
> > >> >> >> > discussion
> > >> >> >> > about
> > >> >> >> > the direction of the project without having to delegate
> providing
> > >> >> >> > a
> > >> >> >> > vision
> > >> >> >> > to a president.
> > >> >> >>
> > >> >> >> The idea of a president that I had in mind, was not someone who
> > >> >> >> makes
> > >> >> >> all decisions, but the person who holds themselves responsible
> for
> > >> >> >> the
> > >> >> >> performance of the project.  If the project has a coherent
> vision
> > >> >> >> already, the president has no need to provide one, but it's the
> > >> >> >> president's job to worry about whether we have vision or not,
> and do
> > >> >> >> what they need to, to make sure we don't lose track of that.
>  If
> > >> >> >> you
> > >> >> >> don't know it already, I highly recommend Jim Collins' work on
> > >> >> >> 'level
> > >> >> >> 5 leadership' [1]
> > >> >> >
> > >> >> >
> > >> >> > Still doesn't sound like the need for a president to me
> > >> >> >
> > >> >> > " the person who holds themselves responsible for the
> > >> >> > performance of the project"
> > >> >> >
> > >> >> > sounds more like the role of the "core" group (adding plural to
> > >> >> > persons)
> > >> >> > to
> > >> >> > me, and cannot be pushed of to an official president.
> > >> >>
> > >> >> Except that, in the past, having multiple people taking decisions
> has
> > >> >> led to the situation where no-one feels themselves accountable for
> the
> > >> >> result, hence this situation tends to lead to stagnation.
> > >> >
> > >> >
> > >> > Is there any evidence for this?
> > >>
> > >> Oh -

Re: [Numpy-discussion] Numpty FFT.FFT slow with certain samples

2015-08-28 Thread Jaime Fernández del Río
On Fri, Aug 28, 2015 at 11:02 AM, Joseph Codadeen 
wrote:

> Hi,
>
> I am a numpy newbie.
>
> I have two wav files, one that numpy takes a long time to process the FFT.
> They was created within audacity using white noise and silence for gaps.
>
>
>1. my_1_minute_noise_with_gaps.wav
>2. my_1_minute_noise_with_gaps_truncated.wav
>
>
> The files are very similar in the following way;
>
>
>- 1. is white noise with silence gaps on every 15 second interval.
>- 2. is 1. but slightly shorter, i.e. I trimmed some ms off the end
>but it still has the last gap at 60s.
>
>
> The code I am using processes the file like this;
>
> framerate, data = scipy.io.wavfile.read(filepath)
> right = data[:, 0]
> # Align it to be efficient.
> if len(right) % 2 != 0:
> right = right[range(len(right) - 1)]
> noframes = len(right)
> fftout = np.fft.fft(right) / noframes# <<< I am timing this cmd
>
> Using timeit...
>
>
>- my_1_minute_noise_with_gaps_truncated took *30.75620985s* to process.
>- my_1_minute_noise_with_gaps took *22307.13917s* to process.
>
>
> Could someone tell me why this behaviour is happening please?
>
> Sorry I can't attach the files as this email gets bounced but you could
> easily create the files yourself.
> E.g my last gap width is 59.9995 - 1:00.0005, I repeat this every 15
> seconds.
> My truncated file is 1:00.0015s long, non-truncated is 1:00.0833s long
>

It is almost certainly caused by the number of samples in your signals,
i.e. look at what noframes is in one case and the other.

You will get best performance when noframes is a power of two, or has a
factorization that includes many small integers (2, 3, 5, perhaps also 7),
and the worst if the size is a prime number.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-08-27 Thread Jaime Fernández del Río
On Thu, Aug 27, 2015 at 11:06 AM, Matthew Brett 
wrote:

> Hi,
>
> On Thu, Aug 27, 2015 at 6:23 PM,   wrote:
> >
> >
> > On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett  >
> > wrote:
> >>
> >> Hi
> >>
> >> On Thu, Aug 27, 2015 at 5:11 PM,   wrote:
> >> >
> >> >
> >> > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett
> >> > 
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Thu, Aug 27, 2015 at 3:34 PM,   wrote:
> >> >> [snip]
> >> >> > I don't really see a problem with "codifying" the status quo.
> >> >>
> >> >> That's an excellent point.If we believe that the current
> situation
> >> >> is the best possible, both now and in the future, then codifying the
> >> >> status quo is an excellent idea.
> >> >>
> >> >> So, we should probably first start by asking ourselves:
> >> >>
> >> >> * what numpy is doing well;
> >> >> * what numpy could do better;
> >> >>
> >> >> and then ask, is there some way we could make it more likely we will
> >> >> improve over time.
> >> >>
> >> >> [snip]
> >> >>
> >> >> > As the current debate shows it's possible to have a public
> discussion
> >> >> > about
> >> >> > the direction of the project without having to delegate providing a
> >> >> > vision
> >> >> > to a president.
> >> >>
> >> >> The idea of a president that I had in mind, was not someone who makes
> >> >> all decisions, but the person who holds themselves responsible for
> the
> >> >> performance of the project.  If the project has a coherent vision
> >> >> already, the president has no need to provide one, but it's the
> >> >> president's job to worry about whether we have vision or not, and do
> >> >> what they need to, to make sure we don't lose track of that.   If you
> >> >> don't know it already, I highly recommend Jim Collins' work on 'level
> >> >> 5 leadership' [1]
> >> >
> >> >
> >> > Still doesn't sound like the need for a president to me
> >> >
> >> > " the person who holds themselves responsible for the
> >> > performance of the project"
> >> >
> >> > sounds more like the role of the "core" group (adding plural to
> persons)
> >> > to
> >> > me, and cannot be pushed of to an official president.
> >>
> >> Except that, in the past, having multiple people taking decisions has
> >> led to the situation where no-one feels themselves accountable for the
> >> result, hence this situation tends to lead to stagnation.
> >
> >
> > Is there any evidence for this?
>
> Oh - dear - that's the key point, but I'm obviously not making it
> clearly enough.  Yes there is, and that was the evidence I was
> pointing to before.
>
> But anyway - Sebastian is right - this discussion isn't going anywhere
> useful.
>
> So - let's step back.
>
> In thinking about governance, we first need to ask what we want to
> achieve.  This includes considering the risks ahead for the project.
>
> So, in the spirit of fruitful discussion, can I ask what y'all
> consider to be the current problems with working on numpy (other than
> the technical ones).   What is numpy doing well, and what is it doing
> badly? What risks do we have to plan for in the future?
>


Are you trying to prove the point that consensus doesn't work by making it
impossible to reach a consensus on this? ;-)


One thing we are doing very badly is leveraging resources outside of
contributions of work and time from individuals.  Getting sponsors to
finance work on what is the cornerstone of just about any Python package
that has to add two numbers together shouldn't be too hard, especially
seeing success stories like Jupyter's, who I believe has several paid
developers working full time.  That requires formalizing governance,
because apparently sponsors are a little wary of giving money to "people on
the internet". ;-)  Fernando Pérez was extremely emphatic about the size of
the opportunity NumPy was letting slip by not formalizing *any* governance
model.  And it is a necessary first step so that e.g. we have the money to,
say a year from now, get the right people together for a couple of days to
figure out a better governance model.  I'd argue that money would be better
spent financing a talented developer to advance e.g. Nathaniel's new dtype
system to end all dtype systems, but that's a different story.

Largely because of the above, even if Nathaniel's document involved tossing
a coin to resolve disputes, I'd rather have that now than something much
better never. Because there really is no alternative to Nathaniel's
write-up of the status quo, other than the status quo without a write-up:
it has taken him two months to put this draft together, **after** we agreed
over several hours of face to face discussion on what the model should be.
And I'm sure he has hated every minute he has had to put into it.  So if we
keep going around this in circles, after a few days we will all grow tired
and go back to fighting over whether indexing should transpose subspaces or
not, and all that other cool stuff we really enjoy. And a year from now we
will be in the same place we are 

Re: [Numpy-discussion] Fwd: Reverse(DESC)-ordered sorting

2015-08-19 Thread Jaime Fernández del Río
On Wed, Aug 19, 2015 at 1:10 PM, Feng Yu  wrote:

> Dear list,
>
> This is forwarded from issue 6217
> https://github.com/numpy/numpy/issues/6217
>
> "What is the way to implement DESC ordering in the sorting routines of
> numpy?"
>
> (I am borrowing DESC/ASC from the SQL notation)
>
> For a stable DESC ordering sort, one can not  revert the sorted array via
> argsort()[::-1] .
>
> I propose the following API change to argsorts/sort. (haven't thought
> about lexsort yet) I will use argsort as an example.
>
> Currently, argsort supports sorting by keys ('order') and by 'axis'.
> These two somewhat orthonal interfaces need to be treated differently.
>
> 1. by axis.
>
> Since there is just one sorting key, a single 'reversed' keyword
> argument is sufficient:
>
> a.argsort(axis=0, kind='merge', reversed=True)
>
> Jaime suggested this can be implemented efficiently as a
> post-processing step.
> (https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is
> there a reference to the algorithm?
>

My thinking was that, for native types, you can stably reverse a sorted
permutation in-place by first reversing item-by-item, then reversing every
chunk of repeated entries. Sort of the way you would reverse the words in a
sentence in-place: first reverse every letter, then reverse everything
bounded by spaces:

TURN ME AROUND -> DNUORA EM NRUT -> AROUND EM NRUT -> AROUND ME NRUT ->
AROUND ME TURN

We could even add a type-specific function to do this for each of the
native types in the npy_sort library.

As I mentioned in Yu's very nice PR
, probably it is best to leave
the signature of the function alone, and have something like order='desc'
be the trigger for the proposed reversed=True.

Jaime


>
> Because all of the sorting algorithms for 'atomic' dtypes are using
> _LT functions, a post processing step seems to be the only viable
> solution, if possible.


>
> 2. by fields, ('order' kwarg)
>
> A single 'reversed' keyword argument will not work, because some keys
> are ASC but others are DESC, for example, sorting my LastName ASC,
> then Salary DESC.
>
> a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'),
> ('Salary', 'desc'))])
>
> The parsing rule of order is:
>
> - if an item is tuple, the first item is the fieldname, the second
> item is DESC/ASC
> - if an item is scalar, the fieldname is the item, the ordering is ASC.
>
> This part of the code already goes to VOID_compare, which walks a
> temporary copy of a.dtype to call the comparison functions.
>
> If I understood the purpose of c_metadata (numpy 1.7+) correctly, the
> ASC/DESC flags, offsets, and comparison functions can all be
> pre-compiled and passed into VOID_compare via c_metadata of the
> temporary type-descriptor.
>
> By just looking this will actually make VOID_compare faster by
> avoiding calling several Python C-API functions. negating the return
> value of cmp seems to be a negligable overhead in such a complex
> function.


> 3. If both reversed and order is given, the ASC/DESC fields in 'order'
> are effectively reversed.
>
> Any comments?
>
> Best,
>
> Yu
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to np.digitize since NumPy 1.9?

2015-08-13 Thread Jaime Fernández del Río
On Thu, Aug 13, 2015 at 9:57 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum 
> wrote:
>
>>
>>
>> On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>>
>>> On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río <
>>> jaime.f...@gmail.com> wrote:
>>>
>>>> On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum >>> > wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I've been testing the package I spend most of my time on, yt, under
>>>>> numpy 1.10b1 since the announcement went out.
>>>>>
>>>>> I think I've narrowed down and fixed all of the test failures that
>>>>> cropped up except for one last issue. It seems that the behavior of
>>>>> np.digitize with respect to ndarray subclasses has changed since the NumPy
>>>>> 1.9 series. Consider the following test script:
>>>>>
>>>>> ```python
>>>>> import numpy as np
>>>>>
>>>>>
>>>>> class MyArray(np.ndarray):
>>>>> def __new__(cls, *args, **kwargs):
>>>>> return np.ndarray.__new__(cls, *args, **kwargs)
>>>>>
>>>>> data = np.arange(100)
>>>>>
>>>>> bins = np.arange(100) + 0.5
>>>>>
>>>>> data = data.view(MyArray)
>>>>>
>>>>> bins = bins.view(MyArray)
>>>>>
>>>>> digits = np.digitize(data, bins)
>>>>>
>>>>> print type(digits)
>>>>> ```
>>>>>
>>>>> Under NumPy 1.9.2, this prints "", but under the
>>>>> 1.10 beta, it prints ""
>>>>>
>>>>> I'm curious why this change was made. Since digitize outputs index
>>>>> arrays, it doesn't make sense to me why it should return anything but a
>>>>> plain ndarray. I see in the release notes that digitize now uses
>>>>> searchsorted under the hood. Is this related?
>>>>>
>>>>
>>>> It is indeed searchsorted's fault, as it returns an object of the same
>>>> type as the needle (the items to search for):
>>>>
>>>> >>> import numpy as np
>>>> >>> class A(np.ndarray): pass
>>>> >>> class B(np.ndarray): pass
>>>> >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B))
>>>> B([0, 1, 2, 3, 4])
>>>>
>>>> I am all for making index-returning functions always return a base
>>>> ndarray, and will be more than happy to send a PR fixing this if there is
>>>> some agreement.
>>>>
>>>
>>> I think that is the right thing to do.
>>>
>>
>> Awesome, I'd appreciate having a PR to fix this. Arguably the return type
>> *could* be the same type as the inputs, but given that it's a behavior
>> change I agree that it's best to add a patch so the output of serachsorted
>> is "sanitized" to be an ndarray before it's returned by digitize.
>>
>
> It is relatively simple to do, just replace Py_TYPE(ap2) with
> &PyArray_Type in this line:
>
>
> https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multiarray/item_selection.c#L1725
>
> Then fix all the tests that are expecting searchsorted to return something
> else than a base ndarray. We already have modified nonzero to return base
> ndarray's in this release, see the release notes, so it will go with the
> same theme.
>
> For 1.11 I think we should try to extend this "if it returns an index, it
> will be a base ndarray" to all other functions that don't right now. Then
> sit back and watch AstroPy come down in flames... ;-)))
>
> Seriously, I think this makes a lot of sense, and should be documented as
> the way NumPy handles index arrays.
>
> Anyway, I will try to find time tonight to put this PR together, unless
> someone beats me to it, which I would be totally fine with.
>

PR #6206 it is: https://github.com/numpy/numpy/pull/6206

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to np.digitize since NumPy 1.9?

2015-08-13 Thread Jaime Fernández del Río
On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum 
wrote:

>
>
> On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río <
>> jaime.f...@gmail.com> wrote:
>>
>>> On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've been testing the package I spend most of my time on, yt, under
>>>> numpy 1.10b1 since the announcement went out.
>>>>
>>>> I think I've narrowed down and fixed all of the test failures that
>>>> cropped up except for one last issue. It seems that the behavior of
>>>> np.digitize with respect to ndarray subclasses has changed since the NumPy
>>>> 1.9 series. Consider the following test script:
>>>>
>>>> ```python
>>>> import numpy as np
>>>>
>>>>
>>>> class MyArray(np.ndarray):
>>>> def __new__(cls, *args, **kwargs):
>>>> return np.ndarray.__new__(cls, *args, **kwargs)
>>>>
>>>> data = np.arange(100)
>>>>
>>>> bins = np.arange(100) + 0.5
>>>>
>>>> data = data.view(MyArray)
>>>>
>>>> bins = bins.view(MyArray)
>>>>
>>>> digits = np.digitize(data, bins)
>>>>
>>>> print type(digits)
>>>> ```
>>>>
>>>> Under NumPy 1.9.2, this prints "", but under the
>>>> 1.10 beta, it prints ""
>>>>
>>>> I'm curious why this change was made. Since digitize outputs index
>>>> arrays, it doesn't make sense to me why it should return anything but a
>>>> plain ndarray. I see in the release notes that digitize now uses
>>>> searchsorted under the hood. Is this related?
>>>>
>>>
>>> It is indeed searchsorted's fault, as it returns an object of the same
>>> type as the needle (the items to search for):
>>>
>>> >>> import numpy as np
>>> >>> class A(np.ndarray): pass
>>> >>> class B(np.ndarray): pass
>>> >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B))
>>> B([0, 1, 2, 3, 4])
>>>
>>> I am all for making index-returning functions always return a base
>>> ndarray, and will be more than happy to send a PR fixing this if there is
>>> some agreement.
>>>
>>
>> I think that is the right thing to do.
>>
>
> Awesome, I'd appreciate having a PR to fix this. Arguably the return type
> *could* be the same type as the inputs, but given that it's a behavior
> change I agree that it's best to add a patch so the output of serachsorted
> is "sanitized" to be an ndarray before it's returned by digitize.
>

It is relatively simple to do, just replace Py_TYPE(ap2) with &PyArray_Type
in this line:

https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multiarray/item_selection.c#L1725

Then fix all the tests that are expecting searchsorted to return something
else than a base ndarray. We already have modified nonzero to return base
ndarray's in this release, see the release notes, so it will go with the
same theme.

For 1.11 I think we should try to extend this "if it returns an index, it
will be a base ndarray" to all other functions that don't right now. Then
sit back and watch AstroPy come down in flames... ;-)))

Seriously, I think this makes a lot of sense, and should be documented as
the way NumPy handles index arrays.

Anyway, I will try to find time tonight to put this PR together, unless
someone beats me to it, which I would be totally fine with.

Jaime


>
> To answer Nathaniel's question, I opened an issue on yt's bitbucket page
> to record the test failures:
>
>
> https://bitbucket.org/yt_analysis/yt/issues/1063/new-test-failures-using-numpy-110-beta
>
> I've fixed two of the classes of errors in that bug in yt itself, since it
> looks like we were relying on buggy or deprecated behavior in NumPy. Here
> are the PRs for those fixes:
>
>
> https://bitbucket.org/yt_analysis/yt/pull-requests/1697/cast-enzo-grid-start-index-to-int-arrays/diff
>
> https://bitbucket.org/yt_analysis/yt/pull-requests/1696/add-assert_allclose_units-like/diff
>
>>
>> Chuck
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to np.digitize since NumPy 1.9?

2015-08-12 Thread Jaime Fernández del Río
On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum 
wrote:

> Hi all,
>
> I've been testing the package I spend most of my time on, yt, under numpy
> 1.10b1 since the announcement went out.
>
> I think I've narrowed down and fixed all of the test failures that cropped
> up except for one last issue. It seems that the behavior of np.digitize
> with respect to ndarray subclasses has changed since the NumPy 1.9 series.
> Consider the following test script:
>
> ```python
> import numpy as np
>
>
> class MyArray(np.ndarray):
> def __new__(cls, *args, **kwargs):
> return np.ndarray.__new__(cls, *args, **kwargs)
>
> data = np.arange(100)
>
> bins = np.arange(100) + 0.5
>
> data = data.view(MyArray)
>
> bins = bins.view(MyArray)
>
> digits = np.digitize(data, bins)
>
> print type(digits)
> ```
>
> Under NumPy 1.9.2, this prints "", but under the
> 1.10 beta, it prints ""
>
> I'm curious why this change was made. Since digitize outputs index arrays,
> it doesn't make sense to me why it should return anything but a plain
> ndarray. I see in the release notes that digitize now uses searchsorted
> under the hood. Is this related?
>

It is indeed searchsorted's fault, as it returns an object of the same type
as the needle (the items to search for):

>>> import numpy as np
>>> class A(np.ndarray): pass
>>> class B(np.ndarray): pass
>>> np.arange(10).view(A).searchsorted(np.arange(5).view(B))
B([0, 1, 2, 3, 4])

I am all for making index-returning functions always return a base ndarray,
and will be more than happy to send a PR fixing this if there is some
agreement.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem.

2015-08-07 Thread Jaime Fernández del Río
On Fri, Aug 7, 2015 at 2:33 AM, David Cournapeau  wrote:

> Which command exactly did you run to have that error ? Normally, the code
> in msvc9compiler should not be called if you call the setup.py with the
> mingw compiler as expected by distutils
>

FWIW, the incantation that works for me to compile numpy on Windows with
mingw is:

python setup.py config --compiler=mingw32 build --compiler=mingw32 install

but I am not sure I have ever tried it with Python 3.

I think my source for this was:

http://nipy.sourceforge.net/nipy/devel/devel/install/windows_scipy_build.html

Jaime


>
> On Fri, Aug 7, 2015 at 12:19 AM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Thu, Aug 6, 2015 at 5:11 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>>
>>> On Thu, Aug 6, 2015 at 4:22 PM, David Cournapeau 
>>> wrote:
>>>
 Sorry if that's obvious, but do you have Visual Studio 2010 installed ?

 On Thu, Aug 6, 2015 at 11:17 PM, Charles R Harris <
 charlesr.har...@gmail.com> wrote:

> Anyone know how to fix this? I've run into it before and never got it
> figured out.
>
> [192.168.121.189:22] out:   File
> "C:\Python34\lib\distutils\msvc9compiler.py", line 259, in query_vcvarsall
> [192.168.121.189:22] out:
> [192.168.121.189:22] out: raise DistutilsPlatformError("Unable to
> find vcvarsall.bat")
> [192.168.121.189:22] out:
> [192.168.121.189:22] out: distutils.errors.DistutilsPlatformError:
> Unable to find vcvarsall.bat
>
> Chuck
>
>
>
>>> I'm running numpy-vendor, which is running wine. I think it is all mingw
>>> with a few installed dll's. The error is coming from the Python distutils
>>> as part of `has_cblas`.
>>>
>>>
>> It's not impossible that we have changed the build somewhere along the
>> line.
>>
>> Chuck
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about unaligned access

2015-07-06 Thread Jaime Fernández del Río
On Mon, Jul 6, 2015 at 10:18 AM, Francesc Alted  wrote:

> Hi,
>
> I have stumbled into this:
>
> In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
> np.int64), ('f1', np.int32)])
>
> In [63]: %timeit sa['f0'].sum()
> 100 loops, best of 3: 4.52 ms per loop
>
> In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
> np.int64), ('f1', np.int64)])
>
> In [65]: %timeit sa['f0'].sum()
> 1000 loops, best of 3: 896 µs per loop
>
> The first structured array is made of 12-byte records, while the second is
> made by 16-byte records, but the latter performs 5x faster.  Also, using an
> structured array that is made of 8-byte records is the fastest (expected):
>
> In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), dtype=[('f0',
> np.int64)])
>
> In [67]: %timeit sa['f0'].sum()
> 1000 loops, best of 3: 567 µs per loop
>
> Now, my laptop has a Ivy Bridge processor (i5-3380M) that should perform
> quite well on unaligned data:
>
>
> http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
>
> So, if 4 years-old Intel architectures do not have a penalty for unaligned
> access, why I am seeing that in NumPy?  That strikes like a quite strange
> thing to me.
>

I believe that the way numpy is setup, it never does unaligned access,
regardless of the platform, in case it gets run on one that would go up in
flames if you tried to. So my guess would be that you are seeing chunked
copies into a buffer, as opposed to bulk copying or no copying at all, and
that would explain your timing differences. But Julian or Sebastian can
probably give you a more informed answer.

Jaime


>
> Thanks,
> Francesc
>
> --
> Francesc Alted
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Open CV 3.0 + NPY_RELAXED_STRIDES

2015-06-11 Thread Jaime Fernández del Río
On Thu, Jun 11, 2015 at 2:44 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:

> On Thu, Jun 11, 2015 at 11:39 AM, Sebastian Berg
>  wrote:
> > On Mi, 2015-06-10 at 21:03 -0600, Charles R Harris wrote:
> >>
> >>
> > 
> >>
> >>
> >>   * Relaxed stride checking will be the default in 1.10.0
> >> Is this still the plan?
> >>
> >>
> >> Yes, but it won't be quite the same as the master branch.  Currently
> >> an unusual value for the stride (?) is used in order to smoke out
> >> misuse, but that value will be more rational in the release.
> >>
> >
> > +1, it should not be as bad/common in practice once rolled out. That
> > said, I do not mind delaying things beyond 1.10, it might be better for
> > compatibility if someone gets a new numpy on top of oldish other
> > packages.
> > So I am good with planning to go ahead for the moment. But if anyone
> > complains, I would back down for 1.10 probably.
> >
> > - Sebastian
> >
>
> With beside scipy.ndimage also opencv being broken I think we will
> have to delay it beyond 1.10, though we should have at least an alpha,
> maybe even a beta with it enabled to induce some panic that hopefully
> will spure some fixes.
>

OpenCV shouldn't be broken any more if the merge this:

https://github.com/Itseez/opencv/pull/4117

I would appreciate a second set of eyes looking over the logic in that PR.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Open CV 3.0 + NPY_RELAXED_STRIDES

2015-06-10 Thread Jaime Fernández del Río
On Wed, Jun 10, 2015 at 5:53 PM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> I'm in the midst of a Python 3.5 + MSVS 2015 compilation frenzy. Today it
> was time for Open CV 3.0, where I found a nasty bug that I have eventually
> tracked down to using a development version of NumPy, and Open CV 3.0
>  choking on relaxed strides, as it does a check that every stride is a
> multiple of the itemsize.
>
> I was thinking of submitting a patch to opencv to fix this, but was
> wondering whether we have plans to eventually have relaxed strides out in
> the wild in user releases, or is it just a testing tool for development?
>

I see that in the release notes of 1.9 we had the following:


   - Relaxed stride checking will be the default in 1.10.0

Is this still the plan?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Open CV 3.0 + NPY_RELAXED_STRIDES

2015-06-10 Thread Jaime Fernández del Río
I'm in the midst of a Python 3.5 + MSVS 2015 compilation frenzy. Today it
was time for Open CV 3.0, where I found a nasty bug that I have eventually
tracked down to using a development version of NumPy, and Open CV 3.0
 choking on relaxed strides, as it does a check that every stride is a
multiple of the itemsize.

I was thinking of submitting a patch to opencv to fix this, but was
wondering whether we have plans to eventually have relaxed strides out in
the wild in user releases, or is it just a testing tool for development?

Jaime
-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy + Python 3.5 + Windows + VS2015

2015-06-09 Thread Jaime Fernández del Río
On Mon, Jun 8, 2015 at 3:11 PM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> I have just unsuccessfully tried to build numpy under Windows for Python
> 3.5, using the latest release candidate for Visual Studio 2015.
>
> A very early failure with a:
>
> RuntimeError: Broken toolchain: cannot link a simple C program
>
> even though repeating the sequence of commands that lead to the failure
> manually seems to work.
>
> Anyway, before diving deeper into this, has anyone tried this out already
> and have some success or failure stories to share?
>

I have finally managed to get this to compile. There are two places at
which things go wrong:

   1. The call to check_long_double_representation in numpy/core/setup.py.
   This tries to figure out the representation used by the compiler for long
   double by compiling C code declaring a struct with a char array, a long
   double, and another char array, initializing them to specific values, then
   parsing the obj file byte by byte to detect the sequence in the first and
   second char array, The sequences are there, but not in contiguous bytes,
   for some reason the compiler is adding 3 bytes between each of the bytes in
   the sequence. I bypassed this hardcoding the long double representation to
   IEEE_DOUBLE_LE.
   2. The call to generate_libraries in numpy/random/setup.py. This is
   supposed to compile and run a small c program to check if _WIN32 is defined
   by the compiler, in which case the 'Advapi32' library is linked. Haven't
   gone into the details, but that compile and run also fails, so the library
   was never getting added. I simply unconditionally added the library to get
   it working.

Once compiled there is something like 20 or 30 test failures, which I
haven't looked into in any detail. I was also getting a handful of
segfaults while running the tests, but it has stopped segfaulting now, even
though I have run the tests in a loop 100 times.

Not sure if we want to try to fix any of this for 1.10. It will probably be
the first release that people try to make work with Python 3.5 when the
final release comes out in September. But it is also hard to figure out how
many of these problems are caused by Python 3.5 itself, or by MSVC 2015,
which is still in RC phase.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NumPy + Python 3.5 + Windows + VS2015

2015-06-08 Thread Jaime Fernández del Río
I have just unsuccessfully tried to build numpy under Windows for Python
3.5, using the latest release candidate for Visual Studio 2015.

A very early failure with a:

RuntimeError: Broken toolchain: cannot link a simple C program

even though repeating the sequence of commands that lead to the failure
manually seems to work.

Anyway, before diving deeper into this, has anyone tried this out already
and have some success or failure stories to share?

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] binary wheels for numpy?

2015-05-15 Thread Jaime Fernández del Río
On Fri, May 15, 2015 at 6:56 PM,  wrote:

>
>
> On Fri, May 15, 2015 at 4:07 PM, Chris Barker 
> wrote:
>
>> Hi folks.,
>>
>> I did a little "intro to scipy" session as part of a larger Python class
>> the other day, and was dismayed to find that "pip install numpy" still
>> dosn't work on Windows.
>>
>> Thanks mostly to Matthew Brett's work, the whole scipy stack is
>> pip-installable on OS-X, it would be really nice if we had that for Windows.
>>
>> And no, saying "you should go get Python(x,y) or Anaconda, or Canopy,
>> or...) is really not a good solution. That is indeed the way to go if
>> someone is primarily focusing on computational programming, but if you have
>> a web developer, or someone new to Python for general use, they really
>> should be able to just grab numpy and play around with it a bit without
>> having to start all over again.
>>
>
> Unrelated to the pip/wheel discussion.
>
> In my experience by far the easiest to get something running to play with
> is using Winpython. Download and unzip (and maybe add to system path) and
> most of the data analysis stack is available.
>
> I haven't even bothered yet to properly install a full "system python" on
> my Windows machine. I'm just working with 3 winpython. (One even has Julia
> and IJulia included after following the installation instructions for a
> short time.)
>

+1 on WinPython. I have half a dozen "installations" of it, none registered
with Windows.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Create a n-D grid; meshgrid alternative

2015-05-12 Thread Jaime Fernández del Río
On Tue, May 12, 2015 at 1:17 AM, Stefan Otte  wrote:

> Hello,
>
> indeed I was looking for the cartesian product.
>
> I timed the two stackoverflow answers and the winner is not quite as clear:
>
> n_elements:10  cartesian  0.00427 cartesian2  0.00172
> n_elements:   100  cartesian  0.02758 cartesian2  0.01044
> n_elements:  1000  cartesian  0.97628 cartesian2  1.12145
> n_elements:  5000  cartesian 17.14133 cartesian2 31.12241
>
> (This is for two arrays as parameters: np.linspace(0, 1, n_elements))
> cartesian2 seems to be slower for bigger.
>

On my system, the following variation on Pauli's answer is 2-4x faster than
his for your test cases:

def cartesian4(arrays, out=None):
arrays = [np.asarray(x).ravel() for x in arrays]
dtype = np.result_type(*arrays)

n = np.prod([arr.size for arr in arrays])
if out is None:
out = np.empty((len(arrays), n), dtype=dtype)
else:
out = out.T

for j, arr in enumerate(arrays):
n /= arr.size
out.shape = (len(arrays), -1, arr.size, n)
out[j] = arr[np.newaxis, :, np.newaxis]
out.shape = (len(arrays), -1)

return out.T


> I'd really appreciate if this was be part of numpy. Should I create a pull
> request?
>

There hasn't been any opposition, quite the contrary, so yes, I would go
ahead an create that PR. I somehow feel this belongs with the set
operations, rather than with the indexing ones. Other thoughts?

Also for consideration: should it work on flattened arrays? or should we
give it an axis argument, and then "broadcast on the rest", a la
generalized ufunc?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Create a n-D grid; meshgrid alternative

2015-05-10 Thread Jaime Fernández del Río
On Sun, May 10, 2015 at 4:40 AM, Stefan Otte  wrote:

> Hey,
>
> quite often I want to evaluate a function on a grid in a n-D space.
> What I end up doing (and what I really dislike) looks something like this:
>
>   x = np.linspace(0, 5, 20)
>   M1, M2 = np.meshgrid(x, x)
>   X = np.column_stack([M1.flatten(), M2.flatten()])
>   X.shape  # (400, 2)
>
>   fancy_function(X)
>
> I don't think I ever used `meshgrid` in any other way.
> Is there a better way to create such a grid space?
>
> I wrote myself a little helper function:
>
>   def gridspace(linspaces):
>   return np.column_stack([space.flatten()
>   for space in np.meshgrid(*linspaces)])
>
> But maybe something like this should be part of numpy?
>

Isn't what you are trying to build a cartesian product function? There is a
neat, efficient implementation of such a function in StackOverflow, by our
own pv.:

http://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays/1235363#1235363

Perhaps we could make this part of numpy.lib.arraysetops? Isthere room for
other combinatoric generators, i.e. combinations, permutations... as in
itertools?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Create a n-D grid; meshgrid alternative

2015-05-10 Thread Jaime Fernández del Río
On Sun, May 10, 2015 at 7:05 AM, Stefan Otte  wrote:

> I just drafted different versions of the `gridspace` function:
>
> https://tmp23.tmpnb.org/user/1waoqQ8PJBJ7/notebooks/2015-05%20gridspace.ipynb


The link seems to be broken...

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-09 Thread Jaime Fernández del Río
There is a reported bug (issue #5837
) regarding different returns
from np.nonzero with 1-D vs higher dimensional arrays. A full summary of
the differences can be seen from the following output:

>>> class C(np.ndarray): pass
...
>>> a = np.arange(6).view(C)
>>> b = np.arange(6).reshape(2, 3).view(C)
>>> anz = a.nonzero()
>>> bnz = b.nonzero()

>>> type(anz[0])

>>> anz[0].flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> anz[0].base

>>> type(bnz[0])

>>> bnz[0].flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False
>>> bnz[0].base
array([[0, 1],
   [0, 2],
   [1, 0],
   [1, 1],
   [1, 2]])

The original bug report was only concerned with the non-writeability of
higher dimensional array returns, but there are more differences: 1-D
always returns an ndarray that owns its memory and is writeable, but higher
dimensional arrays return views, of the type of the original array, that
are non-writeable.

I have a branch that attempts to fix this by making both 1-D and n-D arrays:

   1. return a view, never the base array,
   2. return an ndarray, never a subclass, and
   3. return a writeable view.

I guess the most controversial choice is #2, and in fact making that change
breaks a few tests. I nevertheless think that all of the index returning
functions (nonzero, argsort, argmin, argmax, argpartition) should always
return a bare ndarray, not a subclass. I'd be happy to be corrected, but I
can't think of any situation in which preserving the subclass would be
needed for these functions.

Since we are changing the returns of a few other functions in 1.10
(diagonal, diag, ravel), it may be a good moment to revisit the behavior
for these other functions. Any thoughts?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to set a fixed sized dtype suitable for bitwise operations

2015-04-28 Thread Jaime Fernández del Río
On Tue, Apr 28, 2015 at 7:00 AM, Benjamin Root  wrote:

> I have a need to have a numpy array of 17 byte (more specifically, at
> least 147 bits) values that I would be doing some bit twiddling on. I have
> found that doing a dtype of "i17" yields a dtype of int32, which is
> completely not what I intended. Doing 'u17' gets an "data type not
> understood". I have tried 'a17', but then bitwise_or() and left_shift() do
> not work (returns "NotImplemented").
>

> How should I be going about this?
>

The correct type to use would be a void dtype:

>>> dt = np.dtype('V17')
>>> dt.itemsize
17

Unfortunately, it does not support bitwise operations either, which seems
like an oddity to me:

>>> a = np.empty(2, dt)
>>> a[0] = 'abcdef'
>>> a[1] = bytearray([64, 56, 78])
>>> a[0] | a[1]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unsupported operand type(s) for |: 'numpy.void' and 'numpy.void'

Any fundamental reason for this?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 9:14 AM, Eric Moore  wrote:

> This blog post, and the links within also seem relevant.  Appears to have
> python code available to try things out as well.
>
>
> https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest
>

Very cool indeed... The original works is licensed under an Apache 2.0
license (https://github.com/tdunning/t-digest/blob/master/LICENSE). I am
not fluent in legalese, so not sure whether that means we can use it or
not, seems awfully more complicated than what we normally use.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 8:06 AM, Neil Girdhar  wrote:

> You got it.  I remember this from when I worked at Google and we would
> process (many many) logs.  With enough bins, the approximation is still
> really close.  It's great if you want to make an automatic plot of data.
> Calling numpy.partition a hundred times is probably slower than calling P^2
> with n=100 bins.  I don't think it does O(n) computations per point.  I
> think it's more like O(log(n)).
>

Looking at it again, it probably is O(n) after all: it does a binary
search, which is O(log n), but it then goes on to update all the n bin
counters and estimations, so O(n) I'm afraid. So there is no algorithmic
advantage over partition/percentile: if there are m samples and n bins, P-2
that O(n) m times, while partition does O(m) n times, so both end up being
O(m n). It seems to me that the big thing of P^2 is not having to hold the
full dataset in memory. Online statistics (is that the name for this?),
even if only estimations, is a cool thing, but I am not sure numpy is the
place for them. That's not to say that we couldn't eventually have P^2
implemented for histogram, but I would start off with a partition based one.

Would SciPy have a place for online statistics? Perhaps there's room for
yet another scikit?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar  wrote:

> Yeah, I'm not arguing, I'm just curious about your reasoning.  That
> explains why not C++.  Why would you want to do this in C and not Python?
>

Well, the algorithm has to iterate over all the inputs, updating the
estimated percentile positions at every iteration. Because the estimated
percentiles may change in every iteration, I don't think there is an easy
way of vectorizing the calculation with numpy. So I think it would be very
slow if done in Python.

Looking at this in some more details, how is this typically used? Because
it gives you approximate values that should split your sample into
similarly filled bins, but because the values are approximate, to compute a
proper histogram you would still need to do the binning to get the exact
results, right? Even with this drawback P-2 does have an algorithmic
advantage, so for huge inputs and many bins it should come ahead. But for
many medium sized problems it may be faster to simply use np.partition,
which gives you the whole thing in a single go. And it would be much
simpler to implement.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Jaime Fernández del Río
On Tue, Apr 14, 2015 at 6:16 PM, Neil Girdhar  wrote:

> If you're going to C, is there a reason not to go to C++ and include the
> already-written Boost code?  Otherwise, why not use Python?
>

I think we have an explicit rule against C++, although I may be wrong. Not
sure how much of boost we would have to make part of numpy to use that, the
whole accumulators lib I'm guessing? Seems like an awful lot given what we
are after.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Jaime Fernández del Río
On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith  wrote:

> On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar 
> wrote:
> > Can I suggest that we instead add the P-square algorithm for the dynamic
> > calculation of histograms?
> > (
> http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf
> )
> >
> > This is already implemented in C++'s boost library
> > (
> http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
> )
> >
> > I implemented it in Boost Python as a module, which I'm happy to share.
> > This is much better than fixed-width histograms in practice.  Rather than
> > adjusting the number of bins, it adjusts what you really want, which is
> the
> > resolution of the bins throughout the domain.
>
> This definitely sounds like a useful thing to have in numpy or scipy
> (though if it's possible to do without using Boost/C++ that would be
> nice). But yeah, we should leave the existing histogram alone (in this
> regard) and add a new name for this like "adaptive_histogram" or
> something. Then you can set about convincing matplotlib and friends to
> use it by default :-)
>

Would having a negative number of bins mean "this many, but with optimized
boundaries" be too clever an interface?

I have taken a look at the paper linked, and the P-2 algorithm would not be
too complicated to implement from scratch, although it would require
writing some C code I'm afraid.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-12 Thread Jaime Fernández del Río
On Sun, Apr 12, 2015 at 12:19 AM, Varun  wrote:

>
> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
> tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb
>
> Long story short, histogram visualisations that depend on numpy (such as
> matplotlib, or  nearly all of them) have poor default behaviour as I have
> to
> constantly play around with  the number of bins to get a good idea of what
> I'm
> looking at. The bins=10 works ok for  up to 1000 points or very normal
> data,
> but has poor performance for anything else, and  doesn't account for
> variability either. I don't have a method easily available to scale the
> number
> of bins given the data.
>
> R doesn't suffer from these problems and provides methods for use with it's
> hist  method. I would like to provide similar functionality for
> matplotlib, to
> at least provide  some kind of good starting point, as histograms are very
> useful for initial data discovery.
>
> The notebook above provides an explanation of the problem as well as some
> proposed  alternatives. Use different datasets (type and size) to see the
> performance of the  suggestions. All of the methods proposed exist in R and
> literature.
>
> I've put together an implementation to add this new functionality, but am
> hesitant to  make a pull request as I would like some feedback from a
> maintainer before doing so.
>

+1 on the PR.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal

2015-04-05 Thread Jaime Fernández del Río
On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely
> based on Stephan's code plus some axis remapping, that provides all the
> needed functionality for getting and setting with orthogonal indices.
>
> Would those interested rather see it as a gist to play around with, or as
> a PR adding an orthogonally indexable `.ix_` argument to ndarray?
>

A PR it is, #5749 <https://github.com/numpy/numpy/pull/5749> to be precise.
I think it has all the bells and whistles: integers, boolean and integer
1-D arrays, slices, ellipsis, and even newaxis, both for getting and
setting. No tests yet, so correctness of the implementation is dubious at
best. As a small example:

>>> a = np.arange(60).reshape(3, 4, 5)
>>> a.ix_

>>> a.ix_[[0, 1], :, [True, False, True, False, True]]
array([[[ 0,  2,  4],
[ 5,  7,  9],
[10, 12, 14],
[15, 17, 19]],

   [[20, 22, 24],
[25, 27, 29],
[30, 32, 34],
[35, 37, 39]]])
>>> a.ix_[[0, 1], :, [True, False, True, False, True]] = 0
>>> a
array([[[ 0,  1,  0,  3,  0],
[ 0,  6,  0,  8,  0],
[ 0, 11,  0, 13,  0],
[ 0, 16,  0, 18,  0]],

   [[ 0, 21,  0, 23,  0],
[ 0, 26,  0, 28,  0],
[ 0, 31,  0, 33,  0],
[ 0, 36,  0, 38,  0]],

   [[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal

2015-04-03 Thread Jaime Fernández del Río
I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely
based on Stephan's code plus some axis remapping, that provides all the
needed functionality for getting and setting with orthogonal indices.

Would those interested rather see it as a gist to play around with, or as a
PR adding an orthogonally indexable `.ix_` argument to ndarray?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal

2015-04-02 Thread Jaime Fernández del Río
On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett 
wrote:

> Hi,
>
> On Thu, Apr 2, 2015 at 6:09 PM,   wrote:
> > On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing  wrote:
> >> On 2015/04/02 1:14 PM, Hanno Klemm wrote:
> >>> Well, I have written quite a bit of code that relies on fancy
> >>> indexing, and I think the question, if the behaviour of the []
> >>> operator should be changed has sailed with numpy now at version 1.9.
> >>> Given the amount packages that rely on numpy, changing this
> >>> fundamental behaviour would not be a clever move.
> >>
> >> Are you *positive* that there is no clever way to make a transition?
> >> It's not worth any further thought?
> >
> > I guess it would be similar to python 3 string versus bytes, but
> > without the overwhelming benefits.
> >
> > I don't think I would be in favor of deprecating fancy indexing even
> > if it were possible. In general, my impression is that if there is a
> > trade-off in numpy between powerful machinery versus easy to learn and
> > teach, then the design philosophy when in favor of power.
> >
> > I think numpy indexing is not too difficult and follows a consistent
> > pattern, and I completely avoid mixing slices and index arrays with
> > ndim > 2.
>
> I'm sure y'all are totally on top of this, but for myself, I would
> like to distinguish:
>
> * fancy indexing with boolean arrays - I use it all the time and don't
> get confused;
> * fancy indexing with non-boolean arrays - horrendously confusing,
> almost never use it, except on a single axis when I can't confuse it
> with orthogonal indexing:
>
> In [3]: a = np.arange(24).reshape(6, 4)
>
> In [4]: a
> Out[4]:
> array([[ 0,  1,  2,  3],
>[ 4,  5,  6,  7],
>[ 8,  9, 10, 11],
>[12, 13, 14, 15],
>[16, 17, 18, 19],
>[20, 21, 22, 23]])
>
> In [5]: a[[1, 2, 4]]
> Out[5]:
> array([[ 4,  5,  6,  7],
>[ 8,  9, 10, 11],
>[16, 17, 18, 19]])
>
> I also remember a discussion with Travis O where he was also saying
> that this indexing was confusing and that it would be good if there
> was some way to transition to what he called outer product indexing (I
> think that's the same as 'orthogonal' indexing).
>
> > I think it should be DOA, except as a discussion topic for numpy 3000.
>
> I think there are two proposals here:
>
> 1) Add some syntactic sugar to allow orthogonal indexing of numpy
> arrays, no backward compatibility break.
>
> That seems like a very good idea to me - were there any big objections to
> that?
>
> 2) Over some long time period, move the default behavior of np.array
> non-boolean indexing from the current behavior to the orthogonal
> behavior.
>
> That is going to be very tough, because it will cause very confusing
> breakage of legacy code.
>
> On the other hand, maybe it is worth going some way towards that, like
> this:
>
> * implement orthogonal indexing as a method arr.sensible_index[...]
> * implement the current non-boolean fancy indexing behavior as a
> method - arr.crazy_index[...]
> * deprecate non-boolean fancy indexing as standard arr[...] indexing;
> * wait a long time;
> * remove non-boolean fancy indexing as standard arr[...] (errors are
> preferable to change in behavior)
>
> Then if we are brave we could:
>
> * wait a very long time;
> * make orthogonal indexing the default.
>
> But the not-brave steps above seem less controversial, and fairly
> reasonable.
>
> What about that as an approach?
>

Your option 1 was what was being discussed before the posse was assembled
to bring fancy indexing before justice... ;-)

My background is in image processing, and I have used fancy indexing in all
its fanciness far more often than orthogonal or outer product indexing. I
actually have a vivid memory of the moment I fell in love with NumPy: after
seeing a code snippet that ran a huge image through a look-up table by
indexing the LUT with the image. Beautifully simple. And here

is a younger me, learning to ride NumPy without the training wheels.

Another obvious use case that you can find all over the place in
scikit-image is drawing a curve on an image from the coordinates.

If there is such strong agreement on an orthogonal indexer, we might as
well go ahead an implement it. But before considering any bolder steps, we
should probably give it a couple of releases to see how many people out
there really use it.

Jaime

P.S. As an aside on the remapping of axes when arrays and slices are mixed,
there really is no better way. Once you realize that the array indexing a
dimension does not have to be 1-D, it should clearly appear that what seems
the obvious way does not generalize to the general case. E.g.:

One may rightfully think that:

>>> a = np.arange(60).reshape(3, 4, 5)
>>> a[np.array([1])[:, None], ::2, [0, 1, 3]].shape
(1, 3, 2)

should not reorder the axes, and return an array of shape (1, 2, 3). But
what do you do in the following case?

>>> idx0 

Re: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal

2015-04-02 Thread Jaime Fernández del Río
On Thu, Apr 2, 2015 at 1:29 AM, Stephan Hoyer  wrote:

> On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>> Is there any other package implementing non-orthogonal indexing aside
>> from numpy?
>>
>
> I think we can safely say that NumPy's implementation of broadcasting
> indexing is unique :).
>
> The issue is that many other packages rely on numpy for implementation of
> custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not
> immediately obvious what sort of indexing these objects represent.
>
> If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
>> for improving that to provide the full functionality of "orthogonal
>> indexing". I just need a little more convincing that those new
>> attributes/indexers are going to ever see any real use.
>>
>
> Orthogonal indexing is close to the norm for packages that implement
> labeled data structures, both because it's easier to understand and
> implement, and because it's difficult to maintain associations with labels
> through complex broadcasting indexing.
>
> Unfortunately, the lack of a full featured implementation of orthogonal
> indexing has lead to that wheel being reinvented at least three times (in
> Iris, xray [1] and pandas). So it would be nice to have a canonical
> implementation that supports slices and integers in numpy for that reason
> alone. This could be done by building on the existing `np.ix_` function,
> but a new indexer seems more elegant: there's just much less noise with
> `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`.
>
> It's also well known that indexing with __getitem__ can be much slower
> than np.take. It seems plausible to me that a careful implementation of
> orthogonal indexing could close or eliminate this speed gap, because the
> model for orthogonal indexing is so much simpler than that for broadcasting
> indexing: each element of the key tuple can be applied separately along the
> corresponding axis.
>
> So I think there could be a real benefit to having the feature in numpy.
> In particular, if somebody is up for implementing it in C or Cython, I
> would be very pleased.
>
>  Cheers,
> Stephan
>
> [1] Here is my implementation of remapping from orthogonal to broadcasting
> indexing. It works, but it's a real mess, especially because I try to
> optimize by minimizing the number of times slices are converted into arrays:
>
> https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68
>
>
I believe you can leave all slices unchanged if you later reshuffle your
axes. Basically all the fancy-indexed axes go in the front of the shape in
order, and the subspace follows, e.g.:

>>> a = np.arange(60).reshape(3, 4, 5)
>>> a[np.array([1])[:, None], ::2, np.array([1, 2, 3])].shape
(1, 3, 2)

So you would need to swap the second and last axes and be done. You would
not get a contiguous array without a copy, but that's a different story.
Assigning to an orthogonally indexed subarray is an entirely different
beast, not sure if there is a use case for that.

We probably need more traction on the "should this be done?" discussion
than on the "can this be done?" one, the need for a reordering of the axes
swings me slightly in favor, but I mostly don't see it yet. Nathaniel
usually has good insights on who we are, where do we come from, where are
we going to, type of questions, would be good to have him chime in.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Adding 'where' to ufunc methods?

2015-04-01 Thread Jaime Fernández del Río
This question on StackOverflow:

http://stackoverflow.com/questions/29394377/minimum-of-numpy-array-ignoring-diagonal

Got me thinking that I had finally found a use for the 'where' kwarg of
ufuncs. Unfortunately it is only provided for the ufunc itself, but not for
any of its methods.

Is there any fundamental reason these were not implemented back in the day?
Any frontal opposition to having them now?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal

2015-04-01 Thread Jaime Fernández del Río
On Wed, Apr 1, 2015 at 2:17 AM, R Hattersley  wrote:

> There are two different interpretations in common use of how to handle
> multi-valued (array/sequence) indexes. The numpy style is to consider all
> multi-valued indices together which allows arbitrary points to be
> extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to
> consider each multi-valued index independently.
>
> For example:
>
> >>> type(v)>>> v.shape
> (240, 37, 49)>>> v[(0, 1), (0, 2, 3)].shape
> (2, 3, 49)>>> np.array(v)[(0, 1), (0, 2, 3)].shape
> Traceback (most recent call last):
>   File "", line 1, in IndexError: shape mismatch: indexing 
> arrays could not be broadcast together with shapes (2,) (3,)
>
>
> In a netcdf4-python GitHub issue
>  the authors of
> various orthogonal indexing packages have been discussing how to
> distinguish the two behaviours and have currently settled on a boolean
> __orthogonal_indexing__ attribute.
>
> 1. Is there any appetite for adding that attribute (with the value
> `False`) to ndarray?
>
> 2. As suggested by shoyer
> ,
> is there any appetite for adding an alternative indexer to ndarray where
> __orthogonal_indexing__ = True? For example: myarray.ix_[(0,1), (0, 2, 3)]
>

Is there any other package implementing non-orthogonal indexing aside from
numpy? I understand that it would be nice to do:

if x.__orthogonal_indexing__:
return x[idx]
else:
return x.ix_[idx]

But I think you would get the exact same result doing:

if isinstance(x, np.ndarray):
return x[np.ix_(*idx)]
else:
return x[idx]

If `not x.__orthogonal_indexing__` is going to be a proxy for
`isinstance(x, ndarray)` I don't really see the point of disguising it,
explicit is better than implicit and all that.

If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
for improving that to provide the full functionality of "orthogonal
indexing". I just need a little more convincing that those new
attributes/indexers are going to ever see any real use.

Jaime

>
> Richard
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rename arguments to np.clip and np.put

2015-03-30 Thread Jaime Fernández del Río
On Mon, Mar 30, 2015 at 3:59 PM, Allan Haldane 
wrote:

> Hello everyone,
>
> What does the list think of renaming the arguments of np.clip and np.put
> to match those of ndarray.clip/put? Currently the signatures are
>
> np.clip(a, a_min, a_max, out=None)
> ndarray.clip(a, min=None, max=None, out=None)
>
> np.put(a, ind, v, mode='raise')
> ndarray.put(indices, values, mode='raise')
>
> (The docstring for ndarray.clip is incorrect, too).
>
> I suggest the signatures might be changed to this:
>
> np.clip(a, min=None, max=None, out=None, **kwargs)
> np.put(a, indices, values, mode='raise')
>
> We can still take care of the old argument names for np.clip using
> **kwards, while showing a deprecation warning. I think that would be
> fully back-compatible. Note this makes np.clip more flexible as only one
> of min or max are needed now, just like ndarray.clip.
>
> np.put is trickier to keep back-compatible as it has two positional
> arguments. Someone who called `np.put(a, v=0, ind=1)` would be in
> trouble with this change, although I didn't find anyone on github doing
> so. I suppose to maintain back-compatibility we could make indices and
> values keyword args, and use the same kwargs trick as in np.clip, but
> that might be confusing since they're both required args.
>

Ideally we would want the signature to show as you describe it in the
documentation, but during the deprecation period be something like e.g.

np.put(a, indices=None, values=None, mode='raise', **kwargs)

Not sure if that is even possible, maybe with functools.update_wrapper?

Jaime


>
> Any opinions or suggestions?
>
> Allan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] flags attribute of ndarrays`

2015-03-26 Thread Jaime Fernández del Río
Hi,

I have just submitted a longish issue on the github repo, #5721
. After going in quite some
detail over the flags attribute of ndarray's I have found several
inconsistencies with the C API, and would like to make some changes. The
details are in gh, but as a high level summary:

   1. arr.flags.farray is almost certainly broken, and returns some useless
   result. I would like to make it behave like the C API's PyArray_ISFARRAY,
   even though that doesn't seem to be the intention of the original coder.
   2. arr.flags.fortran is inconsitent with the C API's PyArray_ISFORTRAN.
   I think it should be modified to match it, but understand this may be too
   much of a backwards compatibility breach.
   3. I would like for `arr.flags` to truly behave as a mutable property of
   `arr`. An explanation can be found on another thread's discussion with
   Benjamin Root earlier today.
   4. I would like to match the Python and C version's of these as much as
   possible, and avoid future deviation, by actually using the C versions in
   the Python ones, even if this may introduce subtle behavior changes.

Feedback is very welcome.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do you find this behavior surprising?

2015-03-25 Thread Jaime Fernández del Río
On Wed, Mar 25, 2015 at 1:45 PM, Benjamin Root  wrote:

> I fail to see the wtf.
>

> flags = a.flags
>
> So, "flags" at this point is just an alias to "a.flags", just like any
> other variable in python
>
> "flags.writeable = False" would then be equivalent to "a.flags.writeable =
> False". There is nothing numpy-specific here. a.flags is mutable object.
> This is how Python works.
>
> Ben Root
>

Ah, yes indeed. If you think of it that way it does make all the sense in
the world.

But of course that is not what is actually going on, as flags is a single C
int of the PyArrayObject struct, and a.flags is just a proxy built from it,
and great coding contortions have to be made to have changes to the proxy
rewritten into the owner array.

I guess then the surprising behavior is this other one, which was the one I
(wrongly) expected intuitively:

>>> a = np.arange(10)
>>> flags = a.flags
>>> a.flags.writeable = False
>>> flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

This could be fixed to work properly, although it is probably not worth
worrying much.

Properties of properties are weird...

Jaime
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Do you find this behavior surprising?

2015-03-25 Thread Jaime Fernández del Río
>>> import numpy as np
>>> a = np.arange(10)
>>> flags = a.flags
>>> flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> flags.writeable = False
>>> a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : False  <--- WTF!!??
  ALIGNED : True
  UPDATEIFCOPY : False

I understand why this is happening, and that there is no other obvious way
to make

a.flags.writeable = False

work than to have the return of a.flags linked to a under the hood.

But I don't think this is documented anywhere, and wonder if perhaps it
should.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rewrite np.histogram in c?

2015-03-16 Thread Jaime Fernández del Río
On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer 
wrote:

> On Mon, 16 Mar 2015 06:56:58 -0700
> Jaime Fernández del Río  wrote:
>
> > Dispatching to a different method seems like a no brainer indeed. The
> > question is whether we really need to do this in C.
>
> I need to do both unweighted & weighted histograms and we got a factor 5
> using (simple) cython:
> it is in the proceedings of Euroscipy, last year.
> http://arxiv.org/pdf/1412.6367.pdf


If I read your paper and code properly, you got 5x faster, mostly because
you combined the weighted and unweighted histograms into a single search of
the array, and because you used an algorithm that can only be applied to
equal- sized bins, similarly to the 10x speed-up Robert was reporting.

I think that having a special path for equal sized bins is a great idea:
let's do it, PRs are always welcome!
Similarly, getting the counts together with the weights seems like a very
good idea.

I also think that writing it in Python is going to take us 80% of the way
there: most of the improvements both of you have reported are not likely to
be coming from the language chosen, but from the algorithm used. And if C
proves to be sufficiently faster to warrant using it, it should be confined
to the number crunching: I don;t think there is any point in rewriting
argument parsing in C.

Also, keep in mind `np.histogram` can now handle arrays of just about
**any** dtype. Handling that complexity in C is not a ride in the park.
Other functions like `np.bincount` and `np.digitize` cheat by only handling
`double` typed arrays, a luxury that histogram probably can't afford at
this point in time.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rewrite np.histogram in c?

2015-03-16 Thread Jaime Fernández del Río
On Sun, Mar 15, 2015 at 11:06 PM, Robert McGibbon 
wrote:

> It might make sense to dispatch to difference c implements if the bins are
> equally spaced (as created by using an integer for the np.histogram bins
> argument), vs. non-equally-spaced bins.
>

Dispatching to a different method seems like a no brainer indeed. The
question is whether we really need to do this in C. Maybe for some very
specific case or cases it makes sense to have a super fast C path, e,g. no
weights and bins is an integer. Even then, rather than rewriting the whole
thing in C, it may be a better idea to leave the parsing of the inputs in
Python, and have a C helper function wrapped and privately exposed,
similarly to how `np.core.multiarray.interp` is used by `np.interp`.

But I would still first give it a try in Python...

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rewrite np.histogram in c?

2015-03-15 Thread Jaime Fernández del Río
On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon  wrote:

> Hi,
>
> Numpy.histogram is implemented in python, and is a little sluggish. This
> has been discussed previously on the mailing list, [1, 2]. It came up in a
> project that I maintain, where a new feature is bottlenecked by
> numpy.histogram, and one developer suggested a faster implementation in
> cython [3].
>
> Would it make sense to reimplement this function in c? or cython? Is
> moving functions like this from python to c to improve performance within
> the scope of the development roadmap for numpy? I started implementing this
> a little bit in c, [4] but I figured I should check in here first.
>

Where do you think the performance gains will come from? The PR in your
project that claims a 10x speed-up uses a method that is only fit for
equally spaced bins. I want to think that implementing that exact same
algorithm in Python with NumPy would be comparably fast, say within 2x.

For the general case, NumPy is already doing most of the heavy lifting (the
sorting and the searching) in C: simply replicating the same algorithmic
approach entirely in C is unlikely to provide any major speed-up. And if
the change is to the algorithm, then we should first try it out in Python.

That said, if you can speed things up 10x, I don't think there is going to
be much opposition to moving it to C!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.10

2015-03-12 Thread Jaime Fernández del Río
On Thu, Mar 12, 2015 at 10:16 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Sun, Mar 8, 2015 at 3:43 PM, Ralf Gommers 
> wrote:
>
>>
>>
>> On Sat, Mar 7, 2015 at 12:40 AM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Time to start thinking about numpy 1.10.
>>>
>>
>> Sounds good. Do we have a volunteer for release manager already?
>>
>
> I guess it is my turn, unless someone else wants the experience.
>

What does a release manager do? I will eventually want to be able to tell
my grandchildren that I once managed a numpy release, but am not sure if I
can successfully handle it on my own right now. I will probably need to up
my git foo, which is nothing to write home about...

Maybe for this one I can sign up for release minion, so you have someone to
offload menial tasks?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy array casting ruled not safe

2015-03-07 Thread Jaime Fernández del Río
On Sat, Mar 7, 2015 at 1:52 PM, Charles R Harris 
wrote:

>
>
> On Sat, Mar 7, 2015 at 2:45 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Sat, Mar 7, 2015 at 2:02 PM, Dinesh Vadhia 
>> wrote:
>>
>>>   This was originally posted on SO (
>>> https://stackoverflow.com/questions/28853740/numpy-array-casting-ruled-not-safe)
>>> and it was suggested it is probably a bug in numpy.take.
>>>
>>> Python 2.7.8 |Anaconda 2.1.0 (32-bit)| (default, Jul  2 2014, 15:13:35)
>>> [MSC v.1500 32 bit (Intel)] on win32
>>> Type "copyright", "credits" or "license()" for more information.
>>>
>>> >>> import numpy
>>> >>> numpy.__version__
>>> '1.9.2'
>>>
>>> >>> a = numpy.array([9, 7, 5, 4, 3, 1], dtype=numpy.uint32)
>>> >>> b = numpy.array([1, 3], dtype=numpy.uint32)
>>> >>> c = a.take(b)
>>>
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>> c = a.take(b)
>>> TypeError: Cannot cast array data from dtype('uint32') to dtype('int32')
>>> according to the rule 'safe'
>>>
>>
>> This actually looks correct for 32-bit windows. Numpy indexes with a
>> signed type big enough to hold a pointer to void, which in this case is an
>> int32, and the uint32 cannot be safely cast to that type.
>>
>> Chuck
>>
>
> I note that on SO Jaime made the suggestion that take use unsafe casting
> and throw an error on out of bounds indexes. That sounds reasonable,
> although for sufficiently large integer types an index could wrap around to
> a good value. Maybe make it work only for npy_uintp.
>
> Chuck
>

It is mostly about consistency, and having take match what indexing already
does, which is to unsafely cast all integers:

In [11]: np.arange(10)[np.uint64(2**64-1)]
Out[11]: 9

I think no one has ever complained about that obviously wrong behavior, but
people do get annoyed if they cannot use their perfectly valid uint64 array
because we want to protect them from themselves. Sebastian has probably
given this more thought than anyone else, it would be interesting to hear
his thoughts on this.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ufuncs now take a tuple of arrays as 'out' kwarg

2015-03-05 Thread Jaime Fernández del Río
Hi all,

There is a PR, ready to be merged, that adds the possibility of passing a
tuple of arrays in the 'out' kwarg to ufuncs with multiple outputs:

https://github.com/numpy/numpy/pull/5621

The new functionality is as follows:

* If the ufunc has a single output, then the 'out' kwarg can either be a
single array (or None) like today, or a tuple holding a single array (or
None).

* If the ufunc has more than one output, then the 'out' kwarg must be a
tuple with one array (or None) per output argument. The old behavior, where
only the first output could be specified, is now deprecated, will raise a
deprecation warning, and potentially be changed to an error in the future.

* In both cases, positional and keyword output arguments are incompatible.
This has been made a little more strict, as the following is valid in <=
1.9.x but will now raise an error:

np.add(2, 2, None, out=arr)

There seemed to be a reasonable amount of agreement on the goodness of this
change from the discussions on github, but I wanted to inform the larger
audience, in case there are any addressable concerns.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] linalg.norm probems

2015-03-03 Thread Jaime Fernández del Río
On Tue, Mar 3, 2015 at 4:11 PM, Charles R Harris 
wrote:

> Hi All,
>
> This is with reference to issue  #5626
> . Currently linalg.norm
> converts the input like so `x = asarray(x)`. This can produce integer
> arrays, which in turn may create problems of overflow, or the failure of
> the abs functions for minimum values of signed integer types. I propose to
> convert the input to a minimum precision of float32. However, this will be
> a change in behavior. I'd guess that that might not be much of a problem,
> as otherwise it is likely that this problem would have been reported
> earlier.
>
> Thoughts?
>

Not sure if it makes sense here, but elsewhere (I think it was polyval) we
let object arrays through unchanged.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GSoC'15 - mentors & ideas

2015-02-26 Thread Jaime Fernández del Río
On Thu, Feb 26, 2015 at 2:54 AM, Todd  wrote:

> I am not able to mentor, but I have some ideas about easier projects.
> These may be too easy, too hard, or not even desirable so take them or
> leave them as you please.
>
> scipy:
>
> Implement a set of circular statistics functions comparable to those in R
> or MATLAB circular statistics toolbox.
>
> Either implement some window functions that only apply to the beginning
> and end of an array, or implement a wrapper that takes a window function
> and some parameters and creates a new window that only applies to the
> beginning and end of an array.
>
> numpy:
>
> Integrate the bottleneck project optimizations into numpy proper.
>
> Not sure how much of the bottleneck optimizations can be fitted into the
ufunc machinery. But I'd be more than happy to mentor or co-mentor an
implementation in numpy of the moving window functions. I have already
contributed some work on some of those in scipy.ndimage and pandas, and
find the subject fascinating.

> Integrate as much as possible the matplotlib.mlab functionality into numpy
> (and, optionally, also scipy).
>
> In many places different approaches to the same task have substantially
> different performance (such as indexing vs. take) and check for one
> approach being substantially slower.  If it is, fix the performance problem
> if possible (perhaps by using the same implementation), and if not document
> the difference.
>
The take performance advantage is no longer there since seberg's rewrite of
indexing. Are there any other obvious examples?

> Modify ufuncs so their documentation appears in help() in addition to
> numpy.info().
>
To add one of my own: the old iterator is still being used in many, many
places throughout the numpy code base. Wouldn't it make sense to port those
to the new one? In doing so, it would probably lead to producing simplified
interfaces to the new iterator, e.g. reproducing the old PyIter_AllButAxis
is infinitely more verbose with the new iterator.



> Hi all,
>
>
> On Fri, Feb 20, 2015 at 10:05 AM, Ralf Gommers 
> wrote:
>
>> Hi all,
>>
>> It's time to start preparing for this year's Google Summer of Code. There
>> is actually one urgent thing to be done (before 19.00 UTC today), which is
>> to get our ideas page in decent shape. It doesn't have to be final, but
>> there has to be enough on there for the organizers to judge it. This page
>> is here: https://github.com/scipy/scipy/wiki/GSoC-project-ideas. I'll be
>> reworking it and linking it from the PSF page today, but if you already
>> have new ideas please add them there. See
>> https://wiki.python.org/moin/SummerOfCode/OrgIdeasPageTemplate for this
>> year's template for adding a new idea.
>>
>
> The ideas page is now in pretty good shape. More ideas are very welcome
> though, especially easy or easy/intermediate ideas. Numpy right now has
> zero easy ones and Scipy only one and a half.
>
> What we also need is mentors. All ideas already have a potential mentor
> listed, however some ideas are from last year and I'm not sure that all
> those mentors really are available this year. And more than one potential
> mentor per idea is always good. So can everyone please add/remove his or
> her name on that page?
>
> I'm happy to take care of most of the organizational aspects this year,
> however I'll be offline for two weeks in July and from the end of August
> onwards, so I'll some help in those periods. Any volunteers?
>
> Thanks,
> Ralf
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objects exposing the array interface

2015-02-25 Thread Jaime Fernández del Río
On Wed, Feb 25, 2015 at 1:56 PM, Stephan Hoyer  wrote:

>
>
> On Wed, Feb 25, 2015 at 1:24 PM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>> 1. When converting these objects to arrays using PyArray_Converter, if
>> the arrays returned by any of the array interfaces is not C contiguous,
>> aligned, and writeable, a copy that is will be made. Proper arrays and
>> subclasses are passed unchanged. This is the source of the error reported
>> above.
>>
>
>
> When converting these objects to arrays using PyArray_Converter, if the
> arrays returned by any of the array interfaces is not C contiguous,
> aligned, and writeable, a copy that is will be made. Proper arrays and
> subclasses are passed unchanged. This is the source of the error reported
> above.
>
> I'm not entirely sure I understand this -- how is PyArray_Convert used in
> numpy? For example, if I pass a non-contiguous array to your class Foo,
> np.asarray does not do a copy:
>

It is used by many (all?) C functions that take an array as input. This
follows a different path than what np.asarray or np.asanyarray do, which
are calls to np.array, which maps to the C function _array_fromobject which
can be found here:

https://github.com/numpy/numpy/blob/maintenance/1.9.x/numpy/core/src/multiarray/multiarraymodule.c#L1592

And ufuncs have their own conversion code, which doesn't really help
either. Not sure it would be possible to have them all use a common code
base, but it is certainly well worth trying.


>
> In [25]: orig = np.zeros((3, 4))[:2, :3]
>
> In [26]: orig.flags
> Out[26]:
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : False
>   OWNDATA : False
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> In [27]: subclass = Foo(orig)
>
> In [28]: np.asarray(subclass)
> Out[28]:
> array([[ 0.,  0.,  0.],
>[ 0.,  0.,  0.]])
>
> In [29]: np.asarray(subclass)[:] = 1
>
> In [30]: np.asarray(subclass)
> Out[30]:
> array([[ 1.,  1.,  1.],
>[ 1.,  1.,  1.]])
>
>
> But yes, this is probably a bug.
>
> 2. When converting these objects using PyArray_OutputConverter, as well as
>> in similar code in the ufucn machinery, anything other than a proper array
>> or subclass raises an error. This means that, contrary to what the docs on
>> subclassing say, see below, you cannot use an object exposing the array
>> interface as an output parameter to a ufunc
>>
>
> Here it might be a good idea to distinguish between objects that define
> __array__ vs __array_interface__/__array_struct__. A class that defines
> __array__ might not be very ndarray-like at all, but rather be something
> that can be *converted* to an ndarray. For example, objects in pandas
> define __array__, but updating the return value of df.__array__() in-place
> will not necessarily update the DataFrame (e.g., if the frame had
> inhomogeneous dtypes).
>

I am not really sure what the behavior of __array__ should be. The link to
the subclassing docs I gave before indicates that it should be possible to
write to it if it is writeable (and probably pandas should set the
writeable flag to False if it cannot be reliably written to), but the
obscure comment I mentioned seems to point to the opposite, that it should
never be written to. This is probably a good moment in time to figure out
what the proper behavior should be and document it.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Objects exposing the array interface

2015-02-25 Thread Jaime Fernández del Río
An issue was raised yesterday in github, regarding np.may_share_memory when
run on a class exposing an array using the __array__ method. You can check
the details here:

https://github.com/numpy/numpy/issues/5604

Looking into it, I found out that NumPy doesn't really treat objects
exposing __array__,, __array_interface__, or __array_struct__ as if they
were proper arrays:

   1. When converting these objects to arrays using PyArray_Converter, if
   the arrays returned by any of the array interfaces is not C contiguous,
   aligned, and writeable, a copy that is will be made. Proper arrays and
   subclasses are passed unchanged. This is the source of the error reported
   above.
   2. When converting these objects using PyArray_OutputConverter, as well
   as in similar code in the ufucn machinery, anything other than a proper
   array or subclass raises an error. This means that, contrary to what the
   docs on subclassing say, see below, you cannot use an object exposing the
   array interface as an output parameter to a ufunc

The following classes can be used to test this behavior:

class Foo:
def __init__(self, arr):
self.arr = arr
def __array__(self):
return self.arr

class Bar:
def __init__(self, arr):
self.arr = arr
self.__array_interface__ = arr.__array_interface__

class Baz:
def __init__(self, arr):
self.arr = arr
self.__array_struct__ = arr.__array_struct__

They all behave the same with these examples:

>>> a = Foo(np.ones(5))
>>> np.add(a, a)
array([ 2.,  2.,  2.,  2.,  2.])
>>> np.add.accumulate(a)
array([ 1.,  2.,  3.,  4.,  5.])
>>> np.add(a, a, out=a)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: return arrays must be of ArrayType
>>> np.add.accumulate(a, out=a)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: output must be an array

I think this should be changed, and whatever gets handed by this
methods/interfaces be treated as if it were an array or subclass of it.
This is actually what the docs on subclassing say about __array__ here:

http://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__

This also seems to contradict a rather cryptic comment in the code of
PyArray_GetArrayParamsFromObject, which is part of the call sequence of
this whole mess, see here:

https://github.com/numpy/numpy/blob/maintenance/1.9.x/numpy/core/src/multiarray/ctors.c#L1495

/*
 * If op supplies the __array__ function.
 * The documentation says this should produce a copy, so
 * we skip this method if writeable is true, because the intent
 * of writeable is to modify the operand.
 * XXX: If the implementation is wrong, and/or if actual
 *  usage requires this behave differently,
 *  this should be changed!
 */

There has already been some discussion in the issue linked above, but I
would appreciate any other thoughts on the idea of treating objects with
some form of array interface as if they were arrays. Does it need a
deprecation cycle? Is there some case I am not considering where this could
go horribly wrong?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nonzero behavior with multidimensional arrays

2015-02-23 Thread Jaime Fernández del Río
On Mon, Feb 23, 2015 at 12:12 PM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:

> On 23.02.2015 08:52, Jaime Fernández del Río wrote:
> > This was raised in SO today:
> >
> >
> http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-only-for-multi-dimensional-arrays/28664009
> >
> > np.nonzero (and np.where for boolean arrays) behave differently for 1-D
> > and higher dimensional arrays:
> >
> > In the first case, a tuple with a single behaved base ndarray is
> returned:
> >
> > In the second, a tuple with as many arrays as dimensions in the passed
> > array is returned, but the arrays are not base ndarrays, but of the same
> > subtype as was passed to the function. These arrays are also set as
> > non-writeable:
> >
>
>
> The non-writeable looks like a bug too me, it should probably just use
> PyArray_FLAGS(self) instead of 0. We had a similar one with the new
> indexing, its easy to forget this.
>
> Concerning subtypes, I don't think there is a good reason to preserve
> them here and it should just return an ndarray.
> where with one argument returns a new object that indexes the input
> object so it is not really related anymore to what it indexes and there
> is no information that numpy could reasonably propagate.
>

That was my thinking when I sent that message last night: add the
PyArray_FLAGS argument, and pass the type of the return array rather than
the input array when creating the views.

I tried to put that in a PR, but it fails a number of tests, as the return
of np.nonzero is specifically checked to return the subtype of the passed
in array, both in matrixlib, as well as in core/test_regression.py, related
to Trac #791:

https://github.com/numpy/numpy/issues/1389

So it seems that 7 years ago they had a different view on this, perhaps
Chuck remembers what the rationale was, but this seems like a weird
requirement for index returning functions: nonzero, argmin/max, argsort,
argpartition and the like.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.nonzero behavior with multidimensional arrays

2015-02-22 Thread Jaime Fernández del Río
This was raised in SO today:

http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-only-for-multi-dimensional-arrays/28664009

np.nonzero (and np.where for boolean arrays) behave differently for 1-D and
higher dimensional arrays:

In the first case, a tuple with a single behaved base ndarray is returned:

>>> a = np.ma.array(range(6))
>>> np.where(a > 3)
(array([4, 5]),)
>>> np.where(a > 3)[0].flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In the second, a tuple with as many arrays as dimensions in the passed
array is returned, but the arrays are not base ndarrays, but of the same
subtype as was passed to the function. These arrays are also set as
non-writeable:

>>> np.where(a.reshape(2, 3) > 3)
(masked_array(data = [1 1],
 mask = False,
   fill_value = 99)
, masked_array(data = [1 2],
 mask = False,
   fill_value = 99)
)
>>> np.where(a.reshape(2, 3) > 3)[0].flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False

I can't think of any reason that justifies this difference, and believe
they should be made to return similar results. My feeling is that the
proper behavior is the 1-D one, and that the behavior for multidimensional
arrays should match it. Anyone can think of any reason that justifies the
current behavior?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Matrix Class

2015-02-14 Thread Jaime Fernández del Río
On Sat, Feb 14, 2015 at 5:21 PM,  wrote:

> On Sat, Feb 14, 2015 at 4:27 PM, Charles R Harris
>  wrote:
> >
> >
> > On Sat, Feb 14, 2015 at 12:36 PM,  wrote:
> >>
> >> On Sat, Feb 14, 2015 at 12:05 PM, cjw  wrote:
> >> >
> >> > On 14-Feb-15 11:35 AM, josef.p...@gmail.com wrote:
> >> >>
> >> >> On Wed, Feb 11, 2015 at 4:18 PM, Ryan Nelson 
> >> >> wrote:
> >> >>>
> >> >>> Colin,
> >> >>>
> >> >>> I currently use Py3.4 and Numpy 1.9.1. However, I built a quick test
> >> >>> conda
> >> >>> environment with Python2.7 and Numpy 1.7.0, and I get the same:
> >> >>>
> >> >>> 
> >> >>> Python 2.7.9 |Continuum Analytics, Inc.| (default, Dec 18 2014,
> >> >>> 16:57:52)
> >> >>> [MSC v
> >> >>> .1500 64 bit (AMD64)]
> >> >>> Type "copyright", "credits" or "license" for more information.
> >> >>>
> >> >>> IPython 2.3.1 -- An enhanced Interactive Python.
> >> >>> Anaconda is brought to you by Continuum Analytics.
> >> >>> Please check out: http://continuum.io/thanks and
> https://binstar.org
> >> >>> ? -> Introduction and overview of IPython's features.
> >> >>> %quickref -> Quick reference.
> >> >>> help  -> Python's own help system.
> >> >>> object?   -> Details about 'object', use 'object??' for extra
> details.
> >> >>>
> >> >>> In [1]: import numpy as np
> >> >>>
> >> >>> In [2]: np.__version__
> >> >>> Out[2]: '1.7.0'
> >> >>>
> >> >>> In [3]: np.mat([4,'5',6])
> >> >>> Out[3]:
> >> >>> matrix([['4', '5', '6']],
> >> >>> dtype='|S1')
> >> >>>
> >> >>> In [4]: np.mat([4,'5',6], dtype=int)
> >> >>> Out[4]: matrix([[4, 5, 6]])
> >> >>> ###
> >> >>>
> >> >>> As to your comment about coordinating with Statsmodels, you should
> see
> >> >>> the
> >> >>> links in the thread that Alan posted:
> >> >>> http://permalink.gmane.org/gmane.comp.python.numeric.general/56516
> >> >>> http://permalink.gmane.org/gmane.comp.python.numeric.general/56517
> >> >>> Josef's comments at the time seem to echo the issues the devs (and
> >> >>> others)
> >> >>> have with the matrix class. Maybe things have changed with
> >> >>> Statsmodels.
> >> >>
> >> >> Not changed, we have a strict policy against using np.matrix.
> >> >>
> >> >> generic efficient versions for linear operators, kronecker or sparse
> >> >> block matrix styly operations would be useful, but I would use array
> >> >> semantics, similar to using dot or linalg functions on ndarrays.
> >> >>
> >> >> Josef
> >> >> (long reply canceled because I'm writing too much that might only be
> >> >> of tangential interest or has been in some of the matrix discussion
> >> >> before.)
> >> >
> >> > Josef,
> >> >
> >> > Many thanks.  I have gained the impression that there is some
> antipathy
> >> > to
> >> > np.matrix, perhaps this is because, as others have suggested, the
> array
> >> > doesn't provide an appropriate framework.
> >>
> >> It's not directly antipathy, it's cost-benefit analysis.
> >>
> >> np.matrix has few advantages, but makes reading and maintaining code
> >> much more difficult.
> >> Having to watch out for multiplication `*` is a lot of extra work.
> >>
> >> Checking shapes and fixing bugs with unexpected dtypes is also a lot
> >> of work, but we have large benefits.
> >> For a long time the policy in statsmodels was to keep pandas out of
> >> the core of functions (i.e. out of the actual calculations) and
> >> restrict it to inputs and returns. However, pandas is becoming more
> >> popular and can do some things much better than plain numpy, so it is
> >> slowly moving inside some of our core calculations.
> >> It's still an easy source of bugs, but we do gain something.
> >
> >
> > Any bits of Pandas that might be good for numpy/scipy to steal?
>
> I'm not a Pandas expert.
> Some of it comes into statsmodels because we need the data handling
> also inside a function, e.g. keeping track of labels, indices, and so
> on. Another reason is that contributors are more familiar with
> pandas's way of solving a problems, even if I suspect numpy would be
> more efficient.
>
> However, a recent change, replaces where I would have used np.unique
> with pandas.factorize which is supposed to be faster.
> https://github.com/statsmodels/statsmodels/pull/2213


Numpy could use some form of hash table for its arraysetops, which is where
pandas is getting its advantage from. It is a tricky thing though, see e.g.
these timings:

a = np.ranomdom.randint(10, size=1000)
srs = pd.Series(a)

%timeit np.unique(a)

10 loops, best of 3: 13.2 µs per loop

%timeit srs.unique()

10 loops, best of 3: 15.6 µs per loop


%timeit pd.factorize(a)

1 loops, best of 3: 25.6 µs per loop

%timeit np.unique(a, return_inverse=True)

1 loops, best of 3: 82.5 µs per loop

This last timings are with 1.9.0 an 0.14.0, so numpy doesn't have
https://github.com/numpy/numpy/pull/5012 yet, which makes the operation in
which numpy is slower about 2x faster. And if you need your unique values
sorted, then things are more even, especially if numpy runs 2x faster:

Re: [Numpy-discussion] converting a list of tuples into an array of tuples?

2015-02-09 Thread Jaime Fernández del Río
On Mon, Feb 9, 2015 at 8:31 AM, Benjamin Root  wrote:

> I am trying to write up some code that takes advantage of np.tile() on
> arbitrary array-like objects. I only want to tile along the first axis. Any
> other axis, if they exist, should be left alone. I first coerce the object
> using np.asanyarray(), tile it, and then coerce it back to the original
> type.
>
> The problem seems to be that some of my array-like objects are being
> "over-coerced", particularly the list of tuples. I tried doing
> "np.asanyarray(a, dtype='O')", but that still turns it into a 2-D array.
>

The default constructors will drill down until they find a scalar or a
non-matching shape. So you get an array full of Python ints, or floats, but
still 2-D:

>>> a = [tuple(range(j, j+3)) for j in range(5)]
>>> a
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6)]
>>> np.asarray(a, dtype=object)
array([[0, 1, 2],
   [1, 2, 3],
   [2, 3, 4],
   [3, 4, 5],
   [4, 5, 6]], dtype=object)

If you add a non-matching item, e.g. an empty tuple, then all works fine
for your purposes:

>>> a.append(())
>>> np.asarray(a, dtype=object)
array([(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), ()],
dtype=object)

But you would then have to discard that item before tiling. The only other
way is to first create the object array, then assign your array-like object
to it:

>>> a.pop()
()
>>> a
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6)]
>>> b = np.empty(len(a), object)
>>> b[:] = a
>>> b
array([(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6)], dtype=object)

Not sure if this has always worked, or if it breaks down in some corner
case, but Wes may not have had to write that function after all! At least
not in Cython.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New function: np.stack?

2015-02-05 Thread Jaime Fernández del Río
On Thu, Feb 5, 2015 at 11:10 AM, Benjamin Root  wrote:

> +1! I could never keep straight which stack function I needed anyway.
>
> Wasn't there a proposal a while back for a more generic stacker, like
> "tetrix" or something that allowed one to piece together tiles of different
> sizes?
>
> Ben Root
>
> On Thu, Feb 5, 2015 at 2:06 PM, Stephan Hoyer  wrote:
>
>> There are two usual ways to combine a sequence of arrays into a new array:
>> 1. concatenated along an existing axis
>> 2. stacked along a new axis
>>
>> For 1, we have np.concatenate. For 2, we have np.vstack, np.hstack,
>> np.dstack and np.column_stack. For arrays with arbitrary dimensions, there
>> is the np.array constructor, possibly with transpose to get the result in
>> the correct order. (I've used this last option in the past but haven't been
>> especially happy with it -- it takes some trial and error to get the axis
>> swapping or transpose right for higher dimensional input.)
>>
>> This methods are similar but subtly distinct, and none of them generalize
>> well to n-dimensional input. It seems like the function we are missing is
>> the plain np.stack, which takes the axis to stack along as a keyword
>> argument. The exact desired functionality is clearest to understand by
>> example:
>>
>> >>> X = [np.random.randn(100, 200) for i in range(10)]
>> >>> stack(X, axis=0).shape
>> (10, 100, 200)
>> >>> stack(X, axis=1).shape
>> (100, 10, 200)
>> >>> stack(X, axis=2).shape
>> (100, 200, 10)
>>
>> So I'd like to propose this new function for numpy. The desired signature
>> would be simply np.stack(arrays, axis=0). Ideally, the confusing mess of
>> other stacking functions could then be deprecated, though we could probably
>> never remove them.
>>
>
Leaving aside error checking, once you have a positive axis, I think this
can be implemented in 2 lines of code:

sl = (slice(None),)*axis + (np.newaxis,)
return np.concatenate(arr[sl] for arr in arrays)

I don't have an opinion either way, and I guess if the hstacks and company
have a place in numpy, this does as well.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Views of a different dtype

2015-02-04 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 1:52 PM, Ian Henriksen <
insertinterestingnameh...@gmail.com> wrote:

> On Tue Feb 03 2015 at 1:47:34 PM Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>> On Tue, Feb 3, 2015 at 8:59 AM, Sebastian Berg <
>> sebast...@sipsolutions.net> wrote:
>>
>>> On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
>>> >
>>> 
>>> >
>>> >
>>> >
>>> > Do you have a concrete example of what a non (1, 1) array that fails
>>> > with relaxed strides would look like?
>>> >
>>> >
>>> > If we used, as right now, the array flags as a first choice point, and
>>> > only if none is set try to determine it from the strides/dimensions
>>> > information, I fail to imagine any situation where the end result
>>> > would be worse than now. I don't think that a little bit of
>>> > predictable surprising in an advanced functionality is too bad. We
>>> > could start raising "on the face of ambiguity, we refuse to guess"
>>> > errors, even for the current behavior you show above, but that is more
>>> > likely to trip people by not giving them any simple workaround, that
>>> > it seems to me would be "add a .T if all dimensions are 1" in some
>>> > particular situations. Or are you thinking of something more serious
>>> > than a shape mismatch when you write about "breaking current code"?
>>> >
>>>
>>> Yes, I am talking only about wrongly shaped results for some fortran
>>> order arrays. A (20, 1) fortran order complex array being viewed as
>>> float, will with relaxed strides become a (20, 2) array instead of a
>>> (40, 1) one.
>>>
>>
>> That is a limitation of the current implementation too, and happens
>> already whenever relaxed strides are in place. Which is the default for
>> 1.10, right?
>>
>> Perhaps giving 'view' an 'order' or 'axis' kwarg could make sense after
>> all? It should probably be more of a hint of what to do (fortran vs c) when
>> in doubt. "C" would prioritize last axis, "F" the first, and we could even
>> add a "raise" option to have it fail if the axis cannot be inferred from
>> the strides and shape. Current behavior would is equivalent to what "C"
>> would do.
>>
>> Jaime
>>
>
>
> IMHO, the best option would be something like this:
>
> - When changing to a type with smaller itemsize, add a new axis after the
> others so the resulting array is C contiguous (unless a different axis is
> specified by a keyword argument). The advantage here is that if you index
> the new view using the old indices for an entry, you get an array showing
> its representation in the new type.
> - No shape change for views with the same itemsize
> - When changing to a type with a larger itemsize, collapse along the last
> axis unless a different axis is specified, throwing an error if the axis
> specified does not match the axis specified.
>

My only concern with adding a new axis, backwards compatibility aside, is
that you would not know wether to keep or discard the resulting size 1
dimension when taking a view of the view. We could reuse the keepdims
terminology from ufuncs, though.

So the current behavior would remain unchanged if you set axis=None,
keepdims=True, and we could transition to a more reasonable default with
axis=-1, keepdims=False over a few releases with adequate warnings.

In my mind, when expanding, the axis would not indicate where to place the
new axis, but which axis to collapse over, that is the hard part to figure
out!If you want something else, it is only a call to rollaxis away. Perhaps
we could also give keepdims a meaning when expanding, as to whether to add
a new axis at the end, or change the size of the chosen dimension.

I don't know, there may be an actual interface hidden somewhere here, but
still needs some  cooking to fully figure it out.

Jaime


>
> The last point essentially is just adding an axis argument. I like that
> idea because it gives users the ability to do all that the array's memory
> layout allows in a clear and concise way. Throwing an error if the default
> axis doesn't work would be a good way to prevent strange bugs from
> happening when the default behavior is expected.
>
> The first point would be a break in backwards compatibility, so I'm not
> sure if it's feasible at this point. The advantage would be that all all
> arrays returned when using this functionality would be contiguous along the
> last axi

Re: [Numpy-discussion] Any interest in a 'heaviside' ufunc?

2015-02-03 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 12:58 PM, Warren Weckesser <
warren.weckes...@gmail.com> wrote:

> I have an implementation of the Heaviside function as numpy ufunc.  Is
> there any interest in adding this to numpy?  The function is simply:
>
> 0if x < 0
> heaviside(x) =  0.5  if x == 0
> 1if x > 0
>

I don't think there's anything like it in numpy. Wouldn't scipy.special be
a better home for it?

Jaime


>
>
>
> Warren
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Views of a different dtype

2015-02-03 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 8:59 AM, Sebastian Berg 
wrote:

> On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
> >
> 
> >
> >
> >
> > Do you have a concrete example of what a non (1, 1) array that fails
> > with relaxed strides would look like?
> >
> >
> > If we used, as right now, the array flags as a first choice point, and
> > only if none is set try to determine it from the strides/dimensions
> > information, I fail to imagine any situation where the end result
> > would be worse than now. I don't think that a little bit of
> > predictable surprising in an advanced functionality is too bad. We
> > could start raising "on the face of ambiguity, we refuse to guess"
> > errors, even for the current behavior you show above, but that is more
> > likely to trip people by not giving them any simple workaround, that
> > it seems to me would be "add a .T if all dimensions are 1" in some
> > particular situations. Or are you thinking of something more serious
> > than a shape mismatch when you write about "breaking current code"?
> >
>
> Yes, I am talking only about wrongly shaped results for some fortran
> order arrays. A (20, 1) fortran order complex array being viewed as
> float, will with relaxed strides become a (20, 2) array instead of a
> (40, 1) one.
>

That is a limitation of the current implementation too, and happens already
whenever relaxed strides are in place. Which is the default for 1.10,
right?

Perhaps giving 'view' an 'order' or 'axis' kwarg could make sense after
all? It should probably be more of a hint of what to do (fortran vs c) when
in doubt. "C" would prioritize last axis, "F" the first, and we could even
add a "raise" option to have it fail if the axis cannot be inferred from
the strides and shape. Current behavior would is equivalent to what "C"
would do.

Jaime



> >
> > If there are any real loopholes in expanding this functionality, then
> > lets not do it, but we know we have at least one user unsatisfied with
> > the current performance, so I really think it is worth trying. Plus,
> > I'll admit to that, messing around with some of these stuff deep
> > inside the guts of the beast is lots of fun! ;)
> >
>
> I do not think there are loopholes with expanding this functionality. I
> think there have regressions when we put relaxed strides to on, because
> suddenly the fortran order array might be expanded along a 1-sized axis,
> because it is also C order. So I wonder if we can fix these regressions
> and at the same time maybe provide a more intuitive approach then using
> the memory order blindly
>

> - Sebastian
>
> >
> > Jaime
> >
> >
> > - Sebastian
> >
> >
> > >
> > > Jaime
> > >
> > >
> > >
> > > - Sebastian
> > >
> > > >
> > > > I guess the main consideration for this is
> > that we
> > > may be
> > > > stuck with
> > > > stuff b/c of backwards compatibility. Can
> > you maybe
> > > say a
> > > > little bit
> > > > about what is allowed now, and what
> > constraints that
> > > puts on
> > > > things?
> > > > E.g. are we already grovelling around in
> > strides and
> > > picking
> > > > random
> > > > dimensions in some cases?
> > > >
> > > >
> > > > Just to restate it: right now we only allow new
> > views if the
> > > array is
> > > > globally contiguous, so either along the first or
> > last
> > > dimension.
> > > >
> > > >
> > > > Jaime
> > > >
> > > >
> > > > -n
> > > >
> > > > --
> > > > Nathaniel J. Smith
> > > > Postdoctoral researcher - Informatics -
> > 

Re: [Numpy-discussion] Views of a different dtype

2015-02-03 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 1:28 AM, Sebastian Berg 
wrote:

> On Mo, 2015-02-02 at 06:25 -0800, Jaime Fernández del Río wrote:
> > On Sat, Jan 31, 2015 at 1:17 AM, Sebastian Berg
> >  wrote:
> > On Fr, 2015-01-30 at 19:52 -0800, Jaime Fernández del Río
> > wrote:
> > > On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith
> > 
> > > wrote:
> >     > On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández
> > del Río
> > >  wrote:
> > > [...]
> >
> > 
> >
> > >
> > > Could we make it more like: check to see if the last
> > dimension
> > > works.
> > > If not, raise an error (and let the user transpose
> > some other
> > > dimension there if that's what they wanted)? Or
> > require the
> > > user to
> > > specify which dimension will absorb the shape
> > change? (If we
> > > were
> > > doing this from scratch, then it would be tempting
> > to just say
> > > that we
> > > always add a new dimension at the end with
> > newtype.itemsize /
> > > oldtype.itemsize entries, or absorb such a dimension
> > if
> > > shrinking. As
> > > a bonus, this would always work, regardless of
> > contiguity!
> > > Except that
> > > when shrinking the last dimension would have to be
> > contiguous,
> > > of
> > > course.)
> > >
> > >
> > > When we roll @ in and people start working with stacks of
> > matrices, we
> > > will probably find ourselves having to create an alias,
> > similar to .T,
> > > for .swapaxes(-1, -2). Searching for the smallest stride
> > allows to
> > > take views of such arrays, which does not work right now
> > because the
> > > array is no longer contiguous globally.
> > >
> >
> > That is true, but I agree with Nathaniel at least as far as
> > that I would
> > prefer a user to be able to safely use `view` even he has not
> > even an
> > inkling about what his memory layout is. One option would be
> > an
> > `axis=-1` default (maybe FutureWarn this from `axis=None`
> > which would
> > look at order, see below -- or maybe have axis='A', 'C' and
> > 'F' and
> > default to 'A' for starters).
> >
> > This even now could start creating bugs when enabling relaxed
> > strides :(, because your good old fortran order complex array
> > being
> > viewed as a float one could expand along the wrong axis, and
> > even
> > without such arrays swap order pretty fast when operating on
> > them, which
> > can create impossibly to find bugs, because even a poweruser
> > is likely
> > to forget about such things.
> >
> > Of course you could argue that view is a poweruser feature and
> > a user
> > using it should keep these things in mind Though if you
> > argue that,
> > you can almost just use `np.ndarray` directly ;) -- ok, not
> > really
> > considering how cumbersome it is, but still.
> >
> >
> > I have been giving this some thought, and am willing to concede that
> > my first proposal may have been too ambitious. So even though the knob
> > goes to 11, we can always do things incrementally. I am also wary of
> > adding new keywords when it seems obvious that we do not have the
> > functionality completely figured out, so here's my new proposal:
> >
> >
> >   * The objective is that a view of an array that is the result of
> > slicing a contiguous array should be possible, if it remains
> > "contiguous" (meaning stride == itemsize) along its original
> > contiguous (first or last) dimension. This eliminates axis
> > transposition from the previous proposal, although reversing
> > the axes com

Re: [Numpy-discussion] Views of a different dtype

2015-02-02 Thread Jaime Fernández del Río
On Sat, Jan 31, 2015 at 1:17 AM, Sebastian Berg 
wrote:

> On Fr, 2015-01-30 at 19:52 -0800, Jaime Fernández del Río wrote:
> > On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith 
> > wrote:
> > On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río
> >  wrote:
> > [...]
>
> 
>
> >
> > Could we make it more like: check to see if the last dimension
> > works.
> > If not, raise an error (and let the user transpose some other
> > dimension there if that's what they wanted)? Or require the
> > user to
> > specify which dimension will absorb the shape change? (If we
> > were
> > doing this from scratch, then it would be tempting to just say
> > that we
> > always add a new dimension at the end with newtype.itemsize /
> > oldtype.itemsize entries, or absorb such a dimension if
> > shrinking. As
> > a bonus, this would always work, regardless of contiguity!
> > Except that
> > when shrinking the last dimension would have to be contiguous,
> > of
> > course.)
> >
> >
> > When we roll @ in and people start working with stacks of matrices, we
> > will probably find ourselves having to create an alias, similar to .T,
> > for .swapaxes(-1, -2). Searching for the smallest stride allows to
> > take views of such arrays, which does not work right now because the
> > array is no longer contiguous globally.
> >
>
> That is true, but I agree with Nathaniel at least as far as that I would
> prefer a user to be able to safely use `view` even he has not even an
> inkling about what his memory layout is. One option would be an
> `axis=-1` default (maybe FutureWarn this from `axis=None` which would
> look at order, see below -- or maybe have axis='A', 'C' and 'F' and
> default to 'A' for starters).
>
> This even now could start creating bugs when enabling relaxed
> strides :(, because your good old fortran order complex array being
> viewed as a float one could expand along the wrong axis, and even
> without such arrays swap order pretty fast when operating on them, which
> can create impossibly to find bugs, because even a poweruser is likely
> to forget about such things.
>
> Of course you could argue that view is a poweruser feature and a user
> using it should keep these things in mind Though if you argue that,
> you can almost just use `np.ndarray` directly ;) -- ok, not really
> considering how cumbersome it is, but still.
>

I have been giving this some thought, and am willing to concede that my
first proposal may have been too ambitious. So even though the knob goes to
11, we can always do things incrementally. I am also wary of adding new
keywords when it seems obvious that we do not have the functionality
completely figured out, so here's my new proposal:


   - The objective is that a view of an array that is the result of slicing
   a contiguous array should be possible, if it remains "contiguous" (meaning
   stride == itemsize) along its original contiguous (first or last)
   dimension. This eliminates axis transposition from the previous proposal,
   although reversing the axes completely would also work.
   - To verify this, unless the C contiguous or Fortran contiguous flags
   are set, we would still need to look at the strides. An array would be C
   contiguous if, starting from the last stride it is equal to the itemsize,
   and working backwards every next stride is larger or equal than the product
   of the previous stride by the previous dimension. dimensions of size 1
   would be ignored for these, except for the last one, which would be taken
   to have stride = itemsize. The Fortran case is of course the same in
   reverse.
   - I think the above combined with the current preference of C
   contiguousness over Fortran, would actually allow the views to always be
   reversible, which is also a nice thing to have.

This eliminates most of the weirdness, but extends current functionality to
cover cases like Jens reported a few days back.

Does this sound better?

Jaime


>
> - Sebastian
>
> >
> > I guess the main consideration for this is that we may be
> > stuck with
> > stuff b/c of backwards compatibility. Can you maybe say a
> > little bit
> > about what is allowed now, and what constraints that puts on
> > things?
> > E.g. are we already grovelling around in strides and picking
> > random
> > dimensions in some cases?
> >
> >
> > Just to restate it: right now we only al

  1   2   >