Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Charles R Harris
On Thu, May 31, 2018 at 5:50 PM, Matti Picus  wrote:

> At the recent NumPy sprint at BIDS (thanks to those who made the trip) we
> spent some time brainstorming about a roadmap for NumPy, in the spirit of
> similar work that was done for Jupyter. The idea is that a document with
> wide community acceptance can guide the work of the full-time developer(s),
> and be a source of ideas for expanding development efforts.
>
> I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap,
> and hope to discuss it at a BOF session during SciPy in the middle of July
> in Austin.
>
> Eventually it could become a NEP or formalized in another way.
>
> Matti
>

Under maintenance we could add something about the transition to Python 3,
in particular cleaning up the code and updating the documentation examples.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

2018-06-01 Thread Marten van Kerkwijk
Hi Nathaniel,

On Matt's prompting, I added release notes to the frozen/flexible PR [1];
see text attached below.

Having done that, I felt the examples actually justified the frozen
dimensions quite well. Given that you're the who expressed most doubts
about them, could you have a look? Ideally, I'd avoid having to write a NEP
for this, and the examples do seem to make it quite obvious that this
change to the signature is the way to go, as its meaning is dead obvious.
And the implementation is super-straightforward...

For the broadcasted core dimensions, I do agree the case is less strong and
the meaning perhaps less obvious (implementation is relatively simple), and
I think a short NEP may be called for (unless others on the list have
super-convincing use cases...). I will add here, though, that even if we
implement `all_equal` as a method on `equal`, it would still be useful to
have a signature that can actually describe it.

-- Marten

[1] https://github.com/numpy/numpy/pull/11175/files

Generalized ufunc signatures now allow fixed-size dimensions

By using a numerical value in the signature of a generalized ufunc, one can
indicate that the given function requires input or output to have dimensions
with the given size. E.g., the signature of a function that converts a polar
angle to a two-dimensional cartesian unit vector would be ``()->(2)``; that
for one that converts two spherical angles to a three-dimensional unit
vector
would be ``(),()->(3)``; and that for the cross product of two
three-dimensional vectors would be ``(3),(3)->(3)``.

Note that to the elementary function these dimensions are not treated any
differently from variable ones indicated with a letter; the loop still is
passed the corresponding size, but it can now count on that being equal to
the
fixed size given in the signature.

Generalized ufunc signatures now allow flexible dimensions
--

Some functions, in particular numpy's implementation of ``@`` as ``matmul``,
are very similar to generalized ufuncs in that they operate over core
dimensions, but one could not present them as such because they were able to
deal with inputs in which a dimension is missing. To support this, it is now
allowed to postfix a dimension name with a question mark to indicate that
that
dimension does not necessarily have to be present.

With this addition, the signature for ``matmul`` can be expressed as
``(m?,n),(n,p?)->(m?,p?)``.  This indicates that if, e.g., the second
operand
has only one dimension, for the purposes of the elementary function it will
be
treated as if that input has core shape ``(n, 1)``, and the output has the
corresponding core shape of ``(m, 1)``. The actual output array, however,
has
flexible dimension removed, i.e., it will have shape ``(..., n)``.
Similarly, if both arguments have only a single dimension, the inputs will
be
presented as having shapes ``(1, n)`` and ``(n, 1)`` to the elementary
function, and the output as ``(1, 1)``, while the actual output array
returned
will have shape ``()``. In this way, the signature thus allows one to use a
single elementary function for four related but different signatures,
``(m,n),(n,p)->(m,p)``, ``(n),(n,p)->(p)``, ``(m,n),(n)->(m)`` and
``(n),(n)->()``.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Python 3 compatible examples

2018-06-01 Thread Jarrod Millman
+1

On Fri, Jun 1, 2018 at 1:43 PM, Juan Nunez-Iglesias  wrote:
>
> On Sat, Jun 2, 2018, at 6:22 AM, Pauli Virtanen wrote:
>> For Scipy, we converted the examples in the documentation to Python 3,
>> and have essentially ignored Python 2 compatibility. So far, I remember
>> no complaints about it.
>
> I vote for what Pauli said.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Python 3 compatible examples

2018-06-01 Thread Juan Nunez-Iglesias


On Sat, Jun 2, 2018, at 6:22 AM, Pauli Virtanen wrote:
> For Scipy, we converted the examples in the documentation to Python 3,
> and have essentially ignored Python 2 compatibility. So far, I remember
> no complaints about it.

I vote for what Pauli said.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Python 3 compatible examples

2018-06-01 Thread Pauli Virtanen
pe, 2018-06-01 kello 14:17 -0600, Charles R Harris kirjoitti:
> This post is prompted by this PR  /11222>.
> It would be good to come up with a timeline and plan for rewriting
> the
> examples to be Python 3 compatible. When we do so, we should also
> make it
> assumed that `from __future__ import print_function` has been
> executed when
> the examples are executed in Python 2.7. Might want to include
> `division`
> in that future import as well.
> 
> Anyway, wanted to raise the subject. Thoughts?

For Scipy, we converted the examples in the documentation to Python 3,
and have essentially ignored Python 2 compatibility. So far, I remember
no complaints about it.

Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Python 3 compatible examples

2018-06-01 Thread Charles R Harris
Hi All,

This post is prompted by this PR .
It would be good to come up with a timeline and plan for rewriting the
examples to be Python 3 compatible. When we do so, we should also make it
assumed that `from __future__ import print_function` has been executed when
the examples are executed in Python 2.7. Might want to include `division`
in that future import as well.

Anyway, wanted to raise the subject. Thoughts?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Ralf Gommers
On Fri, Jun 1, 2018 at 9:57 AM, Stefan van der Walt 
wrote:

> Hi Ralf,
>
> On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
> > - "internal refactorings": MaskedArray yes, but the other ones no.
> > numpy.distutils and f2py are very hard to test, a big refactor pretty
> much
> > guarantees breakage. there's also not much need for refactoring, because
> > those things are not coupled to the numpy.core internals. numpy.financial
> > is simply uninteresting - we wish it wasn't there but it is, so now it
> > simply stays where it is.
>
> I want to clarify that in the current notes we put down ideas that
> prompted active discussion, even if they weren't necessarily feasible.
> I feel it is important to keep the conversation open to run its course
> until we have a good understanding of the various issues at hand.
>
> You may find that, in person, people are more willing to admit to their
> support for some "heretical" ideas than they are here on the list.
>

Thanks Stefan, good points. I totally agree that anything can be discussed.


>
> E.g., you say that the financial functions "now simply stay", but that
> promises a future of a NumPy that never shrinks, while there is
> certainly some support for allowing NumPy to contract so that we can
> release maintenance burden and allow development of other core areas
> that have been neglected for a long time.
>
> You will *always* have small, vocal proponents of any specific piece of
> functionality; that doesn't necessarily mean that such functionality
> contributes to the health of a project as a whole.
>
> So, I gently urge us carefully reconsider the narrative that nothing can
> change/be removed, and evaluate each suggestion carefully, not weighing
> only the very evident negatives but also the longer term positives.
>

I don't think there's such a narrative - e.g. the removal of np.matrix that
we've planned and getting rid of MaskedArray at some point once we have a
better new masked array implementation are *major* removals. We do plan
those things because they have major benefits. Imho "major benefits" is a
bar that needs to be passed before listing features as up for removal on a
roadmap (even a draft one).

It would be helpful maybe to find a form for the roadmap where the
essentials of such discussions (key pros/cons) can be captured. Or at least
split it in good/desirable/planned items and "wild ideas".

Re `financial`, there isn't much of a pro as far as I can tell - there's
almost zero maintenance cost now, and it doesn't hinder any of the proposed
new features. Plus it's a discussion we've had a couple of times before.

I know that the current roadmap doc is only draft, but it still says "NumPy
Roadmap" and it's the best thing we have now, so I'd prefer to not have
things there (or have them in a separate random/controversial ideas
section) that are unlikely to happen or for which it's unclear if they're
good ideas.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Gael Varoquaux
While we are in the crazy wish-list: having dtypes that are universal
enough for pandas to use them and export their columns with them would be
my crazy wish. I hope that it would help adding more uniform support for
things like categorical variables in the pydata ecosystem.

Gaël
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

2018-06-01 Thread Matthew Harrigan
Stephan, good point about use cases.  I think its still an odd fit.  For
example I think np.array_equal(np.zeros((3,3)), np.zeros((2,2))) or
np.array_equal([1], ['foo']) would be difficult or impossible to replicate
with a potential all_equal gufunc

On Thu, May 31, 2018 at 2:00 PM, Stephan Hoyer  wrote:

> On Wed, May 30, 2018 at 5:01 PM Matthew Harrigan <
> harrigan.matt...@gmail.com> wrote:
>
>> "short-cut to automatically return False if m != n", that seems like a
>> silent bug
>>
>
> I guess it depends on the use-cases. This is how np.array_equal() works:
> https://docs.scipy.org/doc/numpy/reference/generated/
> numpy.array_equal.html
>
> We could even imagine incorporating this hypothetical "equality along some
> axes with broadcasting" functionality into axis/axes arguments for
> array_equal() if we choose this behavior.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Matthew Harrigan
I would love to see gufuncs become more general.  Specifically I would like
an optional prologue and epilogue function. The prologue could potentially
1) inspect parameterized dtypes 2) kwargs 3) set non-trivial output array
sizes 4) initialize data structures 5) defer processing to other functions
(BLAS).  The epilogue function could do any clean up of data structures.

On Fri, Jun 1, 2018 at 12:57 PM, Stefan van der Walt 
wrote:

> Hi Ralf,
>
> On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
> > - "internal refactorings": MaskedArray yes, but the other ones no.
> > numpy.distutils and f2py are very hard to test, a big refactor pretty
> much
> > guarantees breakage. there's also not much need for refactoring, because
> > those things are not coupled to the numpy.core internals. numpy.financial
> > is simply uninteresting - we wish it wasn't there but it is, so now it
> > simply stays where it is.
>
> I want to clarify that in the current notes we put down ideas that
> prompted active discussion, even if they weren't necessarily feasible.
> I feel it is important to keep the conversation open to run its course
> until we have a good understanding of the various issues at hand.
>
> You may find that, in person, people are more willing to admit to their
> support for some "heretical" ideas than they are here on the list.
>
> E.g., you say that the financial functions "now simply stay", but that
> promises a future of a NumPy that never shrinks, while there is
> certainly some support for allowing NumPy to contract so that we can
> release maintenance burden and allow development of other core areas
> that have been neglected for a long time.
>
> You will *always* have small, vocal proponents of any specific piece of
> functionality; that doesn't necessarily mean that such functionality
> contributes to the health of a project as a whole.
>
> So, I gently urge us carefully reconsider the narrative that nothing can
> change/be removed, and evaluate each suggestion carefully, not weighing
> only the very evident negatives but also the longer term positives.
>
> Best regards,
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Chris Barker
On Fri, Jun 1, 2018 at 9:46 AM, Chris Barker  wrote:

> numpy is also quite a bit slower than raw python for math with (very)
> small arrays:
>

doing a bit more experimentation, the advantage is with pure python for
over 10 elements (I got bored...). but I noticed that the time for numpy
computation is pretty much constant for 2 up to around 100 elements. Which
implies that the bulk of the issue is with "startup" costs, rather than
fancy indexing or anything like that. so maybe a short cut wouldn't be
helpful.

Note if you use a list comp (the pythonic translation of an array
operation) thecrossover point is about 15 elements (in my tests, on my
machine...)

In [90]: % timeit t2 = [x * 10 for x in t]

920 ns ± 4.88 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

-CHB




> In [31]: % timeit t2 = (t[0] * 10, t[1] * 10)
> 162 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>
> In [32]: a
> Out[32]: array([ 3.4,  5.6])
>
> In [33]: % timeit a2 = a * 10
> 941 ns ± 7.95 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
>
> (I often want to so this sort of thing, not for performance, but for ease
> of computation -- say you have 2 or three coordinates that represent a
> point -- it's really nice to be able to scale or shift with array
> operations, rather than all that indexing -- but it is pretty slo with
> numpy.
>
> I've wondered if numpy could be optimized for small 1D arrays, and maybe
> even 2d arrays with a small fixed second dimension (N x 2, N x 3), by
> special-casing / short-cutting those cases.
>
> It would require some careful profiling to see if it would help, but it
> sure seems possible.
>
> And maybe scalars could be fit into the same system.
>
> -CHB
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Stefan van der Walt
Hi Ralf,

On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
> - "internal refactorings": MaskedArray yes, but the other ones no.
> numpy.distutils and f2py are very hard to test, a big refactor pretty much
> guarantees breakage. there's also not much need for refactoring, because
> those things are not coupled to the numpy.core internals. numpy.financial
> is simply uninteresting - we wish it wasn't there but it is, so now it
> simply stays where it is.

I want to clarify that in the current notes we put down ideas that
prompted active discussion, even if they weren't necessarily feasible.
I feel it is important to keep the conversation open to run its course
until we have a good understanding of the various issues at hand.

You may find that, in person, people are more willing to admit to their
support for some "heretical" ideas than they are here on the list.

E.g., you say that the financial functions "now simply stay", but that
promises a future of a NumPy that never shrinks, while there is
certainly some support for allowing NumPy to contract so that we can
release maintenance burden and allow development of other core areas
that have been neglected for a long time.

You will *always* have small, vocal proponents of any specific piece of
functionality; that doesn't necessarily mean that such functionality
contributes to the health of a project as a whole.

So, I gently urge us carefully reconsider the narrative that nothing can
change/be removed, and evaluate each suggestion carefully, not weighing
only the very evident negatives but also the longer term positives.

Best regards,
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Chris Barker
On Fri, Jun 1, 2018 at 4:43 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:


>  one thing that always slightly annoyed me is that numpy math is way
> slower for scalars than python math
>

numpy is also quite a bit slower than raw python for math with (very) small
arrays:

In [31]: % timeit t2 = (t[0] * 10, t[1] * 10)
162 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [32]: a
Out[32]: array([ 3.4,  5.6])

In [33]: % timeit a2 = a * 10
941 ns ± 7.95 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)


(I often want to so this sort of thing, not for performance, but for ease
of computation -- say you have 2 or three coordinates that represent a
point -- it's really nice to be able to scale or shift with array
operations, rather than all that indexing -- but it is pretty slo with
numpy.

I've wondered if numpy could be optimized for small 1D arrays, and maybe
even 2d arrays with a small fixed second dimension (N x 2, N x 3), by
special-casing / short-cutting those cases.

It would require some careful profiling to see if it would help, but it
sure seems possible.

And maybe scalars could be fit into the same system.

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Todd
On Fri, Jun 1, 2018, 11:27 Todd  wrote:

>
>
> On Thu, May 31, 2018, 19:50 Matti Picus  wrote:
>
>> At the recent NumPy sprint at BIDS (thanks to those who made the trip)
>> we spent some time brainstorming about a roadmap for NumPy, in the
>> spirit of similar work that was done for Jupyter. The idea is that a
>> document with wide community acceptance can guide the work of the
>> full-time developer(s), and be a source of ideas for expanding
>> development efforts.
>>
>> I put the document up at
>> https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
>> it at a BOF session during SciPy in the middle of July in Austin.
>>
>> Eventually it could become a NEP or formalized in another way.
>>
>> Matti
>>
>
>
> Some things I have seen mentioned but don't know the current plans for:
>
> * Categorical arrays
> * Releasing the GIL wherever possible
> * Using multithreading internally
> * making use of the next generation blas when available and stay involved
> in planning to make sure it supports our needs
> * Figure out where to use Cython and were not to
>

Also:

* Figure out the best way to handle strings.  This may involve multiple
approaches for different situations but the current approach may not be the
best default approach.
* Decimal and/or rational arrays
* if yes to labeled arrays, then there should probably be a pep about
label-based indexing
* A decision about how to handle numpy 2.0

>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Todd
On Thu, May 31, 2018, 19:50 Matti Picus  wrote:

> At the recent NumPy sprint at BIDS (thanks to those who made the trip)
> we spent some time brainstorming about a roadmap for NumPy, in the
> spirit of similar work that was done for Jupyter. The idea is that a
> document with wide community acceptance can guide the work of the
> full-time developer(s), and be a source of ideas for expanding
> development efforts.
>
> I put the document up at
> https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
> it at a BOF session during SciPy in the middle of July in Austin.
>
> Eventually it could become a NEP or formalized in another way.
>
> Matti
>


Some things I have seen mentioned but don't know the current plans for:

* Categorical arrays
* Releasing the GIL wherever possible
* Using multithreading internally
* making use of the next generation blas when available and stay involved
in planning to make sure it supports our needs
* Figure out where to use Cython and were not to

>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Change in default behavior of np.polyfit

2018-06-01 Thread Andreas Nußbaumer
Hi,

in [1] the scaling factor for the covariance matrix of `np.polyfit` was
discussed. The conclusion was, that it is non-standard and a patch might be
in order to correct this. Pull request [2] changes the factor from
chisq(popt)/(M-N-2) to chisq(popt)/(M-N) (with M=number of point, N=number
of parameters) essentially removing the "-2". Clearly, this changes the
result for the covariance matrix (but not the result for the polynomial
coefficients) and therefore the current behavior if `cov=True` is set.

It should be noted, that `scipy.optimize.curve_fit` also uses the
chisq(popt)/(M-N) as scaling factor (without "-2"). Therefore, the change
would remove a discrepancy.

Additionally, patch [2] adds an option that sets the scaling factor of the
covariance matrix to 1 . This can be useful in occasions, where the weights
are given by 1/sigma with sigma being the (known) standard errors of
(Gaussian distributed) data points, in which case the un-scaled matrix is
already a correct estimate for the covariance matrix.

Best,
Andreas

[1]
http://numpy-discussion.10968.n7.nabble.com/Inconsistent-results-for-the-covariance-matrix-between-scipy-optimize-curve-fit-and-numpy-polyfit-td45582.html
[2] https://github.com/numpy/numpy/pull/11197
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Marten van Kerkwijk
Hi Matti,

Thanks for sharing the roadmap. Overall, it looks very nice. A practical
question is on whether you want input via the mailing list, or should one
just edit the wiki and add questions or so?

As the roadmap mentioned interaction with python proper (and a possible
PEP): one thing that always slightly annoyed me is that numpy math is way
slower for scalars than python math - and duplicates all the function
names. It would seem to make sense to allow python's math module to be
overridden for non-python input, including arrays. That could be another
PEP...

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion