[Numpy-discussion] Re: NEP 56: array API standard support in the main numpy namespace

2024-01-16 Thread Stephan Hoyer
On Sun, Jan 7, 2024 at 8:08 AM Ralf Gommers  wrote:

> This NEP will supersede the following NEPs:
>
> - :ref:`NEP30` (never implemented)
> - :ref:`NEP31` (never implemented)
> - :ref:`NEP37` (never implemented; the ``__array_module__`` idea is
> basically
>   the same as ``__array_namespace__``)
> - :ref:`NEP47` (implemented with an experimental label in
> ``numpy.array_api``,
>   will be removed)
>

Thanks Ralf, Mateusz and Nathan for putting this together.

I just wanted to comment briefly to voice my strong support for this
proposal, and especially for marking these other NEPs as superseded. This
will go a long way towards clarifying NumPy's support for generic array
interfaces.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-22 Thread Stephan Hoyer
On Fri, Dec 22, 2023 at 12:34 PM Martin Ling  wrote:

> Hi folks,
>
> I don't follow numpy development in much detail these days but I see
> that there is a 2.0 release planned soon.
>
> Would this be an opportunity to change the behaviour of 'reduceat'?
>
> This issue has been open in some form since 2006!
> https://github.com/numpy/numpy/issues/834
>
> The current behaviour was originally inherited from Numeric, and makes
> reduceat often unusable in practice, even where it should be the
> perfect, concise, efficient solution. But it has been impossible to
> change it without breaking compatibіlity with existing code.
>
> As a result, horrible hacks are needed instead, e.g. my answer here:
> https://stackoverflow.com/questions/57694003
>
> Is this something that could finally be fixed in 2.0?


The reduceat API is certainly problematic, but I don't think fixing it is
really a NumPy 2.0 thing.

As discussed in that issue, the right way to fix that is to add a new API
with the correct behavior, and then we can think about deprecating (and
maybe eventually removing) the current reduceat method. If the new
reducebins() method were available, I would say removing reduceat() would
be appropriate to consider for NumPy 2, but we don't have the new method
with fixed behavior yet, which is the bigger blocker.


>
>
> Martin
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-09 Thread Stephan Hoyer
On Mon, Oct 9, 2023 at 2:29 PM Nathan  wrote:

> However, one thing we can do now is, for that one particular symbol that
> we know is going to be in every pickle file and probably never elsewhere,
> is intercept that one import and instead of generating a generic warning
> about np.core being deprecated, we instead make that specific version of
> the deprecation warning mentions NumpyUnpickler. I'll make sure this gets
> done.
>
> We *could* just allow that import to happen without a warning, but then
> we're stuck keeping np.core around even longer and we also will still
> generate a deprecation warning for an import from np.core if the pickle
> file happens to include any other numpy types that might generate imports
> in np.core.
>

My preferred option would be to keep restoring old NumPy pickles working
indefinitely, and also to preserve backwards compatibility for pickles
written in newer versions of NumPy. We can still do the rest of the
numpy.core cleanup, but it's OK if we keep a bit of compatibility code in
NumPy indefinitely.

I don't think warnings would help much in this case, because if somebody is
currently distributing pickled numpy arrays despite all of our warnings not
to do so, they are unlikely to go back and update their old files.

We could keep around numpy.core.multiarray as a minimal stub for only this
purpose, or potentially only define the object
numpy.core.multiarray._reconstruct.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-08-31 Thread Stephan Hoyer
On Wed, Aug 30, 2023 at 4:25 AM Ralf Gommers  wrote:

>
>
> On Tue, Aug 29, 2023 at 4:08 PM Nathan  wrote:
>
>> The NEP was merged in draft form, see below.
>>
>> https://numpy.org/neps/nep-0055-string_dtype.html
>>
>
> This is a really nice NEP, thanks Nathan! I see that questions and
> constructive feedback is still coming in on GitHub, but for now it seems
> like everyone is pretty happy with moving forward with implementing this
> new dtype in NumPy.
>
> Cheers,
> Rafl
>

To echo Ralf comments, thank you for this very well-written proposal! I
particularly appreciate the detailed consideration of how to handle
different models of missing values.

Overall, I am very excited about this work. A UTF8 dtype in NumPy is long
overdue, and will bring significant benefits to the entire scientific
Python ecosystem.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Giving deprecation of e.g. `float(np.array([1]))` a shot (not 0-d)

2023-04-20 Thread Stephan Hoyer
On Thu, Apr 20, 2023 at 9:12 AM Sebastian Berg 
wrote:

> Hi all,
>
> Unlike conversions of 0-d arrays via:
>
> float(np.array([1]))
>
> conversions of 1-D or higher dimensional arrays with a single element
> are a bit strange:
>
> float(np.array([1]))
>
> And deprecating it has come up often enough with many in favor, but
> also many worried about the possible annoyance to users.
> I decided to give the PR a shot, I may have misread the room on it
> though:
>
> https://github.com/numpy/numpy/pull/10615
>
> So if this turns out noisy (or you may simply disagree), I am happy to
> revert!
>

This looks like a great clean-up to me, thanks for giving this a try!
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: removing NUMPY_EXPERIMENTAL_ARRAY_FUNCTION env var

2023-03-10 Thread Stephan Hoyer
+1 for removing this environment variable. It was never intended to stick
around this long.

On Fri, Mar 10, 2023 at 6:48 AM Ralf Gommers  wrote:

> Hi all,
>
> In https://github.com/numpy/numpy/pull/23364 we touched on the
> NUMPY_EXPERIMENTAL_ARRAY_FUNCTION environment variable. This was a
> temporary feature during the introduction of `__array_function__` (see NEP
> 18), but we never removed it. I propose we do so now, since it is
> cumbersome to have around (see gh-23364 for one reason why). GitHub code
> search shows some usages, but that's mostly old code to explicitly enable
> it or print diagnostic info it looks like - none of it seemed relevant.
>
> In case there is any need for this functionality to disable
> `__array_function__`, then please speak up. In that case it probably
> applies to `__array_ufunc__` as well, and there should be a better way to
> do this than an undocumented environment variable with "experimental" in
> the name.
>
> Cheers,
> Ralf
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: status of long double support and what to do about it

2022-11-17 Thread Stephan Hoyer
On Thu, Nov 17, 2022 at 5:29 PM Scott Ransom  wrote:

> A quick response from one of the leaders of a team that requires 80bit
> extended precision for
> astronomical work...
>
> "extended precision is pretty useless" unless you need it. And the
> high-precision pulsar timing
> community needs it. Standard double precision (64-bit) values do not
> contain enough precision for us
> to pass relative astronomical times via a single float without extended
> precision (the precision
> ends up being at the ~1 microsec level over decades of time differences,
> and we need it at the
> ~1-10ns level) nor can we store the measured spin frequencies (or do
> calculations on them) of our
> millisecond pulsars with enough precision. Those spin frequencies can have
> 16-17 digits of base-10
> precision (i.e. we measure them to that precision). This is why we use
> 80-bit floats (usually via
> Linux, but also on non X1 Mac hardware if you use the correct compilers)
> extensively.
>
> Numpy is a key component of the PINT software to do high-precision pulsar
> timing, and we use it
> partly *because* it has long double support (with 80-bit extended
> precision):
> https://github.com/nanograv/PINT
> And see the published paper here, particularly Sec 3.3.1 and footnote #42:
> https://ui.adsabs.harvard.edu/abs/2021ApJ...911...45L/abstract
>
> Going to software quad precision would certainly work, but it would
> definitely make things much
> slower for our matrix and vector math.
>
> We would definitely love to see a solution for this that allows us to get
> the extra precision we
> need on other platforms besides Intel/AMD64+Linux (primarily), but giving
> up extended precision on
> those platforms would *definitely* hurt. I can tell you that the pulsar
> community would definitely
> be against option "B". And I suspect that there are other users out there
> as well.
>

Hi Scott,

Thanks for sharing your feedback!

Would you or some of your colleagues be open to helping maintain a library
that adds the 80-bit extended precision dtype into NumPy? This would be a
variation of Ralf's "option A."

Best,
Stephan


>
> Scott
> NANOGrav Chair
> www.nanograv.org
>
>
> --
> Scott M. RansomAddress:  NRAO
> Phone:  (434) 296-0320   520 Edgemont Rd.
> email:  sran...@nrao.edu Charlottesville, VA 22903 USA
> GPG Fingerprint: A40A 94F2 3F48 4136 3AC4  9598 92D5 25CB 22A6 7B65
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 51: Changing the Representation of NumPy Scalars

2022-10-31 Thread Stephan Hoyer
On Mon, Oct 31, 2022 at 12:51 PM Aaron Meurer  wrote:

> I like this. NumPy scalar printing is confusing to new users, who
> might think they are Python scalars. And even if you understand them,
> it's always been annoying that you have to do further introspection to
> see the dtype. I also like the longdouble change (the name float128
> has misled me in the past), and the decision to make everything
> copy-paste round-trippable.
>

Agreed, I am strongly supportive of the proposal in this NEP!


> Are there also plans to add "np." to array() and the string forms of
> other objects?
>
> Aaron Meurer
>
> On Fri, Oct 28, 2022 at 2:55 AM Sebastian Berg
>  wrote:
> >
> > Hi all,
> >
> > As mentioned earlier, I would like to propose changing the
> > representation of scalars in NumPy.  Discussion and ideas on changes
> > are much appreciated!
> >
> > The main change is to show scalars as:
> >
> > * `np.float64(3.0)`  ­instead of just `3.0`
> > * `np.True_` instead of `True`
> > * `np.void((3, 5), dtype=[('a', ' >   `(3, 5)`
> > * Use `np.` rather than `numpy.` for datetime/timedelta.
> >
> > This way it is clear for users that they are dealing with NumPy scalars
> > which behave different from Python scalars.
> > The `str()` that is given when using `print()` and the way arrays are
> > shown will be unchanged.
> >
> > The NEP draft can be found here:
> >
> > https://numpy.org/neps/nep-0051-scalar-representation.html
> >
> > and it includes more details and related changes.
> >
> > The implementation is largely finished and can be found here:
> >
> >https://github.com/numpy/numpy/pull/22449
> >
> > W are fairly late in the release cycle and the change should not block
> > other things.  So, the aim is to merge it early in the next release
> > cycle.  That way downstream has time to fix documentation is wanted.
> >
> > Depending on how discussion goes, I hope to formally propose the NEP
> > fairly soon, so that the merging the implementation doesn't need to
> > wait on NEP approval.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> >
> >
> > On Thu, 2022-09-08 at 11:38 +0200, Sebastian Berg wrote:
> > >
> > > TL;DR:  NumPy scalars representation is e.g. `34.3` instead of
> > > `float32(34.3)`.  So the representation is missing the type
> > > information.  What are your thoughts on changing that?
> > >
> > >
> > > Hi all,
> > >
> > > I am thinking about the next steps for NEP 50 (The NEP wants to fix
> > > the
> > > NumPy promotion rules, especially with respect to scalars):
> > >
> > > https://numpy.org/neps/nep-0050-scalar-promotion.html
> > >
> > > In relation to that, there was one point that Stéfan brought up
> > > previously.
> > >
> > > The NumPy scalars (representation) currently print as numbers:
> > >
> > > >>> np.float32(34.3)
> > > 34.3
> > > >>> np.uint8(5)
> > > 5
> > >
> > > That can already be confusing now.  However, it gets more problematic
> > > if NEP 50 is introduced since the behavior between a Python `34.3`
> > > and
> > > `np.float32(34.3)` would differ more than it does now (please refer
> > > to
> > > the NEP).
> > >
> > > The change would be that we should print as:
> > >
> > > float64(34.3)  (or similar?)
> > >
> > > This Email is mainly to ask for any feedback or concern on such a
> > > change.  I suspect we may have to write a very brief NEP about it.
> > >
> > > If there is little concern, maybe we could move forward such a change
> > > promptly.  Otherwise it could be moved forward together with NEP 50
> > > and
> > > take effect in a "major" release [1].
> > >
> > > Cheers,
> > >
> > > Sebastian
> > >
> > >
> > >
> > > [1] Note that for me, even a major release would hopefully not affect
> > > the majority of users or be very disruptive.
> > >
> > > ___
> > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > Member address: sebast...@sipsolutions.net
> >
> >
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: asmeu...@gmail.com
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Representation of NumPy scalars

2022-09-08 Thread Stephan Hoyer
On Thu, Sep 8, 2022 at 3:41 AM Stefano Miccoli 
wrote:

> On 8 Sep 2022, at 11:39, numpy-discussion-requ...@python.org wrote:
>
> TL;DR:  NumPy scalars representation is e.g. `34.3` instead of
> `float32(34.3)`.  So the representation is missing the type
> information.  What are your thoughts on changing that?
>
>
> This would be a VERY welcome change!
>

+1 this would be very welcome!

The current behavior is a major source of confusion. Users end up using
NumPy scalars accidentally all over the place without even realizing it,
leading to all sorts of surprising and challenging to debug bugs.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Exporting numpy arrays to binary JSON (BJData) for better portability

2022-08-27 Thread Stephan Hoyer
On Sat, Aug 27, 2022 at 9:17 AM Qianqian Fang  wrote:

> 2. a key belief
> 
> of the NeuroJSON project is that "human readability" is the single most
> important factor to decide the longevity of both codes and data. The
> human-readability of codes have been well addressed and reinforced by
> open-source/free/libre software licenses (specifically, Freedom 1
> ). but not
> many people have been paying attention to the "readability" of data.
>

Hi Qianqian,

I think you might be interested in the Zarr storage format, for exactly
this same reason: https://zarr.dev/

Zarr is focused more on "big data" but one of its fundamental strengths is
that the format is extremely simple. All the metadata is in JSON, with
arrays divided up into smaller "chunks" stored as files on disk or in cloud
object stores.

Cheers,
Stephan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal: Indexing by callables

2022-07-31 Thread Stephan Hoyer
On Sat, Jul 30, 2022 at 5:51 PM Matteo Santamaria 
wrote:

> Hi all,
>
>
>
> I’d like to open a discussion on supporting callables within
> `np.ndarray.__getitem__`. The intent is to make long function-chaining
> expressions more ergonomic by removing the need for an intermediary,
> temporary value.
>
>
>
> Instead of
>
>
>
> ```
>
> tmp = long_and_complicated_expression(arr)
>
> return tmp[tmp > 0]
>
> ```
>
>
>
> we would allow
>
>
>
> ```
>
> return long_and_complicated_expression(arr)[lambda x: x > 0]
>
> ```
>
>
>
> This feature has long been supported by pandas’ .loc accessor, where I’ve
> personally found it very valuable. In accordance with the pandas
> implementation, the callable would be required to take only a single
> argument.
>
>
>
> In terms of semantics, it should always be the case that `arr[fn] ==
> arr[fn(arr)]`.
>
>
>
> I do realize that expanding the API and supporting additional indexing
> methods is not without cost, so I open the floor to anyone who’d like to
> weigh in for/against the proposal.
>
>
Matteo, thanks for bringing up this proposal!

In my opinion, this would not be a good idea. The main reason why this
makes sense in pandas is because the pandas API is designed for "method
chaining," so being able to chain indexing is also important.

In contrast, only some NumPy functions have equivalents as methods, so
method chaining doesn't really work. More broadly, I don't really see how
it could be made to work even if we did add methods for everything, because
you almost always need to work with multiple NumPy arrays (in contrast to
the multiple arrays that can live in a single pandas DataFrame).

Best,
Stephan


> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: copy="never" discussion and no deprecation cycle?

2022-07-21 Thread Stephan Hoyer
On Mon, Jul 5, 2021 at 11:19 AM Stefan van der Walt 
wrote:

> The reason why Gagandeep started working on this is so we can have the
> never-copy behavior in the `numpy.array_api` namespace. For the `asarray`
> function there, the `copy` keyword is still boolean, with description:
>
> Whether or not to make a copy of the input. If True, always copies.
> If False, never copies for input which supports DLPack or the buffer
> protocol,
> and raises ValueError in case that would be necessary.
> If None , reuses existing memory buffer if possible, copies
> otherwise.
> Default: None.
>
> In the end I think that's better than strings, and way better than enums -
> we just can't have that in the main namespace, because we can't change what
> `False` does.
>
>
> I agree 
> that this is a good API (although not everybody else does).
> 
>
> W.r.t. NumPy's API: it could be okay to change the behavior of copy=False
> to make it more strict (no copies ever), because then at least errors will
> be raised and we can provide a message with instructions on how to fix it.
>

Resurfacing this discussion, since Sebastian asked me to comment.

After some reflection, I think my favorite solution now is True/False/None,
including a deprecation cycle to migrate existing users of copy=False to
use copy=None. This is the simplest adaptation of the existing argument,
and in many cases where users are writing copy=False they may actually not
be intending the current "maybe copy" behavior.

Strings would be appropriate if we were starting from scratch, but breaking
backwards compatibility is very problematic.

I do like enums, but I recognize that they are not currently used in
NumPy/SciPy, so they feel a little out of place, and expanding NumPy's
namespace to add more enums also has a cost. I don't think the meme I
linked to is entirely appropriate, because these aren't just three
arbitrary modes -- two of the cases here really are "yes" or "no" copy, and
the other is "maybe", which is a pretty common meaning for "None" as a
default value (and users will rarely be writing copy=None themselves).
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Speeding up isin1d and adding a "method" or similar

2022-06-17 Thread Stephan Hoyer
I think this is a great idea! I don't see any downsides here.

As for the method name, I would lean towards calling it "kind" and using a
default value of None for automatic selection, for consistency with np.sort.

On Thu, Jun 16, 2022 at 6:14 AM Sebastian Berg 
wrote:

> Hi all,
>
> there is a PR to add a faster path to `np.isin`, that uses a look-up-
> table for all the elements that are included in the haystack
> (`test_elements`):
>
> https://github.com/numpy/numpy/pull/12065/files
>
> Such a table means that the memory overhead can be very significant,
> but the speedup as well, so there was the idea of adding an option to
> pick which version is used.
>
> The current documentation for this new `method` keyword argument would
> be.  So the main questions are:
>
> * Is there any concern about adding such a new kwarg?
> * Is `method` the best name?  Sorts uses `kind` which may also be good
>
> There is also the smaller question of what heuristic 'auto' would use,
> but that can be tweaked at any time.
>
> ```
>method : {'auto', 'sort', 'dictionary'}, optional
>  The algorithm to use. This will not affect the final result,
>  but will affect the speed. Default is 'auto'.
>
>  - If 'sort', will use a mergesort-based approach. This will have
>a memory usage of roughly 6 times the sum of the sizes of
>`ar1` and `ar2`, not accounting for size of dtypes.
>  - If 'dictionary', will use a key-dictionary approach similar
>to a counting sort. This is only available for boolean and
>integer arrays. This will have a memory usage of the
>size of `ar1` plus the max-min value of `ar2`. This tends
>to be the faster method if the following formula is true:
>`log10(len(ar2)) > (log10(max(ar2)-min(ar2)) - 2.27) / 0.927`,
>but may use greater memory.
>  - If 'auto', will automatically choose the method which is
>expected to perform the fastest, using the above
>formula. For larger sizes or smaller range,
>'dictionary' is chosen. For larger range or smaller
>sizes, 'sort' is chosen.`
> ```
>
> Cheers,
>
> Sebastian
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Dropping the pdf documentation.

2022-05-22 Thread Stephan Hoyer
On Sun, May 22, 2022 at 3:52 PM Rohit Goswami 
wrote:

> Being very hard to read should not be reason enough to stop generating
> them. In places with little to no internet connectivity often the PDF
> documentation is invaluable.
>
The HTML docs can also be downloaded for offline use.

Perhaps someone has access to analytics from numpy.org that can tell us how
often the PDF docs are viewed? I believe PDFs could be more convenient for
some use-cases, but I don't think it's worth the trouble of the separate
rendering pipeline for a relatively niche use-case.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Dropping the pdf documentation.

2022-05-22 Thread Stephan Hoyer
+1 let’s drop the PDF docs. They are already very hard to read.

On Sun, May 22, 2022 at 1:06 PM Charles R Harris 
wrote:

> Hi All,
>
> This is a proposal to drop the generation of pdf documentation and only
> generate the html version. This is a one way change due to the difficulty
> maintaining/fixing the pdf versions. See minimal discussion here
> .
>
> Chuck
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Exposing `from_dlpack` to the main namespace

2022-03-08 Thread Stephan Hoyer
On Tue, Mar 8, 2022 at 8:27 AM Stefan van der Walt 
wrote:

> In other places in the ecosystem, like pandas and xarray, `from_x` and
> friends live as static methods on their respective classes.  Any reason not
> to add this as `numpy.array.from_dlpack`? We may also want to consider
> adding all the other `from*`'s there and deprecating the original usage
> (without removing it).
>

Pandas and Xarray make almost everything else a method, too, and encourage
using "method chaining" for manipulating datasets. So I'm not sure they are
great precedents here.

I think static/class methods are a fine way to write constructors, but are
not inherently superior. My vote would be to keep it as a function for
consistency with existing numpy constructors like frombuffer. It might even
make sense to call it np.fromdlpack, though the underscore really does
increase readability.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: [Job] NumPy Open Source Developer at NVIDIA

2022-03-02 Thread Stephan Hoyer
Hi Inessa -- could you share the original job description? It looks like it
got lost from your message :)

On Wed, Mar 2, 2022 at 12:28 PM Inessa Pawson  wrote:

> Hi, Mike!
> This is wonderful news! NumPy could certainly use more help.
>
> Cheers,
> Inessa
>
> Inessa Pawson
> Contributor Experience Lead | NumPy
> email: ine...@albuscode.org
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Writing an array subclass in a compiled language with proper dispatch

2022-01-10 Thread Stephan Hoyer
There are no C-level APIs for __array_function__ or __array_ufunc__, so
yes, at a high-level Python methods will be invoked by NumPy.

That said, NumPy's logic for handling __array_function__ and
__array_ufunc__ methods is written in highly optimized C. If you wrote your
own __array_function__ and __array_ufunc__ methods on Quantity using C, and
there should be very little overhead from Python. I would guess you might
see a considerable speed-ups due to moving highly dynamic Python logic into
a compiled language.


On Mon, Jan 10, 2022 at 12:56 PM Juan Luis Cano Rodríguez <
hello@juanlu.space> wrote:

> Hi all,
>
> I am a long time user of astropy.units, which allows one to define
> quantities with physical units as follows:
>
> >>> from astropy import units as u
> >>> 10 << u.cm
> 
> >>> np.sqrt(4 << u.m ** 2)
> 
> >>> ([1, 1, 0] << u.m) @ ([0, 10, 20] << u.cm / u.s)
> 
> >>> (([1, 1, 0] << u.m) * ([0, 10, 20] << u.cm / u.s)).to(u.m ** 2 / u.s)
> 
>
> The mechanism works by subclassing numpy.ndarray and leveraging
> __array_function__ support aka NEP 18. Internally it is something like this:
>
> >>> v = np.array(10, dtype=np.float64, copy=False, order=None, subok=True,
> ndmin=0)
> >>> vu = v.view(u.Quantity)
> >>> vu._set_unit(u.cm)
> >>> vu
> 
>
> However, over the years I have been constantly annoyed by the fact that it
> is tremendously slow. I'm not critizing Astropy devs, the problem seems
> objectively difficult: although some code paths could be optimized at the
> cost of losing some syntactic sugar or breaking backwards compatibility,
> `isinstance` calls and introspection in general are slow.
>
> Setting aside the question of trying to make astropy.units faster (which
> may or may not be possible), I was thinking how feasible could it be to
> implement something similar, but using a compiled language instead (C,
> Cython, Rust, whatever) and leveraging "modern" dispatch mechanisms. But
> after reading about NEP 18, NEP 47, uarray, and various pull requests and
> issues here and there (
> https://labs.quansight.org/blog/2021/11/pydata-extensibility-vision/ and
> https://github.com/scipy/scipy/issues/10204#issuecomment-787067947 among
> others) I don't fully grasp the differences between the approaches, and I
> don't know if what I am proposing is feasible at all. Since IIUC the numpy
> function or ufunc is passed to __array_function__ and __array_ufunc__
> respectively, I am not sure how would that interact with the code being in
> a foreign language (I assume the NumPy C API would have to be used).
>
> If folks have advice, ideas, or suggestions for a direction, I'll be happy
> to read them.
>
> Best!
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fwd: ndarray should offer __format__ that can adjust precision

2021-12-03 Thread Stephan Hoyer
On Fri, Dec 3, 2021 at 12:07 PM Sebastian Berg 
wrote:

> This discussion has recently surfaced again and I am wondering what the
> stance on it is for people?
>
> The PR is: https://github.com/numpy/numpy/pull/19550
>
> I.e. that things like "f{arr:.2f}" would be enabled in certain cases,
> at least for all floating point and complex values.
> I am wondering more about the general API progression here, since I do
> not think we have any prior art to compare to.
>
> * NumPy arrays are N-D objects (containers), do we want f/e formatting
>   to work for it?
>

Yes, I imagine this could be quite handy -- way nicer than figuring out the
syntax for np.array2string. I would use this functionality myself.


> * NumPy printing has a lot more option than just how to format each
>   element.  Are we happy to say that implemeting `.2f` is fine without
>   unlocking other things?
>

If we want to add support for custom whole array formatting in the future,
I think it would be reasonable to constrain ourselves to backwards
compatible extensions of elementwise formatting.


> * Some formatting might come with an expectation that the result has
>   that length `f"{3.123:30e}"` gas a length of 30, but for an
>   array that is obviously not true?  Do we care about that?
>

I'm not concerned about this.

If you aren't checking the types of arguments that you are trying to format
today, you are already going to encounter surprising errors when string
formatting fails.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal - Making ndarray object JSON serializable via standardized JData annotations

2021-11-25 Thread Stephan Hoyer
Hi Qianqian,

What is your concrete proposal for NumPy here?

Are you suggesting new methods or functions like to_json/from_json in NumPy
itself? As far as I can tell, reading/writing in your custom JSON format
already works with your jdata library.

Best,
Stephan

On Thu, Nov 25, 2021 at 2:35 PM Qianqian Fang  wrote:

> Dear numpy developers,
>
> I would like to share a proposal on making ndarray JSON serializable by
> default, as detailed in this github issue:
>
> https://github.com/numpy/numpy/issues/20461
>
>
> briefly, my group and collaborators are working on a new NIH (National
> Institute of Health) funded initiative - NeuroJSON (http://neurojson.org)
> - to further disseminate a lightweight data annotation specification (
> JData
> )
> among the broad neuroimaging/scientific community. Python and numpy have
> been widely used  in
> neuroimaging data analysis pipelines (nipy, nibabel, mne-python, PySurfer
> ... ), because N-D array is THE most important data structure used in
> scientific data. However, numpy currently does not support JSON
> serialization by default. This is one of the frequently requested features
> on github (#16432, #12481).
>
> We have developed a lightweight python modules (jdata
> , bjdata
> ) to help export/import ndarray objects
> to/from JSON (and a binary JSON format - BJData
> 
> /UBJSON  - to gain efficiency). The approach is to
> convert ndarray objects to a dictionary  with subfields using standardized
> JData annotation tags. The JData spec can serialize complex data structures
> such as N-D arrays (solid, sparse, complex). trees, graphs, tables etc. It
> also permits data compression. These annotations have been implemented in
> my MATLAB toolbox - JSONLab  - since
> 2011 to help import/export MATLAB data types, and have been broadly used
> among MATLAB/GNU Octave users.
>
> Examples of these portable JSON annotation tags representing N-D arrays
> can be found at
>
>
> http://openjdata.org/wiki/index.cgi?JData/Examples/Basic#2_D_arrays_in_the_annotated_format
> http://openjdata.org/wiki/index.cgi?JData/Examples/Advanced
>
> and the detailed formats on N-D array annotations can be found in the spec:
>
>
> https://github.com/NeuroJSON/jdata/blob/master/JData_specification.md#annotated-storage-of-n-d-arrays
>
>
> our current python module to encode/decode ndarray to JSON serializable
> forms are implemented in these compact functions (handling lossless
> type/data conversion and data compression)
>
>
> https://github.com/NeuroJSON/pyjdata/blob/63301d41c7b97fc678fa0ab0829f76c762a16354/jdata/jdata.py#L72-L97
>
> https://github.com/NeuroJSON/pyjdata/blob/63301d41c7b97fc678fa0ab0829f76c762a16354/jdata/jdata.py#L126-L160
>
> We strongly believe that enabling JSON serialization by default will
> benefit the numpy user community, making it a lot easier to share complex
> data between platforms (MATLAB/Python/C/FORTRAN/JavaScript...) via a
> standardized/NIH-backed data annotation scheme.
>
> We are happy to hear your thoughts, suggestions on how to contribute, and
> also glad to set up dedicated discussions.
>
> Cheers
>
> Qianqian
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Putting in `np.ma.ndenumerate` MaskedArray specific ndenumerate

2021-11-17 Thread Stephan Hoyer
I think a separate ndenumerate() in the masked array namespace would make a
lot of sense. This is much less risky than changing np.ndenumerate().

On Wed, Nov 17, 2021 at 11:54 AM Andras Deak  wrote:

> On Wed, Nov 17, 2021 at 8:35 PM Sebastian Berg 
> wrote:
>
>> On Wed, 2021-11-17 at 19:49 +0100, Andras Deak wrote:
>> > On Wed, Nov 17, 2021 at 7:39 PM Sebastian Berg
>> > 
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > the `np.ndenumerate` does not work well for masked arrays (like
>> > > many
>> > > main namespace functions, it simply ignores/drops the mask).
>> > >
>> > > There is a PR (https://github.com/numpy/numpy/pull/20020) to add a
>> > > version of it to `np.ma` (masked array specific).  And we thought
>> > > it
>> > > seemed reasonable and were planning on putting it in.
>> > >
>> > > This version skips all masked elements.  An alternative could be to
>> > > return `np.ma.masked` for masked elements?
>> > >
>> > > So if anyone thinks that may be the better solution, please send a
>> > > brief mail.
>> > >
>> >
>> > Would it be a bad idea to add a kwarg that specifies this behaviour
>> > (i.e.
>> > offering both alternatives)? Assuming people might need the masked
>> > items to
>> > be there under certain circumstances. Perhaps when zipping masked
>> > data with
>> > dense data?
>> >
>>
>> Sure, if you agree the default should be skipping, I guess we are OK
>> with adding it? ;)
>>
>
> I don't actually use masked arrays myself, nor ndenumerate, so I'm very
> forgiving in this matter...
> But if both use cases are plausible (_if_, although I can indeed imagine
> that this is the case), supporting both seems straightforward. Considering
> the pure python implementation it wouldn't be a problem to expose both
> functionalities.
>
> András
>
>
>
>> Cheers,
>>
>> Sebastian
>>
>>
>> > András
>> >
>> >
>> >
>> > > (Personally, I don't have opinions on masked arrays for the most
>> > > part.)
>> > >
>> > > Cheers,
>> > >
>> > > Sebastian
>> > > ___
>> > > NumPy-Discussion mailing list -- numpy-discussion@python.org
>> > > To unsubscribe send an email to numpy-discussion-le...@python.org
>> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> > > Member address: deak.and...@gmail.com
>> > >
>> > ___
>> > NumPy-Discussion mailing list -- numpy-discussion@python.org
>> > To unsubscribe send an email to numpy-discussion-le...@python.org
>> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> > Member address: sebast...@sipsolutions.net
>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: deak.and...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Idea: adding `ndarray.isin` for consistency with pandas

2021-10-11 Thread Stephan Hoyer
Thanks for the suggestion.

It isn't a hard rule, but the ndarray namespace is already very large and
cluttered, so generally we have avoided adding new methods in recent years.
(This policy might be worth codifying in a NEP.)

On Mon, Oct 11, 2021 at 7:35 AM Max Ghenis  wrote:

> `numpy.isin` is equivalent to `pandas.{Series,DataFrame}.isin`. Adding
> `numpy.ndarray.isin` would produce consistency with `pandas` and save 4
> characters (`np.isin(x, y)` vs. `x.isin(y)`).
>
> I'm adding this idea here in conjunction with filing
> https://github.com/numpy/numpy/issues/20092 per the instructions.
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: spam on the mailing lists

2021-10-01 Thread Stephan Hoyer
On Fri, Oct 1, 2021 at 10:21 AM Ilhan Polat  wrote:

> > GitHub Discussions is more of a Q platform, like Stackoverflow. I
> don't think it really makes sense for free form discussion.
>
>  I don't see how it is to be honest. I'm hearing this complaint quite
> often but I can't see how that is. That's quite not my experience.
> Especially in node.js repo and other participants of the discussions beta
> are quite happy with it.
>
> Maybe I should rephrase why I am mentining this; Very often, some thing is
> popping up in the issues asking for whether X is suitable for Sci/NumPy and
> we lead the user here and more often than not they don't follow up. I can't
> blame them because the whole mailing list experience especially for the
> newcomers is a dreadful experience and most of the time you don't get any
> feedback. Also you can't move because in the issue we told them to come
> here and nobody is interested, then things stop unless someone nudges the
> repo issue which was the idea in the first place. So in a way we are
> putting this barrier as in "go talk to the elders in the mountain and bring
> some shiny gems on your way back" which makes not much sense. We are using
> the issues and PRs anyways to discuss stuff  willingly or not so I can't
> say I follow the argument for the holistic mailing list format. This
> doesn't mean that I ignore the convenience because that was the case in the
> last decades. I'm totally fine with it. But if we are going to move it
> let's make it count not switch to an identical platform just for the sake
> of it. If not Github then something actually encourages the community to
> join and not getting in the way.
>

I agree, "go talk to the elders in the mountain" is not a great experience.

One of the other problems about mailing lists is that it's awkward or
impossible to ping old discussions. E.g., if you find a mailing list thread
discussing an issue from two years ago, you pretty much have to start a new
thread to discuss it.

I think GitHub discussions is a perfectly fine web-based platform and
definitely an improvement over a mailing list, but do like Discourse a
little better. It's literally one click for a user to sign up to post on
Discourse if they already have a GitHub account.



> On Fri, Oct 1, 2021 at 6:31 PM Stephan Hoyer  wrote:
>
>> On Fri, Oct 1, 2021 at 8:55 AM Matthew Brett 
>> wrote:
>>
>>> Only to say that:
>>>
>>> * I used to have a very firm preference for mail, because I'm pretty
>>> happy with Gmail as a mail interface, and I didn't want to have
>>> another channel I had to monitor, but
>>> * I've spent more time on Discourse over the last year, mainly on
>>> Jupyter, but I have also set up instances for my own projects.  I now
>>> have a fairly strong preference for Discourse, because of its very
>>> nice Markdown authoring, pleasant web interface for reviewing
>>> discussions and reasonable mailing list mode.
>>>
>>
>> +1 Markdown support, the ability to edit/delete posts, a good web
>> interface and the possibility for new-comers to jump into an ongoing
>> discussion are all major advantages to Discourse.
>>
>> I am not concerned about spam management or moderation. NumPy-Discussion
>> is not a very popular form, and we have plenty of mature contributors to
>> help moderate.
>>
>>
>>> * I have hardly used Github Discussions, so I can't comment on them.
>>> Are there large projects that are happy with them?   How does that
>>> compare to Discourse, for example?
>>>
>>
>> GitHub Discussions is more of a Q platform, like Stackoverflow. I don't
>> think it really makes sense for free form discussion.
>>
>>
>>> * It will surely cause some harm if it is not clear where discussions
>>> happen, mainly (mailing list, Discourse, Github Discussions) so it
>>> seems to me better to decide on one standard place, and commit to
>>> that.
>>>
>>
>> +1 let's pick a place and stick to it!
>>
>>
>>>
>>> Cheers,
>>>
>>> Matthew
>>>
>>> On Fri, Oct 1, 2021 at 4:39 PM Rohit Goswami 
>>> wrote:
>>> >
>>> > I’m firmly against GH discussions because of the upvoting mechanism.
>>> We don’t need to be Reddit or SO. .NET had a bad experience with the
>>> discussions as well [1].
>>> >
>>> > [1] https://github.com/dotnet/aspnetcore/issues/29935
>>> >
>>> > — Rohit
>>> >
>>> > On 1 Oct 2021, at 15:04, Andras Deak wrote:
>>> >
>>> > On Fri, Oct 1, 2021 at 4:27 

[Numpy-discussion] Re: spam on the mailing lists

2021-10-01 Thread Stephan Hoyer
On Fri, Oct 1, 2021 at 8:55 AM Matthew Brett 
wrote:

> Only to say that:
>
> * I used to have a very firm preference for mail, because I'm pretty
> happy with Gmail as a mail interface, and I didn't want to have
> another channel I had to monitor, but
> * I've spent more time on Discourse over the last year, mainly on
> Jupyter, but I have also set up instances for my own projects.  I now
> have a fairly strong preference for Discourse, because of its very
> nice Markdown authoring, pleasant web interface for reviewing
> discussions and reasonable mailing list mode.
>

+1 Markdown support, the ability to edit/delete posts, a good web interface
and the possibility for new-comers to jump into an ongoing discussion are
all major advantages to Discourse.

I am not concerned about spam management or moderation. NumPy-Discussion is
not a very popular form, and we have plenty of mature contributors to help
moderate.


> * I have hardly used Github Discussions, so I can't comment on them.
> Are there large projects that are happy with them?   How does that
> compare to Discourse, for example?
>

GitHub Discussions is more of a Q platform, like Stackoverflow. I don't
think it really makes sense for free form discussion.


> * It will surely cause some harm if it is not clear where discussions
> happen, mainly (mailing list, Discourse, Github Discussions) so it
> seems to me better to decide on one standard place, and commit to
> that.
>

+1 let's pick a place and stick to it!


>
> Cheers,
>
> Matthew
>
> On Fri, Oct 1, 2021 at 4:39 PM Rohit Goswami 
> wrote:
> >
> > I’m firmly against GH discussions because of the upvoting mechanism. We
> don’t need to be Reddit or SO. .NET had a bad experience with the
> discussions as well [1].
> >
> > [1] https://github.com/dotnet/aspnetcore/issues/29935
> >
> > — Rohit
> >
> > On 1 Oct 2021, at 15:04, Andras Deak wrote:
> >
> > On Fri, Oct 1, 2021 at 4:27 PM Ilhan Polat  wrote:
> >>
> >> The reason why I mentioned GH discussions is that literally everybody
> who is engaged with the code, is familiar with the format, included in the
> codebase product and has replies in built unlike the Discourse (opinion is
> mine) useless flat discussion design where replies are all over the place
> just like the mailing list in case you are not using a tree view supporting
> client. Hence topic hijacking is one of the main usability difficulties of
> emails.
> >>
> >> The goal here is to have a coherent engagement with everyone not just
> within a small circle, such that there is indeed a discussion happening
> rather than a few people chiming in. It would be a nice analytics exercise
> to have how many active users using these lists. I'd say 20-25 max for
> contribs and team members which is really not much. I know some people are
> still using IRC and mailing lists but I wouldn't argue that these are the
> modern media to have proper engaging discussions. "Who said to whom" is the
> bread and butter of such discussions. And I do think that discourse is
> exactly the same thing with mailing lists with a slightly better UI while
> virtually everyone else in the world is doing replies.
> >
> >
> > (There are probably a lot of users like myself who follow the mailing
> list discussions but rarely feel the need to speak up themselves. Not that
> this says much either way in the discussion, just pointing it out).
> >
> > I'm not intimately familiar with github discussions (I've only used it a
> few times), but as far as I can tell it only has answers (or "comments")
> and comments (or "replies") on answers, i.e. 2 levels of replies rather
> than a flat single level of replies. If this is indeed the case then I'm
> not sure it's that much better than a flat system, since when things really
> get hairy then 2 levels are probably also insufficient to ensure "who said
> to whom". The "clear replies" argument would hold stronger (in my
> peanut-gallery opinion) for a medium that supports full reply trees like
> many comment sections do on various websites.
> >
> > András
> >
> >>
> >> I would be willing to help with the objections raised since I have been
> using GH discussions for quite a while now and there are many tools
> available for administration of the discussions. For example,
> >>
> >>
> https://github.blog/changelog/2021-09-14-notification-emails-for-discussions/
> >>
> >> is a recent feature. I don't work for GitHub obviously and have nothing
> to do with them but the reasons I'm willing to hear about.
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Oct 1, 2021 at 3:07 PM Matthew Brett 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> On Fri, Oct 1, 2021 at 1:57 PM Rohit Goswami 
> wrote:
> >>> >
> >>> > I guess then the approach overall would evolve to something like
> using the mailing list to announce discourse posts which need input. Though
> I would assume that the web interface essentially makes the mailing list
> almost like discourse, even for new users.
> >>> >
> >>> > The real issue IMO is 

[Numpy-discussion] Re: spam on the mailing lists

2021-09-30 Thread Stephan Hoyer
On Wed, Sep 29, 2021 at 7:55 PM Juan Nunez-Iglesias 
wrote:

> On scikit-image, we've moderated *subscriptions*, and only subscribers can
> post without being moderated, but still spam gets through, because it's
> hard to vet whether an email address is "genuine", so sometimes we allow
> the subscription, and immediately receive resulting spam. A "reputation"
> system in which the first, say, 3 emails from a user are moderated would be
> most useful. I don't think mailman provides this yet. (?)
>
> My personal impression is that users/newcomers (as opposed to long-time
> core developers) definitely prefer Discourse to email. I also think it is a
> better experience browsing/searching the archives than mailman, despite
> recent improvements to the latter. So, without wanting to minimise the
> downsides, I do think that putting everyone on Discourse is the best
> approach forward. From what I can tell, none of the mailing lists that have
> made the move (Numba, Bokeh, Matplotlib) have regretted it.
>
> Juan.
>

+1 I do think Discourse would be a significantly better experience for new
users, and would make it easier to get newcomers involved in NumPy. It's
simply a much more accessible tool than mailing lists.


> On Wed, 29 Sep 2021, at 5:11 PM, Matti Picus wrote:
> > On 29/9/21 9:07 pm, Stefan van der Walt wrote:
> >> On Wed, Sep 29, 2021, at 03:02, Ralf Gommers wrote:
> >>> We don't have admin access to the python.org 
> >>> lists, so this is a bit of a problem. We have never had a spam
> >>> problem, so we can ask to block this user first. If it continues to
> >>> happen, we may be able to moderate new subscriber emails, but we do
> >>> need to ask for permissions first and I'm not sure we'll get them.
> >>>
> >>> A better solution longer term is migrating to Discourse, which has
> >>> far better moderation tools than Mailman and is also more
> >>> approachable for people not used to mailing lists (which is most
> >>> newcomers to open source). Migrating is a bit of a pain, but with the
> >>> new CZI grant having a focus on improving the contributor experience,
> >>> we should be able to do this.
> >>
> >> I would like to offer the use of https://discuss.scientific-python.org
> >> . I would be happy to handle
> >> email list migration, and have created the following two categories
> >> for NumPy discussion:
> >>
> >> User discussion: https://discuss.scientific-python.org/c/user/numpy
> >> 
> >> Contributor discussion:
> >> https://discuss.scientific-python.org/c/contributor/numpy
> >> 
> >>
> >> We're happy to support this as part of the Scientific Python ecosystem
> >> grant, and will give admin rights to anyone on the NumPy developer
> >> team who wants to help manage / moderate discussions.
> >>
> >> Of course, we can also just delete these if the team prefers to have
> >> their discussions somewhere else.  But I think there is a benefit to
> >> bringing community discussions together in one place.
> >>
> >> Best regards,
> >> Stéfan
> >
> >
> > Thanks for the offer to host the discussions on discourse. Personally, I
> > find the email interface to discourse very clunky. I would prefer we
> > exhaust the possibilities to stay on e-mail only before moving to
> discourse.
> >
> > Matti
> >
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: j...@fastmail.com
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


Re: [Numpy-discussion] deprecating float(x) for ndim > 0

2021-09-15 Thread Stephan Hoyer
On Wed, Sep 15, 2021 at 5:18 AM Nico Schlömer 
wrote:

> Hi everyone,
>
> This is seeking input on PR [1] which I've worked on with @eric-wieser
> and @seberg. It deprecates
> ```
> float(x)
> ```
> if `x` is an array of ndim > 0. (It works with all arrays of size 1
> right now.) This aligns the behavior of float() on ndarrays with
> float() on lists which already fails today. It also deprecates the
> implicit conversion to float in assignment expressions like
> ```
> a = np.array([1, 2, 3])
> a[0] = [5]  # deprecated, should be a[0] = 5
> ```
> In general, the PR makes numpy a tad bit stricter on how it treats
> scalars vs. single-item arrays.
>
> The change also prevents the #1 wrong usage of float(), namely for
> extracting the scalar value from an array. One should rather use
> `x[0]` or `x.item()` to that which doesn't convert the value to a
> Python float.
>

Hi Nico,

I think this is a great idea! Another good alternative to mention is
explicitly calling .squeeze() first, to remove all size 1 dimensions.

Cheers,
Stephan


> To estimate the impact of the PR, I looked at major numpy dependents
> like matplotlib, scipy, pandas etc., and of course numpy itself.
> Except scipy, all projects were virtually clean to start with. Scipy
> needed some changes for all tests to pass without warning, and all of
> the changes were improvements. In particular, the deprecation
> motivates users to use actual scalars when scalars are needed, e.g.,
> in the case of scipy, as the return value of a goal functional.
>
> It'd be great if you could try the branch against your own project and
> let us know (here or in the PR) about and problems that you might
> have.
>
> Thanks!
> Nico
>
> [1] https://github.com/numpy/numpy/pull/10615
> [2] https://github.com/numpy/numpy/issues/10404
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?

2021-06-24 Thread Stephan Hoyer
On Thu, Jun 24, 2021 at 1:03 AM Ralf Gommers  wrote:

> I agree with this. Enums are nice _in theory_, but once you start using
> them you quickly figure out they're clunky, plus the all-caps looks bad
> (I'd consider ignoring that style recommendation). For API design they
> don't make all that much sense compared to "here's a list of strings we
> accept, and everything else raises an informative error". The only reasons
> I can think of to use them are:
>
> 1. Cases like never-copy, when there's a reason to have an object we can
> add a method too (`__bool__` here)
> 2. There's a long list of options and we want to give users  a way to
> explore or iterate over those, so a public object is useful. so cases where
> we'd otherwise use a class (instance) instead of documenting the string
> options. I can't think of many examples like this, padding modes for
> `scipy.ndimage.convolve` is the only one that comes to mind.
>

I think Enums are a very clean abstraction for capturing a discrete set of
options in a type-safe way, both at runtime and with static checks. You
also don't have to keep lists of strings in sync, which makes them a little
easier to document.

That said, I agree that in most cases the overall benefits are rather
marginal. I don't think it's worth a mass migration of existing NumPy
functions, which uses strings for categorical options.

In this particular case, I think there is a clear advantage to using an
enum, to avoid inadvertent bugs with old versions of NumPy.


> In general I don't expect we'd need (m)any more. Hence I'd suggest adding
> a new namespace like `np.flags` is not a good idea. Right now all we need
> is a single object, if we end up going the enum route.
>
> For this one, I'd say it kinda looks like we do need one, so then  let's
> just add one and be done with it, rather than inventing odd patterns like
> tacking enum members onto an existing function.
>

I agree with both of these. If we're only going to add a couple of enums,
it's not worth worrying about a couple of extra objects polluting NumPy's
namespace. I would just add np.CopyMode, rather than inventing a new design
pattern.

At some point in the future, we might either:
(1) switch the interface to use strings, in which case we would stop
recommending/documenting CopyMode (like plenty of other top level objects
in the NumPy namespace)
(2) add many more enums, in which case we can consider assigning enums as
function attributes or putting them in a namespace. But so far the only
other enum I've heard suggested is np.ClipMode. Adding two enums to the
NumPy namespace would hardly make a difference at this point, given how
many objects are already there.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?

2021-06-17 Thread Stephan Hoyer
On Thu, Jun 17, 2021 at 1:29 PM Matti Picus  wrote:

> On 16/6/21 11:00 pm, Sebastian Berg wrote:
>
> > Hi all,
> >
> > (sorry for the length, details/discussion below)
> >
> > On the triage call, there seemed a preference to just try to skip the
> > deprecation and introduce `copy="never"`, `copy="if_needed"`, and
> > `copy="always"` (i.e. string options for the `copy` keyword argument).
> >
>
> Why this may be controversial: today someone could be calling
> |'||np.array(..., copy="never")', which would call| '|bool("never")',
> which would evaluate to 1, and would end up| doing the exact opposite of
> never. So their code is wrong, and they do not know it but are used to
> the error. If we change this, it would silently fix their code to do
> what they intended.
>
>
> Is that a correct reading of the problem?
>
> If so, I am in favor of the proposal to use string options in addition
> to boolean options.
>

No, we aren't really concerned about users who write np.array(...,
copy='never') today. This currently means np.array(..., copy=True), which
is slightly unfortunate but not really a big deal.

The big concern is users who will write np.array(..., copy='never') in the
future, when it becomes supported by NumPy, but their code gets run on an
older version of NumPy, in which case it silently works in a different way.

This happens all the time. Even if we make copy='never' an error *today*,
users will be encountering existing versions of NumPy for years into the
future, so we won't be able to change the behavior of copy='never' for a
very long time. Our deprecation policy says we would need to wait at least
one year for this, but frankly I'm not sure that's enough for
the possibility of silent bugs. 3-4 years might be more realistic.

Eric's concerns about existing uses of "if copy" inside NEP 18 overloads is
another good point, though there may be relatively few users of this
feature today given that np.array() is only recently overridable (via
"like").

Overall, I think using an enum is the happiest situation. It's a slightly
awkward API, to be sure, but not very awkward in the scheme of things, and
it's better than needing to wait a long time for a final resolution.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?

2021-06-16 Thread Stephan Hoyer
On Wed, Jun 16, 2021 at 1:01 PM Sebastian Berg 
wrote:

> 2. We introduce `copy="never"`, `copy="if_needed"` and `copy="always"`
>as strings (all other strings will be a `TypeError`):
>
>* Problem: `copy="never"` currently means `copy=True` (the opposite)
>Which means new code has to take care when it may run on
>older NumPy versions.  And in theory could make old code
>return the wrong thing.
>

To me, this seems like a big problem.

People try to use newer NumPy features on old versions of NumPy all the
time. This works out OK if they get error messages, but we shouldn't add
new features that silently do something else on old versions -- especially
for recent old versions.

In particular, both copy='if_needed' and copy='never' would mean
copy='always' on old versions of NumPy. This seems bad -- basically the
exact opposite of what the user explicitly requested. These sort of bugs
can be quite challenging to track down.

So in my opinion (1) and (3) are the only real options.


> 3. Same as 2. But we take it very slow: Make strings an error right now
>and only introduce the new options after two releases as per typical
>deprecation policy.
>
>
> ## Discussion
>
> We discussed it briefly today in the triage call and we were leaning
> towards strings.
>
> I was honestly expecting to converge to option 3 to avoid compatibility
> issues (mainly surprises with `copy="never"` on older versions).
> But considering how weird it is to currently pass `copy="never"`, the
> question was whether we should not change it with a release note.
>
> The probability of someone currently passing exactly one of those three
> (and no other) strings seems exceedingly small.
>
> Personally, I don't have a much of an opinion.  But if *nobody* voices
> any concern about just changing the meaning of the string inputs, I
> think the current default may be to just do it.
>
> Cheers,
>
> Sebastian
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: MyGrad 2.0 - Drop-in autodiff for NumPy

2021-04-18 Thread Stephan Hoyer
On Sun, Apr 18, 2021 at 9:11 AM Ryan Soklaski  wrote:

> MyGrad is not meant to "compete" with the likes of PyTorch and JAX, which
> are fantastically-fast and powerful autodiff libraries. Rather, its
> emphasis is on being lightweight and seamless to use in NumPy-centric
> workflows.
>

Thanks for sharing, this looks like an interesting project!

I would also be curious how MyGrad compares to Autograd [1], which is
currently probably the most popular package implementing "drop-in autodiff
for NumPy." From a quick look, it appears that you take a slightly
different approach for the design -- object oriented rather than functional.

[1] https://github.com/HIPS/autograd
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is there a defined way to "unpad" an array, and if not, should there be?

2021-04-12 Thread Stephan Hoyer
On Mon, Apr 12, 2021 at 5:12 PM Jeff Gostick  wrote:

> I guess I should have clarified that I was inquiring about proposing a
> 'feature request'.  The github site suggested I open a discussion on this
> list first.  There are several ways to effectively unpad an array as has
> been pointed out, but they all require more than a little bit of thought
> and care, are dependent on array shape, and honestly error prone.  It would
> be very valuable to me to have such a 'predefined' function, so I was
> wondering if (a) I was unaware of some function that already does this and
> (b) if I'm alone in thinking this would be useful.
>

Indeed, this is a fair question.

Given that this is not entirely trivial to write correctly, I think it
would be reasonable to add the inverse operation for pad() into NumPy. This
is generally better than encouraging users to write their own thing.

>From a naming perspective, here are some possibilities:
unpad
trim
crop

I think "trim" would be pretty descriptive, probably slightly better than
"unpad."
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is there a defined way to "unpad" an array, and if not, should there be?

2021-04-12 Thread Stephan Hoyer
The easy way to unpad an array is by indexing with slices, e.g., x[20:-4]
to undo a padding of [(20, 4)]. Just be careful about unpadding "zero"
elements on the right hand side, because Python interprets an ending slice
of zero differently -- you need to write something like x[20:] to undo
padding by [(20, 0)].


On Mon, Apr 12, 2021 at 1:15 PM Jeff Gostick  wrote:

> I often find myself padding an array to do some processing on it (i.e. to
> avoid edge artifacts), then I need to remove the padding.  I wish there
> was either a built in "unpad" function that accepted the same arguments as
> "pad", or that "pad" accepted negative numbers (e.g [-20, -4] would undo a
> padding of [20, 4]).  This seems like a pretty obvious feature to me so
> maybe I've just missed something, but I have looked through all the open
> and closed issues on github and don't see anything related to this.
>
>
> Jeff G
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy logo merchandise

2021-03-04 Thread Stephan Hoyer
I love your mittens!

NumPy really should be in the NumFOCUS store, but it currently isn't:
https://shop.spreadshirt.com/numfocus/all

I'll make some inquiries to see if we can sort that out :).

On Thu, Mar 4, 2021 at 2:43 PM  wrote:

> Hello. I was looking for a T-shirt with Numpy logo but didn't find
> anywhere. Anybody knows if there's a merchandise with Numpy? So I have to
> kneet mittens with Numpy logo for myself.
>
> Best regards!
> Konstantin
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 48: Spending NumPy Project funds

2021-02-22 Thread Stephan Hoyer
On Mon, Feb 22, 2021 at 4:08 AM Pearu Peterson 
wrote:

> Hi,
>
> See GH discussion starting at
> https://github.com/numpy/numpy/pull/18454#discussion_r579967791 for the
> raised issue that is now moved here.
>
> Re "Compensating fairly" section:
>
> The NEP proposes location-dependent contracts for fair pays.
>
> I think this is a contradictory approach as location is not the only
> factor that may influence fairness. As an example, contractors may have
> different levels of obligations to their families, and one might argue this
> should be taken into consideration as well because the family size and the
> required level of commitment to the family members (kids, members who need
> special care, etc) can have a huge influence on the contractors living
> standards, not just the level of average rent in the particular location.
> It would be unfair to take into account location but not the family
> situation. There may be other factors as well that may influence fairness
> and I think this will make the decision-making about contracting harder
> and, most importantly, controversial.
>
> My proposal is that factors like location, family situation, etc should be
> discarded when negotiating contract terms. The efficiency of using the
> project funding should be defined by how well and quickly a particular
> contractor is able to get the job done,  but not how the contractors are
> likely to spend their pays - it is nobody's business, IMHO, and is likely
> very hard if not impossible to verify.
>

One difference is that it is illegal (at least under US law) to consider
factors such as family situation in determining pay.

However, it is both legal and standard to consider location. I'm not saying
we should necessarily do it, but it's an accepted practice. NumPy
development is global, but prevailing wages are not.


>
> My 2cents,
> Pearu
>
> On Sun, Feb 21, 2021 at 4:52 PM Ralf Gommers 
> wrote:
>
>> Compensating fairly
>> ```
>>
>> Paying people fairly is a difficult topic. Therefore, we will only offer
>> some
>> guidance here. Final decisions will always have to be considered and
>> approved
>> by the group of people that bears this responsibility (according to the
>> current NumPy governance structure, this would be the NumPy Steering
>> Council).
>>
>> Discussions on employee compensation tend to be dominated by two
>> narratives:
>> "pay local market rates" and "same work -- same pay".
>>
>> We consider them both extreme:
>>
>> - "Same work -- same pay" is unfair to people living in locations with a
>> higher
>>   cost of living. For example, the average rent for a single family
>> apartment
>>   can differ by a large factor (from a few hundred dollar to thousands of
>>   dollars per month).
>> - "Pay local market rates" bakes in existing inequalities between
>> countries
>>   and makes fixed-cost items like a development machine or a holiday trip
>>   abroad relatively harder to afford in locations where market rates are
>> lower.
>>
>> We seek to find a middle ground between these two extremes.
>>
>> Useful points of reference include companies like GitLab and
>> Buffer who are transparent about their remuneration policies ([3]_, [4]_),
>> Google Summer of Code stipends ([5]_), other open source projects that
>> manage
>> their budget in a transparent manner (e.g., Babel and Webpack on Open
>> Collective ([6]_, [7]_)), and standard salary comparison sites.
>>
>> Since NumPy is a not-for-profit project, we also looked to the nonprofit
>> sector
>> for guidelines on remuneration policies and compensation levels. Our
>> findings
>> show that most smaller non-profits tend to pay a median salary/wage. We
>> recognize merit in this approach: applying candidates are likely to have a
>> genuine interest in open source, rather than to be motivated purely by
>> financial incentives.
>>
>> Considering all of the above, we will use the following guidelines for
>> determining compensation:
>>
>> 1. Aim to compensate people appropriately, up to a level that's expected
>> for
>>senior engineers or other professionals as applicable.
>> 2. Establish a compensation cap of $125,000 USD that cannot be exceeded
>> even
>>for the residents from the most expensive/competitive locations
>> ([#f-pay]_).
>> 3. For equivalent work and seniority,  a pay differential between
>> locations
>>should never be more than 2x.
>>For example, if we pay $110,000 USD to a senior-level developer from
>> New
>>York, for equivalent work a senior-level developer from South-East Asia
>>should be paid at least $55,000 USD. To compare locations, we will use
>>`Numbeo Cost of Living calculator <
>> https://www.numbeo.com/cost-of-living/>`__
>>(or its equivalent).
>>
>> Some other considerations:
>>
>> - Often, compensated work is offered for a limited amount of hours or
>> fixed
>>   term. In those cases, consider compensation equivalent to a remuneration
>>   package that comes with permanent 

Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-18 Thread Stephan Hoyer
On Wed, Feb 17, 2021 at 2:37 AM Ralf Gommers  wrote:

>
>
> On Wed, Feb 17, 2021 at 12:26 AM Stefan van der Walt 
> wrote:
>
>> On Tue, Feb 16, 2021, at 07:49, Joseph Fox-Rabinovitz wrote:
>>
>> I'm getting a generally lukewarm not negative response. Should we put it
>> to a vote?
>>
>>
>> Things here don't typically get decided by vote—I think you'll have to
>> build towards consensus.  It may be overkill to write a NEP, but outlining
>> a proposed solution along with pros and cons and getting everyone on board
>> is necessary.
>>
>> The API surface is a touchy issue, and so it is difficult to get new
>> features like these added.
>>
>
> This function is less bad than most similar utility functions, because it
> starts with atleast_ so from a "function browsing" end user perspective
> it's not much additional clutter. But it does still force other libraries
> to do work because they aim to be compatible to numpy's main namespace
> (e.g. see jax.numpy).
>
> And there's 6-7 maintainers all not strongly opposed but also not
> enthusiastic.
>

I agree with Ralf's assessment.

This function feels like a natural generalization of existing NumPy
functionality, but we don't expand NumPy's API without use-cases. That's
just a waste of time for everyone involved.

I am most moved by Juan's report that he has the "very distinct impression
of needing it repeatedly," but I would still love to see concrete examples
of where users have found this be helpful.

It is not a hard function to write, so if it was useful I would expect to
see some version of it in an existing open source project or at least on
StackOverflow.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What to do about structured string dtype and string regression?

2021-02-16 Thread Stephan Hoyer
On Tue, Feb 16, 2021 at 3:13 PM Sebastian Berg 
wrote:

> Hi all,
>
> In https://github.com/numpy/numpy/issues/18407 it was reported that
> there is a regression for `np.array()` and friends in NumPy 1.20 for
> code such as:
>
> np.array(["1234"], dtype=("U1", 4))
> # NumPy 1.20: array(['1', '1', '1', '1'], dtype=' # NumPy 1.19: array(['1', '2', '3', '4'], dtype='
>
> The Basics
> --
>
> This happens when you ask for a rare "subarray" dtype, ways to create
> it are:
>
> np.dtype(("U1", 4))
> np.dtype("(4)U1,")  # (does not have a field, only a subarray)
>
> Both of which give the same subarray dtype a "U1" dtype with shape 4.
> One thing to know about these dtypes is that they cannot be attached to
> an array:
>
> np.zeros(3, dtype="(4)U1,").dtype == "U1"
> np.zeros(3, dtype="(4)U1,").shape == (3, 4)
>
> I.e. the shape is moved/added into the array itself (instead of
> remaining part of the dtype).
>
> The Change
> --
>
> Now what/why did something change?  When filling subarray dtypes, NumPy
> normally fills every element with the same input. In the above case in
> most cases NumPy will give the 1.20 result because it assigns "1234" to
> every subarray element individually; maybe confusingly, this truncates
> so that only the "1" is actually assigned, we can proof it with a
> structured dtype (same result in 1.19 and 1.20):
>
> >>> np.array(["1234"], dtype="(4)U1,i")
> array([(['1', '1', '1', '1'], 1234)],
>   dtype=[('f0', '
> Another, weirder case which changed (more obviously for the better is:
>
> >>> np.array("1234", dtype="(4)U1,")
> # Numpy 1.20: array(['1', '1', '1', '1'], dtype=' # NumPy 1.19: array(['1', '', '', ''], dtype='
> And, to point it out, we can have subarrays that are not 1-D:
>
> >>> np.array(["12"],dtype=("(2,2)U1,"))
> array([[['1', '1'],
> ['2', '2']]], dtype='
>
> The Cause
> -
>
> The cause of the 1.19 behaviour is two-fold:
>
> 1. The "subarray" part of the dtype is moved into the array after the
> dimension is found. At this point strings are always considered
> "scalars".  In most above examples, the new array shape is (1,)+(4,).
>
> 2. When filling the new array with values, it now has an _additional_
> dimension!  Because of this, the string is now suddenly considered a
> sequence, so it behaves the same as if `list("1234")`.  Although,
> normally, NumPy would never consider a string a sequence.
>
>
> The Solution?
> -
>
> I honestly don't have one.  We can consider strings as sequences in
> this weird special case.  That will probably create other weird special
> cases, but they would be even more hidden (I expect mainly odder things
> throwing an error).
>
> Should we try to document this better in the release notes or can we
> think of some better (or at least louder) solution?
>

There are way too many unsafe assumptions in this example. It's an edge
case of an edge case.

I don't think we should be beholden to continuing to support this
behavior, which was obviously never anticipated. If there was a way to
raise a warning or error in potentially ambiguous situations like this, I
would support it.




> Cheers,
>
> Sebastian
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-11 Thread Stephan Hoyer
On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root  wrote:

> for me, I find that the at_least{1,2,3}d functions are useful for
> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
> towards cleaning up the API, not cluttering it (although, deprecations of
> the existing functions probably should be long given how long they have
> existed).
>

I would love to see examples of this -- perhaps in matplotlib?

My thinking is that in most cases it's probably a better idea to keep the
interface simpler, and raise an error for lower-dimensional arrays.
Automatic conversion is convenient (and endemic within the SciPy
ecosystem), but is also a common source of bugs.

On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer  wrote:
>
>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias 
>> wrote:
>>
>>> I totally agree with the namespace clutter concern, but honestly, I
>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>> for which I had no idea where the new axes would end up.
>>>
>>> So, I’m in favour of including it, and optionally deprecating
>>> `atleast_{1,2,3}d`.
>>>
>>>
>> I appreciate that `atleast_nd` feels more sensible than
>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>> on its own.
>>
>> What would be the recommended use-cases for this new function?
>> Have any libraries building on top of NumPy implemented a version of this?
>>
>>
>>> Juan.
>>>
>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg 
>>> wrote:
>>>
>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>
>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
>>> atleast_3d functions.
>>>
>>> I proposed a similar idea about four and a half years ago:
>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>> ,
>>> PR#7804. The reception was ambivalent, but a couple of folks have asked
>>> me
>>> about this, so I'm bringing it back.
>>>
>>> Some pros:
>>>
>>> - This closes issue #12336
>>> - There are a couple of Stack Overflow questions that would benefit
>>> - Been asked about this a couple of times
>>> - Implementation of three existing atleast_*d functions gets easier
>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>
>>> Some cons:
>>>
>>> - Cluttering up the API
>>> - Maintenance burden (but not a big one)
>>> - This is just a utility function, which can be achieved through
>>> broadcasting and reshaping
>>>
>>>
>>> My main concern would be the namespace cluttering. I can't say I use
>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>> slightly against the addition. But if others land on the "useful" side here
>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>> "mental load" with respect to namespace cluttering.
>>>
>>> Bike shedding the API is probably a good idea in any case.
>>>
>>> I have pasted the current PR documentation (as html) below for quick
>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>> value rather than just a side?
>>>
>>>
>>>
>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>> View input as array with at least ndim dimensions.
>>> New unit dimensions are inserted at the index given by *pos* if
>>> necessary.
>>> Parameters*ary  *array_like
>>> The input array. Non-array inputs are converted to arrays. Arrays that
>>> already have ndim or more dimensions are preserved.
>>> *ndim  *int
>>> The minimum number of dimensions required.
>>> *pos  *int, optional
>>> The index to insert the new dimensions. May range from -ary.ndim - 1 to
>>> +ary.ndim (inclusive). Non-negative indices indicate locations before
>>> the corresponding axis: pos=0 means to insert at the very beginning.
>>> Negative indices indicate locations after the corresponding axis: pos=-1
>>>  means to insert at the very end. 0 and -1 are always guaranteed to

Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-10 Thread Stephan Hoyer
On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias 
wrote:

> I totally agree with the namespace clutter concern, but honestly, I would
> use `atleast_nd` with its `pos` argument (I might rename it to `position`,
> `axis`, or `axis_position`) any day over `at_least{1,2,3}d`, for which I
> had no idea where the new axes would end up.
>
> So, I’m in favour of including it, and optionally deprecating
> `atleast_{1,2,3}d`.
>
>
I appreciate that `atleast_nd` feels more sensible than `at_least{1,2,3}d`,
but I don't think "better" than a pattern we would not recommend is a good
enough reason for inclusion in NumPy. It needs to stand on its own.

What would be the recommended use-cases for this new function?
Have any libraries building on top of NumPy implemented a version of this?


> Juan.
>
> On 11 Feb 2021, at 9:48 am, Sebastian Berg 
> wrote:
>
> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>
> I've created PR#18386 to add a function called atleast_nd to numpy and
> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
> atleast_3d functions.
>
> I proposed a similar idea about four and a half years ago:
> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html,
> PR#7804. The reception was ambivalent, but a couple of folks have asked me
> about this, so I'm bringing it back.
>
> Some pros:
>
> - This closes issue #12336
> - There are a couple of Stack Overflow questions that would benefit
> - Been asked about this a couple of times
> - Implementation of three existing atleast_*d functions gets easier
> - Looks nicer that the equivalent broadcasting and reshaping
>
> Some cons:
>
> - Cluttering up the API
> - Maintenance burden (but not a big one)
> - This is just a utility function, which can be achieved through
> broadcasting and reshaping
>
>
> My main concern would be the namespace cluttering. I can't say I use even
> the `atleast_2d` etc. functions personally, so I would tend to be slightly
> against the addition. But if others land on the "useful" side here (and it
> seemed a bit at least on github), I am also not opposed.  It is a clean
> name that lines up with existing ones, so it doesn't seem like a big
> "mental load" with respect to namespace cluttering.
>
> Bike shedding the API is probably a good idea in any case.
>
> I have pasted the current PR documentation (as html) below for quick
> reference. I wonder a bit about the reasoning for having `pos` specify a
> value rather than just a side?
>
>
>
> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
> View input as array with at least ndim dimensions.
> New unit dimensions are inserted at the index given by *pos* if necessary.
> Parameters*ary  *array_like
> The input array. Non-array inputs are converted to arrays. Arrays that
> already have ndim or more dimensions are preserved.
> *ndim  *int
> The minimum number of dimensions required.
> *pos  *int, optional
> The index to insert the new dimensions. May range from -ary.ndim - 1 to
> +ary.ndim (inclusive). Non-negative indices indicate locations before the
> corresponding axis: pos=0 means to insert at the very beginning. Negative
> indices indicate locations after the corresponding axis: pos=-1 means to
> insert at the very end. 0 and -1 are always guaranteed to work. Any other
> number will depend on the dimensions of the existing array. Default is 0.
> Returns*res  *ndarray
> An array with res.ndim >= ndim. A view is returned for array inputs.
> Dimensions are prepended if *pos* is 0, so for example, a 1-D array of
> shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions
> are appended if *pos* is -1, so for example a 2-D array of shape (M, N) 
> becomes
> a view of shape (M, N, 1, 1)when ndim=4.
> *See also*
> atleast_1d
> 
> , atleast_2d
> 
> , atleast_3d
> 
> *Notes*
> This function does not follow the convention of the other atleast_*d functions
> in numpy in that it only accepts a single array argument. To process
> multiple arrays, use a comprehension or loop around the function call. See
> examples below.
> Setting pos=0 is equivalent to how the array would be interpreted by
> numpy’s broadcasting rules. There is no need to call this function for
> simple broadcasting. This is also roughly (but not exactly) equivalent to
> np.array(ary, copy=False, subok=True, ndmin=ndim).
> It is easy to create functions for specific dimensions similar to the other
>  atleast_*d functions using Python’s functools.partial
>  
> function.
> An example is shown below.
> *Examples*
>
> >>> np.atleast_nd(3.0, 

Re: [Numpy-discussion] accepting NEP 23 - backwards compatibility and deprecation policy NEP

2021-01-26 Thread Stephan Hoyer
On Tue, Jan 26, 2021 at 12:45 AM Stefan van der Walt 
wrote:

> On Tue, Jan 26, 2021, at 00:25, Ralf Gommers wrote:
>
> The update PR was merged after a lot more review on GitHub. I propose we
> change the status of this NEP to Accepted. We'll merge a PR to do so unless
> there are objections within the next five days.
>
>
> Thanks for the heads-up, Ralf.  I am happy to have the NEP accepted.
>
> Stéfan
>


Thank you, Ralf!



>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Addition of new distributions: Polya-gamma

2020-12-27 Thread Stephan Hoyer
Thanks for putting together this clean implementation!

My concern is that Polya-Gamma is not popular enough to warrant inclusion
in NumPy, which tries very hard to limit scope these days. For example,
Polya-Gamma isn’t implemented in scioy.stats and doesn’t have a Wikipedia
page, both of which are generally much more inclusive than NumPy.

On Sun, Dec 27, 2020 at 3:29 AM Tom Swirly  wrote:

> I'm just a lurker, but I spent a minute or two to look at that commit,
> which looks to be high quality.  While I personally have not used this
> distribution, people I know use it all the time (for ML).
>
>
> A quibble:
>
> #define NPY_PI 3.141592653589793238462643383279502884 /* pi */
>
> and the following defines which appear
> in numpy/random/src/distributions/random_polyagamma.c are already defined
> in numpy/core/include/numpy/npy_math.h
>
> Probably it would be better to include that file instead, if it isn't
> already included.
>
>
> DISCLAIMER: I checked none of the math other than passing my eyes over it.
>
>
>
> On Sun, Dec 27, 2020 at 12:05 PM Zolisa Bleki 
> wrote:
>
>> Hi All,
>>
>> I would like to know if Numpy accepts addition of new distributions since
>> the implementation of the Generator interface. If so, what is the criteria
>> for a particular distribution to be accepted? The reason why i'm asking is
>> because I would like to propose adding the Polya-gamma distribution to
>> numpy, for the following reasons:
>>
>> 1) Polya-gamma random variables are commonly used as auxiliary variables
>> during data augmentation in Bayesian sampling algorithms, which have
>> wide-spread usage in Statistics and recently, Machine learning.
>> 2) Since this distribution is mostly useful for random sampling, it since
>> appropriate to have it in numpy and not projects like scipy [1].
>> 3) The only python/C++ implementation of the sampler available is
>> licensed under GPLv3 which I believe limits copying into packages that
>> choose to use a different license [2].
>> 4) Numpy's random API makes adding the distribution painless.
>>
>> I have done preliminary work on this by implementing the distribution
>> sampler as decribed in [3]; see:
>> https://github.com/numpy/numpy/compare/master...zoj613:polyagamma .
>> There is a more efficient sampling algorithm described in a later paper
>> [4], but I chose not to start with that one unless I know it is worth
>> investing time in.
>>
>> I would appreciate your thoughts on this proposal.
>>
>> Regards,
>> Zolisa
>>
>>
>> Refs:
>> [1] https://github.com/scipy/scipy/issues/11009
>> [2] https://github.com/slinderman/pypolyagamma
>> [3] https://arxiv.org/pdf/1205.0310v1.pdf
>> [4] https://arxiv.org/pdf/1405.0506.pdf
>>
>>
>>
>> Disclaimer - University of Cape Town This email is subject to UCT
>> policies and email disclaimer published on our website at
>> http://www.uct.ac.za/main/email-disclaimer or obtainable from +27 21 650
>> 9111. If this email is not related to the business of UCT, it is sent by
>> the sender in an individual capacity. Please report security incidents or
>> abuse via https://csirt.uct.ac.za/page/report-an-incident.php.
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
> --
>  /t
>
> PGP Key: https://flowcrypt.com/pub/tom.ritchf...@gmail.com
> *https://tom.ritchford.com *
> *https://tom.swirly.com *
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Feature request: Alternative representation for arrays with many dimensions

2020-12-09 Thread Stephan Hoyer
On Wed, Dec 9, 2020 at 2:24 PM Fang Zhang  wrote:

> By default, the __repr__ and __str__ functions of NumPy arrays summarize
> long arrays (i.e. omit all items but a few at beginning and end of each
> dimension), which is a good thing because when debugging, programmers can
> call print() on arrays with millions of elements without clogging the
> output or taking up too much CPU/memory (unsurprisingly, the string
> representation of an array item usually takes more bytes than its binary
> representation).
>
> However, this mechanic does not help when an array has a lot of short
> dimensions, e.g. np.arange(2 ** 20).reshape((2,) * 20). I often encounter
> such arrays in my work, and every once in a while I would try to print such
> an array without flattening it first (usually because I didn't know what
> shape or even what type the variable I was trying to print is), which has
> caused incidents ranging from losing everything in my scrollback buffer to
> crashing my computer by using too much memory.
>
> I think it may be a good idea to change the way NumPy pretty prints arrays
> with such shapes to avoid this situation. Something like "array([ 0, 1, 2,
> ..., 1048573, 1048574, 1048575]).reshape(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> 2, 2, 2, 2, 2, 2, 2, 2, 2)" would be good enough for me. The condition to
> trigger such a representation can either be a fixed number of dimensions,
> or when after summarizing the pretty printer would still print more items
> than the threshold (1000 by default). Since the outputs of __repr__ and
> __str__ are meant for human eyes rather than computers, I think this should
> not cause too much of a compatibility problem.
>

+1, this could use improvement. For high dimensional arrays, the way NumPy
prints is way too verbose.

In xarray, we automatically decrease "edgeitems" for printing NumPy arrays,
to 2 for ndim=3 and 1 for ndim>3:
https://github.com/pydata/xarray/blob/9802411b35291a6149d850e8e573cde71a93bfbf/xarray/core/formatting.py#L439-L453

As a last resort, we could consider automatically limiting the maximum
number of displayed lines, adding "..." for clipped lines. It is unlikely,
for example, that anymore ever wants to print more than ~100 lines of text
to the screen, which can easily happen for very high dimensional arrays.


> What do you all think?
>
> Sincerely,
> Fang Zhang
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-09 Thread Stephan Hoyer
On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer  wrote:

> On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
>  wrote:
> >
> > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
> > > Regarding np.bool specifically, if you want to deprecate this, you
> > > might want to discuss this with us at the array API standard
> > > https://github.com/data-apis/array-api (which is currently in RFC
> > > stage). The spec uses bool as the name for the boolean dtype.
> > >
> > > Would it make sense for NumPy to change np.bool to just be the
> > > boolean
> > > dtype object? Unlike int and float, there is no ambiguity with bool,
> > > and NumPy clearly doesn't have any issues with shadowing builtin
> > > names
> > > in its namespace.
> >
> > We could keep the Python alias around (which for `dtype=` is the same
> > as `np.bool_`).
> >
> > I am not sure I like the idea of immediately shadowing the builtin.
> > That is a switch we can avoid flipping (without warning); `np.bool_`
> > and `bool` are fairly different beasts? [1]
>
> NumPy already shadows a lot of builtins, in many cases, in ways that
> are incompatible with existing ones. It's not something I would have
> done personally, but it's been this way for a long time.
>

It may be defensible to keep np.bool as an alias for Python's bool even
when we remove the other aliases.

np.int_ and np.float_ have fixed precision, which makes them somewhat
different from the builtin types. NumPy has a whole bunch of different
precisions for integer and floats, so this distinction matters.

In contrast, there is only one boolean dtype in NumPy, which matches
Python's bool. So we wouldn't have to worry, for example, about whether a
user has requested a specific precision explicitly. This comes up in issues
like type-promotion where libraries like JAX and PyTorch have special case
logic for most Python types vs NumPy dtypes (but booleans are the same for
both):
https://jax.readthedocs.io/en/latest/type_promotion.html



>
> Aaron Meurer
>
> > OTOH, if someone wants to entertain switching... It could be
> > interesting to see how (unfixed) downstream projects react to it.
> >
> > One approach would be:
> >
> > * Go ahead for now (deprecate)
> > * Add a FutureWarning at some point that we _will_ start to export
> >   `np.bool` again (but `from numpy import *` is a problem?)
> > * Aim to make `np.bool is np.bool_` at some point in the (far) future.
> >
> > It is multi-step (and I recall opinions that multi-step is bad).
> > Although, I think the main argument against it was to not force users
> > to modify code more than once.  And I do not think that happens here.
> >
> > Of course we could use the `FutureWarning` right away, but I don't mind
> > taking it slow.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> >
> > [1] I admit, probably almost nobody would notice. And usually using a
> > Python `bool` is better...
> >
> >
> > >
> > > Aaron Meurer
> > >
> > > On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias 
> > > wrote:
> > > > Hi all,
> > > >
> > > > At the prodding [1] of Sebastian, I’m starting a discussion on the
> > > > decision to deprecate np.{bool,float,int}. This deprecation broke
> > > > our prerelease testing in scikit-image (which, hooray for rcs!),
> > > > and resulted in a large amount of code churn to fix [2].
> > > >
> > > > To be honest, I do think *some* sort of deprecation is needed,
> > > > because for the longest time I thought that np.float was what
> > > > np.float_ actually is. I think it would be worthwhile to move to
> > > > *that*, though it’s an even more invasive deprecation than the
> > > > currently proposed one. Writing `x = np.zeros(5, dtype=int)` is
> > > > somewhat magical, because someone with a strict typing mindset
> > > > (there’s an increasing number!) might expect that this is an array
> > > > of pointers to Python ints. This is why I’ve always preferred to
> > > > write `dtype=np.int`, resulting in the current code churn.
> > > >
> > > > I don’t know what the best answer is, just sparking the discussion
> > > > Sebastian wants to see. ;) For skimage we’ve already merged a fix
> > > > (even if it is one of dubious quality, as Stéfan points out [3] ;),
> > > > so I don’t have too much stake in the outcome.
> > > >
> > > > Juan.
> > > >
> > > > [1]:
> > > >
> https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739334463
> > > > [2]: https://github.com/scikit-image/scikit-image/pull/5103
> > > > [3]:
> > > >
> https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739368765
> > > > ___
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion@python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > ___
> > 

Re: [Numpy-discussion] Rules for argument parsing/forwarding in __array_function__ and __array_ufunc__

2020-12-05 Thread Stephan Hoyer
On Wed, Dec 2, 2020 at 3:39 PM Sebastian Berg 
wrote:

> 1. If an argument is invalid in NumPy it is considered and error.
>For example:
>
>np.log(arr, my_weird_argument=True)
>
>is always an error even if the `__array_function__` implementation
>of `arr` would support it.
>NEP 18 explicitly says that allowing forwarding could be done, but
>will not be done at this time.
>

>From my perspective, this is a good thing: it ensures that NumPy's API is
only used for features that exist in NumPy. Otherwise I can imagine causing
considerable confusion.

If you want to use my_weird_argument, you can call my_weird_library.log()
instead.


> 2. Arguments must only be forwarded if they are passed in:
>
>np.mean(cupy_array)
>
>ends up as `cupy.mean(cupy_array)` and not:
>
>cupy.mean(cupy_array, axis=None, dtype=None, out=None,
>  keepdims=False, where=True)
>
>meaning that CuPy does not need to implement all of those kwargs and
>NumPy can add new ones without breaking anyones code.
>

My reasoning here was two-fold:
1. To avoid the unfortunate situation for functions like np.mean(), where
NumPy jumps through considerable hoops to avoid passing extra arguments in
an ad-hoc way to preserve backwards compatibility
2. To make it easy for a library to implement "incomplete" versions of
NumPy's API, by simply omitting arguments.

The idea was that NumPy's API is open to partial implementations, but not
extension.


> 3. NumPy should not check the *validity* of the arguments. For example:
>`np.add.reduce(xarray, axis="long")` should probably work in xarray.
>(`xarray.DataArray` does not actually implement the above.)
>But a string cannot be used as an axis in NumPy.
>

I don't think libraries should be encouraged to abuse NumPy's API to mean
something else. Certainly I would not use this in xarray :).

If we could check the validity of arguments cheaply, that would be fine by
me. But I didn't think it was worth adding overhead to every function call.
Perhaps type annotations could be relied on for these sorts of checks? I am
pretty happy considering not checking the validity of arguments to be an
implementation detail for now.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-05 Thread Stephan Hoyer
On Sat, Dec 5, 2020 at 9:24 PM Mark Harfouche 
wrote:

> If the answer is to deprecate
>
> np.int(1) == int(1)
>
> then one can add a warning to the __init__ of the np.int class, but
> continue to subclass the python int class.
>
> It just doesn’t seem worthwhile to to stop people from using dtype=np.int,
> which seem to read:
>
> “I want this to be a numpy integer, not necessarily a python integer”.
>
The problem is that there is assuredly code that inadvertently relies upon
this (mis)feature.

If we change the behavior of np.int() to create np.int64() objects instead
of int() objects, it is likely to result in breaking some user code. Even
with a prior warning, this breakage may be surprising and very hard to
track down. In contrast, it's much safer to simply remove np.int entirely,
because if users ignore the deprecation they end up with an error.

This is a general feature for deprecations: it's much safer to remove
functionality than it is to change behavior.

So on the whole, I think this is the right call.

>
> On Sat, Dec 5, 2020 at 10:14 PM Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias 
>> wrote:
>>
>>> Hi all,
>>>
>>> At the prodding [1] of Sebastian, I’m starting a discussion on the
>>> decision to deprecate np.{bool,float,int}. This deprecation broke our
>>> prerelease testing in scikit-image (which, hooray for rcs!), and resulted
>>> in a large amount of code churn to fix [2].
>>>
>>> To be honest, I do think *some* sort of deprecation is needed, because
>>> for the longest time I thought that np.float was what np.float_ actually
>>> is. I think it would be worthwhile to move to *that*, though it’s an even
>>> more invasive deprecation than the currently proposed one. Writing `x =
>>> np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict
>>> typing mindset (there’s an increasing number!) might expect that this is an
>>> array of pointers to Python ints. This is why I’ve always preferred to
>>> write `dtype=np.int`, resulting in the current code churn.
>>>
>>> I don’t know what the best answer is, just sparking the discussion
>>> Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if
>>> it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have
>>> too much stake in the outcome.
>>>
>>> Juan.
>>>
>>> [1]:
>>> https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739334463
>>> [2]: https://github.com/scikit-image/scikit-image/pull/5103
>>> [3]:
>>> https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739368765
>>>
>>
>> I checked pandas and astropy and both have several uses of the deprecated
>> types but should be easy to fix. I suppose the question is if we want to
>> make them fix things *right now* :)
>>
>> Chuck
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] datetime64: Remove deprecation warning when constructing with timezone

2020-11-05 Thread Stephan Hoyer
I can try to dig up the old discussions, but datetime64 used to implement
both (1) and (3), and this was updated in a very intentional way.
Datetime64 now works like Python's own time-zone naive datetime.datetime
objects. The documentation referencing "Z" should be updated -- datetime64
can be in any timezone you like.

Timezone aware datetime objects are certainly useful, but NumPy's
datetime64 was restricted to UTC. The consensus was that it was worse to
have UTC-only rather than timezone-naive-only. NumPy's datetime64 is often
used for data analysis purposes, for which automatic conversion to the
local timezone of the computer running the analysis is often
counter-productive.

If you care about timezone conversions, I would highly recommend looking
into pandas's Timestamp class for this purpose. In the future, this would
be a good use-case for a new custom NumPy dtype. (The existing
np.datetime64 code cannot easily handle multiple timezones.)

On Thu, Nov 5, 2020 at 1:04 PM Eric Wieser 
wrote:

> Without weighing in yet on how I feel about the deprecation, you can see
> some discussion about why this was originally deprecated in the PR that
> introduced the warning:
>
> https://github.com/numpy/numpy/pull/6453
>
> Eric
>
> On Thu, Nov 5, 2020, 20:13 Noam Yorav-Raphael  wrote:
>
>> Hi,
>>
>> I suggest removing the deprecation warning when constructing a datetime64
>> with a timezone. For example, this is the current behavior:
>>
>> >>> np.datetime64('2020-11-05 16:00+0200')
>> :1: DeprecationWarning: parsing timezone aware datetimes is
>> deprecated; this will raise an error in the future
>> numpy.datetime64('2020-11-05T14:00')
>>
>> I suggest removing the deprecation warning because I find this to be a
>> useful behavior, and because it is a correct behavior. The manual says:
>> "The datetime object represents a single moment in time... Datetimes are
>> always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z."
>> So 2020-11-05T16:00+0200 is indeed the moment in time represented by
>> np.datetime64('2020-11-05T14:00').
>>
>> I just used this to restrict my data set to records created after a
>> certain moment. It was easier for me to write the moment in my local time
>> and add "+0200" than to figure out the moment representation in UTC.
>>
>> So this is my simple suggestion: remove the deprecation warning.
>>
>>
>> Beyond that, I have 3 ideas for changing the repr of datetime64 that I
>> would like to discuss.
>>
>> 1. Add "Z" at the end, for example,
>> numpy.datetime64('2020-11-05T14:00Z'). This will make it clear to which
>> moment it refers. I think this is significant - I had to dig quite a bit to
>> realize that datetime64('2020-11-05T14:00') means 14:00 UTC.
>>
>> 2. Replace the 'T' with a space. I just find it much easier to read
>> '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of
>> characters makes it hard for my brain to parse.
>>
>> 3. This will require discussion, but will be very convenient: have the
>> repr display the time using the environment time zone, including a time
>> offset. So, in my specific time zone (+0200), I will have:
>>
>> repr(np.datetime64('2020-11-05 14:00Z')) ==
>> "numpy.datetime64('2020-11-05T16:00+0200')"
>>
>> I'm sure the pros and cons of having an environment-dependent repr should
>> be discussed. But I will list some pros:
>> 1. It's very convenient - it's immediately obvious to me to which moment
>> 2020-11-05 16:00+0200 refers.
>> 2. It's well defined - I may collect timestamps from machines with
>> different time zones, and I will be able to know to which exact moment each
>> timestamp refers.
>> 3. It's very simple - I could compare any two timestamps, I don't have to
>> worry about time zones.
>>
>> I would be happy to hear your thoughts.
>>
>> Thanks,
>> Noam
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add sliding_window_view method to numpy

2020-11-05 Thread Stephan Hoyer
On Thu, Nov 5, 2020 at 11:16 AM Ralf Gommers  wrote:

>
>
> On Thu, Nov 5, 2020 at 4:56 PM Sebastian Berg 
> wrote:
>
>> Hi all,
>>
>> just a brief note that I merged this proposal:
>>
>> https://github.com/numpy/numpy/pull/17394
>>
>> adding `np.sliding_window_view` into the 1.20 release of NumPy.
>>
>> There was only one public API change, and that is that the `shape`
>> argument is now called `window_shape`.
>>
>> This is still a good time for feedback in case you have a better idea
>> e.g. for the function or parameter names.
>>
>
> The old PR had this in the lib.stride_tricks namespace. Seeing it in the
> main namespace is unexpected and likely will lead to issues/questions,
> given that such an overlapping view is going to do behave in ways the
> average user will be surprised by. It may also lead to requests for other
> array/tensor libraries to implement this. I don't see any discussion on
> this in PR 17394, it looks like a decision by the PR author that no one
> commented on - reconsider that?
>
> Cheers,
> Ralf
>

+1 let's keep this in the lib.stride_tricks namespace.


>
>
>
>
>>
>> Cheers,
>>
>> Sebastian
>>
>>
>>
>> On Mon, 2020-10-12 at 08:39 +, Zimmermann Klaus wrote:
>> > Hello,
>> >
>> > I would like to draw the attention of this list to PR #17394 [1] that
>> > adds the implementation of a sliding window view to numpy.
>> >
>> > Having a sliding window view in numpy is a longstanding open issue
>> > (cf
>> > #7753 [2] from 2016). A brief summary of the discussions surrounding
>> > it
>> > can be found in the description of the PR.
>> >
>> > This PR implements a sliding window view based on stride tricks.
>> > Following the discussion in issue #7753, a first implementation was
>> > provided by Fanjin Zeng in PR #10771. After some discussion, that PR
>> > stalled and I picked up the issue in the present PR #17394. It is
>> > based
>> > on the first implementation, but follows the changed API as suggested
>> > by
>> > Eric Wieser.
>> >
>> > Code reviews have been provided by Bas van Beek, Stephen Hoyer, and
>> > Eric
>> > Wieser. Sebastian Berg added the "62 - Python API" label.
>> >
>> >
>> > Do you think this is suitable for inclusion in numpy?
>> >
>> > Do you consider the PR ready?
>> >
>> > Do you have suggestions or requests?
>> >
>> >
>> > Thanks for your time and consideration!
>> > Klaus
>> >
>> >
>> > [1] https://github.com/numpy/numpy/pull/17394
>> > [2] https://github.com/numpy/numpy/issues/7753
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.20.x branch in two weeks

2020-11-01 Thread Stephan Hoyer
On Sun, Nov 1, 2020 at 7:47 PM Stefan van der Walt 
wrote:

> On Sun, Nov 1, 2020, at 18:54, Jarrod Millman wrote:
> > I also misunderstood the purpose of the NEP.  I assumed it was
> > intended to encourage projects to drop old versions of Python.  Other
> > people have viewed the NEP similarly:
> > https://github.com/networkx/networkx/issues/4027
>
> Of all the packages, it makes sense for NumPy to behave most
> conservatively with depreciations. The NEP suggests allowable support
> periods, but as far as I recall does not enforce minimal support.
>
> Stephan Hoyer had a good recommendation on how we can clarify the NEP to
> be easier to intuit. Stephan, shall we make an ammendment to the NEP with
> your idea?
>

For reference, here was my proposed revision:
https://github.com/numpy/numpy/pull/14086#issuecomment-649287648

Specifically, rather than saying "the latest release of NumPy supports all
versions of Python released in the 42 months before NumPy's release", it
says "NumPy will only require versions of Python that were released more
than 24 months ago". In practice, this works out to the same thing (at
least given Python's old 18 month release cycle).

This changes the definition of the support window (in a way that I think is
clearer and that works better for infrequent releases), but there is still
the question of how large that window should be for NumPy. My personal
opinion is that somewhere in the range of 24-36 months would be appropriate.


> Best regards,
> Stéfan
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Ndarray static typing: Order of generic types

2020-10-29 Thread Stephan Hoyer
On Wed, Oct 28, 2020 at 2:44 PM bas van beek 
wrote:

> Hey all,
>
>
>
> With the recent merging of numpy/numpy#16759
>  we’re at the point where `
> ndarray` can be made generic w.r.t. its dtype and shape.
>
> An open question which yet remains is to order in which these two
> parameters should appear (numpy/numpy#16547
> ):
>
> · `ndarray[Dtype, Shape]`
>
> · `ndarray[Shape, Dtype]`
>
>
Hi Bas,

Thanks for driving this forward!

Just to speak for myself, I don't think the precise choice matters
very much. There are arguments for consistency both ways. In the end Dtype
and Shape are different enough that I doubt it will be a point of confusion.

Also, I would guess many users will define their own type aliases, so can
write something more succinct like Float64[shape] rather than
ndarray[float64, shape].  We might even consider including some of these in
numpy.typing.

Cheers,
Stephan



>
>
> There has been a some discussion about this question in issue 16547, but a
> consensus has not yet to be reached.
>
> Most people seem to slightly preferring one option over the other.
>
>
>
> Are there any further thoughts on this subject?
>
>
>
> Regards,
>
> Bas van Beek
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new function: broadcast_shapes

2020-10-15 Thread Stephan Hoyer
On Thu, Oct 15, 2020 at 11:46 AM Warren Weckesser <
warren.weckes...@gmail.com> wrote:

> On 10/15/20, Madhulika Jain Chambers  wrote:
> > Hello all,
> >
> > I opened a PR to add a function which returns the broadcasted shape from
> a
> > given set of shapes:
> > https://github.com/numpy/numpy/pull/17535
> >
> > As this is a proposed change to the API, I wanted to see if there was any
> > feedback from the list.
>
>
> Thanks, this is useful!  I've implemented something similar many times
> over the years, and could have used it in some SciPy code, where we
> currently have a private implementation in one of the `stats` modules.
>
> Warren


+1 for adding this.

There's a version of this helper function -- coincidentally with the
exactly same API and name -- in JAX, too:
https://github.com/google/jax/blob/22c3684d3bac3ad0f40aca69cdbc76842fa9dfc0/jax/lax/lax.py#L73
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Einops cross-linking from einsum

2020-09-27 Thread Stephan Hoyer
On Sat, Sep 26, 2020 at 10:53 PM Alex Rogozhnikov <
alex.rogozhni...@yandex.ru> wrote:

> Hello all,
>
> I'm developer of einops - python package for readable and reliable tensor
> operations.
> Einops handles different types of tensors (including numpy, pytorch, jax,
> tensorflow and others) and targeted as a verbose replacement to existing
> numpy operations.
>
> As einops now is quite mature project, I suggest linking einops from
> np.einsum function
> (which was an initial point to appearance of this new interface, and many
> einops users previously used einsum)
>
> Not sure about precise implementation - link in 'see also' works.
> Alternatively, I can suggest a single-line description like:
> Operations with similar verbose interface are provided by einops
>  package to cover additional
> operations: transpose, reshape/flatten, repeat/tile, squeeze/unsqueeze and
> reductions.
>
> Glad to hear opinions/recommendations.
>

Hi Alex,

I think this would be a nice idea! Putting a note like this in the "See
also" section seems appropriate to me.

I would also suggest adding a reference to opt-einsum, which provides more
flexible optimization routines for np.einsum.

Cheers,
Stephan


>
> Cheers,
> Alex.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for comments on PEP 637 - Support for indexing with keyword arguments

2020-09-23 Thread Stephan Hoyer
On Wed, Sep 23, 2020 at 2:22 PM Stefano Borini 
wrote:

> Dear all,
>
>
>
>
>
> I would like to know your opinion on how to address a specific need of
>
>
> the new PEP 637:
>
>
>
>
>
> https://github.com/python/peps/blob/master/pep-0637.txt
>
>
>
>
>
> Such PEP would make a syntax like the following valid
>
>
>
>
>
> obj[2, x=23]
>
>
> obj[2, 4, x=23]
>
>
>
>
>
> Which would resolve to a call in the form
>
>
>
>
>
> type(obj).__getitem__(obj, 2, x=23)
>
>
> type(obj).__getitem__(obj, (2, 4), x=23)
>
>
>
>
>
> and similar for set and del.
>
>
> After discussion, we currently have one open point we are unsure how
>
>
> to address, that is what to pass when no positional index is passed,
>
>
> that is:
>
>
>
>
>
> obj[x=23]
>
>
>
>
>
> We are unsure if we should resolve this call with None or the empty
>
>
> tuple in the positional index:
>
>
>
>
>
> type(obj).__getitem__(obj, None, x=23)
>
>
> type(obj).__getitem__(obj, (), x=23)
>
>
>
>
Hi Stefano -- thanks for pushing this proposal forward! I am sure that
support for keyword indexing will be very welcome in the scientific Python
ecosystem.

I have not been following the full discussion on PEP 637, but I recall
seeing another suggestion earlier for what this could be resolved into:

type(obj).__getitem__(obj, x=23)

I.e., not passing a positional argument at all.

The author of a class that supports keyword indexing could check this sort
of case with a positional only argument with a default value, e.g.,

def __getitem__(self, key=MY_SENTINEL, /, **kwargs):

where MY_SENTINEL could be any desired sentinel value, including either
None or (). Is there a reason for rejecting this option? It seems like a
nice explicit alternative to prespecifing the choice of sentinel value.

I guess the concern might be that this would not suffice for __setitem__?


>
>
> You can see a detailed discussion in the PEP at L913
>
>
>
>
>
>
> https://github.com/python/peps/blob/1fb19ac3a57c9793669ea9532fb840861d4d7566/pep-0637.txt#L913
>
>
>
>
>
> One of the commenters on python-ideas reported that using None might
>
>
> introduce an issue in numpy, as None is used to create new axes, hence
>
>
> the proposal for rejection of None as a solution.
>
>
> However, we are unsure how strongly this would affect numpy and
>
>
> similar packages, as well as what developer will find more natural to
>
>
> receive in that case. We would like to hear your opinion on the topic.
>

I guess the case this would disallow is distinguishing between obj[None,
x=23] and obj[x=23]?

Yes, this could be a little awkward potentially. The tuple would definitely
be more natural for NumPy users, given the that first step of
__getitem__/__setitem__ methods in the broader NumPy ecosystem is typically
packing non-tuple keys into a tuple, e.g.,

def __getitem__(self, key):
if not isinstance(key, tuple):
key = (key,)
...

That said:
- NumPy itself is unlikely to support keyword indexing anytime soon.
- New packages could encourage using explicit aliases like "np.newaxis"
instead of "None", which in general is a best practice already.
- The combined use of keyword indexing *and* insertion of new axes at the
same time strikes me as something that would be unusual in practice. From
what I've seen, it is most useful to either use entirely unnamed or
entirely named axes. In the later case, I might write something like
obj[x=None] to indicate inserting a new dimension with the name "x".

I think we could definitely live with it either way. I would lean towards
using an empty tuple, but I agree that it feels a little uglier than using
None (though perhaps not surprising, given the already ugly special cases
for tuples in the indexing protocol).

Best,
Stephan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Experimental `like=` attribute for array creation functions

2020-08-13 Thread Stephan Hoyer
On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers  wrote:

> Thanks for raising these concerns Ilhan and Juan, and for answering Peter.
> Let me give my perspective as well.
>
> To start with, this is not specifically about Peter's NEP and PR. NEP 35
> simply follows the pattern set by previous PRs, and given its tight scope
> is less difficult to understand than other NEPs on such technical topics.
> Peter has done a lot of things right, and is close to the finish line.
>
>
> On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <
> pe...@entschev.com> wrote:
>
>>
>> > I think, arriving to an agreement would be much faster if there is an
>> executive summary of who this is intended for and what the regular usage
>> is. Because with no offense, all I see is "dispatch", "_array_function_"
>> and a lot of technical details of which I am absolutely ignorant.
>>
>> This is what I intended to do in the Usage Guidance [2] section. Could
>> you elaborate on what more information you'd want to see there? Or is
>> it just a matter of reorganizing the NEP a bit to try and summarize
>> such things right at the top?
>>
>
> We adapted the NEP template [6] several times last year to try and improve
> this. And specified in there as well that NEP content set to the mailing
> list should only contain the sections: Abstract, Motivation and Scope,
> Usage and Impact, and Backwards compatibility. This to ensure we fully
> understand the "why" and "what" before the "how". Unfortunately that
> template and procedure hasn't been exercised much yet, only in NEP 38 [7]
> and partially in NEP 41 [8].
>
> If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image
> (Juan) and CuPy (Leo, on the PR review) all saying they don't understand
> the goals, relevance, target audience, or how they're supposed to use a new
> feature, that indicates that the people doing the writing and having the
> discussion are doing something wrong at a very fundamental level.
>
> At this point I'm pretty disappointed in and tired of how we write and
> discuss NEPs on technical topics like dispatching, dtypes and the like.
> People literally refuse to write down concrete motivations, goals and
> non-goals, code that's problematic now and will be better/working post-NEP
> and usage examples before launching into extensive discussion of the gory
> details of the internals. I'm not sure what to do about it. Completely
> separate API and behavior proposals from implementation proposals? Make
> separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo
> on the API team which then needs to approve every API change in new NEPs?
> Offer to co-write NEPs if someone is willing but doesn't understand how to
> go about it? Keep the current structure/process but veto further approvals
> until NEP authors get it right?
>

I think the NEP template is great, and we should try to be more diligent
about following it!

My own NEP 37 (__array_module__) is probably a good example of poor
presentation due to not following the template structure. It goes pretty
deep into low-level motivation and some implementation details before usage
examples.

Speaking just for myself, I would have appreciated a friendly nudge to use
the template. Certainly I think it would be fine to require using the
template for newly submitted NEPs. I did not remember about it when I
started drafting NEP 37, and it definitely would have helped. I may still
try to do a revision at some point to use the template structure.


> I want to make an exception for merging the current NEP, for which the
> plan is to merge it as experimental to try in downstream PRs and get more
> experience. That does mean that master will be in an unreleasable state by
> the way, which is unusual and it'd be nice to get Chuck's explicit OK for
> that. But after that, I think we need a change here. I would like to hear
> what everyone thinks is the shape that change should take - any of my above
> suggestions, or something else?
>
>
>
>> > Finally as a minor point, I know we are mostly (ex-)academics but this
>> necessity of formal language on NEPs is self-imposed (probably PEPs are to
>> blame) and not quite helping. It can be a bit more descriptive in my
>> external opinion.
>>
>> TBH, I don't really know how to solve that point, so if you have any
>> specific suggestions, that's certainly welcome. I understand the
>> frustration for a reader trying to understand all the details, with
>> many being only described in NEP-18 [3], but we also strive to avoid
>> rewriting things that are written elsewhere, which would also
>> overburden those who are aware of what's being discussed.
>>
>>
>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP
>> discussion) that NEPs are getting pretty inaccessible. In a sense these are
>> difficult topics and readers should be expected to have *some* familiarity
>> with the topics being discussed, but perhaps more effort should be put into
>> the 

Re: [Numpy-discussion] Add Chebyshev (cosine) transforms implemented via FFTs

2020-08-04 Thread Stephan Hoyer
On Tue, Aug 4, 2020 at 6:10 PM Charles R Harris 
wrote:

>
>
> On Tue, Aug 4, 2020 at 4:55 AM Ralf Gommers 
> wrote:
>
>>
>>
>> On Tue, Aug 4, 2020 at 1:49 AM Chris Vavaliaris 
>> wrote:
>>
>>> PR #16999: https://github.com/numpy/numpy/pull/16999
>>>
>>> Hello all,
>>> this PR adds the two 1D Chebyshev transform functions `chebyfft` and
>>> `ichebyfft` into the `numpy.fft` module, utilizing the real FFTs `rfft`
>>> and
>>> `irfft`, respectively. As far as I understand, `pockefft` does not
>>> support
>>> cosine transforms natively; for this reason, an even extension of the
>>> input
>>> vector is constructed, whose real FFT corresponds to a cosine transform.
>>>
>>> The motivation behind these two additions is the ability to quickly
>>> perform
>>> direct and inverse Chebyshev transforms with `numpy`, without the need to
>>> write scripts that do the necessary (although minor) modifications.
>>> Chebyshev transforms are used often e.g. in the spectral integration of
>>> PDE
>>> problems; thus, I believe having them implemented in `numpy` would be
>>> useful
>>> to many people in the community.
>>>
>>> I'm happy to get comments/feedback on this feature, and on whether it's
>>> something more people would be interested in. Also, I'm not entirely sure
>>> what part of this functionality is/isn't present in `scipy`, so that the
>>> two
>>> `fft` modules remain consistent with one another.
>>>
>>
>> Hi Chris, that's a good question. scipy.fft is a superset of numpy.fft,
>> and the functionality included in NumPy is really only the basics that are
>> needed in many fields. The reason for the duplication stems from way back
>> when we had no wheels and SciPy was very hard to install. So I don't think
>> there's anything we'd add to numpy.fft at this point.
>>
>> As I commented on your PR, it would be useful to add some references and
>> applications, and then make your proposal on the scipy-dev list.
>>
>>
> Chebfun  is based around this method,
> they use series with possibly thousands of terms. Trefethen is a big fan of
> Chebyshev polynomials.
>

I am quite sure that Chebyshev transforms are useful, but it does feel like
something more directly suitable for SciPy than NumPy. The current division
for submodules like numpy.fft/scipy.fft and numpy.linalg/scipy.linalg
exists for outdated historical reasons, but at this point it is easiest for
users to understand if has SciPy has a strict superset of NumPy's
functionality here.


Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Augment unique method

2020-07-16 Thread Stephan Hoyer
On Thu, Jul 16, 2020 at 1:04 PM  wrote:

> I see your point. How about passing number of significant figures instead
> of atol.
>
>
>
> In fact, that’s what I originally intended but I thought that it could be
> expressed via atol and rtol, whereas number of significant figures doesn’t
> seem to suffer from the ambiguity you pointed out.
>

This can already be expressed clearly* with a separate function call, e.g.,
np.unique(np.round(x, 3))

In general, it's a better software design practice to have separate
composable functions rather than adding more features into a single
function. So I don't think this would be an improvement for np.unique().

* Note: this is rounding to fixed precision rather than a fixed number of
significant figures. I can see a case why adding a helper function for
rounding to a number of significant digits would be useful, but this should
be a separate change from np.unique(). You can certainly do this currently
in NumPy but it's a bit of work:
https://stackoverflow.com/questions/18915378/rounding-to-significant-figures-in-numpy


>
>
> *From:* NumPy-Discussion  gmail....@python.org> *On Behalf Of *Stephan Hoyer
> *Sent:* Thursday, July 16, 2020 3:06 PM
> *To:* Discussion of Numerical Python 
> *Subject:* Re: [Numpy-discussion] Augment unique method
>
>
>
> On Thu, Jul 16, 2020 at 11:41 AM Roman Yurchak 
> wrote:
>
> One issue with adding a tolerance to np.unique for floats is say you have
>   [0, 0.1, 0.2, 0.3, 0.4, 0.5] with atol=0.15
>
> Should this return a single element or multiple ones? One once side each
> consecutive float is closer than the tolerance to the next one but the
> first one and the last one are clearly not within atol.
>
> Generally this is similar to what DBSCAN clustering algorithm does (e.g.
> in scikit-learn) and that would probably be out of scope for np.unique.
>
>
>
> I agree, I don't think there's an easy answer for selecting "approximately
> unique" floats in the case of overlap.
>
>
>
> np.unique() does actually have well defined behavior for float, comparing
> floats for exact equality. This isn't always directly useful, but it
> definitely is well defined.
>
>
>
> My suggestion for this use-case would be round floats to the desired
> precision before passing them into np.unique().
>
>
>
>
>
> Roman
>
> On 16/07/2020 20:27, Amin Sadeghi wrote:
> > It would be handy to add "atol" and "rtol" optional arguments to the
> > "unique" method. I'm proposing this since uniqueness is a bit vague for
> > floats. This change would be clearly backwards-compatible.
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Augment unique method

2020-07-16 Thread Stephan Hoyer
On Thu, Jul 16, 2020 at 11:41 AM Roman Yurchak 
wrote:

> One issue with adding a tolerance to np.unique for floats is say you have
>   [0, 0.1, 0.2, 0.3, 0.4, 0.5] with atol=0.15
>
> Should this return a single element or multiple ones? One once side each
> consecutive float is closer than the tolerance to the next one but the
> first one and the last one are clearly not within atol.
>
> Generally this is similar to what DBSCAN clustering algorithm does (e.g.
> in scikit-learn) and that would probably be out of scope for np.unique.
>

I agree, I don't think there's an easy answer for selecting "approximately
unique" floats in the case of overlap.

np.unique() does actually have well defined behavior for float, comparing
floats for exact equality. This isn't always directly useful, but it
definitely is well defined.

My suggestion for this use-case would be round floats to the desired
precision before passing them into np.unique().



> Roman
>
> On 16/07/2020 20:27, Amin Sadeghi wrote:
> > It would be handy to add "atol" and "rtol" optional arguments to the
> > "unique" method. I'm proposing this since uniqueness is a bit vague for
> > floats. This change would be clearly backwards-compatible.
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Complex Comparison/Ordering in Numpy

2020-07-01 Thread Stephan Hoyer
On Wed, Jul 1, 2020 at 12:23 PM Sebastian Berg 
wrote:

> This is a WIP, but allows nicely to try out how the new API
> could/should look like, and see the potential impact to code.  The
> current choice is for:
>
> np.sort(arr, keys=(arr.real, arr.image))
>
> for example.  `keys` is like the `key` argument to pythons sorts, but
> unlike python sorts is not passed a function but rather a sequence of
> arrays.
>
> Alternative spellings could be `by=...`? Or maybe someone has a
> different API idea.
>

I really like the look of np.sort(arr, by=(arr.real, arr.image)).
- This avoids adding an extra function sortby into NumPy's API. The default
behavior (by=None) would of course be to sort by the arrays being sorted,
so it's backwards compatible.
- Calling the new argument "by" instead of "key" avoids confusion with the
behavior of Python's sort/sorted (which take functions instead of
sequences).

The combination of lexsort() and take_along_axis() makes it possible to
achieve this behavior currently, but it is definitely less clear than a
single function call.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] divmod(1.0, 0.0) bug

2020-05-08 Thread Stephan Hoyer
On Fri, May 8, 2020 at 4:10 PM Brock Mendel  wrote:

> FWIW in pandas we post-process floordiv (and divmod) ops to get the
> "Expected Result" behavior from the OP.
>
>
> On Fri, May 8, 2020 at 11:56 AM Anirudh Subramanian 
> wrote:
>
>> Hi all,
>>
>> There has been a discussion about divmod (1.0, 0.0) bug here :
>> https://github.com/numpy/numpy/issues/14900 and
>> https://github.com/numpy/numpy/pull/16161 .
>>
>> *SUMMARY*
>>
>> Currently divmod(1.0, 0.0) sets the "Invalid error" and returns (nan,
>> nan). This is not consistent with IEEE 754
>>  standard which says that
>> 1.0/0.0 divisions should return inf and raise dividebyzero error. Although
>> this may not apply to divmod, it should apply to floor_divide and mod.
>> I have summarized in the table below, summarizing current state and
>> expected state.
>>
>> Operator Warning message Expected warning Result Expected Result
>> np.divmod Invalid error invalid and dividebyzero ?? nan, nan inf, nan
>> np.fmod(1.0, 0.0) Invalid error Invalid nan nan
>> np.floor_divide(1.0, 0.0) Invalid error Dividebyzero nan inf
>> np.remainder(1.0, 0.0) Invalid error Invalid nan nan
>>
>>
>> For remainder and fmod above, according to the standard, it is supposed
>> to raise invalid error. We need to change the code to also raise
>> dividebyzero error for floor_divide.
>>
>> The question is what to do for np.divmod (since this is not defined by
>> standard). My opinion is in this case we need to set both dividebyzero and
>> invalid error flags since its a combination of these two operations.
>>
>> *USER IMPACT*
>>
>> This is going to cause a breaking change/silent incorrect results to
>> atleast some users who are either doing one or two of the following:
>> 1. expecting nans from their output and check isnan but not isinf in
>> their code and accordingly do further computations.
>> 2. who currently call raise only on invalid and not dividebyzero errors
>> for any of  the above listed operations.
>>
>> Considering this we can try one of the two things:
>> 1. Create a futurewarning for every divmod(1.0, 0.0) path. This may be
>> very noisy and I cannot think of an action for a user to take to suppress
>> this.
>> 2. Since bug fixes are exempt from backwards compatibility policy
>>  just
>> notify in the release notes, maybe after a couple of releases. (More
>> Impactful)
>>
>
I agree, I think these behaviors could be considered bugs and fixed without
warning. (However, note that the backwards compatibility policy you link to
is only a draft, not officially accepted.)

My guess is that these code paths have been rarely exercised, because floor
division and divmod are most useful for integers.


>
>> *OTHER ISSUES?*
>>
>> Depending on the compiler, and if it implements annex F of the C
>> standard, it may not support 1.0/0.0 operation and may crash. Also this is
>> the case for true_divide also, so we wont be breaking more users than we
>> currently are.
>>
>> Would like to hear your thoughts about this!
>>
>> Anirudh
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecate Promotion of numbers to strings?

2020-04-30 Thread Stephan Hoyer
On Thu, Apr 30, 2020 at 10:32 AM Sebastian Berg 
wrote:

> Hi all,
>
> in https://github.com/numpy/numpy/pull/15925 I propose to deprecate
> promotion of strings and numbers. I have to double check whether this
> has a large effect on pandas, but it currently seems to me that it will
> be reasonable.
>

Sebastian -- thanks for driving this forward!

Pandas and Xarray already override these casting rules, so I think this is
generally a good idea:
https://github.com/pydata/xarray/blob/3820fb77256682d909c1e41d962e29bec0edd62d/xarray/core/dtypes.py#L34-L42

Note that Xarray also overrides np.promote_types(np.bytes_, np.unicode_) to
object.

This means that `np.promote_types("S", "int8")`, etc. will lead to an
> error instead of returning `"S4"`.  For the user, I believe the two
> main visible changes are that:
>
> np.array(["string", 0])
>
> will stop creating a string array and return either an `object` array
> or give an error (object array would be the default currently).
>

In the long term, I guess this would error as part of the plan to require
explicitly writing dtype=object to get object arrays?


> Another larger visible change will be code such as:
>
> np.concatenate([np.array(["string"]), np.array([2])])
>
> will result in an error instead of returning a string array. (Users
> will have to cast manually here.)
>

I agree, it is better to raise an error than inadvertently cast to
object dtype. This can make errors appear later in strange ways.

We would need to make this change slowly over several releases, e.g., by
issuing a warning first.


> The alternative is to return an object array also for the concatenate
> example.  I somewhat dislike that because `object` is not homogeneously
> typed and we thus lose type information.  This also affects functions
> that wish to cast inputs to a common type (ufuncs also do this
> sometimes).
> A further example of this and discussion is at the end of the mail [1].
>
>
> So the first question is whether we can form an agreement that an error
> is the better choice for `concatenate` and `np.promote_types()`.
> I.e. there is no one dtype that can faithfully represent both strings
> and integers. (This is currently the case e.g. for datetime64 and
> float64.)
>
>
> The second question is what to do for:
>
> np.array(["string", 0])
>
> which currently always returns strings.  Arguably, it must also either
> return an `object` array, or raise an error (requiring the user to pick
> string or object using `dtype=object`).
>
> The default would be to create a FutureWarning that an `object` array
> will be returned for `np.asarray(["string", 0])` in the future.
> But if we know already that we prefer an error, it would be better to
> give a DeprecationWarning right away. (It just does not seem nice to
> change the same thing twice even if the workaround is identical.)
>
> Cheers,
>
> Sebastian
>
>
> [1]
>
> A second more in-depth point is that code such as:
>
> common_dtype = np.result_type(arr1, arr2)  # or promote_types
> arr1 = arr1.astype(common_dtype, copy=False)
> arr2 = arr2.astype(common_dtype, copy=False)
>
> will currently use `string` in this case while it would error in the
> future. This already fails with other type combinations such as
> `datetime64` and `float64` at the moment.
>
> The main alternative to this proposal is to return `object` for the
> common dtype, since an object array is not homogeneously typed, it
> arguably can represent both inputs.  I do not quite like this choice
> personally because in the above example, it may be that the next line
> is something like:
>
> return arr1 * arr2
>
> in which case, the preferred return may be `str` and not `object`.
> We currently never promote to `object` unless one of the arrays is
> already an `object` array, and that seems like the right choice to me.
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-25 Thread Stephan Hoyer
On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers 
wrote:

>
>
> On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser 
> wrote:
>
>> Perhaps worth mentioning that we've discussed this sort of API before, in
>> https://github.com/numpy/numpy/pull/11897.
>>
>> Under that proposal, the api would be something like:
>>
>> * `copy=True` - always copy, like it is today
>> * `copy=False` - copy if needed, like it is today
>> * `copy=np.never_copy` - never copy, throw an exception if not possible
>>
>
> There's a couple of issues I see with using `copy` for __array__:
> - copy is already weird (False doesn't mean no), and a [bool,
> some_obj_or_str] keyword isn't making that better
> - the behavior we're talking about can do more than copying, e.g. for
> PyTorch it would modify the autograd graph by adding detach(), and for
> sparse it's not just "make a copy" (which implies doubling memory use) but
> it densifies which can massively blow up the memory.
> - I'm -1 on adding things to the main namespace (never_copy) for something
> that can be handled differently (like a string, or a new keyword)
>
> tl;dr a new `force` keyword would be better
>

I agree, “copy” is not a good description of this desired coercion behavior.

A new keyword argument like “force” would be much clearer.


> Cheers,
> Ralf
>
>
>> I think the discussion stalled on the precise spelling of the third
>> option.
>>
>> `__array__` was not discussed there, but it seems like adding the `copy`
>> argument to `__array__` would be a perfectly reasonable extension.
>>
>> Eric
>>
>> On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> One bit of expressivity we would miss is “copy if necessary, but
 otherwise don’t bother”, but there are workarounds to this.

>>>
>>> After a side discussion with Stéfan van der Walt, we came up with
>>> `allow_copy=True`, which would express to the downstream library that we
>>> don’t mind waiting, but that zero-copy would also be ok.
>>>
>>> This sounds like the sort of thing that is use case driven. If enough
>>> projects want to use it, then I have no objections to adding the keyword.
>>> OTOH, we need to be careful about adding too many interoperability tricks
>>> as they complicate the code and makes it hard for folks to determine the
>>> best solution. Interoperability is a hot topic and we need to be careful
>>> not put too leave behind too many experiments in the NumPy code.  Do you
>>> have any other ideas of how to achieve the same effect?
>>>
>>>
>>> Personally, I don’t have any other ideas, but would be happy to hear
>>> some!
>>>
>>> My view regarding API/experiment creep is that `__array__` is the oldest
>>> and most basic of all the interop tricks and that this can be safely
>>> maintained for future generations. Currently it only takes `dtype=` as a
>>> keyword argument, so it is a very lean API. I think this particular use
>>> case is very natural and I’ve encountered the reluctance to implicitly copy
>>> twice, so I expect it is reasonably common.
>>>
>>> Regarding difficulty in determining the best solution, I would be happy
>>> to contribute to the dispatch basics guide together with the new kwarg. I
>>> agree that the protocols are getting quite numerous and I couldn’t find a
>>> single place that gathers all the best practices together. But, to
>>> reiterate my point: `__array__` is the simplest of these and I think this
>>> keyword is pretty safe to add.
>>>
>>> For ease of discussion, here are the API options discussed so far, as
>>> well as a few extra that I don’t like but might trigger other ideas:
>>>
>>> np.asarray(my_duck_array, allow_copy=True)  # default is False, or None
>>> -> leave it to the duck array to decide
>>> np.asarray(my_duck_array, copy=True)  # always copies, but, if supported
>>> by the duck array, defers to it for the copy
>>> np.asarray(my_duck_array, copy=‘allow’)  # could take values ‘allow’,
>>> ‘force’, ’no’, True(=‘force’), False(=’no’)
>>> np.asarray(my_duck_array, force_copy=False, allow_copy=True)  # separate
>>> concepts, but unclear what force_copy=True, allow_copy=False means!
>>> np.asarray(my_duck_array, force=True)
>>>
>>> Juan.
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Feelings about type aliases in NumPy

2020-04-25 Thread Stephan Hoyer
On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg 
wrote:

> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote:
> > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote:
> > > But, Stephan pointed out that it might be confusing to users for
> > > objects to only exist at typing time, so we came around to the
> > > question of whether people are open to the idea of including the
> > > type
> > > aliases in NumPy itself. Ralf's concrete proposal was to make a
> > > module
> > > numpy.types (or maybe numpy.typing) to hold the aliases so that
> > > they
> > > don't pollute the top-level namespace. The module would initially
> > > contain the types
> >
> > That sounds very sensible.  Having types available with NumPy should
> > also encourage their use, especially if we can add some documentation
> > around it.
>
> I agree, I might have a small tendency for `numpy.types` if we ever
> find any usage other than direct typing that may be the better name?


Unless we anticipate adding a long list of type aliases (more than the
three suggested so far), I would lean towards adding ArrayLike to the top
level NumPy namespace as np.ArrayLike.

Type annotations are becoming an increasingly core part of modern Python
code. We should make it easy to appropriately type check functions that act
on NumPy arrays, and a top level np.ArrayLike is definitely more convenient
than np.types.ArrayLike.

Out of curiousity, I guess `ArrayLike` would be an ABC that a
> downstream project can register with?


ArrayLike will be a typing Protocol, automatically recognizing attributes
like __array__ to indicate that something can be cast to an array.


>
> - Sebastian
>
>
> >
> > Stéfan
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-24 Thread Stephan Hoyer
On Fri, Apr 24, 2020 at 6:31 AM Sebastian Berg 
wrote:

> One thing to note is that `__array__` is actually asked to return a
> copy AFAIK.


The documentation on __array__ seems to quite limited, unfortunately. The
most I can find are a few sentences here:
https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array__

I don't see anything about returning copies. My interpretation has always
been that __array__ can return either a copy or a view, like the
np.asarray() constructor.


> I doubt it always does, but if it does not I assume the
> object should and could provide `__array_interface__`.
>

Objects like xarray.DataArray and pandas.Series sometimes directly wrap
NumPy arrays and sometimes don't.

They both implement __array__ but not __array_inferace__. It's very obvious
how to implement a "forwarding" __array__ method (just call `np.asarray()`
on an argument that might implement it). I guess something similar could be
done for __array_interface__, but it's not clear to me that it's right to
implement __array_interface__ when doing so might require a copy.


> Under that assumption, it would be an opt-out right now since NumPy
> allows copies by default here.
> Defining things along copy does seem sensible, though I do not know how
> it would play with some of the current array-likes choosing to refuse
> `__array__`.
>
> - Sebastian
>
>
>
> > Eric
> >
> > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > One bit of expressivity we would miss is “copy if necessary, but
> > > otherwise
> > > > don’t bother”, but there are workarounds to this.
> > > >
> > >
> > > After a side discussion with Stéfan van der Walt, we came up with
> > > `allow_copy=True`, which would express to the downstream library
> > > that we
> > > don’t mind waiting, but that zero-copy would also be ok.
> > >
> > > This sounds like the sort of thing that is use case driven. If
> > > enough
> > > projects want to use it, then I have no objections to adding the
> > > keyword.
> > > OTOH, we need to be careful about adding too many interoperability
> > > tricks
> > > as they complicate the code and makes it hard for folks to
> > > determine the
> > > best solution. Interoperability is a hot topic and we need to be
> > > careful
> > > not put too leave behind too many experiments in the NumPy
> > > code.  Do you
> > > have any other ideas of how to achieve the same effect?
> > >
> > >
> > > Personally, I don’t have any other ideas, but would be happy to
> > > hear some!
> > >
> > > My view regarding API/experiment creep is that `__array__` is the
> > > oldest
> > > and most basic of all the interop tricks and that this can be
> > > safely
> > > maintained for future generations. Currently it only takes `dtype=`
> > > as a
> > > keyword argument, so it is a very lean API. I think this particular
> > > use
> > > case is very natural and I’ve encountered the reluctance to
> > > implicitly copy
> > > twice, so I expect it is reasonably common.
> > >
> > > Regarding difficulty in determining the best solution, I would be
> > > happy to
> > > contribute to the dispatch basics guide together with the new
> > > kwarg. I
> > > agree that the protocols are getting quite numerous and I couldn’t
> > > find a
> > > single place that gathers all the best practices together. But, to
> > > reiterate my point: `__array__` is the simplest of these and I
> > > think this
> > > keyword is pretty safe to add.
> > >
> > > For ease of discussion, here are the API options discussed so far,
> > > as well
> > > as a few extra that I don’t like but might trigger other ideas:
> > >
> > > np.asarray(my_duck_array, allow_copy=True)  # default is False, or
> > > None ->
> > > leave it to the duck array to decide
> > > np.asarray(my_duck_array, copy=True)  # always copies, but, if
> > > supported
> > > by the duck array, defers to it for the copy
> > > np.asarray(my_duck_array, copy=‘allow’)  # could take values
> > > ‘allow’,
> > > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > > np.asarray(my_duck_array, force_copy=False, allow_copy=True)  #
> > > separate
> > > concepts, but unclear what force_copy=True, allow_copy=False means!
> > > np.asarray(my_duck_array, force=True)
> > >
> > > Juan.
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Put type annotations in NumPy proper?

2020-03-24 Thread Stephan Hoyer
When we started numpy-stubs [1] a few years ago, putting type annotations
in NumPy itself seemed premature. We still supported Python 2, which meant
that we would need to use awkward comments for type annotations.

Over the past few years, using type annotations has become increasingly
popular, even in the scientific Python stack. For example, off-hand I know
that at least SciPy, pandas and xarray have at least part of their APIs
type annotated. Even without annotations for shapes or dtypes, it would be
valuable to have near complete annotations for NumPy, the project at the
bottom of the scientific stack.

Unfortunately, numpy-stubs never really took off. I can think of a few
reasons for that:
1. Missing high level guidance on how to write type annotations,
particularly for how (or if) to annotate particularly dynamic parts of
NumPy (e.g., consider __array_function__), and whether we should prioritize
strictness or faithfulness [2].
2. We didn't have a good experience for new contributors. Due to the
relatively low level of interest in the project, when a contributor would
occasionally drop in, I often didn't even notice their PR for a few weeks.
3. Developing type annotations separately from the main codebase makes them
a little harder to keep in sync. This means that type annotations couldn't
serve their typical purpose of self-documenting code. Part of this may be
necessary for NumPy (due to our use of C extensions), but large parts of
NumPy's user facing APIs are written in Python. We no longer support Python
2, so at least we no longer need to worry about putting annotations in
comments.

We eventually could probably use a formal NEP (or several) on how we want
to use type annotations in NumPy, but I think a good first step would be to
think about how to start moving the annotations from numpy-stubs into numpy
proper.

Any thoughts? Anyone interested in taking the lead on this?

Cheers,
Stephan

[1] https://github.com/numpy/numpy-stubs
[2] https://github.com/numpy/numpy-stubs/issues/12
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

2020-02-23 Thread Stephan Hoyer
On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers  wrote:

>
>
> On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer  wrote:
>
>> On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <
>> sebast...@sipsolutions.net> wrote:
>>
>>>
>>> Another thing about backward compatibility: What is our vision there
>>> actually?
>>> This NEP will *not* give the *end user* the option to opt-in! Here,
>>> opt-in is really reserved to the *library user* (e.g. sklearn). (I did
>>> not realize this clearly before)
>>>
>>> Thinking about that for a bit now, that seems like the right choice.
>>> But it also means that the library requires an easy way of giving a
>>> FutureWarning, to notify the end-user of the upcoming change. The end-
>>> user will easily be able to convert to a NumPy array to keep the old
>>> behaviour.
>>> Once this warning is given (maybe during `get_array_module()`, the
>>> array module object/context would preferably be passed around,
>>> hopefully even between libraries. That provides a reasonable way to
>>> opt-in to the new behaviour without a warning (mainly for library
>>> users, end-users can silence the warning if they wish so).
>>>
>>
>> I don't think NumPy needs to do anything about warnings. It is
>> straightforward for libraries that want to use use get_array_module() to
>> issue their own warnings before calling get_array_module(), if desired.
>>
>
>> Or alternatively, if a library is about to add a new __array_module__
>> method, it is straightforward to issue a warning inside the new
>> __array_module__ method before returning the NumPy functions.
>>
>
> I don't think this is quite enough. Sebastian points out a fairly
> important issue. One of the main rationales for the whole NEP, and the
> argument in multiple places (
> https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users)
> is that it's now opt-in while __array_function__ was opt-out. This isn't
> really true - the problem is simply *moved*, from the duck array libraries
> to the array-consuming libraries. The end user will still see the backwards
> incompatible change, with no way to turn it off. It will be easier with
> __array_module__ to warn users, but this should be expanded on in the NEP.
>

Ralf, thanks for sharing your thoughts.

I'm not quite I understand the concerns about backwards incompatibility:
1. The intention is that implementing a __array_module__ method should be
backwards compatible with all current uses of NumPy. This satisfies
backwards compatibility concerns for an array-implementing library like JAX.
2. In contrast, calling get_array_module() offers no guarantees about
backwards compatibility. This seems nearly impossible, because the entire
point of the protocol is to make it possible to opt-in to new behavior. So
backwards compatibility isn't solved for Scikit-Learn switching to use
get_array_module(), and after Scikit-Learn does so, adding __array_module__
to new types of arrays could potentially have backwards incompatible
consequences for Scikit-Learn (unless sklearn uses default=None).

Are you suggesting just adding something like what I'm writing here into
the NEP? Perhaps along with advice to consider issuing warnings inside
__array_module__  and falling back to legacy behavior when first
implementing it on a new type?

We could also potentially make a few changes to make backwards
compatibility even easier, by making the protocol less aggressive about
assuming that NumPy is a safe fallback. Some non-exclusive options:
a. We could switch the default value of "default" on get_array_module() to
None, so an exception is raised if nothing implements __array_module__.
b. We could includes *all* argument types in "types", not just types that
implement __array_module__. NumPy's ndarray.__array_module__ could then
recognize and refuse to return an implementation if there are other
arguments that might implement __array_module__ in the future (e.g.,
anything outside the standard library?).

The downside of making either of these choices is that it would potentially
make get_array_function() a bit less usable, because it is more likely to
fail, e.g., if called on a float, or some custom type that should be
treated as a scalar.

Also, I'm still not sure I agree with the tone of the discussion on this
> topic. It's very heavily inspired by what the JAX devs are telling you (the
> NEP still says PyTorch and scipy.sparse as well, but that's not true in
> both cases). If you ask Dask and CuPy for example, they're quite happy with
> __array_function__ and there haven't been many complaints about backwards
> compat breakage.
>

I'm linking to comments you wrote in reference to P

Re: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

2020-02-23 Thread Stephan Hoyer
On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg 
wrote:

> > It is less clear how this could work for __array_module__, because
>
> __array_module__ and get_array_module() are not generic -- they
> > refers explicitly to a NumPy like module. If we want to extend it to
> > SciPy (for which I agree there are good use-cases), what should that
> > look __array_module__`
>
> I suppose the question is here, where should the code reside? For
> SciPy, I agree there is a good reason why you may want to "reverse" the
> implementation. The code to support JAX arrays, should live inside JAX.
>
> One, probably silly, option is to return a "global" namespace, so that:
>
> np = get_array_module(*arrays).numpy`
>
>
My main concern with a "global namespace" is that it adds boilerplate to
the typical usage of fetching a duck-array version of NumPy.

I think the simplest proposal is to add a "module" argument to both
get_array_module and __array_module__, with a default value of "numpy".
This adds flexibility with minimal additional complexity.

The main question is what the type of arguments for "module" should be:
1. Modules could be specified as strings, e.g., "numpy"
2. Module could be specified as actual namespace, e.g., numpy from import
numpy.

The advantage of (1) is that in theory you could write
np.get_array_module(*arrays, module='scipy.linalg') without the overhead of
actually importing scipy.linalg or without even needing scipy to be
installed, if all the arrays use a different scipy.linalg implementation.
But in practice, this seems a little far-fetched. All alternative
implementations of scipy that I know of (e.g., in JAX or conceivably in
Dask) import the original library.

The main downside of (1) is that it would would mean that NumPy's
ndarray.__array_module__ would need to use importlib.import_module() to
dynamically import modules. It also adds a potentially awkward asymmetry
between the "module" and "default" arguments, unless we also switched
default to specify modules with strings.

Either way, the "default" argument will probably need to be adjusted so
that by default it matches whatever value is passed into "module", instead
of always defaulting to "numpy".

Any thoughts on which of these options makes most sense? We could also put
off making any changes to the protocol now, but this change seems pretty
safe and appear to have real use-cases (e.g., for sklearn) so I am inclined
to go ahead with it now before finalizing the NEP.


> We have to distinct issues: Where should e.g. SciPy put a generic
> implementation (assuming they to provide implementations that only
> require NumPy-API support to not require overriding)?
> And, also if a library provides generic support, should we define a
> standard of how the context/namespace may be passed in/provided?
>
> sklearn's main namespace is expected to support many array
> objects/types, but it could be nice to pass in an already known
> context/namespace (say scikit-image already found it, and then calls
> scikit-learn internally). A "generic" namespace may even require this
> to infer the correct output array object.
>
>
> Another thing about backward compatibility: What is our vision there
> actually?
> This NEP will *not* give the *end user* the option to opt-in! Here,
> opt-in is really reserved to the *library user* (e.g. sklearn). (I did
> not realize this clearly before)
>
> Thinking about that for a bit now, that seems like the right choice.
> But it also means that the library requires an easy way of giving a
> FutureWarning, to notify the end-user of the upcoming change. The end-
> user will easily be able to convert to a NumPy array to keep the old
> behaviour.
> Once this warning is given (maybe during `get_array_module()`, the
> array module object/context would preferably be passed around,
> hopefully even between libraries. That provides a reasonable way to
> opt-in to the new behaviour without a warning (mainly for library
> users, end-users can silence the warning if they wish so).
>

I don't think NumPy needs to do anything about warnings. It is
straightforward for libraries that want to use use get_array_module() to
issue their own warnings before calling get_array_module(), if desired.

Or alternatively, if a library is about to add a new __array_module__
method, it is straightforward to issue a warning inside the new
__array_module__ method before returning the NumPy functions.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

2020-02-06 Thread Stephan Hoyer
On Wed, Feb 5, 2020 at 8:02 AM Andreas Mueller  wrote:

> A bit late to the NEP 37 party.
> I just wanted to say that at least from my perspective it seems a great
> solution that will help sklearn move towards more flexible compute engines.
> I think one of the biggest issues is array creation (including random
> arrays), and that's handled quite nicely with NEP 37.
>

Andreas, thanks for sharing your feedback here! Your perspective is really
appreciated.


> - We use scipy.linalg in many places, and we would need to do a separate
> dispatching to check whether we can use module.linalg instead
>  (that might be an issue for many libraries but I'm not sure).
>

This brings up a good question -- obviously the final decision here is up
to SciPy maintainers, but how should we encourage SciPy to support
dispatching?

We could pretty easily make __array_function__ cover SciPy by simply
exposing NumPy's internal utilities. SciPy could simply use the
np.array_function_dispatch decorator internally and that would be enough.

It is less clear how this could work for __array_module__, because
__array_module__ and get_array_module() are not generic -- they refers
explicitly to a NumPy like module. If we want to extend it to SciPy (for
which I agree there are good use-cases), what should that look like?

The obvious choices would be to either add a new protocol, e.g.,
__scipy_module__ (but then NumPy needs to know about SciPy), or to add some
sort of "module request" parameter to np.get_array_module(), to indicate
the requested API, e.g., np.get_array_module(*arrays, matching='scipy').
This is pretty similar to the "default" argument but would need to get
passed into the __array_module__ protocol, too.


> - Some models have several possible optimization algorithms, some of which
> are pure numpy and some which are Cython. If someone provides a different
> array module,
>  we might want to choose an algorithm that is actually supported by that
> module. While this exact issue is maybe sklearn specific, a similar issue
> could appear for most downstream libs that use Cython in some places.
>  Many Cython algorithms could be implemented in pure numpy with a
> potential slowdown, but once we have NEP 37 there might be a benefit to
> having a pure NumPy implementation as an alternative code path.
>
>
> Anyway, NEP 37 seems a great step in the right direction and would enable
> sklearn to actually dispatch in some places. Dispatching just based on
> __array_function__ seems not really feasible so far.
>
> Best,
> Andreas Mueller
>
>
> On 1/6/20 11:29 PM, Stephan Hoyer wrote:
>
> I am pleased to present a new NumPy Enhancement Proposal for discussion:
> "NEP-37: A dispatch protocol for NumPy-like modules." Feedback would be
> very welcome!
>
> The full text follows. The rendered proposal can also be found online at
> https://numpy.org/neps/nep-0037-array-module.html
>
> Best,
> Stephan Hoyer
>
> =======
> NEP 37 — A dispatch protocol for NumPy-like modules
> ===
>
> :Author: Stephan Hoyer 
> :Author: Hameer Abbasi
> :Author: Sebastian Berg
> :Status: Draft
> :Type: Standards Track
> :Created: 2019-12-29
>
> Abstract
> 
>
> NEP-18's ``__array_function__`` has been a mixed success. Some projects
> (e.g.,
> dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted it. Others
> (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we propose a new
> protocol, ``__array_module__``, that we expect could eventually subsume
> most
> use-cases for ``__array_function__``. The protocol requires explicit
> adoption
> by both users and library authors, which ensures backwards compatibility,
> and
> is also significantly simpler than ``__array_function__``, both of which we
> expect will make it easier to adopt.
>
> Why ``__array_function__`` hasn't been enough
> -
>
> There are two broad ways in which NEP-18 has fallen short of its goals:
>
> 1. **Maintainability concerns**. `__array_function__` has significant
>implications for libraries that use it:
>
>- Projects like `PyTorch
>  <https://github.com/pytorch/pytorch/issues/22402>`_, `JAX
>  <https://github.com/google/jax/issues/1565>`_ and even `scipy.sparse
>  <https://github.com/scipy/scipy/issues/10362>`_ have been reluctant
> to
>  implement `__array_function__` in part because they are concerned
> about
>  **breaking existing code**: users expect NumPy functions like
>  ``np.concatenate`` to return NumPy arrays. This is a fundamental
>  limitation of the ``__array_function__`` design,

[Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

2020-01-06 Thread Stephan Hoyer
I am pleased to present a new NumPy Enhancement Proposal for discussion:
"NEP-37: A dispatch protocol for NumPy-like modules." Feedback would be
very welcome!

The full text follows. The rendered proposal can also be found online at
https://numpy.org/neps/nep-0037-array-module.html

Best,
Stephan Hoyer

===
NEP 37 — A dispatch protocol for NumPy-like modules
===

:Author: Stephan Hoyer 
:Author: Hameer Abbasi
:Author: Sebastian Berg
:Status: Draft
:Type: Standards Track
:Created: 2019-12-29

Abstract


NEP-18's ``__array_function__`` has been a mixed success. Some projects
(e.g.,
dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted it. Others
(e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we propose a new
protocol, ``__array_module__``, that we expect could eventually subsume most
use-cases for ``__array_function__``. The protocol requires explicit
adoption
by both users and library authors, which ensures backwards compatibility,
and
is also significantly simpler than ``__array_function__``, both of which we
expect will make it easier to adopt.

Why ``__array_function__`` hasn't been enough
-

There are two broad ways in which NEP-18 has fallen short of its goals:

1. **Maintainability concerns**. `__array_function__` has significant
   implications for libraries that use it:

   - Projects like `PyTorch
 <https://github.com/pytorch/pytorch/issues/22402>`_, `JAX
 <https://github.com/google/jax/issues/1565>`_ and even `scipy.sparse
 <https://github.com/scipy/scipy/issues/10362>`_ have been reluctant to
 implement `__array_function__` in part because they are concerned about
 **breaking existing code**: users expect NumPy functions like
 ``np.concatenate`` to return NumPy arrays. This is a fundamental
 limitation of the ``__array_function__`` design, which we chose to
allow
 overriding the existing ``numpy`` namespace.
   - ``__array_function__`` currently requires an "all or nothing" approach
to
 implementing NumPy's API. There is no good pathway for **incremental
 adoption**, which is particularly problematic for established projects
 for which adopting ``__array_function__`` would result in breaking
 changes.
   - It is no longer possible to use **aliases to NumPy functions** within
 modules that support overrides. For example, both CuPy and JAX set
 ``result_type = np.result_type``.
   - Implementing **fall-back mechanisms** for unimplemented NumPy functions
 by using NumPy's implementation is hard to get right (but see the
 `version from dask <https://github.com/dask/dask/pull/5043>`_), because
 ``__array_function__`` does not present a consistent interface.
 Converting all arguments of array type requires recursing into generic
 arguments of the form ``*args, **kwargs``.

2. **Limitations on what can be overridden.** ``__array_function__`` has
some
   important gaps, most notably array creation and coercion functions:

   - **Array creation** routines (e.g., ``np.arange`` and those in
 ``np.random``) need some other mechanism for indicating what type of
 arrays to create. `NEP 36 <https://github.com/numpy/numpy/pull/14715>`_
 proposed adding optional ``like=`` arguments to functions without
 existing array arguments. However, we still lack any mechanism to
 override methods on objects, such as those needed by
 ``np.random.RandomState``.
   - **Array conversion** can't reuse the existing coercion functions like
 ``np.asarray``, because ``np.asarray`` sometimes means "convert to an
 exact ``np.ndarray``" and other times means "convert to something
_like_
 a NumPy array." This led to the `NEP 30
 <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_ proposal
for
 a separate ``np.duckarray`` function, but this still does not resolve
how
 to cast one duck array into a type matching another duck array.

``get_array_module`` and the ``__array_module__`` protocol
--

We propose a new user-facing mechanism for dispatching to a duck-array
implementation, ``numpy.get_array_module``. ``get_array_module`` performs
the
same type resolution as ``__array_function__`` and returns a module with an
API
promised to match the standard interface of ``numpy`` that can implement
operations on all provided array types.

The protocol itself is both simpler and more powerful than
``__array_function__``, because it doesn't need to worry about actually
implementing functions. We believe it resolves most of the maintainability
and
functionality limitations of ``__array_function__``.

The new protocol is opt-in, explicit and with local control; see
:ref:`appendix-design-choices` for discu

Re: [Numpy-discussion] Adding keepdims to linspace / logspace / geomspace

2019-12-10 Thread Stephan Hoyer
I'm not sure I understand the motivation here. Is the idea that you want to
reuse min/max values computed with keepdims that you also want to use for
other purposes?

If so, this could be achieved with squeeze(), e.g.,

np.linspace(
arr.min(axis=ax, keepdims=True).squeeze(ax),
arr.max(axis=ax, keepdims=True).squeeze(ax),
axis=ax,
)

This is not the most elegant code, but it seems better than adding a new
keyword argument, and the resulting code is likely about as efficient.


On Tue, Dec 10, 2019 at 11:54 AM Sebastian Berg 
wrote:

> On Tue, 2019-12-10 at 10:43 -0800, Zijie Poh wrote:
> > Hi all,
> >
> > We've created a PR (#14922) on
> > adding keepdims to linspace / logspace / geomspace, which
> > enables linspace to directly take the output
> > of min and max with keepdims = True as the start and stop arguments.
> > That is, the following two linspace calls return the same result.
> >
> > np.linspace(
> > arr.min(axis=ax),
> > arr.max(axis=ax),
> > axis=ax
> > )
> >
> > np.linspace(
> > arr.min(axis=ax, keepdims=True),
> > arr.max(axis=ax, keepdims=True),
> > axis=ax, keepdims=True
> > )
> >
>
> I am a bit hesitant about the name `keepdims` being the best one. I
> realize it is nice to have the pattern to use the same name for
> symmetry. But on the other hand, there is no axes being "kept" here. In
> fact, `keepdims=True` returns fewer dims than `keepdims=False` :).
>
> Not sure I have a better idea though, `expand_axis` (or axis) might be
> closer to what happens?
>
> `keepdims` is currently used entirely for reduction-like operations
> (including complex reduce-like behaviour in `percentile`). However, the
> closest to an opposite of reduce-like operations are maybe broadcasts
> (they expand axis), but I cannot think of a way to use that for a
> parameter name ;).
>
> The change itself is small enough and I am good with adding it, I have
> some doubts it will be used much. But it is like a very natural thing
> to  give input with the same number of dimensions/axis as the output
> will have.
>
> Maybe we should have had `new_axis=` and `expand_axis=` and you can
> only use one ;).
>
> - Sebastian
>
>
> > Please let me know if you have any questions / suggestions.
> >
> > Regards,
> > ZJ
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [API] (minor change) Allow multiple axes in `expand_dims`

2019-12-02 Thread Stephan Hoyer
This looks good to me!

On Mon, Dec 2, 2019 at 11:13 AM Sebastian Berg 
wrote:

> Hi all,
>
> Pull request 14051:
>
> https://github.com/numpy/numpy/pull/14051
>
> means that `np.expand_dims` now accepts multiple axes in the `axis`
> argument. As before, the axis signal where a new axis is in the output
> array. From the new tests:
>
> a = np.empty((3, 3, 3))
> np.expand_dims(a, axis=(0, 1, 2)).shape == (1, 1, 1, 3, 3, 3)
> np.expand_dims(a, axis=(0, -1, -2)).shape == (1, 3, 3, 3, 1, 1)
> np.expand_dims(a, axis=(0, 3, 5)).shape == (1, 3, 3, 1, 3, 1)
> np.expand_dims(a, axis=(0, -3, -5)).shape == (1, 1, 3, 1, 3, 3)
>
> We believe this is an uncontroversial generalization, but pinging the
> mailing list since it is an API change. If anyone is concerned please I
> will be happy to revert, otherwise this is expected to be included in
> 1.18.
>
> Cheers,
>
> Sebastian
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cleanup of travisCI tests.

2019-11-15 Thread Stephan Hoyer
We still support explicitly opting-out of __array_function__, so I think we
should still keep that test configuration.

On Fri, Nov 15, 2019 at 1:50 PM Charles R Harris 
wrote:

> Hi All,
>
> I think there are some travisCI tests that we can eliminate, see  tests
> 
>  for
> the current proposed set. I think we can eliminate the following
>
> INSTALL_PICKLE5=1  # Python3.8 has this
> NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0
>
> And possibly one or both of
>
> NPY_RELAXED_STRIDES_CHECKING=0
> NPY_RELAXED_STRIDES_CHECKING_DEBUG=1
>
> Thoughts?
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation

2019-09-16 Thread Stephan Hoyer
On Mon, Sep 16, 2019 at 1:45 PM Peter Andreas Entschev 
wrote:

> What would be the use case for a duck-array to implement __array__ and
> return a NumPy array? Unless I'm missing something, this seems
> redundant and one should just use array/asarray functions then. This
> would also prevent error-handling, what if the developer intentionally
> wants a NumPy-like array (e.g., the original array passed to the
> duckarray function) or an exception (instead of coercing to a NumPy
> array)?
>

Dask arrays are a good example. They will want to implement __duck_array__
(or whatever we call it) because they support duck typed versions of NumPy
operation. They also (already) implement __array__, so they can converted
into NumPy arrays as a fallback. This is convenient for moderately sized
dask arrays, e.g., so you can pass one into a matplotlib function.



>
> On Mon, Sep 16, 2019 at 9:25 PM Chris Barker 
> wrote:
> >
> >
> >
> > On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev <
> pe...@entschev.com> wrote:
> >>
> >> Apologies for the late reply. I've opened a new PR
> >> https://github.com/numpy/numpy/pull/14257 with the changes requested
> >
> >
> > thanks!
> >
> > I've written a small PR on your PR:
> >
> > https://github.com/pentschev/numpy/pull/1
> >
> > Essentially, other than typos and copy editing, I'm suggesting that a
> duck-array could choose to implement __array__ if it so chooses -- it
> should, of course, return an actual numpy array.
> >
> > I think this could be useful, as much code does require an actual numpy
> array, and only that class itself knows how best to convert to one.
> >
> > -CHB
> >
> > --
> >
> > Christopher Barker, Ph.D.
> > Oceanographer
> >
> > Emergency Response Division
> > NOAA/NOS/OR(206) 526-6959   voice
> > 7600 Sand Point Way NE   (206) 526-6329   fax
> > Seattle, WA  98115   (206) 526-6317   main reception
> >
> > chris.bar...@noaa.gov
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-11 Thread Stephan Hoyer
On Wed, Sep 11, 2019 at 4:18 PM Ralf Gommers  wrote:

>
>
> On Tue, Sep 10, 2019 at 10:59 AM Stephan Hoyer  wrote:
>
>> On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi 
>> wrote:
>>
>>> On 10.09.19 05:32, Stephan Hoyer wrote:
>>>
>>> On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers 
>>> wrote:
>>>
>>>> I think we've chosen to try the former - dispatch on functions so we
>>>> can reuse the NumPy API. It could work out well, it could give some
>>>> long-term maintenance issues, time will tell. The question is now if and
>>>> how to plug the gap that __array_function__ left. It's main limitation is
>>>> "doesn't work for functions that don't have an array-like input" - that
>>>> left out ~10-20% of functions. So now we have a proposal for a structural
>>>> solution to that last 10-20%. It seems logical to want that gap plugged,
>>>> rather than go back and say "we shouldn't have gone for the first 80%, so
>>>> let's go no further".
>>>>
>>>
>>> I'm excited about solving the remaining 10-20% of use cases for flexible
>>> array dispatching,
>>>
>>> Great! I think most (but not all) of us are on the same page here.
> Actually now that Peter came up with the `like=` keyword idea for array
> creation functions I'm very interested in seeing that worked out, feels
> like that could be a nice solution for part of that 10-20% that did look
> pretty bad before.
>
>> but the unumpy interface suggested here (numpy.overridable) feels like a
>>> redundant redo of __array_function__ and __array_ufunc__.
>>>
>>>
> A bit of context: a big part of the reason I advocated for
> numpy.overridable is that library authors can use it *only* for the parts
> not already covered by the protocols we already have. If there's overlap
> there's several ways to deal with that, including only including part of
> the unumpy API surface. It does plug all the holes in one go (although you
> can then indeed argue it does too much), and there is no other coherent
> proposal/vision yet that does this. What you wrote below comes closest, and
> I'd love to see that worked out (e.g. the like= argument for array
> creation). What I don't like is an ad-hoc plugging of one hole at a time
> without visibility on how many more protocols and new workaround functions
> in the API we would need. So hopefully we can come to an apples-to-apples
> comparison of two design alternatives.
>
> Also, we just discussed this whole thread in the community call, and it's
> clear that it's a complex matter with many different angles. It's very hard
> to get a full overview. Our conclusion in the call was that this will
> benefit from an in-person discussion. The sprint in November may be a
> really good opportunity for that.
>

Sounds good, I'm looking forward to the discussion at the November sprint!


> In the meantime we can of course keep working out ideas/docs. For now I
> think it's clear that we (the NEP authors) have some homework to do - that
> may take some time.
>
>
>>> I would much rather continue to develop specialized protocols for the
>>> remaining usecases. Summarizing those I've seen in this thread, these
>>> include:
>>> 1. Overrides for customizing array creation and coercion.
>>> 2. Overrides to implement operations for new dtypes.
>>> 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs
>>> with MKL.
>>>
>>> (1) could mostly be solved by adding np.duckarray() and another function
>>> for duck array coercion. There is still the matter of overriding np.zeros
>>> and the like, which perhaps justifies another new protocol, but in my
>>> experience the use-cases for truly an array from scratch are quite rare.
>>>
>>> While they're rare for libraries like XArray; CuPy, Dask and
>>> PyData/Sparse need these.
>>>
>>>
>>> (2) should be tackled as part of overhauling NumPy's dtype system to
>>> better support user defined dtypes. But it should definitely be in the form
>>> of specialized protocols, e.g., which pass in preallocated arrays to into
>>> ufuncs for a new dtype. By design, new dtypes should not be able to
>>> customize the semantics of array *structure*.
>>>
>>> We already have a split in the type system with e.g. Cython's buffers,
>>> Numba's parallel type system. This is a different issue altogether, e.g.
>>> allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write
>>> of unyt to cooperate with NumPy's ne

Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-10 Thread Stephan Hoyer
On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi 
wrote:

> On 10.09.19 05:32, Stephan Hoyer wrote:
>
> On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers 
> wrote:
>
>> I think we've chosen to try the former - dispatch on functions so we can
>> reuse the NumPy API. It could work out well, it could give some long-term
>> maintenance issues, time will tell. The question is now if and how to plug
>> the gap that __array_function__ left. It's main limitation is "doesn't work
>> for functions that don't have an array-like input" - that left out ~10-20%
>> of functions. So now we have a proposal for a structural solution to that
>> last 10-20%. It seems logical to want that gap plugged, rather than go back
>> and say "we shouldn't have gone for the first 80%, so let's go no further".
>>
>
> I'm excited about solving the remaining 10-20% of use cases for flexible
> array dispatching, but the unumpy interface suggested here
> (numpy.overridable) feels like a redundant redo of __array_function__ and
> __array_ufunc__.
>
> I would much rather continue to develop specialized protocols for the
> remaining usecases. Summarizing those I've seen in this thread, these
> include:
> 1. Overrides for customizing array creation and coercion.
> 2. Overrides to implement operations for new dtypes.
> 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs
> with MKL.
>
> (1) could mostly be solved by adding np.duckarray() and another function
> for duck array coercion. There is still the matter of overriding np.zeros
> and the like, which perhaps justifies another new protocol, but in my
> experience the use-cases for truly an array from scratch are quite rare.
>
> While they're rare for libraries like XArray; CuPy, Dask and PyData/Sparse
> need these.
>
>
> (2) should be tackled as part of overhauling NumPy's dtype system to
> better support user defined dtypes. But it should definitely be in the form
> of specialized protocols, e.g., which pass in preallocated arrays to into
> ufuncs for a new dtype. By design, new dtypes should not be able to
> customize the semantics of array *structure*.
>
> We already have a split in the type system with e.g. Cython's buffers,
> Numba's parallel type system. This is a different issue altogether, e.g.
> allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write
> of unyt to cooperate with NumPy's new dtype system.
>

I guess you're proposing that operations like np.sum(numpy_array,
dtype=other_dtype) could rely on other_dtype for the implementation and
potentially return a non-NumPy array? I'm not sure this is well motivated
-- it would be helpful to discuss actual use-cases.

The most commonly used NumPy functionality related to dtypes can be found
only in methods on np.ndarray, e.g., astype() and view(). But I don't think
there's any proposal to change that.

> 4. Having default implementations that allow overrides of a large part of
> the API while defining only a small part. This holds for e.g.
> transpose/concatenate.
>
I'm not sure how unumpy solve the problems we encountered when trying to do
this with __array_function__ -- namely the way that it exposes all of
NumPy's internals, or requires rewriting a lot of internal NumPy code to
ensure it always casts inputs with asarray().

I think it would be useful to expose default implementations of NumPy
operations somewhere to make it easier to implement __array_function__, but
it doesn't make much sense to couple this to user facing overrides. These
can be exposed as a separate package or numpy module (e.g.,
numpy.default_implementations) that uses np.duckarray(), which library
authors can make use of by calling inside their __aray_function__ methods.

> 5. Generation of Random numbers (overriding RandomState). CuPy has its
> own implementation which would be nice to override.
>
I'm not sure that NumPy's random state objects make sense for duck arrays.
Because these are stateful objects, they are pretty coupled to NumPy's
implementation -- you cannot store any additional state on RandomState
objects that might be needed for a new implementation. At a bare minimum,
you will loss the reproducibility of random seeds, though this may be less
of a concern with the new random API.

> I also share Nathaniel's concern that the overrides in unumpy are too
> powerful, by allowing for control from arbitrary function arguments and
> even *non-local* control (i.e., global variables) from context managers.
> This level of flexibility can make code very hard to debug, especially in
> larger codebases.
>
> Backend switching needs global context, in any case. There isn't a good
> way around that other than the class dundermethods outlined in another
> thread, which would require rewrites of l

Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-09 Thread Stephan Hoyer
On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers  wrote:

> I think we've chosen to try the former - dispatch on functions so we can
> reuse the NumPy API. It could work out well, it could give some long-term
> maintenance issues, time will tell. The question is now if and how to plug
> the gap that __array_function__ left. It's main limitation is "doesn't work
> for functions that don't have an array-like input" - that left out ~10-20%
> of functions. So now we have a proposal for a structural solution to that
> last 10-20%. It seems logical to want that gap plugged, rather than go back
> and say "we shouldn't have gone for the first 80%, so let's go no further".
>

I'm excited about solving the remaining 10-20% of use cases for flexible
array dispatching, but the unumpy interface suggested here
(numpy.overridable) feels like a redundant redo of __array_function__ and
__array_ufunc__.

I would much rather continue to develop specialized protocols for the
remaining usecases. Summarizing those I've seen in this thread, these
include:
1. Overrides for customizing array creation and coercion.
2. Overrides to implement operations for new dtypes.
3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs with
MKL.

(1) could mostly be solved by adding np.duckarray() and another function
for duck array coercion. There is still the matter of overriding np.zeros
and the like, which perhaps justifies another new protocol, but in my
experience the use-cases for truly an array from scratch are quite rare.

(2) should be tackled as part of overhauling NumPy's dtype system to better
support user defined dtypes. But it should definitely be in the form of
specialized protocols, e.g., which pass in preallocated arrays to into
ufuncs for a new dtype. By design, new dtypes should not be able to
customize the semantics of array *structure*.

(3) could potentially motivate a new solution, but it should exist *inside*
of select existing NumPy implementations, after checking for overrides with
__array_function__. If the only option NumPy provides for overriding np.fft
is to implement np.overrideable.fft, I doubt that would suffice to convince
MKL developers from monkey patching it -- they already decided that a
separate namespace is not good enough for them.

I also share Nathaniel's concern that the overrides in unumpy are too
powerful, by allowing for control from arbitrary function arguments and
even *non-local* control (i.e., global variables) from context managers.
This level of flexibility can make code very hard to debug, especially in
larger codebases.

Best,
Stephan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: Add Moving Average Function

2019-08-25 Thread Stephan Hoyer
I would be very interested to see the “sliding window view” function merged
into np.lib.stride_tricks.

I don’t think it makes sense to add a suite of dedicated functions for
sliding window calculations that wrap that function. If we are going to go
down the path of adding sliding window calculations into a NumPy, they
should use efficient algorithms, like those found in the “bottleneck”
package.

Best,
Stephan

On Sun, Aug 25, 2019 at 3:33 PM Nicholas Georgescu  wrote:

> Hi all,
>
> I opened a Pull Request
> 
>  to
> include this package in numpy
> ,
> along with the associated sliding window function in this PR
> 
> .
>
> The function picks the fastest method to do a moving average if there is
> no weighting, but with weights it resorts to the second-fastest method
> which has an easier implementation.  It also contains a binning option
> which cuts the number of points down by a factor of n rather than by
> subtracting n.  The details are in the package documentation and PR.
>
> Thanks,
> Nicholas
> [image: Sent from Mailspring]
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation

2019-08-07 Thread Stephan Hoyer
On Wed, Aug 7, 2019 at 6:18 PM Charles R Harris 
wrote:

>
>
> On Wed, Aug 7, 2019 at 7:10 PM Stephan Hoyer  wrote:
>
>> On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers 
>> wrote:
>>
>>>
>>> On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer  wrote:
>>>
>>>> On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers 
>>>> wrote:
>>>>
>>>>
>>>>> The NEP currently does not say who this is meant for. Would you expect
>>>>> libraries like SciPy to adopt it for example?
>>>>>
>>>>> The NEP also (understandably) punts on the question of when something
>>>>> is a valid duck array. If you want this to be widely used, that will need
>>>>> an answer or at least some rough guidance though. For example, we would
>>>>> expect a duck array to have a mean() method, but probably not a ptp()
>>>>> method. A library author who wants to use np.duckarray() needs to know,
>>>>> because she can't test with all existing and future duck array
>>>>> implementations.
>>>>>
>>>>
>>>> I think this is covered in NEP-22 already.
>>>>
>>>
>>> It's not really. We discussed this briefly in the community call today,
>>> Peter said he will try to add some text.
>>>
>>> We should not add new functions to NumPy without indicating who is
>>> supposed to use this, and what need it fills / problem it solves. It seems
>>> pretty clear to me that it's mostly aimed at library authors rather than
>>> end users. And also that mature libraries like SciPy may not immediately
>>> adopt it, because it's too fuzzy - so it's new libraries first, mature
>>> libraries after the dust has settled a bit (I think).
>>>
>>
>> I totally agree -- we definitely should clarify this in the docstring and
>> elsewhere in the docs. An example in the new doc page on "Writing custom
>> array containers" (https://numpy.org/devdocs/user/basics.dispatch.html)
>> would also probably be appropriate.
>>
>>
>>> As discussed there, I don't think NumPy is in a good position to
>>>> pronounce decisive APIs at this time. I would welcome efforts to try, but I
>>>> don't think that's essential for now.
>>>>
>>>
>>> There's no need to pronounce a decisive API that fully covers duck
>>> array. Note that RNumPy is an attempt in that direction (not a full one,
>>> but way better than nothing). In the NEP/docs, at least saying something
>>> along the lines of "if you implement this, we recommend the following
>>> strategy: check if a function is present in Dask, CuPy and Sparse. If so,
>>> it's reasonable to expect any duck array to work here. If not, we suggest
>>> you indicate in your docstring what kinds of duck arrays are accepted, or
>>> what properties they need to have". That's a spec by implementation, which
>>> is less than ideal but better than saying nothing.
>>>
>>
>> OK, I agree here as well -- some guidance is better than nothing.
>>
>> Two other minor notes on this NEP, concerning naming:
>> 1. We should have a brief note on why we settled on the name "duck
>> array". Namely, as discussed in NEP-22, we don't love the "duck" jargon,
>> but we couldn't come up with anything better since NumPy already uses
>> "array like" and "any array" for different purposes.
>> 2. The protocol should use *something* more clearly namespaced as NumPy
>> specific than __duckarray__. All the other special protocols NumPy defines
>> start with "__array_". That suggests either __array_duckarray__ (sounds a
>> little redundant) or __numpy_duckarray__ (which I like the look of, but is
>> a different from the existing protocols).
>>
>>
> `__numpy_like__` ?
>


This could work, but I think we would also want to rename the NumPy
function itself to either np.like or np.numpy_like. The later is a little
redundant but definitely more self-descriptive than "duck array".


> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation

2019-08-07 Thread Stephan Hoyer
On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers  wrote:

>
> On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer  wrote:
>
>> On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers 
>> wrote:
>>
>>
>>> The NEP currently does not say who this is meant for. Would you expect
>>> libraries like SciPy to adopt it for example?
>>>
>>> The NEP also (understandably) punts on the question of when something is
>>> a valid duck array. If you want this to be widely used, that will need an
>>> answer or at least some rough guidance though. For example, we would expect
>>> a duck array to have a mean() method, but probably not a ptp() method. A
>>> library author who wants to use np.duckarray() needs to know, because she
>>> can't test with all existing and future duck array implementations.
>>>
>>
>> I think this is covered in NEP-22 already.
>>
>
> It's not really. We discussed this briefly in the community call today,
> Peter said he will try to add some text.
>
> We should not add new functions to NumPy without indicating who is
> supposed to use this, and what need it fills / problem it solves. It seems
> pretty clear to me that it's mostly aimed at library authors rather than
> end users. And also that mature libraries like SciPy may not immediately
> adopt it, because it's too fuzzy - so it's new libraries first, mature
> libraries after the dust has settled a bit (I think).
>

I totally agree -- we definitely should clarify this in the docstring and
elsewhere in the docs. An example in the new doc page on "Writing custom
array containers" (https://numpy.org/devdocs/user/basics.dispatch.html)
would also probably be appropriate.


> As discussed there, I don't think NumPy is in a good position to pronounce
>> decisive APIs at this time. I would welcome efforts to try, but I don't
>> think that's essential for now.
>>
>
> There's no need to pronounce a decisive API that fully covers duck array.
> Note that RNumPy is an attempt in that direction (not a full one, but way
> better than nothing). In the NEP/docs, at least saying something along the
> lines of "if you implement this, we recommend the following strategy: check
> if a function is present in Dask, CuPy and Sparse. If so, it's reasonable
> to expect any duck array to work here. If not, we suggest you indicate in
> your docstring what kinds of duck arrays are accepted, or what properties
> they need to have". That's a spec by implementation, which is less than
> ideal but better than saying nothing.
>

OK, I agree here as well -- some guidance is better than nothing.

Two other minor notes on this NEP, concerning naming:
1. We should have a brief note on why we settled on the name "duck array".
Namely, as discussed in NEP-22, we don't love the "duck" jargon, but we
couldn't come up with anything better since NumPy already uses "array like"
and "any array" for different purposes.
2. The protocol should use *something* more clearly namespaced as NumPy
specific than __duckarray__. All the other special protocols NumPy defines
start with "__array_". That suggests either __array_duckarray__ (sounds a
little redundant) or __numpy_duckarray__ (which I like the look of, but is
a different from the existing protocols).
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation

2019-08-05 Thread Stephan Hoyer
On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers  wrote:

> Having __array__ give a TypeError is fine for libraries that want to
> prevent unintentional coercion with, e.g., `np.asarray(my_ducktype)`.
> However that leaves the obvious question of what the right way is to do
> this intentionally. Would be good to recommend something, for example a
> `numpy()` or `to_numpy()` method. Also, the NEP should make it clearer that
> this is not the obviously correct thing to do, it only makes sense in cases
> where coercion is very expensive, like CuPy and Sparse. For Dask for
> example, coercion to a numpy array is perfectly reasonable.
>

I agree, we need another solution for explicit array conversion, either
from duck arrays to NumPy arrays or between duck arrays.

As has come-up on GitHub [1], think this should probably be another
protocol, to allow for third-party conversions like sparse <-> dask that in
principle could be implemented by either library.

To get discussion start, here's one possible proposal for what the NumPy
API(s) could look like:
np.coerce(sparse_array)  # by default, coerce to np.ndarray
np.coerce(sparse_array, dask.array.Array)  # coerces to dask
np.coerce_like(sparse_array, dask_array)  # coerce like the second array
type
np.coerce_arrays(list_of_arrays)  # coerce to first type that can handle
everything

The protocol itself should probably either use __array_function__ (e.g.,
for np.coerce_like, if all the dispatched on arguments are arrays) or a
custom protocol in the same style that allows for implementations on either
the array being converted or the type of the result [2].

[1] https://github.com/numpy/numpy/issues/13831
[2] https://github.com/numpy/numpy/pull/14170#issuecomment-517004293


> The NEP currently does not say who this is meant for. Would you expect
> libraries like SciPy to adopt it for example?
>
> The NEP also (understandably) punts on the question of when something is a
> valid duck array. If you want this to be widely used, that will need an
> answer or at least some rough guidance though. For example, we would expect
> a duck array to have a mean() method, but probably not a ptp() method. A
> library author who wants to use np.duckarray() needs to know, because she
> can't test with all existing and future duck array implementations.
>

I think this is covered in NEP-22 already. As discussed there, I don't
think NumPy is in a good position to pronounce decisive APIs at this time.
I would welcome efforts to try, but I don't think that's essential for now.

An alternative to introducing np.duckarray() would be to just modify
> np.asarray(). Of course this has backwards compatibility impact, but if
> you're going to be raising a TypeError from __array__ then that impact is
> there anyway. Note: I don't think this is necessarily a better idea,
> because it may lead to less clear errors, but it's worth putting in the
> alternatives section at least.
>
> Cheers,
> Ralf
>
>>
>> Would be great to get some comments on that.
>>
>> [1]
>> https://github.com/numpy/numpy/blob/master/doc/neps/nep-0030-duck-array-protocol.rst
>> [2] https://github.com/numpy/numpy/pull/14170
>> [3] https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
>>
>> Best,
>> Peter
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] __array_function related regression for 1.17.0rc1

2019-07-02 Thread Stephan Hoyer
On Tue, Jul 2, 2019 at 8:16 AM Ralf Gommers  wrote:

>
>
> On Tue, Jul 2, 2019 at 1:45 AM Juan Nunez-Iglesias 
> wrote:
>
>> On Tue, 2 Jul 2019, at 4:34 PM, Stephan Hoyer wrote:
>>
>> This is addressed in the NEP, see bullet 1 under "Partial implementation
>> of NumPy's API":
>>
>> http://www.numpy.org/neps/nep-0018-array-function-protocol.html#partial-implementation-of-numpy-s-api
>>
>> My concern is that fallback coercion behavior makes it difficult to
>> reliably implement "strict" overrides of NumPy's API. Fallback coercion is
>> certainly useful for interactive use, but it isn't really appropriate for
>> libraries.
>>
>>
> Do you mean "fallback coercion in NumPy itself", or "at all"? Right now
> there's lots of valid code around, e.g. `np.median(some_dask_array)`. Users
> will keep wanting to do that. Forcing everyone to write
> `np.median(np.array(some_dask_array))` serves no purpose. So the coercion
> has to be somewhere. You're arguing that that's up to Dask et al I think?
>

Yes, I'm arguing this is up to dask to maintain backwards compatibility --
or not, as the maintainers see fit.

NumPy adding dispatching with __array_function__ did not break any existing
code, until the maintainers of other libraries started adding
__array_function__ methods. I hope that the risks of implementing such
experimental methods were self-evident.


> Putting it in Dask right now still doesn't address Juan's backwards compat
> concern, but perhaps that could be bridged with a Dask bugfix release and
> some short-lived pain.
>

I really think this is the best (only?) path forward.

I'm not convinced that this shouldn't be fixed in NumPy though. Your
> concern "reliably implement "strict" overrides of NumPy's API" is a bit
> abstract. Overriding the _whole_ NumPy API is definitely undesirable. If
> we'd have a reference list somewhere about every function that is handled
> with __array_function__, then would that address your concern? Such a list
> could be auto-generated fairly easily.
>

By "reliably implement strict overrides" I mean the ability to ensure that
every operation either uses an override or raises an informative error --
making it very clear which operation needs to be implemented or avoided.

It's true that we didn't really consider "always issuing warnings" as a
long term solution in the NEP. I can see how this would simply a backwards
compatibility story for libraries like dask, but in general, I really don't
like warnings: Using them like exceptions can easily result in code that is
partially broken or that fails later for non-obvious reasons. There's a
reason why Python's errors stop execution flow, until errors in languages
like PHP or JavaScript.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] __array_function related regression for 1.17.0rc1

2019-07-02 Thread Stephan Hoyer
On Tue, Jul 2, 2019 at 1:46 AM Juan Nunez-Iglesias  wrote:

> I'm also wondering where the list of functions that must be implemented
> can be found, so that libraries like dask and CuPy can be sure that they
> have a complete implementation, and further typeerrors won't be raised with
> their arrays.
>

This is a good question. We don't have a master list currently.

In practice, I would be surprised if there is ever more than exactly one
full implementation of NumPy's full API. We added dispatch with
__array_function__ even to really obscure corners of NumPy's API, e.g.,
np.lib.scimath.

The short answer right now is "Any publicly exposed function that says it
takes array-like arguments, aside from functions specifically for coercing
to NumPy arrays and the functions in numpy.testing."
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] __array_function related regression for 1.17.0rc1

2019-07-02 Thread Stephan Hoyer
>
> Your suggestion on the issue to switch from typeerror to warning is, imho,
>> much better, as long as the warning contains a link to an issue/webpage
>> explaining what needs to happen. It's only because I've been vaguely aware
>> of the `__array_function__` discussions that I was able to diagnose
>> relatively quickly. The average user would be very confused by this code
>> break or by a warning, and be unsure of what they need to do to get rid of
>> the warning.
>>
>
>  This would work I think. It's not even a band-aid, it's probably the
> better design option because any sane library that implements
> __array_function__ will have a much smaller API surface than NumPy - and
> why forbid users from feeding array-like input to the rest of the NumPy
> functions?
>

This is addressed in the NEP, see bullet 1 under "Partial implementation of
NumPy's API":
http://www.numpy.org/neps/nep-0018-array-function-protocol.html#partial-implementation-of-numpy-s-api

My concern is that fallback coercion behavior makes it difficult to
reliably implement "strict" overrides of NumPy's API. Fallback coercion is
certainly useful for interactive use, but it isn't really appropriate for
libraries.

In contrast to putting this into NumPy, if a library like dask prefers to
issue warnings or even keep around fallback coercion indefinitely (not that
I would recommend it), they can do that by putting it in their
__array_function__ implementation.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-25 Thread Stephan Hoyer
On Tue, Jun 25, 2019 at 10:14 AM Todd  wrote:

> On Tue, Jun 25, 2019 at 11:47 AM Alan Isaac  wrote:
>
>> On 6/25/2019 11:03 AM, Todd wrote:
>> > Fair enough.  But although there are valid reasons to do a divide by
>> zero, it still causes a warning because it is a problem often enough that
>> people should be made aware of it.  I
>> > think this is a similar scenario.
>>
>>
>>
>> I side with Stephan on this, but when there are opinions on both sides,
>> I wonder what the resolution strategy is.  I suppose there is a possible
>> tension:
>>
>> 1. Existing practice should be privileged (no change for the sake of
>> change).
>> 2. Documented user issues need to be addressed.
>>
>
> Note that the behavior wouldn't change.  Transposing vectors would do
> exactly what it has always done: nothing.  But people would be made aware
> that the operation they are doing won't actually do anything.
>
> I completely agree that change for the sake of change is not a good
> thing.  But we are talking about a no-op here.  If someone is intentionally
> doing something that does nothing, I would like to think that they could
> deal with a warning that can be easily silenced.
>

I am strongly opposed to adding warnings for documented and correct
behavior that we are not going to change. Warnings are only appropriate in
rare cases that demand user's attention, i.e., code that is almost
certainly not correct, like division by 0. We have already documented use
cases for .T on 1D arrays, such as compatibility with operations also
defined on 2D arrays.

I also agree with Alan that probably it's too late to change the behavior
of .T for arrays with more than 2-dimensions. NumPy could certainly use a
more comprehensive policy around backwards compatibility, but we certainly
need to meet a *very* high bar to break backwards compatibility. I am
skeptical that the slightly cleaner code facilitated by this new definition
for .T would be worth it.


>
>
>> So what is an "in the wild" example of where numpy users create errors
>> that pass
>> silently because a 1-d array transpose did not behave as expected?
>>
>
> Part of the problem with silent errors is that we typically aren't going
> to see them, by definition.  The only way you could catch a silent error
> like that is if someone noticed the results looked different than they
> expected, but that can easily be hidden if the error is a corner case that
> is averaged out.  That is the whole point of having a warning, to make it
> not silent.  It reminds me of the old Weisert quote, "As far as we know,
> our computer has never had an undetected error."
>
> The problems I typically encounter is when people write their code
> assuming that, for example, a trial will have multiple results.  It usually
> does, but on occasion it doesn't.  This sort of thing usually results in an
> error, although it is typically an error far removed from where the problem
> actually occurs and is therefor extremely hard to debug.  I haven't seen
> truly completely silent errors, but again I wouldn't expect to.
>
> We can't really tell how common this sort of thing is until we actively
> check for it.  Remember how many silent errors in encoding were caught once
> python3 starting enforcing proper encoding/decoding handling?  People
> insisted encoding was being handled properly with python2, but it wasn't
> even in massive, mature projects.  People just didn't notice the problems
> before because they were silent.
>
> At the very least, the warning could tell people coming from other
> languages why the transpose is doing something different than they expect,
> as this is not an uncommon issue on stackoverflow. [1]
>
>
>> Why would the unexpected array shape of the result not alert the user if
>> it happens?
>
>
> I think counting on the code to produce an error is really dangerous.  I
> have seen people do a lot of really bizarre things with their code.
>
>
>> In your favor, Mathematica's `Transpose` raises an error when applied to
>> 1d arrays,
>> and the Mma designs are usually carefully considered.
>
>
> Yes, numpy is really the outlier here in making this a silent no-op.
> MATLAB, Julia, R, and SAS all transpose vectors, coercing them to matrices
> if needed.  Again, I don't think we should change how the transpose works,
> it is too late for that.  But I do think that people should be warned about
> it.
>
> [1] https://stackoverflow.com/search?q=numpy+transpose+vector (not all of
> these are relevant, but there are a bunch on there)
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-25 Thread Stephan Hoyer
On Tue, Jun 25, 2019 at 7:20 AM Todd  wrote:

> On Tue, Jun 25, 2019 at 9:35 AM Juan Nunez-Iglesias 
> wrote:
>
>> On Mon, 24 Jun 2019, at 11:25 PM, Marten van Kerkwijk wrote:
>>
>> Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape
>> of `(n, 1)` the right behaviour? I.e., it should still change from what it
>> is now - which is to leave the shape at `(n,)`.
>>
>>
>> Just to chime in as a user: v.T should continue to be a silent no-op for
>> 1D arrays. NumPy makes it arbitrary whether a 1D array is viewed as a row
>> or column vector, but we often want to write .T to match the notation in a
>> paper we're implementing.
>>
>
> Why should it be silent?  This is a source of bugs.  At least in my
> experience, generally when people write v.T it is a mistake.  Either they
> are coming from another language that works differently, or they failed to
> properly check their function arguments.  And if you are doing it on
> purpose, you are doing something you know is a no-op for essentially
> documentation purposes, and I would think that is the sort of thing you
> need to make as explicit as possible.  "Errors should never pass silently.
> Unless explicitly silenced."
>

Writing v.T is also sensible if you're writing code that could apply
equally well to either a single vector or a stack of vectors. This mirrors
the behavior of @, which also allows either single vectors or stacks of
vectors (matrices) with the same syntax.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Stephan Hoyer
On Mon, Jun 24, 2019 at 5:36 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

>
>
> On Mon, Jun 24, 2019 at 7:21 PM Stephan Hoyer  wrote:
>
>> On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane 
>> wrote:
>>
>>> I'm not at all set on that behavior and we can do something else. For
>>> now, I chose this way since it seemed to best match the "IGNORE" mask
>>> behavior.
>>>
>>> The behavior you described further above where the output row/col would
>>> be masked corresponds better to "NA" (propagating) mask behavior, which
>>> I am leaving for later implementation.
>>
>>
>> This does seem like a clean way to *implement* things, but from a user
>> perspective I'm not sure I would want separate classes for "IGNORE" vs "NA"
>> masks.
>>
>> I tend to think of "IGNORE" vs "NA" as descriptions of particular
>> operations rather than the data itself. There are a spectrum of ways to
>> handle missing data, and the right way to propagating missing values is
>> often highly context dependent. The right way to set this is in functions
>> where operations are defined, not on classes that may be defined far away
>> from where the computation happen. For example, pandas has a "min_count"
>> parameter in functions for intermediate use-cases between "IGNORE" and "NA"
>> semantics, e.g., "take an average, unless the number of data points is
>> fewer than min_count."
>>
>
> Anything that specific like that is probably indeed outside of the purview
> of a MaskedArray class.
>

I agree that it doesn't make much sense to have a "min_count" attribute on
a MaskedArray class, but certainly it makes sense for operations on
MaskedArray objects, e.g., to write something like
masked_array.mean(min_count=10). This is what users do in pandas today.


> But your general point is well taken: we really need to ask clearly what
> the mask means not in terms of operations but conceptually.
>
> Personally, I guess like Benjamin I have mostly thought of it as "data
> here is bad" (because corrupted, etc.) or "data here is irrelevant"
> (because of sea instead of land in a map). And I would like to proceed
> nevertheless with calculating things on the remainder. For an expectation
> value (or, less obviously, a minimum or maximum), this is mostly OK: just
> ignore the masked elements. But even for something as simple as a sum, what
> is correct is not obvious: if I ignore the count, I'm effectively assuming
> the expectation is symmetric around zero (this is why `vector.dot(vector)`
> fails); a better estimate would be `np.add.reduce(data, where=~mask) *
> N(total) / N(unmasked)`.
>

I think it's fine and logical to define default semantics for operations on
MaskedArray objects. Much of the time, replacing masked values with 0 is
the right thing to do for sum. Certainly IGNORE semantics are more useful
overall than the NA semantics.

But even if a MaskedArray conceptually always represents "bad" or
"irrelevant" data, the way to handle those missing values will differ based
on the use case, and not everything will fall cleanly into either IGNORE or
NA buckets. I think it makes sense to provide users with functions/methods
that expose these options, rather than requiring that they convert their
data into a different type MaskedArray.

"It is better to have 100 functions operate on one data structure than 10
functions on 10 data structures." —Alan Perlis
https://stackoverflow.com/questions/6016271/why-is-it-better-to-have-100-functions-operate-on-one-data-structure-than-10-fun
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Stephan Hoyer
On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane 
wrote:

> I'm not at all set on that behavior and we can do something else. For
> now, I chose this way since it seemed to best match the "IGNORE" mask
> behavior.
>
> The behavior you described further above where the output row/col would
> be masked corresponds better to "NA" (propagating) mask behavior, which
> I am leaving for later implementation.


This does seem like a clean way to *implement* things, but from a user
perspective I'm not sure I would want separate classes for "IGNORE" vs "NA"
masks.

I tend to think of "IGNORE" vs "NA" as descriptions of particular
operations rather than the data itself. There are a spectrum of ways to
handle missing data, and the right way to propagating missing values is
often highly context dependent. The right way to set this is in functions
where operations are defined, not on classes that may be defined far away
from where the computation happen. For example, pandas has a "min_count"
parameter in functions for intermediate use-cases between "IGNORE" and "NA"
semantics, e.g., "take an average, unless the number of data points is
fewer than min_count."

Are there examples of existing projects that define separate user-facing
types for different styles of masks?
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Stephan Hoyer
On Mon, Jun 24, 2019 at 8:46 AM Allan Haldane 
wrote:

>  1. Making a "no-clobber" guarantee on the underlying data
>

Hi Allan -- could kindly clarify what you mean by "no-clobber"?

Is this referring to allowing masked arrays to mutate masked data values
in-place, even on apparently non-in-place operators? If so, that definitely
seems like a bad idea to me. I would much rather do an unnecessary copy
than have surprisingly non-thread-safe operations.


>  If we agree that masked values will contain nonsense, it seems like a
> bad idea for those values to be easily exposed.
>
> Further, in all the comments so far I have not seen an example of a need
> for unmasking that is not more easily, efficiently and safely achieved
> by simply creating a new MaskedArray with a different mask.


My understanding is that essentially every low-level MaskedArray function
is implemented by looking at the data and mask separately. If so, we should
definitely expose this API directly to users (as part of the public API for
MaskedArray), so they can write their own efficient algorithms.

As a concrete example, suppose I wanted to implement a low-level "grouped
mean" operation for masked arrays like that found in pandas. This isn't a
builtin NumPy function, so I would need to write this myself. This would be
relatively straightforward to do in Numba or Cython with raw NumPy arrays
(e.g., see my example here for a NaN skipping version:
https://github.com/shoyer/numbagg/blob/v0.1.0/numbagg/grouped.py), but to
do it efficiently you definitely don't want to make an unnecessary copy.

The usual reason for hiding implementation details is when we want to
reserve the right to change them. But if we're sure about the data model
(which I think we are for MaskedArray) then I think there's a lot of value
in exposing it directly to users, even if it's lower level than it
appropriate to use in most cases.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Stephan Hoyer
On Mon, Jun 24, 2019 at 8:10 AM Todd  wrote:

> On Mon, Jun 24, 2019 at 11:00 AM Stephan Hoyer  wrote:
>
>> On Sun, Jun 23, 2019 at 10:05 PM Stewart Clelland <
>> stewartclell...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Based on discussion with Marten on github
>>> <https://github.com/numpy/numpy/issues/13797>, I have a couple of
>>> suggestions on syntax improvements on array transpose operations.
>>>
>>> First, introducing a shorthand for the Hermitian Transpose operator. I
>>> thought "A.HT" might be a viable candidate.
>>>
>>
>> I agree that short-hand for the Hermitian transpose would make sense,
>> though I would try to stick with "A.H". It's one of the last reasons to
>> prefer the venerable np.matrix. NumPy arrays already has loads of
>> methods/properties, and this is a case (like @ for matrix multiplication)
>> where the operator significantly improves readability: consider "(x.H @
>> M @ x) / (x.H @ x)" vs "(x.conj().T @ M @ x) / (x.conj().T @ x)" [1].
>> Nearly everyone who does linear algebra with complex numbers would find
>> this useful.
>>
>> If I recall correctly, the last time this came up, it was suggested that
>> we might implement this with NumPy view as  a "complex conjugate" dtype
>> rather than a memory copy. This would allow the operation to be essentially
>> free. I find this very appealing, both due to symmetry with ".T" and
>> because of the principle that properties should be cheap to compute.
>>
>> So my tentative vote would be (1) yes, let's do the short-hand attribute,
>> but (2) let's wait until we have a complex conjugate dtype that do this
>> efficiently. My hope is that this should be relatively doable in a year or
>> two after current dtype refactor/usability effect comes to fruition.
>>
>> Best,
>> Stephan
>>
>> [1]  I copied the first non-trivial example off the Wikipedia page for a
>> Hermitian matrix:  https://en.wikipedia.org/wiki/Hermitian_matrix
>>
>>
> I would call it .CT or something like that, based on the term "Conjugate
> transpose".  Wikipedia redirects "Hermitian transpose" to "Conjugate
> transpose", and google has 49,800 results for "Hermitian transpose" vs
> 201,000 for "Conjugate transpose" (both with quotes).  So "Conjugate
> transpose" seems to be the more widely-known name.  Further, I think what a
> "Conjugate transpose" does is immediately obvious to someone who isn't
> already familiar with the term so long as they know what a "conjugate" and
> "transpose" are, while no one would be able to tell what a "Hermitian
> transpose" unless they are already familiar with the name.  So I have no
> problem calling it a "Hermitian transpose" somewhere in the docs, but I
> think the naming and documentation should focus on the "Conjugate
> transpose" term.
>

Sure, we should absolutely document the name as the "Conjugate transpose".
But the standard mathematical notation is definitely "A^H" rather than
"A^{CT}".


> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-23 Thread Stephan Hoyer
On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Your proposal would be something like np.sum(array,
>> where=np.ones_like(array))? This seems rather verbose for a common
>> operation. Perhaps np.sum(array, where=True) would work, making use of
>> broadcasting? (I haven't actually checked whether this is well-defined yet.)
>>
>> I think we'd need to consider separately the operation on the mask and on
> the data. In my proposal, the data would always do `np.sum(array,
> where=~mask)`, while how the mask would propagate might depend on the mask
> itself, i.e., we'd have different mask types for `skipna=True` (default)
> and `False` ("contagious") reductions, which differed in doing
> `logical_and.reduce` or `logical_or.reduce` on the mask.
>

OK, I think I finally understand what you're getting at. So suppose this
this how we implement it internally. Would we really insist on a user
creating a new MaskedArray with a new mask object, e.g., with a GreedyMask?
We could add sugar for this, but certainly array.greedy_masked().sum() is
significantly less clear than array.sum(skipna=False).

I'm also a little concerned about a proliferation of MaskedArray/Mask
types. New types are significantly harder to understand than new functions
(or new arguments on existing functions). I don't know if we have enough
distinct use cases for this many types.

Are there use-cases for propagating masks separately from data? If not, it
>> might make sense to only define mask operations along with data, which
>> could be much simpler.
>>
>
> I had only thought about separating out the concern of mask propagation
> from the "MaskedArray" class to the mask proper, but it might indeed make
> things easier if the mask also did any required preparation for passing
> things on to the data (such as adjusting the "where" argument in a
> reduction). I also like that this way the mask can determine even before
> the data what functionality is available (i.e., it could be the place from
> which to return `NotImplemented` for a ufunc.at call with a masked index
> argument).
>

You're going to have to come up with something more compelling than
"separation of concerns" to convince me that this extra Mask abstraction is
worthwhile. On its own, I think a separate Mask class would only obfuscate
MaskedArray functions.

For example, compare these two implementations of add:

def  add1(x, y):
return MaskedArray(x.data + y.data,  x.mask | y.mask)

def  add2(x, y):
return MaskedArray(x.data + y.data,  x.mask + y.mask)

The second version requires that you *also* know how Mask classes work, and
how they implement +. So now you need to look in at least twice as many
places to understand add() for MaskedArray objects.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-23 Thread Stephan Hoyer
On Sun, Jun 23, 2019 at 4:07 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> - If reductions/aggregations default to skipping missing elements, how is
>> it be possible to express "NA propagating" versions, which are also useful,
>> if slightly less common?
>>
>
> I have been playing with using a new `Mask(np.ndarray)` class for the
> mask, which does the actual mask propagation (i.e., all single-operand
> ufuncs just copy the mask, binary operations do `logical_or` and reductions
> do `logical.and.reduce`). This way the `Masked` class itself can generally
> apply a given operation on the data and the masks separately and then
> combine the two results (reductions are the exception in that `where` has
> to be set). Your particular example here could be solved with a different
> `Mask` class, for which reductions do `logical.or.reduce`.
>

I think it would be much better to use duck-typing for the mask as well, if
possible, rather than a NumPy array subclass. This would facilitate using
alternative mask implementations, e.g., distributed masks, sparse masks,
bit-array masks, etc.

Are there use-cases for propagating masks separately from data? If not, it
might make sense to only define mask operations along with data, which
could be much simpler.


> We may want to add a standard "skipna" argument on NumPy aggregations,
>> solely for the benefit of duck arrays (and dtypes with missing values). But
>> that could also be a source of confusion, especially if skipna=True refers
>> only "true NA" values, not including NaN, which is used as an alias for NA
>> in pandas and elsewhere.
>>
>
> It does seem `where` should suffice, no? If one wants to be super-fancy,
> we could allow it to be a callable, which, if a ufunc, gets used inside the
> loop (`where=np.isfinite` would be particularly useful).
>

Let me try to make the API issue more concrete. Suppose we have a
MaskedArray with values [1, 2, NA]. How do I get:
1. The sum ignoring masked values, i.e., 3.
2. The sum that is tainted by masked values, i.e., NA.

Here's how this works with existing array libraries:
- With base NumPy using NaN as a sentinel value for NA, you can get (1)
with np.sum and (2) with np.nansum.
- With pandas and xarray, the default behavior is (1) and to get (2) you
need to write array.sum(skipna=False).
- With NumPy's current MaskedArray, it appears that you can only get (1).
Maybe there isn't as strong a need for (2) as I thought?

Your proposal would be something like np.sum(array,
where=np.ones_like(array))? This seems rather verbose for a common
operation. Perhaps np.sum(array, where=True) would work, making use of
broadcasting? (I haven't actually checked whether this is well-defined yet.)
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-23 Thread Stephan Hoyer
On Sat, Jun 22, 2019 at 6:50 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Allan,
>
> I'm not sure I would go too much by what the old MaskedArray class did. It
> indeed made an effort not to overwrite masked values with a new result,
> even to the extend of copying back masked input data elements to the output
> data array after an operation. But the fact that this is non-sensical if
> the dtype changes (or the units in an operation on quantities) suggests
> that this mental model simply does not work.
>
> I think a sensible alternative mental model for the MaskedArray class is
> that all it does is forward any operations to the data it holds and
> separately propagate a mask, ORing elements together for binary operations,
> etc., and explicitly skipping masked elements in reductions (ideally using
> `where` to be as agnostic as possible about the underlying data, for which,
> e.g., setting masked values to `0` for `np.reduce.add` may or may not be
> the right thing to do - what if they are string?).
>

+1, this sounds like the right model to me.

That said, I would still not guarantee values under the mask as part of
NumPy's API. The result of computations under the mask should be considered
an undefined implementation detail, sort of like integer overflow or dict
iteration order pre-Python 3.7. The values may even be entirely arbitrary,
e.g., in cases where the result is preallocated with empty().

I'm less confident about the right way to handle missing elements in
reductions. For example:
- Should median() also skip missing elements, even though there is no
identity element?
- If reductions/aggregations default to skipping missing elements, how is
it be possible to express "NA propagating" versions, which are also useful,
if slightly less common?

We may want to add a standard "skipna" argument on NumPy aggregations,
solely for the benefit of duck arrays (and dtypes with missing values). But
that could also be a source of confusion, especially if skipna=True refers
only "true NA" values, not including NaN, which is used as an alias for NA
in pandas and elsewhere.

With this mental picture, the underlying data are always have well-defined
> meaning: they have been operated on as if the mask did not exist. There
> then is also less reason to try to avoid getting it back to the user.
>
> As a concrete example (maybe Ben has others): in astropy we have a
> sigma-clipping average routine, which uses a `MaskedArray` to iteratively
> mask items that are too far off from the mean; here, the mask varies each
> iteration (an initially masked element can come back into play), but the
> data do not.
>
> All the best,
>
> Marten
>
> On Sat, Jun 22, 2019 at 10:54 AM Allan Haldane 
> wrote:
>
>> On 6/21/19 2:37 PM, Benjamin Root wrote:
>> > Just to note, data that is masked isn't always garbage. There are plenty
>> > of use-cases where one may want to temporarily apply a mask for a set of
>> > computation, or possibly want to apply a series of different masks to
>> > the data. I haven't read through this discussion deeply enough, but is
>> > this new class going to destroy underlying masked data? and will it be
>> > possible to swap out masks?
>> >
>> > Cheers!
>> > Ben Root
>>
>> Indeed my implementation currently feels free to clobber the data at
>> masked positions and makes no guarantees not to.
>>
>> I'd like to try to support reasonable use-cases like yours though. A few
>> thoughts:
>>
>> First, the old np.ma.MaskedArray explicitly does not promise to preserve
>> masked values, with a big warning in the docs. I can't recall the
>> examples, but I remember coming across cases where clobbering happens.
>> So arguably your behavior was never supported, and perhaps this means
>> that no-clobber behavior is difficult to reasonably support.
>>
>> Second, the old np.ma.MaskedArray avoids frequent clobbering by making
>> lots of copies. Therefore, in most cases you will not lose any
>> performance in my new MaskedArray relative to the old one by making an
>> explicit copy yourself. I.e, is it problematic to have to do
>>
>>  >>> result = MaskedArray(data.copy(), trial_mask).sum()
>>
>> instead of
>>
>>  >>> marr.mask = trial_mask
>>  >>> result = marr.sum()
>>
>> since they have similar performance?
>>
>> Third, in the old np.ma.MaskedArray masked positions are very often
>> "effectively" clobbered, in the sense that they are not computed. For
>> example, if you do "c = a+b", and then change the mask of c, the values
>> at masked position of the result of (a+b) do not correspond to the sum
>> of the masked values in a and b. Thus, by "unmasking" c you are exposing
>> nonsense values, which to me seems likely to cause heisenbugs.
>>
>>
>> In summary, by not making no-clobber guarantees and by strictly
>> preventing exposure of nonsense values, I suspect that: 1. my new code
>> is simpler and faster by avoiding lots of copies, and forces copies to
>> be explicit in user code. 2. 

Re: [Numpy-discussion] new MaskedArray class

2019-06-23 Thread Stephan Hoyer
On Thu, Jun 20, 2019 at 7:44 PM Allan Haldane 
wrote:

> On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
> > Hi Allan,
> >
> > This is very impressive! I could get the tests that I wrote for my class
> > pass with yours using Quantity with what I would consider very minimal
> > changes. I only could not find a good way to unmask data (I like the
> > idea of setting the mask on some elements via `ma[item] = X`); is this
> > on purpose?
>
> Yes, I want to make it difficult for the user to access the garbage
> values under the mask, which are often clobbered values. The only way to
> "remove" a masked value is by replacing it with a new non-masked value.
>

I think we should make it possible to access (and even mutate) data under
the mask directly, while noting the lack of any guarantees about what those
values are.

MaskedArray has a minimal and transparent data model, consisting of data
and mask arrays. There are plenty of use cases where it is convenient to
access the underlying arrays directly, e.g., for efficient implementation
of low-level MaskedArray algorithms.

NumPy itself does a similar thing on ndarray by exposing data/strides.
Advanced users who learn the details of the data model find them useful,
and everyone else ignores them.


>
> > Anyway, it would seem easily at the point where I should comment on your
> > repository rather than in the mailing list!
>
> To make further progress on this encapsulation idea I need a more
> complete ducktype to pass into MaskedArray to test, so that's what I'll
> work on next, when I have time. I'll either try to finish my
> ArrayCollection type, or try making a simple NDunit ducktype
> piggybacking on astropy's Unit.
>

dask.array would be another good example to try. I think it already should
support __array_function__ (and if not it should be very easy to add).


> Best,
> Allan
>
>
> >
> > All the best,
> >
> > Marten
> >
> >
> > On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane  > > wrote:
> >
> > On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> > >
> > >
> > > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
> > mailto:allanhald...@gmail.com>
> > > >>
> > wrote:
> > > 
> > >
> > > > This may be too much to ask from the initializer, but, if
> > so, it still
> > > > seems most useful if it is made as easy as possible to do,
> > say, `class
> > > > MaskedQuantity(Masked, Quantity): `.
> > >
> > > Currently MaskedArray does not accept ducktypes as underlying
> > arrays,
> > > but I think it shouldn't be too hard to modify it to do so.
> > Good idea!
> > >
> > >
> > > Looking back at my trial, I see that I also never got to duck
> arrays -
> > > only ndarray subclasses - though I tried to make the code as
> > agnostic as
> > > possible.
> > >
> > > (Trial at
> > >
> >
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
> )
> > >
> > > I already partly navigated this mixin-issue in the
> > > "MaskedArrayCollection" class, which essentially does
> > > ArrayCollection(MaskedArray(array)), and only takes about 30
> > lines of
> > > boilerplate. That's the backwards encapsulation order from
> > what you want
> > > though.
> > >
> > >
> > > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> > > mask=[True, False, False])` does indeed not have a `.unit`
> attribute
> > > (and cannot represent itself...); I'm not at all sure that my
> > method of
> > > just creating a mixed class is anything but a recipe for disaster,
> > though!
> >
> > Based on your suggestion I worked on this a little today, and now my
> > MaskedArray more easily encapsulates both ducktypes and ndarray
> > subclasses (pushed to repo). Here's an example I got working with
> masked
> > units using unyt:
> >
> > [1]: from MaskedArray import X, MaskedArray, MaskedScalar
> >
> > [2]: from unyt import m, km
> >
> > [3]: import numpy as np
> >
> > [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
> >
> > [5]: uarr
> >
> > MaskedArray([1., X , 3.])
> > [6]: uarr + 1*m
> >
> > MaskedArray([1.001, X, 3.001])
> > [7]: uarr.filled()
> >
> > unyt_array([1., 0., 3.], 'km')
> > [8]: np.concatenate([uarr, 2*uarr]).filled()
> > unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
> >
> > The catch is the ducktype/subclass has to rigorously follow numpy's
> > indexing rules, including distinguishing 0d arrays from scalars. For
> now
> > only I used unyt in the example above since it happens to be less
> strict
> >  about dimensionless operations than astropy.units which trips up my
> > repr code. (see below for example with astropy.units). Note in the
> last
> > line 

Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Stephan Hoyer
On Thu, Jun 13, 2019 at 9:35 AM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Ralf, others,
>
>
>>>  Anyway, I guess this is still a good example to consider for how we
>>> should go about getting to a new implementation, ideally with just a
>>> single-way to override?
>>>
>>> Indeed, how do we actually envisage deprecating the use of
>>> `__array_function__` for a given part of the numpy API? Are we allowed to
>>> go cold-turkey if the new implementation is covered by `__array_ufunc__`?
>>>
>>
>> I think __array_function__ is still the best way to do this (that's the
>> only actual override, so most robust and performant likely), so I don't see
>> any reason for a deprecation.
>>
>> Yes, I fear I have to agree for the nan-functions, at least for now...
>
> But how about `np.sum` itself? Right now, it is overridden by
> __array_function__ but classes without __array_function__ support can also
> override it through the method lookup and through __array_ufunc__.
>
> Would/should there be a point where we just have `sum = np.add.reduce` and
> drop other overrides? If so, how do we get there?
>
> One option might be start reversing the order in `_wrapreduction` - try
> `__array_ufunc__` if it is defined and only if that fails try the `.sum`
> method.
>

Yes, I think we would need to do this sort of thing. It's a bit of trouble,
but probably doable with some decorator magic. It would indeed be nice for
sum() to eventually just be np.add.reduce, though to be honest I'm not
entirely sure it's worth the trouble of a long deprecation cycle -- people
have been relying on the fall-back calling of methods for a long time.



> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-12 Thread Stephan Hoyer
On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Ralf,
>
> You're right, the problem is with the added keyword argument (which would
> appear also if we did not still have to support the old .sum method
> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
> maybe worse given that it seems likely the reduce method has seen much less
> testing in  __array_ufunc__ implementations).
>
> Still, I do think the question stands: we implement a `nansum` for our
> ndarray class a certain way, and provide ways to override it (three now, in
> fact). Is it really reasonable to expect that we wait 4 versions for other
> packages to keep up with this, and thus get stuck with given internal
> implementations?
>
> Aside: note that the present version of the nanfunctions relies on turning
> the arguments into arrays and copying 0s into it - that suggests that
> currently they do not work for duck arrays like Dask.
>

Agreed. We could safely rewrite things to use np.asarray(), without any
need to worry about backends compatibility. From an API perspective,
nothing would change -- we already cast inputs into base numpy arrays
inside the _replace_nan() routine.


>
> All the best,
>
> Marten
>
> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers 
> wrote:
>
>>
>>
>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>> stef...@berkeley.edu> wrote:
>>
>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>>> > In a way, I brought it up mostly as a concrete example of an internal
>>> > implementation which we cannot change to an objectively cleaner one
>>> because
>>> > other packages rely on an out-of-date numpy API.
>>>
>>
>> I think this is not the right way to describe the problem (see below).
>>
>>
>>> This, and the comments Nathaniel made on the array function thread, are
>>> important to take note of.  Would it be worth updating NEP 18 with a
>>> list of pitfalls?  Or should this be a new informational NEP that
>>> discusses—on a higher level—the benefits, risks, and design
>>> considerations of providing protocols?
>>>
>>
>> That would be a nice thing to do (the higher level one), but in this case
>> I think the issue has little to do with NEP 18. The summary of the issue in
>> this thread is a little brief, so let me try to clarify.
>>
>> 1. np.sum gained a new `where=` keyword in 1.17.0
>> 2. using np.sum(x) will detect a `x.sum` method if it's present and try
>> to use that
>> 3. the `_wrapreduction` utility that forwards the function to the method
>> will compare signatures of np.sum and x.sum, and throw an error if there's
>> a mismatch for any keywords that have a value other than the default
>> np._NoValue
>>
>> Code to check this:
>> >>> x1 = np.arange(5)
>> >>> x2 = np.asmatrix(x1)
>> >>> np.sum(x1)  # works
>> >>> np.sum(x2)  # works
>> >>> np.sum(x1, where=x1>3)  # works
>> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
>> ...
>> TypeError: sum() got an unexpected keyword argument 'where'
>>
>> Note that this is not specific to np.matrix. Using pandas.Series you also
>> get a TypeError:
>> >>> y = pd.Series(x1)
>> >>> np.sum(y)  # works
>> >>> np.sum(y, where=y>3)  # pandas throws TypeError
>> ...
>> TypeError: sum() got an unexpected keyword argument 'where'
>>
>> The issue is that when we have this kind of forwarding logic,
>> irrespective of how it's implemented, new keywords cannot be used until the
>> array-like objects with the methods that get forwarded to gain the same
>> keyword.
>>
>> tl;dr this is simply a cost we have to be aware of when either proposing
>> to add new keywords, or when proposing any kind of dispatching logic (in
>> this case `_wrapreduction`).
>>
>> Regarding internal use of  `np.sum(..., where=)`: this should not be done
>> until at least 4-5 versions from now, and preferably way longer than that.
>> Because doing so will break already released versions of Pandas, Dask, and
>> other libraries with array-like objects.
>>
>> Cheers,
>> Ralf
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving forward with value based casting

2019-06-05 Thread Stephan Hoyer
On Wed, Jun 5, 2019 at 1:43 PM Sebastian Berg 
wrote:

> Hi all,
>
> TL;DR:
>
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
>
> ---
>
> Currently when you write code such as:
>
> arr = np.array([1, 43, 23], dtype=np.uint16)
> res = arr + 1
>
> Numpy uses fairly sophisticated logic to decide that `1` can be
> represented as a uint16, and thus for all unary functions (and most
> others as well), the output will have a `res.dtype` of uint16.
>
> Similar logic also exists for floating point types, where a lower
> precision floating point can be used:
>
> arr = np.array([1, 43, 23], dtype=np.float32)
> (arr + np.float64(2.)).dtype  # will be float32
>
> Currently, this value based logic is enforced by checking whether the
> cast is possible: "4" can be cast to int8, uint8. So the first call
> above will at some point check if "uint16 + uint16 -> uint16" is a
> valid operation, find that it is, and thus stop searching. (There is
> the additional logic, that when both/all operands are scalars, it is
> not applied).
>
> Note that while it is defined in terms of casting "1" to uint8 safely
> being possible even though 1 may be typed as int64. This logic thus
> affects all promotion rules as well (i.e. what should the output dtype
> be).
>
>
> There 2 main discussion points/issues about it:
>
> 1. Should value based casting/promotion logic exist at all?
>
> Arguably an `np.int32(3)` has type information attached to it, so why
> should we ignore it. It can also be tricky for users, because a small
> change in values can change the result data type.
> Because 0-D arrays and scalars are too close inside numpy (you will
> often not know which one you get). There is not much option but to
> handle them identically. However, it seems pretty odd that:
>  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
>  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
>
> give a different result.
>
> This is a bit different for python scalars, which do not have a type
> attached already.
>
>
> 2. Promotion and type resolution in Ufuncs:
>
> What is currently bothering me is that the decision what the output
> dtypes should be currently depends on the values in complicated ways.
> It would be nice if we can decide which type signature to use without
> actually looking at values (or at least only very early on).
>
> One reason here is caching and simplicity. I would like to be able to
> cache which loop should be used for what input. Having value based
> casting in there bloats up the problem.
> Of course it currently works OK, but especially when user dtypes come
> into play, caching would seem like a nice optimization option.
>
> Because `uint8(127)` can also be a `int8`, but uint8(128) it is not as
> simple as finding the "minimal" dtype once and working with that."
> Of course Eric and I discussed this a bit before, and you could create
> an internal "uint7" dtype which has the only purpose of flagging that a
> cast to int8 is safe.
>

Does NumPy actually have an logic that does these sort of checks currently?
If so, it would be interesting to see what it is.

My experiments suggest that we currently have this logic of finding the
"minimal" dtype that can hold the scalar value:

>>> np.array([127], dtype=np.int8) + 127 # silent overflow!
array([-2], dtype=int8)

>>> np.array([127], dtype=np.int8) + 128 # correct result
array([255], dtype=int16)


I suppose it is possible I am barking up the wrong tree here, and this
> caching/predictability is not vital (or can be solved with such an
> internal dtype easily, although I am not sure it seems elegant).
>
>
> Possible options to move forward
> 
>
> I have to still see a bit how trick things are. But there are a few
> possible options. I would like to move the scalar logic to the
> beginning of ufunc calls:
>   * The uint7 idea would be one solution
>   * Simply implement something that works for numpy and all except
> strange external ufuncs (I can only think of numba as a plausible
> candidate for creating such).
>
> My current plan is to see where the second thing leaves me.
>
> We also should see if we cannot move the whole thing forward, in which
> case the main decision would have to be forward to where. My opinion is
> currently that when a type has a dtype associated with it clearly, we
> should always use that dtype in the future. This mostly means that
> numpy dtypes such as `np.int64` will always be treated like an int64,
> and never like a `uint8` because they happen to be castable to that.
>
> For values without a dtype attached (read python integers, floats), I
> see three options, from more complex to simpler:
>
> 1. Keep the current logic in place as much as possible
> 2. Only support value based 

Re: [Numpy-discussion] defining a NumPy API standard?

2019-06-02 Thread Stephan Hoyer
On Sun, Jun 2, 2019 at 1:08 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

>
>
> On Sun, Jun 2, 2019 at 2:21 PM Eric Wieser 
> wrote:
>
>> Some of your categories here sound like they might be suitable for ABCs
>> that provide mixin methods, which is something I think Hameer suggested in
>> the past. Perhaps it's worth re-exploring that avenue.
>>
>> Eric
>>
>>
> Indeed, and of course for __array_ufunc__ we moved there a bit already,
> with `NDArrayOperatorsMixin` [1].
> One could certainly similarly have NDShapingMixin that, e.g., relied on
> `shape`, `reshape`, and `transpose` to implement `ravel`, `swapaxes`, etc.
> And indeed use those mixins in `ndarray` itself.
>
> For this also having a summary of base functions/methods would be very
> helpful.
> -- Marten
>


I would definitely support writing more mixins and helper functions (either
in NumPy, or externally) to make it easier to re-implement NumPy's public
API. Certainly there is plenty of room to make it easier to leverage
__array_ufunc__ and __array_function__.

For some recent examples of what these helpers functions could look like,
see JAX's implementation of NumPy, which is written in terms of a much
smaller array library called LAX:
https://github.com/google/jax/blob/9dfe27880517d5583048e7a3384b504681968fb4/jax/numpy/lax_numpy.py

Hypothetically, JAX could be written on top of a "restricted NumPy"
instead, which in turn could have an implementation written in LAX. This
would facilitate reusing JAX's higher level functions for automatic
differentiation and vectorization on top of different array backends.

I would also be happy to see guidance for NumPy API re-implementers, both
for those scratching from scratch (e.g., in a new language) or who plan to
copy NumPy's Python API (e.g., with __array_function__).

I would focus on:
1. Describing the tradeoffs of challenging design decisions that NumPy may
have gotten wrong, e.g., scalars and indexing.
2. Describing common "gotchas" where it's easy to deviate from NumPy's
semantics unintentionally, e.g., with scalar arithmetic dtypes or indexing
edge cases.

I would *not* try to identify a "core" list of methods/functionality to
implement. Everyone uses their own slice of NumPy's API, so the rational
approach for anyone trying to reimplement exactly (i.e., with
__array_function__) is to start with a minimal subset and add functionality
on demand to meet user's needs. Also, many of the choices involved in
making an array library don't really have objectively right or wrong
answers, and authors are going to make intentional deviations from NumPy's
semantics when it makes sense for them.

Cheers,
Stephan



>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] __skip_array_function__ discussion summary

2019-05-25 Thread Stephan Hoyer
Sebastian, Stefan and Marten -- thanks for the excellent summaries of the
discussion.

In line with this consensus, I have drafted a revision of the NEP without
__skip_array_function__: https://github.com/numpy/numpy/pull/13624


On Thu, May 23, 2019 at 5:28 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Sebastian, Stéfan,
>
> Thanks for the very good summaries!
>
> An additional item worth mentioning is that by using
> `__skip_array_function__` everywhere inside, one minimizes the performance
> penalty of checking for `__array_function__`. It would obviously be worth
> trying to do that, but ideally in a way that is much less intrusive.
>
> Furthermore, it became clear that there were different pictures of the
> final goal, with quite a bit of discussion about the relevant benefits of
> trying the limit exposure of the internal API and of, conversely, trying to
> (incrementally) move to implementations that are maximally re-usable (using
> duck-typing), which are themselves based around a smaller core (more in
> line with Nathaniel's NEP-22).
>
> In the latter respect, Stéfan's example is instructive. The real
> implementation of `ones_like` is:
> ```
> def ones_like(a, dtype=None, order='K', subok=True, shape=None):
> res = empty_like(a, dtype=dtype, order=order, subok=subok, shape=shape)
> multiarray.copyto(res, 1, casting='unsafe')
> return res
> ```
>
> The first step is here seems obvious: an "empty_like" function would seem
> to belong in the core.
> The second step less so: Stéfan's `res.fill(1)` seems more logical, as
> surely a class's method is the optimal way to do something. Though I do
> feel `.fill` itself breaks "There should be one-- and preferably only one
> --obvious way to do it." So, I'd want to replace it with `res[...] = 1`, so
> that one relies on the more obvious `__setitem__`. (Note that all are
> equally fast even now.)
>
> Of course, in this idealized future, there would be little reason to even
> allow `ones_like` to be overridden with __array_function__...
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Keep __array_function__ unexposed by default for 1.17?

2019-05-23 Thread Stephan Hoyer
On Thu, May 23, 2019 at 2:43 AM Ralf Gommers  wrote:

>
>
> On Thu, May 23, 2019 at 3:02 AM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>>
>> If we want to keep an "off" switch we might want to add some sort of API
>>> for exposing whether NumPy is using __array_function__ or not. Maybe
>>> numpy.__experimental_array_function_enabled__ = True, so you can just test
>>> `hasattr(numpy, '__experimental_array_function_enabled__')`? This is
>>> assuming that we are OK with adding an underscore attribute to NumPy's
>>> namespace semi-indefinitely.
>>>
>>
> I don't think we want to add or document anything publicly. That only adds
> to the configuration problem, and indeed makes it harder to rely on the
> issue. All I was suggested was keeping some (private) safety switch in the
> code base for a while in case of real issues as a workaround.
>

I was concerned that libraries dask might have different behavior
internally depending upon whether or not __array_function__ is enabled, but
looking more carefully dask only does this detection for tests. So maybe
this is not needed.

Still, I'm concerned about the potential broader implications of making it
possibly to turn this off. In general, I don't think NumPy should have
configurable global state -- it opens up the possibility of a whole class
of issues. Stefan van der Walt raised this point when this "off switch" was
suggested a few months ago:
https://mail.python.org/pipermail/numpy-discussion/2019-March/079207.html

That said, I'd be OK with keeping around an environment variable as an
emergency opt-out for now, especially to support benchmarking the impact of
__array_function__ checks.

But I would definitely be opposed to keeping around this switch around long
term, for more than a major version or two. If there will be an outcry when
we remove checks for NUMPY_EXPERIMENTAL_ARRAY_FUNCTION, then we should
reconsider the entire __array_function__ approach.

Might this be overthinking it? I might use this myself on supercomputer
>> runs were I know that I'm using arrays only. Though one should not
>> extrapolate from oneself!
>>
>> That said, it is not difficult as is. For instance, we could explain in
>> the docs that one can tell from:
>> ```
>> enabled = hasattr(np.core, 'overrides') and
>> np.core.overrides.ENABLE_ARRAY_FUNCTION
>>
> ```
>> One could even allow for eventual removal by explaining it should be,
>> ```
>> enabled = hasattr(np.core, 'overrides') and getattr(np.core.overrides,
>> 'ENABLE_ARRAY_FUNCTION', True)
>> ```
>> (If I understand correctly, one cannot tell from the presence of
>> `ndarray.__array_function__`, correct?)
>>
>
> I think a hasattr check for __array_function__ is right.
>

We define ndarray.__array_function__ (even on NumPy 1.16) regardless of
whether __array_function__ is enabled or not.

In principle we could have checked the environment variable from C before
defining the method, but it's too late for that now.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Converting np.sinc into a ufunc

2019-05-22 Thread Stephan Hoyer
On Wed, May 22, 2019 at 2:00 PM Ralf Gommers  wrote:

>
>
> On Wed, May 22, 2019 at 7:34 PM Nathan Goldbaum 
> wrote:
>
>> It might be worth using BigQuery to search the github repository public
>> dataset for usages of np.sinc with keyword arguments.
>>
>
> We spent some effort at Quansight to try different approaches to this.
> BigQuery turns out to be suboptimal, parsing code with ast.parse is more
> robust. Chris Ostrouchov just released some code for this (blog post with
> details to follow) and the results of running that code:
> https://github.com/Quansight-Labs/python-api-inspect/blob/master/data/numpy-summary.csv
>
> np.sinc has 35 usages. to put that in perspective, np.array has ~31,000,
> np.dot ~2200, np.floor ~220, trace/inner/spacing/copyto are all similar to
> sinc.
>

Searching Google's internal code base (including open source dependencies),
I found many uses of np.sinc, but no uses of the keyword argument "x".

I think it's pretty safe to go ahead here.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Keep __array_function__ unexposed by default for 1.17?

2019-05-22 Thread Stephan Hoyer
On Wed, May 22, 2019 at 2:36 PM Ralf Gommers  wrote:

> I would still like to turn on __array_function__ in NumPy 1.17. At least,
>>> let's try that for the release candidate and see how it goes.
>>>
>>
> I agree. I'd actually suggest flipping the switch asap and see if it
> causes any issues for projects that test against numpy master in their CI,
> and the people that like to live on the bleeding edge by installing master
> into their environment.
>

The switch actually has already been done on master for several months now,
until for a period in the 1.16 release cycle before we added the off
switch. Doing so did turn up a few bugs, e.g.,
https://github.com/numpy/numpy/issues/12263

We will actually need to re-add in the code that does the environment
variable to allow for turning it off, but this isn't a big deal. My main
concern is that this adds some complexity for third-party projects in
detecting whether __array_function__ is enabled or not. They can't just use
the NumPy version and will need to check the environment variable as well,
or actually try using it on an example object.

If we want to keep an "off" switch we might want to add some sort of API
for exposing whether NumPy is using __array_function__ or not. Maybe
numpy.__experimental_array_function_enabled__ = True, so you can just test
`hasattr(numpy, '__experimental_array_function_enabled__')`? This is
assuming that we are OK with adding an underscore attribute to NumPy's
namespace semi-indefinitely.


>
> Cheers,
> Ralf
>
>
> The "all in" nature of __array_function__ without __skip_array_function__
>>> will both limit its use to cases where it is strongly motivated, and also
>>> limits the API implications for NumPy. There is still plenty of room for
>>> expanding the protocol, but it's really hard to see what is necessary (and
>>> prudent!) without actual use.
>>>
>>> [1] e.g., see
>>> https://github.com/google/jax/blob/62473351643cecb6c248a50601af163646ba7be6/jax/numpy/lax_numpy.py#L2440-L2459
>>> [2] https://github.com/numpy/numpy/pull/13305
>>>
>>>
>>>
>>>
>>> On Tue, May 21, 2019 at 11:44 PM Juan Nunez-Iglesias 
>>> wrote:
>>>
 I just want to express my general support for Marten's concerns. As an
 "interested observer", I've been meaning to give `__array_function__` a try
 but haven't had the chance yet. So from my anecdotal experience I expect
 that more people need to play with this before setting the API in stone.

 At scikit-image we place a very strong emphasis on code simplicity and
 readability, so I also share Marten's concerns about code getting too
 complex. My impression reading the NEP was "whoa, this is hard, I'm glad
 smarter people than me are working on this, I'm sure it'll get simpler in
 time". But I haven't seen the simplicity materialise...

 On Wed, 22 May 2019, at 11:31 AM, Marten van Kerkwijk wrote:

 Hi All,

 For 1.17, there has been a big effort, especially by Stephan, to make
 __array_function__ sufficiently usable that it can be exposed. I think this
 is great, and still like the idea very much, but its impact on the numpy
 code base has gotten so big in the most recent PR (gh-13585) that I wonder
 if we shouldn't reconsider the approach, and at least for 1.17 stick with
 the status quo. Since that seems to be a bigger question than can be
 usefully addressed in the PR, I thought I would raise it here.

 Specifically, now not only does every numpy function have its
 dispatcher function, but also internally all numpy function calls are being
 done via the new `__skip_array_function__` attribute, to avoid further
 overrides. I think both changes make the code significantly less readable,
 thus, e.g., making it even harder than it is already to attract new
 contributors.

 I think with this it is probably time to step back and check whether
 the implementation is in fact the right one. For instance, among the
 alternatives we originally considered was one that had the overridable
 versions of functions in the regular `numpy` namespace, and the once that
 would not themselves check in a different one. Alternatively, for some of
 the benefits provided by `__skip_array_function__`, there was a different
 suggestion to have a special return value, of `NotImplementedButCoercible`.
 Might these be better after all?

 More generally, I think we're suffering from the fact that several of
 us seem to have rather different final goals in mind  In particular, I'd
 like to move to a state where as much of the code as possible makes use of
 the simplest possible implementation, with only a few true base functions,
 so that all but those simplest functions will generally work on any type of
 array. Others, however, worry much more about making implementations (even
 more) part of the API.

 All the best,

 Marten
 

Re: [Numpy-discussion] Keep __array_function__ unexposed by default for 1.17?

2019-05-22 Thread Stephan Hoyer
Thanks for raising these concerns.

The full implications of my recent __skip_array_function__ proposal are
only now becoming evident to me now, looking at it's use in GH-13585.
Guaranteeing that it does not expand NumPy's API surface seems hard to
achieve without pervasive use of __skip_array_function__ internally.

Taking a step back, the sort of minor hacks [1] that motivated
__skip_array_function__ for me are annoying, but really not too bad -- they
are a small amount of additional code duplication in a proposal that
already requires a large amount of code duplication.

So let's roll back the recent NEP change adding __skip_array_function__ to
the public interface [2]. Inside the few NumPy functions where
__array_function__ causes a measurable performance impact due to repeated
calls (most notably np.block, for which some benchmarks are 25% slower), we
can make use of the private __wrapped__ attribute.

I would still like to turn on __array_function__ in NumPy 1.17. At least,
let's try that for the release candidate and see how it goes. The "all in"
nature of __array_function__ without __skip_array_function__ will both
limit its use to cases where it is strongly motivated, and also limits the
API implications for NumPy. There is still plenty of room for expanding the
protocol, but it's really hard to see what is necessary (and prudent!)
without actual use.

[1] e.g., see
https://github.com/google/jax/blob/62473351643cecb6c248a50601af163646ba7be6/jax/numpy/lax_numpy.py#L2440-L2459
[2] https://github.com/numpy/numpy/pull/13305




On Tue, May 21, 2019 at 11:44 PM Juan Nunez-Iglesias 
wrote:

> I just want to express my general support for Marten's concerns. As an
> "interested observer", I've been meaning to give `__array_function__` a try
> but haven't had the chance yet. So from my anecdotal experience I expect
> that more people need to play with this before setting the API in stone.
>
> At scikit-image we place a very strong emphasis on code simplicity and
> readability, so I also share Marten's concerns about code getting too
> complex. My impression reading the NEP was "whoa, this is hard, I'm glad
> smarter people than me are working on this, I'm sure it'll get simpler in
> time". But I haven't seen the simplicity materialise...
>
> On Wed, 22 May 2019, at 11:31 AM, Marten van Kerkwijk wrote:
>
> Hi All,
>
> For 1.17, there has been a big effort, especially by Stephan, to make
> __array_function__ sufficiently usable that it can be exposed. I think this
> is great, and still like the idea very much, but its impact on the numpy
> code base has gotten so big in the most recent PR (gh-13585) that I wonder
> if we shouldn't reconsider the approach, and at least for 1.17 stick with
> the status quo. Since that seems to be a bigger question than can be
> usefully addressed in the PR, I thought I would raise it here.
>
> Specifically, now not only does every numpy function have its dispatcher
> function, but also internally all numpy function calls are being done via
> the new `__skip_array_function__` attribute, to avoid further overrides. I
> think both changes make the code significantly less readable, thus, e.g.,
> making it even harder than it is already to attract new contributors.
>
> I think with this it is probably time to step back and check whether the
> implementation is in fact the right one. For instance, among the
> alternatives we originally considered was one that had the overridable
> versions of functions in the regular `numpy` namespace, and the once that
> would not themselves check in a different one. Alternatively, for some of
> the benefits provided by `__skip_array_function__`, there was a different
> suggestion to have a special return value, of `NotImplementedButCoercible`.
> Might these be better after all?
>
> More generally, I think we're suffering from the fact that several of us
> seem to have rather different final goals in mind  In particular, I'd like
> to move to a state where as much of the code as possible makes use of the
> simplest possible implementation, with only a few true base functions, so
> that all but those simplest functions will generally work on any type of
> array. Others, however, worry much more about making implementations (even
> more) part of the API.
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

2019-05-10 Thread Stephan Hoyer
On Sat, May 4, 2019 at 12:29 PM Ralf Gommers  wrote:

> We seem to have run out of steam a bit here.
>

We discussed this today in person at the NumPy sprint.

The consensus was to go for a name like __skip_array_function__. Ufuncs
don't have very good use-cases for a function that skips dispatch:
1. The overhead of the ufunc dispatch machinery is much smaller, especially
in the case where all arguments are NumPy arrays, because there is no need
for a wrapper function in Python.
2. Inside __array_ufunc__ it's possible to cast arguments into NumPy arrays
explicitly and then call the ufunc again. There's no need to explicitly
skip overrides.

We also don't really care about supporting the use-case where a function
gets changed into a ufunc. We already warn users not to call
__skip_array_function__ directly (without using getattr) outside
__array_function__.

Given all this, it seems best to stick with a name that mirrors
__array_function__ as closely as possible. I picked "skip" instead of
"skpping" just because it's slightly shorter, but otherwise don't have a
strong preference.

I've edited the NEP [1] and implementation [2] pull requests to use this
new name, and clarify the use-cases. If there no serious objections, I'd
love to merge these soon, in time for the NumPy 1.17 release candidate.

[1] https://github.com/numpy/numpy/pull/13305
[2] https://github.com/numpy/numpy/pull/13389
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


  1   2   3   >