Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-11 Thread Juan Nunez-Iglesias

> I agree. I think we should recommend sane, descriptive names that do the 
> right thing. So ideally we'd have people spell their dtype specifiers as
>   dtype=bool  # or np.bool
>   dtype=np.float64
>   dtype=np.int64
>   dtype=np.complex128
> The names with underscores at the end make little sense from a UX 
> perspective. And the C equivalents (single/double/etc) made sense 15 years 
> ago, but with the user base of today - the majority of whom will not know C 
> fluently or at all - also don't make too much sense.
> 
> The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits 
> is likely to be a pitfall much more often than it is what the user actually 
> needs, so shouldn't be recommended and probably deserves a warning in the 
> docs.

I kinda disagree with this. I want to have a way to say, give me an array of 
the same type as the default NumPy type (for either ints or floats). This will 
prevent casting back and forth as different arrays are combined. In other 
words, as long as NumPy itself flips back and forth (depending on locale), I 
think users will in many cases want to flip back and forth with it?

Juan.___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Installing numpy

2020-12-11 Thread Lianyuan Zheng
Hi Stanley,

Thank you!

Lianyuan

On Fri, Dec 11, 2020 at 2:49 PM Stanley Seibert 
wrote:

> The development version of NumPy from Github requires Python 3.7 or later.
>
> On Fri, Dec 11, 2020 at 1:35 PM Lianyuan Zheng 
> wrote:
>
>> Hello,
>>
>> On my linux server, I downloaded the NUMPY package from GitHub (git clone
>> https://github.com/numpy/numpy.git) and then accessed the directory
>> "numpy".  When I typed command "python setup.py install", it shows the
>> following message:
>>
>>   File "setup.py", line 51
>> f"NumPy {VERSION} may not yet support Python "
>>  ^
>> SyntaxError: invalid syntax
>>
>> Obviously the installation failed.  Which python version is needed for
>> this numpy package?  The python version installed is version 2.7.5.
>>
>> Thanks,
>> Lianyuan
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Installing numpy

2020-12-11 Thread Stanley Seibert
The development version of NumPy from Github requires Python 3.7 or later.

On Fri, Dec 11, 2020 at 1:35 PM Lianyuan Zheng  wrote:

> Hello,
>
> On my linux server, I downloaded the NUMPY package from GitHub (git clone
> https://github.com/numpy/numpy.git) and then accessed the directory
> "numpy".  When I typed command "python setup.py install", it shows the
> following message:
>
>   File "setup.py", line 51
> f"NumPy {VERSION} may not yet support Python "
>  ^
> SyntaxError: invalid syntax
>
> Obviously the installation failed.  Which python version is needed for
> this numpy package?  The python version installed is version 2.7.5.
>
> Thanks,
> Lianyuan
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Installing numpy

2020-12-11 Thread Lianyuan Zheng
Hello,

On my linux server, I downloaded the NUMPY package from GitHub (git clone
https://github.com/numpy/numpy.git) and then accessed the directory
"numpy".  When I typed command "python setup.py install", it shows the
following message:

  File "setup.py", line 51
f"NumPy {VERSION} may not yet support Python "
 ^
SyntaxError: invalid syntax

Obviously the installation failed.  Which python version is needed for this
numpy package?  The python version installed is version 2.7.5.

Thanks,
Lianyuan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-11 Thread Robert Kern
On Fri, Dec 11, 2020 at 1:12 PM Aaron Meurer  wrote:

> On Fri, Dec 11, 2020 at 1:47 AM Eric Wieser 
> wrote:
> >
> > >  you might want to discuss this with us at the array API standard
> > > https://github.com/data-apis/array-api (which is currently in RFC
> > > stage). The spec uses bool as the name for the boolean dtype.
> >
> > I don't fully understand this argument - `np.bool` is already not the
> boolean dtype. Either:
>
> The spec does deviate from what NumPy currently does in some places.
> If we wanted to just copy NumPy exactly, there wouldn't be a need for
> a specification.


I wouldn't take that as a premise. Specifying a subset of the vast existing
NumPy API would be a quite valuable specification in its own right. I find
the motivation for deviation laid out in the Purpose and Scope

section
to be reasonably convincing that deviation might be needed *somewhere*. The
question then is, is *this* deviation supporting that stated motivation, or
is it taking the opportunity of a redesign to rationalize the names more to
our current tastes? Given the mode of adopting the standard (a separate
subpackage), that's a reasonable choice to make, but let's be clear about
the motivation. I submit that keeping the name `bool_` does not make it any
harder for other array APIs to adopt the standard. It's just that few
people would design a new API with that name if they were designing a
greenfield API.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-11 Thread Aaron Meurer
On Fri, Dec 11, 2020 at 1:47 AM Eric Wieser  wrote:
>
> >  you might want to discuss this with us at the array API standard
> > https://github.com/data-apis/array-api (which is currently in RFC
> > stage). The spec uses bool as the name for the boolean dtype.
>
> I don't fully understand this argument - `np.bool` is already not the boolean 
> dtype. Either:

The spec does deviate from what NumPy currently does in some places.
If we wanted to just copy NumPy exactly, there wouldn't be a need for
a specification.

>
> * The spec is suggesting that `pkg.bool` be some arbitrary object that can be 
> passed into a dtype argument and will produce a boolean array.
>   If this is the case, the spec could also just require that 
> `dtype=builtins.bool` have this behavior.
> * The spec is suggesting that `pkg.bool` is some rich dtype object.
>   Ignoring the question of whether this should be `np.bool_` or 
> `np.dtype(np.bool_)`, it's currently neither, and changing it will break 
> users relying on `np.bool(True) is True`.
>   That's not to say this isn't a sensible thing for the specification to 
> have, it's just something that numpy can't conform to without breaking code.

This what it currently says
(https://data-apis.github.io/array-api/latest/API_specification/data_types.html)

Data types (“dtypes”) are objects that can be used as dtype specifiers
in functions and methods (e.g., zeros((2, 3), dtype=float32) ). A
conforming implementation may add methods or attributes to data type
objects; however, these methods and attributes are not included in
this specification.

So basically, np.bool just needs to be something that can be used as a
dtype. The dtype objects names don't have any requirements on them. A
library could have float64 == 'f8', for example. It isn't written
there presently but really the only thing that needs to work for the
dtype objects is == comparison (or at least, it will be impossible for
the test suite to test dtype behavior if a.dtype == float64 doesn't
work).

So np.bool == builtins.bool is actually fine. My concern here was that
the discussion was about deprecating np.bool, meaning it would be
removed from the namespace, which goes against what is currently in
the spec.

Aaron Meurer

>
> While it would be great if `np.bool_` could be spelt `np.bool`, I really 
> don't think we can make that change without a long deprecation first (if at 
> all).
>
> Eric
>
> On Thu, 10 Dec 2020 at 20:00, Sebastian Berg  
> wrote:
>>
>> On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
>> > On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <
>> > sebast...@sipsolutions.net>
>> > wrote:
>> >
>> > > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
>> > > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer 
>> > > > wrote:
>> > > >
>> > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
>> > > > >  wrote:
>> > > > > >
>> > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
>> > > > > > > Regarding np.bool specifically, if you want to deprecate
>> > > > > > > this,
>> > > > > > > you
>> > > > > > > might want to discuss this with us at the array API
>> > > > > > > standard
>> > > > > > > https://github.com/data-apis/array-api (which is currently
>> > > > > > > in
>> > > > > > > RFC
>> > > > > > > stage). The spec uses bool as the name for the boolean
>> > > > > > > dtype.
>> > > > > > >
>> > > > > > > Would it make sense for NumPy to change np.bool to just be
>> > > > > > > the
>> > > > > > > boolean
>> > > > > > > dtype object? Unlike int and float, there is no ambiguity
>> > > > > > > with
>> > > > > > > bool,
>> > > > > > > and NumPy clearly doesn't have any issues with shadowing
>> > > > > > > builtin
>> > > > > > > names
>> > > > > > > in its namespace.
>> > > > > >
>> > > > > > We could keep the Python alias around (which for `dtype=` is
>> > > > > > the
>> > > > > > same
>> > > > > > as `np.bool_`).
>> > > > > >
>> > > > > > I am not sure I like the idea of immediately shadowing the
>> > > > > > builtin.
>> > > > > > That is a switch we can avoid flipping (without warning);
>> > > > > > `np.bool_`
>> > > > > > and `bool` are fairly different beasts? [1]
>> > > > >
>> > > > > NumPy already shadows a lot of builtins, in many cases, in ways
>> > > > > that
>> > > > > are incompatible with existing ones. It's not something I would
>> > > > > have
>> > > > > done personally, but it's been this way for a long time.
>> > > > >
>> > > >
>> > > > It may be defensible to keep np.bool as an alias for Python's
>> > > > bool
>> > > > even when we remove the other aliases.
>> > >
>> >
>> > I'd agree with that.
>> >
>> >
>> > > That is true, `int` is probably the most confusing, since it is not
>> > > at
>> > > all compatible to a Python integer, but rather the "default"
>> > > integer
>> > > (which happens to be the same as C `long` currently).
>> > >
>> > > So we could focus on `np.int`, `np.long`.  I am a bit unsure
>> > > whether
>> > > you would prefer that or are mainly pointing out the 

Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-11 Thread Sebastian Berg
On Fri, 2020-12-11 at 11:35 +0100, Ralf Gommers wrote:
> On Thu, Dec 10, 2020 at 9:00 PM Sebastian Berg <  
> sebast...@sipsolutions.net>
> wrote:



> 
> Just deprecation `np.int` may make sense. That will already raise
> awareness, and leaving `np.float` as-is may prevent a lot of churn.
> And we
> could then still deprecate `np.float` later. I also don't feel
> strongly
> about `float` either way though.
> 
> I'm not sure why you'd specifically touch `long`, it's not really
> relevant
> and it's not a builtin.
> 

`np.long is np.int is int` as it was a builtin on Python 2. But it
looks like a C-long.
In `dtype=` usage it actually ends up being a C-long (but it might even
be nice to consider modifying the default `int` on windows at some
point. At that point the "long" alias would be very confusing).

OTOH, right now the only way to spell C-long is with `np.int_` which
doesn't help.

Cheers,

Sebastian 




> Cheers,
> Ralf
> 
> To be honest, I don't mind either way, so any stronger opinion will
> tip
> > the scale for me personally (my default currently is to update the
> > release notes to recommend the more descriptive names).
> > 
> > There are probably more doc updates that would be nice, I will
> > suggest
> > updating a separate issue for that.
> > 
> > 
> > > Right now, my main take-away from the discussion is that it would
> > > be
> > > > good to clarify the release notes a bit more.
> > > > 
> > > > Using `float` for a dtype seems fine to me, but I prefer
> > > > mentioning
> > > > `np.float64` over `np.float_`.
> > > > For integers, I wonder if we should also suggest `np.int64`,
> > > > even –
> > > > or
> > > > because – if the default integer on many systems is currently
> > > > `np.int_`?
> > > > 
> > > 
> > > I agree. I think we should recommend sane, descriptive names that
> > > do
> > > the
> > > right thing. So ideally we'd have people spell their dtype
> > > specifiers
> > > as
> > >   dtype=bool  # or np.bool
> > >   dtype=np.float64
> > >   dtype=np.int64
> > >   dtype=np.complex128
> > > The names with underscores at the end make little sense from a UX
> > > perspective. And the C equivalents (single/double/etc) made sense
> > > 15
> > > years
> > > ago, but with the user base of today - the majority of whom will
> > > not
> > > know C
> > > fluently or at all - also don't make too much sense.
> > > 
> > > The `dtype=int` or `dtype=np.int_` behaviour flopping between 32
> > > and
> > > 64
> > > bits is likely to be a pitfall much more often than it is what
> > > the
> > > user
> > > actually needs, so shouldn't be recommended and probably deserves
> > > a
> > > warning
> > > in the docs.
> > 
> > Right, there is one slight trickery because `np.intp` is often a
> > great
> > integer dtype to use, because it is the integer that NumPy uses for
> > all
> > things related to indexing and array sizes.
> > (I would be happy to dig out my PR making `np.intp` the default
> > NumPy
> > integer.)
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > > 
> > > Cheers,
> > > Ralf
> > > 
> > > 
> > > > 
> > > > > 
> > > > > np.int_ and np.float_ have fixed precision, which makes them
> > > > > somewhat
> > > > > different from the builtin types. NumPy has a whole bunch of
> > > > > different
> > > > > precisions for integer and floats, so this distinction
> > > > > matters.
> > > > > 
> > > > > In contrast, there is only one boolean dtype in NumPy, which
> > > > > matches
> > > > > Python's bool. So we wouldn't have to worry, for example,
> > > > > about
> > > > > whether a
> > > > > user has requested a specific precision explicitly. This
> > > > > comes up
> > > > > in
> > > > > issues
> > > > > like type-promotion where libraries like JAX and PyTorch have
> > > > > special
> > > > > case
> > > > > logic for most Python types vs NumPy dtypes (but booleans are
> > > > > the
> > > > > same for
> > > > > both):
> > > > > https://jax.readthedocs.io/en/latest/type_promotion.html
> > > > 
> > > > 
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fixing/implementing value based Casting

2020-12-11 Thread Sebastian Berg
On Fri, 2020-12-11 at 12:09 +0100, Ralf Gommers wrote:
> On Wed, Dec 9, 2020 at 5:22 PM Sebastian Berg <   
> sebast...@sipsolutions.net>
> wrote:
> 
> > Hi all,
> > 
> > Sorry that this will again be a bit complicated again :(. In brief:
> > 
> > * I would like to pass around scalars in some (partially new) C-API
> >   to implement value-based promotion.
> > * There are some subtle commutativity issues with promotion.
> >   Commutativity may change in that case (with respect of value
> > based
> >   promotion, probably to the better normally). [0]
> > 
> > 
> > In the past days, I have been looking into implementing value-based
> > promotion in a way that I had done it for Prototype before.
> > The idea was that NEP 42, allows for the creation of DType
> > dynamically,
> > which does allow very powerful value based promotion/casting.
> > 
> > But I decided there are too many quirks with creating type
> > instances
> > dynamically (potentially very often) just to pass around one
> > additional
> > piece of information.
> > That approach was far more powerful, but it is power and complexity
> > that we do not require, given that:
> > 
> > * Value based promotion is only used for a mix of scalars and
> > arrays
> >   (where "scalar" is annoyingly defined as 0-D at the moment)
> > * I assume it is only relevant for `np.result_type` and promotion
> >   in ufuncs (which often uses `np.result_type`).
> >   `np.can_cast` has such behaviour, but I think it is easier [1].
> >   We could implement more powerful "value based" logic, but I doubt
> >   it is worthwhile.
> > * This is already stretching the Python C-API beyond its limits.
> > 
> > 
> > So I will suggest this instead which *must* modify some (poorly
> > defined) current behaviour:
> > 
> > 1. We always evaluate concrete DTypes first in promotion, this
> > means
> >    that in rare cases the non-commutativity of promotion may change
> >    the result dtype:
> > 
> >    np.result_type(-1, 2**16, np.float32)
> > 
> >    The same can also happens when you reorder the normal dtypes:
> > 
> >    np.result_type(np.int8, np.uint16, np.float32)
> >    np.result_type(np.float32, np.int8, np.uint16)
> > 
> >    in both cases the `np.float32` is moved to the front
> > 
> > 2. If we reorder the above operation, we can define that we never
> >    promote two "scalar values". Instead we convert both to a
> >    concrete one first.  This makes it effectively like:
> > 
> >    np.result_type(np.array(-1).dtype, np.array(2**16).dtype)
> > 
> >    This means that we never have to deal with promoting two values.
> > 
> > 3. We need additional private API (we were always going to need
> > some
> >    additional API); That API could become public:
> > 
> >    * Convert a single value into a concrete dtype, you could say
> >  the same as `self.common_dtype(None)`, but a dedicated
> > function
> >  seems simpler. A dtype like this will never use
> > `common_dtype()`.
> >    * `common_dtype_with_scalar(self, other, scalar)` (note that
> >  only one of the DTypes can have a scalar).
> >  As a fallback, this function can be implemented by converting
> >  to the concrete DType and retrying with the normal
> > `common_dtype`.
> > 
> >    (At leas the second slot must be made public we are to allow
> > value
> >    based promotion for user DTypes. I expect we will, but it is not
> >    particularly important to me right now.)
> > 
> > 4. Our public API (including new C-API) has to expose and take the
> >    scalar values. That means promotion in ufuncs will get DTypes
> > and
> >    `scalar_values`, although those should normally be `NULL` (or
> > None).
> > 
> >    In future python API, this is probably acceptable:
> > 
> >     np.result_type([t if v is None else v for t, v in
> > zip(dtypes,
> > scalar_values)])
> > 
> >    In C, we need to expose a function below `result_type` which
> >    accepts both the scalar values and DTypes explicitly.
> > 
> > 5. For the future: As said many times, I would like to deprecate
> >    using value based promotion for anything except Python core
> > types.
> >    That just seems wrong and confusing.
> > 
> 
> I agree with this. 


It is tempting to wonder what would happen if we dropped it entirely,
but I fear my current assumption is that it should keep working largely
unchanged with careful deprecations hopefully added soon...


> Value-based promotion was never a great idea, so let's
> try to keep it as minimal as possible. I'm not even sure what kind of
> value-based promotion for non Python builtin types is happening now
> (?).


It (roughly?) identical for all zero dimensional objects:

arr1 = np.array(1, dtype=np.int64)
arr2 = np.array([1, 2], dtype=np.int32)

(arr1 + arr2).dtype == np.int32
(1 + arr2).dtype == np.int32

In the first addition `arr1` behaves like the Python `1` even though it
has a dtype attached.

The reason for this probably that our entry-points greedily convert

Re: [Numpy-discussion] Fixing/implementing value based Casting

2020-12-11 Thread Ralf Gommers
On Wed, Dec 9, 2020 at 5:22 PM Sebastian Berg 
wrote:

> Hi all,
>
> Sorry that this will again be a bit complicated again :(. In brief:
>
> * I would like to pass around scalars in some (partially new) C-API
>   to implement value-based promotion.
> * There are some subtle commutativity issues with promotion.
>   Commutativity may change in that case (with respect of value based
>   promotion, probably to the better normally). [0]
>
>
> In the past days, I have been looking into implementing value-based
> promotion in a way that I had done it for Prototype before.
> The idea was that NEP 42, allows for the creation of DType dynamically,
> which does allow very powerful value based promotion/casting.
>
> But I decided there are too many quirks with creating type instances
> dynamically (potentially very often) just to pass around one additional
> piece of information.
> That approach was far more powerful, but it is power and complexity
> that we do not require, given that:
>
> * Value based promotion is only used for a mix of scalars and arrays
>   (where "scalar" is annoyingly defined as 0-D at the moment)
> * I assume it is only relevant for `np.result_type` and promotion
>   in ufuncs (which often uses `np.result_type`).
>   `np.can_cast` has such behaviour, but I think it is easier [1].
>   We could implement more powerful "value based" logic, but I doubt
>   it is worthwhile.
> * This is already stretching the Python C-API beyond its limits.
>
>
> So I will suggest this instead which *must* modify some (poorly
> defined) current behaviour:
>
> 1. We always evaluate concrete DTypes first in promotion, this means
>that in rare cases the non-commutativity of promotion may change
>the result dtype:
>
>np.result_type(-1, 2**16, np.float32)
>
>The same can also happens when you reorder the normal dtypes:
>
>np.result_type(np.int8, np.uint16, np.float32)
>np.result_type(np.float32, np.int8, np.uint16)
>
>in both cases the `np.float32` is moved to the front
>
> 2. If we reorder the above operation, we can define that we never
>promote two "scalar values". Instead we convert both to a
>concrete one first.  This makes it effectively like:
>
>np.result_type(np.array(-1).dtype, np.array(2**16).dtype)
>
>This means that we never have to deal with promoting two values.
>
> 3. We need additional private API (we were always going to need some
>additional API); That API could become public:
>
>* Convert a single value into a concrete dtype, you could say
>  the same as `self.common_dtype(None)`, but a dedicated function
>  seems simpler. A dtype like this will never use `common_dtype()`.
>* `common_dtype_with_scalar(self, other, scalar)` (note that
>  only one of the DTypes can have a scalar).
>  As a fallback, this function can be implemented by converting
>  to the concrete DType and retrying with the normal `common_dtype`.
>
>(At leas the second slot must be made public we are to allow value
>based promotion for user DTypes. I expect we will, but it is not
>particularly important to me right now.)
>
> 4. Our public API (including new C-API) has to expose and take the
>scalar values. That means promotion in ufuncs will get DTypes and
>`scalar_values`, although those should normally be `NULL` (or None).
>
>In future python API, this is probably acceptable:
>
> np.result_type([t if v is None else v for t, v in zip(dtypes,
> scalar_values)])
>
>In C, we need to expose a function below `result_type` which
>accepts both the scalar values and DTypes explicitly.
>
> 5. For the future: As said many times, I would like to deprecate
>using value based promotion for anything except Python core types.
>That just seems wrong and confusing.
>

I agree with this. Value-based promotion was never a great idea, so let's
try to keep it as minimal as possible. I'm not even sure what kind of
value-based promotion for non Python builtin types is happening now (?).

   My only problem is that while I can warn (possibly sometimes too
>often) when behaviour will change.  I do not have a good idea about
>silencing that warning.
>

Do you see a real issue with this somewhere, or is it all just corner
cases? In that case no warning seems okay.


>
> Note that this affects NEP 42 (a little bit). NEP 42 currently makes a
> nod towards the dynamic type creation, but falls short of actually
> defining it.
>
So These rules have to be incorporated, but IMO they do not affect the
> general design choices in the NEP.
>
>
> There is probably even more complexity to be found here, but for now
> the above seems to be at least good enough to make headway...
>
>
> Any thoughts or clarity remaining that I can try to confuse? :)
>

My main question is why you're considering both deprecating and expanding
public API (in points 3 and 4). If you have a choice, keep everything
private I'd say.

My other question 

Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-11 Thread Ralf Gommers
On Thu, Dec 10, 2020 at 9:00 PM Sebastian Berg 
wrote:

> On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
> > On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <
> > sebast...@sipsolutions.net>
> > wrote:
> >
> > > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
> > > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer 
> > > > wrote:
> > > >
> > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
> > > > > > > Regarding np.bool specifically, if you want to deprecate
> > > > > > > this,
> > > > > > > you
> > > > > > > might want to discuss this with us at the array API
> > > > > > > standard
> > > > > > > https://github.com/data-apis/array-api (which is currently
> > > > > > > in
> > > > > > > RFC
> > > > > > > stage). The spec uses bool as the name for the boolean
> > > > > > > dtype.
> > > > > > >
> > > > > > > Would it make sense for NumPy to change np.bool to just be
> > > > > > > the
> > > > > > > boolean
> > > > > > > dtype object? Unlike int and float, there is no ambiguity
> > > > > > > with
> > > > > > > bool,
> > > > > > > and NumPy clearly doesn't have any issues with shadowing
> > > > > > > builtin
> > > > > > > names
> > > > > > > in its namespace.
> > > > > >
> > > > > > We could keep the Python alias around (which for `dtype=` is
> > > > > > the
> > > > > > same
> > > > > > as `np.bool_`).
> > > > > >
> > > > > > I am not sure I like the idea of immediately shadowing the
> > > > > > builtin.
> > > > > > That is a switch we can avoid flipping (without warning);
> > > > > > `np.bool_`
> > > > > > and `bool` are fairly different beasts? [1]
> > > > >
> > > > > NumPy already shadows a lot of builtins, in many cases, in ways
> > > > > that
> > > > > are incompatible with existing ones. It's not something I would
> > > > > have
> > > > > done personally, but it's been this way for a long time.
> > > > >
> > > >
> > > > It may be defensible to keep np.bool as an alias for Python's
> > > > bool
> > > > even when we remove the other aliases.
> > >
> >
> > I'd agree with that.
> >
> >
> > > That is true, `int` is probably the most confusing, since it is not
> > > at
> > > all compatible to a Python integer, but rather the "default"
> > > integer
> > > (which happens to be the same as C `long` currently).
> > >
> > > So we could focus on `np.int`, `np.long`.  I am a bit unsure
> > > whether
> > > you would prefer that or are mainly pointing out the possibility?
> > >
> >
> > Not sure what you mean with focus, focus on describing in the release
> > notes? Deprecating `np.int` seems like the most beneficial part of
> > this
> > whole exercise.
> >
>
> I meant limiting the current deprecation to `np.int`, maybe `np.long`,
> and a "carefully chosen" set.
>

Just deprecation `np.int` may make sense. That will already raise
awareness, and leaving `np.float` as-is may prevent a lot of churn. And we
could then still deprecate `np.float` later. I also don't feel strongly
about `float` either way though.

I'm not sure why you'd specifically touch `long`, it's not really relevant
and it's not a builtin.

Cheers,
Ralf

To be honest, I don't mind either way, so any stronger opinion will tip
> the scale for me personally (my default currently is to update the
> release notes to recommend the more descriptive names).
>
> There are probably more doc updates that would be nice, I will suggest
> updating a separate issue for that.
>
>
> > Right now, my main take-away from the discussion is that it would be
> > > good to clarify the release notes a bit more.
> > >
> > > Using `float` for a dtype seems fine to me, but I prefer mentioning
> > > `np.float64` over `np.float_`.
> > > For integers, I wonder if we should also suggest `np.int64`, even –
> > > or
> > > because – if the default integer on many systems is currently
> > > `np.int_`?
> > >
> >
> > I agree. I think we should recommend sane, descriptive names that do
> > the
> > right thing. So ideally we'd have people spell their dtype specifiers
> > as
> >   dtype=bool  # or np.bool
> >   dtype=np.float64
> >   dtype=np.int64
> >   dtype=np.complex128
> > The names with underscores at the end make little sense from a UX
> > perspective. And the C equivalents (single/double/etc) made sense 15
> > years
> > ago, but with the user base of today - the majority of whom will not
> > know C
> > fluently or at all - also don't make too much sense.
> >
> > The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and
> > 64
> > bits is likely to be a pitfall much more often than it is what the
> > user
> > actually needs, so shouldn't be recommended and probably deserves a
> > warning
> > in the docs.
>
> Right, there is one slight trickery because `np.intp` is often a great
> integer dtype to use, because it is the integer that NumPy uses for all
> things related to indexing and array sizes.
> (I would be happy to dig out my PR making `np.intp` the default NumPy
> 

Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-11 Thread Ralf Gommers
On Fri, Dec 11, 2020 at 9:47 AM Eric Wieser 
wrote:

> >  you might want to discuss this with us at the array API standard
> > https://github.com/data-apis/array-api (which is currently in RFC
> > stage). The spec uses bool as the name for the boolean dtype.
>
> I don't fully understand this argument - `np.bool` is already not the
> boolean dtype. Either:
>
> * The spec is suggesting that `pkg.bool` be some arbitrary object that can
> be passed into a dtype argument and will produce a boolean array.
>   If this is the case, the spec could also just require that
> `dtype=builtins.bool` have this behavior.
>

Yes, this.

* The spec is suggesting that `pkg.bool` is some rich dtype object.
>   Ignoring the question of whether this should be `np.bool_` or
> `np.dtype(np.bool_)`, it's currently neither, and changing it will break
> users relying on `np.bool(True) is True`.
>   That's not to say this isn't a sensible thing for the specification to
> have, it's just something that numpy can't conform to without breaking code.
>

It can have richer behaviour, there's no constraints there - but it's not
necessary.


> While it would be great if `np.bool_` could be spelt `np.bool`, I really
> don't think we can make that change without a long deprecation first (if at
> all).
>

Given that that standard API would be in a new namespace (given backwards
compat we can't possibly introduce it in the main namespace), there `bool`
can be the numpy boolean dtype (if desired).

The key point is that `bool_` is a terrible name, and keeping `np.bool`
that you can use as a dtype specifier is desirable.

Cheers,
Ralf


> Eric
>
> On Thu, 10 Dec 2020 at 20:00, Sebastian Berg 
> wrote:
>
>> On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
>> > On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <
>> > sebast...@sipsolutions.net>
>> > wrote:
>> >
>> > > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
>> > > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer 
>> > > > wrote:
>> > > >
>> > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
>> > > > >  wrote:
>> > > > > >
>> > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
>> > > > > > > Regarding np.bool specifically, if you want to deprecate
>> > > > > > > this,
>> > > > > > > you
>> > > > > > > might want to discuss this with us at the array API
>> > > > > > > standard
>> > > > > > > https://github.com/data-apis/array-api (which is currently
>> > > > > > > in
>> > > > > > > RFC
>> > > > > > > stage). The spec uses bool as the name for the boolean
>> > > > > > > dtype.
>> > > > > > >
>> > > > > > > Would it make sense for NumPy to change np.bool to just be
>> > > > > > > the
>> > > > > > > boolean
>> > > > > > > dtype object? Unlike int and float, there is no ambiguity
>> > > > > > > with
>> > > > > > > bool,
>> > > > > > > and NumPy clearly doesn't have any issues with shadowing
>> > > > > > > builtin
>> > > > > > > names
>> > > > > > > in its namespace.
>> > > > > >
>> > > > > > We could keep the Python alias around (which for `dtype=` is
>> > > > > > the
>> > > > > > same
>> > > > > > as `np.bool_`).
>> > > > > >
>> > > > > > I am not sure I like the idea of immediately shadowing the
>> > > > > > builtin.
>> > > > > > That is a switch we can avoid flipping (without warning);
>> > > > > > `np.bool_`
>> > > > > > and `bool` are fairly different beasts? [1]
>> > > > >
>> > > > > NumPy already shadows a lot of builtins, in many cases, in ways
>> > > > > that
>> > > > > are incompatible with existing ones. It's not something I would
>> > > > > have
>> > > > > done personally, but it's been this way for a long time.
>> > > > >
>> > > >
>> > > > It may be defensible to keep np.bool as an alias for Python's
>> > > > bool
>> > > > even when we remove the other aliases.
>> > >
>> >
>> > I'd agree with that.
>> >
>> >
>> > > That is true, `int` is probably the most confusing, since it is not
>> > > at
>> > > all compatible to a Python integer, but rather the "default"
>> > > integer
>> > > (which happens to be the same as C `long` currently).
>> > >
>> > > So we could focus on `np.int`, `np.long`.  I am a bit unsure
>> > > whether
>> > > you would prefer that or are mainly pointing out the possibility?
>> > >
>> >
>> > Not sure what you mean with focus, focus on describing in the release
>> > notes? Deprecating `np.int` seems like the most beneficial part of
>> > this
>> > whole exercise.
>> >
>>
>> I meant limiting the current deprecation to `np.int`, maybe `np.long`,
>> and a "carefully chosen" set.
>> To be honest, I don't mind either way, so any stronger opinion will tip
>> the scale for me personally (my default currently is to update the
>> release notes to recommend the more descriptive names).
>>
>> There are probably more doc updates that would be nice, I will suggest
>> updating a separate issue for that.
>>
>>
>> > Right now, my main take-away from the discussion is that it would be
>> > > good to clarify the release notes a bit more.
>> > >
>> > > Using 

Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-11 Thread Eric Wieser
>  you might want to discuss this with us at the array API standard
> https://github.com/data-apis/array-api (which is currently in RFC
> stage). The spec uses bool as the name for the boolean dtype.

I don't fully understand this argument - `np.bool` is already not the
boolean dtype. Either:

* The spec is suggesting that `pkg.bool` be some arbitrary object that can
be passed into a dtype argument and will produce a boolean array.
  If this is the case, the spec could also just require that
`dtype=builtins.bool` have this behavior.
* The spec is suggesting that `pkg.bool` is some rich dtype object.
  Ignoring the question of whether this should be `np.bool_` or
`np.dtype(np.bool_)`, it's currently neither, and changing it will break
users relying on `np.bool(True) is True`.
  That's not to say this isn't a sensible thing for the specification to
have, it's just something that numpy can't conform to without breaking code.

While it would be great if `np.bool_` could be spelt `np.bool`, I really
don't think we can make that change without a long deprecation first (if at
all).

Eric

On Thu, 10 Dec 2020 at 20:00, Sebastian Berg 
wrote:

> On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
> > On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <
> > sebast...@sipsolutions.net>
> > wrote:
> >
> > > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
> > > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer 
> > > > wrote:
> > > >
> > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
> > > > > > > Regarding np.bool specifically, if you want to deprecate
> > > > > > > this,
> > > > > > > you
> > > > > > > might want to discuss this with us at the array API
> > > > > > > standard
> > > > > > > https://github.com/data-apis/array-api (which is currently
> > > > > > > in
> > > > > > > RFC
> > > > > > > stage). The spec uses bool as the name for the boolean
> > > > > > > dtype.
> > > > > > >
> > > > > > > Would it make sense for NumPy to change np.bool to just be
> > > > > > > the
> > > > > > > boolean
> > > > > > > dtype object? Unlike int and float, there is no ambiguity
> > > > > > > with
> > > > > > > bool,
> > > > > > > and NumPy clearly doesn't have any issues with shadowing
> > > > > > > builtin
> > > > > > > names
> > > > > > > in its namespace.
> > > > > >
> > > > > > We could keep the Python alias around (which for `dtype=` is
> > > > > > the
> > > > > > same
> > > > > > as `np.bool_`).
> > > > > >
> > > > > > I am not sure I like the idea of immediately shadowing the
> > > > > > builtin.
> > > > > > That is a switch we can avoid flipping (without warning);
> > > > > > `np.bool_`
> > > > > > and `bool` are fairly different beasts? [1]
> > > > >
> > > > > NumPy already shadows a lot of builtins, in many cases, in ways
> > > > > that
> > > > > are incompatible with existing ones. It's not something I would
> > > > > have
> > > > > done personally, but it's been this way for a long time.
> > > > >
> > > >
> > > > It may be defensible to keep np.bool as an alias for Python's
> > > > bool
> > > > even when we remove the other aliases.
> > >
> >
> > I'd agree with that.
> >
> >
> > > That is true, `int` is probably the most confusing, since it is not
> > > at
> > > all compatible to a Python integer, but rather the "default"
> > > integer
> > > (which happens to be the same as C `long` currently).
> > >
> > > So we could focus on `np.int`, `np.long`.  I am a bit unsure
> > > whether
> > > you would prefer that or are mainly pointing out the possibility?
> > >
> >
> > Not sure what you mean with focus, focus on describing in the release
> > notes? Deprecating `np.int` seems like the most beneficial part of
> > this
> > whole exercise.
> >
>
> I meant limiting the current deprecation to `np.int`, maybe `np.long`,
> and a "carefully chosen" set.
> To be honest, I don't mind either way, so any stronger opinion will tip
> the scale for me personally (my default currently is to update the
> release notes to recommend the more descriptive names).
>
> There are probably more doc updates that would be nice, I will suggest
> updating a separate issue for that.
>
>
> > Right now, my main take-away from the discussion is that it would be
> > > good to clarify the release notes a bit more.
> > >
> > > Using `float` for a dtype seems fine to me, but I prefer mentioning
> > > `np.float64` over `np.float_`.
> > > For integers, I wonder if we should also suggest `np.int64`, even –
> > > or
> > > because – if the default integer on many systems is currently
> > > `np.int_`?
> > >
> >
> > I agree. I think we should recommend sane, descriptive names that do
> > the
> > right thing. So ideally we'd have people spell their dtype specifiers
> > as
> >   dtype=bool  # or np.bool
> >   dtype=np.float64
> >   dtype=np.int64
> >   dtype=np.complex128
> > The names with underscores at the end make little sense from a UX
> > perspective. And the