Re: [Numpy-discussion] backwards compatibility and deprecation policy NEP

2018-07-24 Thread Ralf Gommers
On Tue, Jul 24, 2018 at 8:07 PM, Nathaniel Smith  wrote:

> On Sun, Jul 22, 2018 at 12:28 PM, Ralf Gommers 
> wrote:
> > On Sat, Jul 21, 2018 at 7:15 PM, Nathaniel Smith  wrote:
> >> Speaking of examples: I hate to say this because in general I think
> >> using examples is a great idea. But... I think you should delete most
> >> of these examples. The problem is scope creep: the goal for this NEP
> >> (IMO) should be to lay out the principles we use to think about these
> >> issues in general, but right now it comes across as trying to lay down
> >> a final resolution on lots of specific issues (including several where
> >> there are ongoing conversations). It ends up like trying to squish
> >> multiple NEPs into one, which makes it hard to discuss, and also
> >> distracts from the core purpose.
> >
> >
> > I'm not sure this is the best thing to do. I can remove a couple, but
> aiming
> > to be "totally uncontroversial" is almost impossible given the topic of
> the
> > NEP.
>
> Of course the NEP itself will have some things to discuss – but I
> think the discussion will be more productive if we can stay focused on
> the core part of the NEP, which is the general principles we use to
> evaluate each specific situation as it comes up. Look at how much of
> the discussion so far has gotten derailed onto topics like
> subclassing, submodules, etc.
>

The subclassing discussion was actually illuminating and useful. Maybe it
does deserve its own write-up somewhere though. Happy to remove that too.
Would then like to put it somewhere else - in the docs, another NEP, ...?

The submodules one I'd really like to keep.


> > The diag view example is important I think, it's the second most
> > discussed backwards compatibility issue next to histogram. I'm happy to
> > remove the statement on what should happen with it going forward though.
>
> It's the most discussed issue because it was the test case where we
> developed all these policies in the first place :-).


Pretty sure that's not true, we had policies long before that plus it was
not advertised as a test case for backwards compat (it's just an
improvement that someone wanted to implement). But well, I don't care
enough about this particular one to argue about it - I'll remove it.

I'm not sure it's
> particularly interesting aside from that, and that specific history
> ("let's come up with a transition plan for this feature that no-one
> actually cares about, b/c no-one cares about it so it's a good thing
> to use as a test case") is unlikely to be repeated.
>
> > Then, I think it's not unreasonable to draw a couple of hard lines. For
> > example, removing complete submodules like linalg or random has ended up
> on
> > some draft brainstorm roadmap list because someone (no idea who) put it
> > there after a single meeting. Clearly the cost-benefit of that is such
> that
> > there's no point even discussing that more, so I'd rather draw that line
> > here than every time someone open an issue. Very recent example:
> > https://github.com/numpy/numpy/issues/11457 (remove auto-import of
> > numpy.testing).
>
> I can see an argument for splitting random and linalg into their own
> modules, which numpy depends on and imports so that existing code
> doesn't break.


Me too, that could happen. But that's unrelated to backwards compatibility.

E.g. this might let people install an old version of
> random if they needed to reproduce some old results, or help us merge
> numpy and scipy's linalg modules into a single package. I agree though
> that making 'np.linalg' start raising AttributeError is a total
> non-starter.
>

It is, hence why I say above that I'd like to keep that example.


> >> Regarding the major version number thing: ugh do we really want to
> >> talk about this more. I'd probably leave it out of the NEP entirely.
> >> If it stays in, I think it needs a clearer description of what counts
> >> as a "major" change.
> >
> >
> > I think it has value to keep it, and that it's not really possible to
> come
> > up with a very clear description of "major". In particular, I'd like
> every
> > deprecation message to say "this deprecated feature will be removed by
> > release X.Y.0". At the moment we don't do that, so if users see a message
> > they don't know if a removal will happen next year, in the far future
> (2.0),
> > or never. The major version thing is quite useful to signal our intent.
> > Doesn't mean we need to exhaustively discuss when to do a 2.0 though, I
> > agree that that's not a very useful discussion right now.
>
> The problem is that "2.0" means a lot of different things to different
> people, not just "some future date to be determined", so using it that
> way will confuse people. Also, it's hard to predict when a deprecation
> will actually happen... it's very common that we adjust the schedule
> as we go (e.g. when we try to remove it and then discover it breaks
> everyone so we have to put it back for a while).
>
> I feel like it would 

Re: [Numpy-discussion] backwards compatibility and deprecation policy NEP

2018-07-24 Thread Nathaniel Smith
On Sun, Jul 22, 2018 at 12:28 PM, Ralf Gommers  wrote:
> On Sat, Jul 21, 2018 at 7:15 PM, Nathaniel Smith  wrote:
>> Speaking of examples: I hate to say this because in general I think
>> using examples is a great idea. But... I think you should delete most
>> of these examples. The problem is scope creep: the goal for this NEP
>> (IMO) should be to lay out the principles we use to think about these
>> issues in general, but right now it comes across as trying to lay down
>> a final resolution on lots of specific issues (including several where
>> there are ongoing conversations). It ends up like trying to squish
>> multiple NEPs into one, which makes it hard to discuss, and also
>> distracts from the core purpose.
>
>
> I'm not sure this is the best thing to do. I can remove a couple, but aiming
> to be "totally uncontroversial" is almost impossible given the topic of the
> NEP.

Of course the NEP itself will have some things to discuss – but I
think the discussion will be more productive if we can stay focused on
the core part of the NEP, which is the general principles we use to
evaluate each specific situation as it comes up. Look at how much of
the discussion so far has gotten derailed onto topics like
subclassing, submodules, etc.

> The diag view example is important I think, it's the second most
> discussed backwards compatibility issue next to histogram. I'm happy to
> remove the statement on what should happen with it going forward though.

It's the most discussed issue because it was the test case where we
developed all these policies in the first place :-). I'm not sure it's
particularly interesting aside from that, and that specific history
("let's come up with a transition plan for this feature that no-one
actually cares about, b/c no-one cares about it so it's a good thing
to use as a test case") is unlikely to be repeated.

> Then, I think it's not unreasonable to draw a couple of hard lines. For
> example, removing complete submodules like linalg or random has ended up on
> some draft brainstorm roadmap list because someone (no idea who) put it
> there after a single meeting. Clearly the cost-benefit of that is such that
> there's no point even discussing that more, so I'd rather draw that line
> here than every time someone open an issue. Very recent example:
> https://github.com/numpy/numpy/issues/11457 (remove auto-import of
> numpy.testing).

I can see an argument for splitting random and linalg into their own
modules, which numpy depends on and imports so that existing code
doesn't break. E.g. this might let people install an old version of
random if they needed to reproduce some old results, or help us merge
numpy and scipy's linalg modules into a single package. I agree though
that making 'np.linalg' start raising AttributeError is a total
non-starter.

>> Regarding the major version number thing: ugh do we really want to
>> talk about this more. I'd probably leave it out of the NEP entirely.
>> If it stays in, I think it needs a clearer description of what counts
>> as a "major" change.
>
>
> I think it has value to keep it, and that it's not really possible to come
> up with a very clear description of "major". In particular, I'd like every
> deprecation message to say "this deprecated feature will be removed by
> release X.Y.0". At the moment we don't do that, so if users see a message
> they don't know if a removal will happen next year, in the far future (2.0),
> or never. The major version thing is quite useful to signal our intent.
> Doesn't mean we need to exhaustively discuss when to do a 2.0 though, I
> agree that that's not a very useful discussion right now.

The problem is that "2.0" means a lot of different things to different
people, not just "some future date to be determined", so using it that
way will confuse people. Also, it's hard to predict when a deprecation
will actually happen... it's very common that we adjust the schedule
as we go (e.g. when we try to remove it and then discover it breaks
everyone so we have to put it back for a while).

I feel like it would be better to do this based on time -- like say
"this will be removed " or something, and then it
might take longer but not shorter?

Re: version numbers, I actually think numpy should consider switching
to calver [1]. We'd be giving up on being able to do a "2.0", but
that's kind of a good thing -- if a change is too big to handle
through our normal deprecation cycle, then it's probably too big to
handle period. And "numpy 2018.3" gives you more information than our
current scheme -- for example you could see at a glance that numpy
2012.1 is super out-of-date, and we could tell people that numpy
2019.1 will drop python 2 support.

...But that's a whole other discussion, and we shouldn't get derailed
onto it here in this NEP's thread :-).

[1] https://calver.org/

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list

Re: [Numpy-discussion] backwards compatibility and deprecation policy NEP

2018-07-24 Thread Ralf Gommers
On Mon, Jul 23, 2018 at 11:46 AM, Stephan Hoyer  wrote:

> On Sun, Jul 22, 2018 at 12:28 PM Ralf Gommers 
> wrote:
>
>> Then, I think it's not unreasonable to draw a couple of hard lines. For
>> example, removing complete submodules like linalg or random has ended up on
>> some draft brainstorm roadmap list because someone (no idea who) put it
>> there after a single meeting. Clearly the cost-benefit of that is such that
>> there's no point even discussing that more, so I'd rather draw that line
>> here than every time someone open an issue.
>>
>
> I'm happy to give the broader context here. This came up in the NumPy
> sprint in Berkeley back in May of this year.
>
> The existence of all of these submodules in NumPy is mostly a historical
> artifact, due to the previously poor state of Python packaging.
>

That's true.

Our thinking was that perhaps this could be revisited in this age of conda
> and manylinux wheels.
>
> This isn't to say that it would actually be a good idea to remove any of
> these submodules today. Separate modules bring both benefits and downsides.
>
> Benefits:
> - It can be easier to maintain projects separately rather than inside
> NumPy, e.g., bug fixes do not need to be tied to NumPy releases.
> - Separate modules could reduce the maintenance burden for NumPy itself,
> because energy gets focused on core features.
>

That's certainly not a given though. Those things still need to be
maintained, and splitting up packages increases overhead for e.g. doing
releases. It's quite unclear if splitting would increase the developer pool.

- For projects for which a rewrite would be warranted (e.g., numpy.ma and
> scipy.sparse), it is *much* easier to innovate outside of NumPy/SciPy.
>

Agreed. That can happen and is already happening though (e.g.
https://github.com/pydata/sparse). It doesn't have much to do with removing
existing submodules.

- Packaging. As mentioned above, this is no longer as beneficial as it once
> way.
>

True, no longer as beneficial - that's not really a benefit though,
packaging just works fine either way.


> Downsides:
> - It's harder to find separate packages than NumPy modules.
> - If the maintainers and maintenance processes are very similar, then
> separate projects can add unnecessary overhead.
> - Changing from bundled to separate packages imposes a significant cost
> upon their users (e.g., due to changed import paths).
>
> Coming back to the NEP:
>
> The import on downstream libraries and users would be very large, and
>>
> maintenance of these modules would still have to happen.  Therefore this
>> is simply not a good idea; removing these submodules should not happen
>> even for a new major version of NumPy.
>>
>
> I'm afraid I disagree pretty strongly here. There should absolutely be a
> high bar for removing submodules, but we should not rule out the
> possibility entirely.
>

My thinking here is: given that we're not even willing to remove
MaskedArray (NEP 17), for which the benefits of removing are a lot higher
and the user base smaller, we are certainly not going to be removing random
or linalg or distutils in the foreseeable future. So we may as well say
that. Otherwise we have the discussions regularly (we actually just did
have one for numpy.testing in gh-11457), which is just a waste of energy.


> It is certainly true that modules need to be maintained for them to be
> remain usable, but I particularly object to the idea that this should be
> forced upon NumPy maintainers.
>

Nothing is "forced on you" as a NumPy maintainer - we are all individuals
who do things voluntarily (okay, almost all - we have some funding now) and
can choose to not spend any time on certain parts of NumPy. MaskedArray
languished for quite a while before Marten and Eric spent a lot of time in
improving it and closing lots of issues related to it. That can happen.

Open source projects need to be maintained by their users, and if their
> users cannot devote energy to maintain them then the open source project
> deserves to die. This is just as true for NumPy submodules as for external
> packages.
>
> NumPy itself only has an obligation to maintain submodules if they are
> actively needed by the NumPy project and valued by active NumPy
> contributors.
>

This is very developer-centric view. We have lots of users and also lots of
no-longer-active contributors. The needs, interests and previous work put
into NumPy of those groups of people matter.

Otherwise, they should be maintained by users who care about them --
> whether that means inside or outside NumPy. It serves nobody well to insist
> on NumPy developers maintaining projects that they don't use or care about.
>

> I like would suggest the following criteria for considering removing a
> NumPy submodule:
> 1. It cannot be relied upon by other portions of NumPy.
> 2. Either
> (a) the submodule imposes a significant maintenance burden upon the rest
> of NumPy that is not balanced by the level of dedicated 

Re: [Numpy-discussion] Roadmap proposal, v3

2018-07-24 Thread Hameer Abbasi
Hey Stefan/Ralf/Stephan,

This looks nice, generally what the community agrees on. Great work, and
thanks for putting this together.

Best regards,
Hameer Abbasi
Sent from Astro  for Mac

On 24. Jul 2018 at 21:04, Stefan van der Walt  wrote:


Hi everyone,

Please take a look at the latest roadmap proposal:

https://github.com/numpy/numpy/pull/11611

This is a living document, so can easily be modified in the future, but
we'd like to get in place a document that corresponds fairly closely
with current community priorities.

Best regards,
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] backwards compatibility and deprecation policy NEP

2018-07-24 Thread Hameer Abbasi
On 23. Jul 2018 at 19:46, Stephan Hoyer  wrote:


On Sat, Jul 21, 2018 at 6:40 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> But I think the subclassing section is somewhat misleading in suggesting
> `ndarray` is not well designed to be subclassed. At least, for neither my
> work on Quantity nor that on MaskedArray, I've found that the design of
> `ndarray` itself was a problem. Instead, it was the functions that were, as
> most were not written with subclassing or duck typing in mind, but rather
> with the assumption that all input should be an array, and that somehow it
> is useful to pass anything users pass in through `asarray`. With then
> layers on top to avoid this in specific circumstances... But perhaps this
> is what you meant?
>

I can't speak for Ralf, but yes, this is part of what I had in mind. I
don't think you can separate "core" objects/methods from functions that act
on them. Either the entire system is designed to handle subclassing through
some well-defined interface or is it not.

If you don't design a system for subclassing but allow it anyways (



and it's impossible to prohibit problematically in Python


This isn’t really true. Metaprogramming to the rescue I guess.
https://stackoverflow.com/questions/16564198/pythons-equivalent-of-nets-sealed-class#16564232

Best regards,
Hameer Abbasi
Sent from Astro  for Mac

), then you can easily end up with very fragile systems that are difficult
to modify or extend. As Ralf noted in the NEP, "Some of them change the
behavior of ndarray methods, making it difficult to write code that accepts
array duck-types." These changes end up having implications for apparently
unrelated functions (e.g., np.median needing to call np.mean internally to
handle units properly). I don't think anyone really wants that sort of
behavior or lock-in in NumPy itself, but of course that is the price we pay
for not having well-defined interfaces :). Hopefully NEP-18 will change
that, and eventually we will be able to remove hacks from NumPy that we
added only because there weren't any better alternatives available.

For the NEP itself, i would not mention "A future change in NumPy to not
support subclassing," because it's not as if subclassing is suddenly not
going to work as of a certain NumPy release.  Certain types of subclasses
(e.g., those that only add extra methods and/or metadata and do not modify
any existing functionality) have never been a problem and will be fine to
support indefinitely.

Rather, we might state that "At some point in the future, the NumPy
development team may no longer interested in maintaining workarounds for
specific subclasses, because other interfaces for extending NumPy are
believed to be more maintainable/preferred."

Overall, it seems to me that these days in the python eco-system
> subclassing is simply expected to work.
>

I don't think this is true. You can use subclassing on builtin types like
dict, but just because you can do it doesn't mean it's a good idea. If you
change built-in methods to work in different ways other things will break
in unexpected ways (or simply not change, also in unexpected ways).
Probably the only really safe way to subclass a dictionary is to define the
__missing__() method and not change any other aspects of the public
interface directly.

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Roadmap proposal, v3

2018-07-24 Thread Gael Varoquaux
Looks great! Thank you for doing this!

Gaël

On Tue, Jul 24, 2018 at 12:04:49PM -0700, Stefan van der Walt wrote:
> Hi everyone,

> Please take a look at the latest roadmap proposal:

> https://github.com/numpy/numpy/pull/11611

> This is a living document, so can easily be modified in the future, but
> we'd like to get in place a document that corresponds fairly closely
> with current community priorities.

> Best regards,
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-- 
Gael Varoquaux
Senior Researcher, INRIA Parietal
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone:  ++ 33-1-69-08-79-68
http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Roadmap proposal, v3

2018-07-24 Thread Stefan van der Walt
Hi everyone,

Please take a look at the latest roadmap proposal:

https://github.com/numpy/numpy/pull/11611

This is a living document, so can easily be modified in the future, but
we'd like to get in place a document that corresponds fairly closely
with current community priorities.

Best regards,
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] backwards compatibility and deprecation policy NEP

2018-07-24 Thread Ralf Gommers
On Mon, Jul 23, 2018 at 1:43 PM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

>
>> Rather, we might state that "At some point in the future, the NumPy
>> development team may no longer interested in maintaining workarounds for
>> specific subclasses, because other interfaces for extending NumPy are
>> believed to be more maintainable/preferred."
>>
>> That sentence I think covers it very well. Subclasses can and should be
> expected to evolve along with numpy, and if that means some numpy-version
> dependent parts, so be it (we have those now...).  It is just that one
> should not remove functionality without providing the better alternative!
>

Thanks for the input both, that makes sense. I'll try and rewrite the
section along these lines.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion