Re: [Numpy-discussion] Proposal to add clause to license prohibiting use by oil and gas extraction companies

2020-07-01 Thread Ryan May
Hi,

I can respect where this comes from, especially as someone who works in
atmospheric science. I'm glad people are trying to do what they can.

With that said, I am -1000 on this. In my opinion, a software license is a
wholly inappropriate venue for trying to do this. At the top of the home
page for the Free Software Foundation: "Free software developers guarantee
everyone equal rights to their programs". What you're proposing is
essentially "everyone equal rights so long as they aren't working on things
I disagree with". The nobility of the cause in my opinion doesn't justify
compromising the values behind free software.

As someone with some miniscule commits in the numpy codebase, I would not
want them distributed under the modified license. As a developer of other
downstream projects, I would switch to the BSD fork of the project that
would inevitably materialize.

Ryan

On Wed, Jul 1, 2020 at 12:35 PM John Preston  wrote:

> Hello all,
>
> The following proposal was originally issue #16722 on GitHub but at
> the request of Matti Picus I am moving the discussion to this list.
>
>
> "NumPy is the fundamental package needed for scientific computing with
> Python."
>
> I am asking the NumPy project to leverage its position as a core
> dependency among statistical, numerical, and ML projects, in the
> pursuit of climate justice. It is easy to identify open-source
> software used by the oil and gas industry which relies on NumPy [1]
> [2] , and it is highly likely that NumPy is used in closed-source and
> in-house software at oil and gas extraction companies such as Aramco,
> ExxonMobil, BP, Shell, and others. I believe it is possible to use
> software licensing to discourage the use of NumPy and dependent
> packages by companies such as these, and that doing so would frustrate
> the ability of these companies to identify and extract new oil and gas
> reserves.
>
> I propose NumPy's current BSD 3-Clause license be extended to include
> the following conditions, in line with the Climate Strike License [3]
> :
>
> * The Software may not be used in applications and services that
> are used for or
>aid in the exploration, extraction, refinement, processing, or
> transportation
>of fossil fuels.
>
> * The Software may not be used by companies that rely on fossil
> fuel extraction
>as their primary means of revenue. This includes but is not
> limited to the
>companies listed at https://climatestrike.software/blocklist
>
> I accept that there are issues around adopting such a proposal, including
> that:
>
> addition of such clauses violates the Open Source Initiative's
> canonical Open Source Definition, which explicitly excludes licenses
> that limit re-use "in a specific field of endeavor", and therefore if
> these clauses were adopted NumPy would no longer "be open-source" by
> this definition;
> there may be collateral damage among the wider user base and project
> sponsorship, due to the vague nature of the first clause, and this may
> affect the longevity of the project and its standing within the
> Python, numerical, statistical, and ML communities.
>
> My intention with the opening of this issue is to promote constructive
> discussion of the use of software licensing -- and other measures --
> for working towards climate justice -- and other forms of justice --
> in the context of NumPy and other popular open-source libraries. Some
> people will say that NumPy is "just a tool" and that it sits
> independent of how it is used, but due to its utility and its
> influence as a major open-source library, I think it is essential that
> we consider the position of the Climate Strike License authors, that
> "as tech workers, we should take responsibility in how our software is
> used".
>
> Many thanks to all of the contributors who have put so much time and
> energy into NumPy. ✨ ❤️ 
>
> [1] https://github.com/gazprom-neft/petroflow
> [2] https://github.com/climate-strike/analysis
> [3] https://github.com/climate-strike/license
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: NEP 41 -- First step towards a new Datatype System

2020-03-17 Thread Ryan May
On Tue, Mar 17, 2020 at 4:35 PM Chris Meyer  wrote:

> > On Mar 17, 2020, at 1:02 PM, Sebastian Berg 
> wrote:
> >
> > in the spirit of trying to keep this moving, can I assume that the main
> > reason for little discussion is that the actual changes proposed are
> > not very far reaching as of now?  Or is the reason that this is a
> > fairly complex topic that you need more time to think about it?
> > If it is the latter, is there some way I can help with it?  I tried to
> > minimize how much is part of this initial NEP.
>
> One reason for not responding is that it seems a lot of discussion of this
> has already taken place and this NEP is presented more as a conclusion
> summary rather than a discussion point.
>
> I implement scientific imaging software and overall this NEP looks useful.
>
> My only caveat is that I don’t think tracking physical units should be a
> primary use case. Units are fundamentally different than data types, even
> though there are libraries out there that treat them more like data types.
>

I strongly disagree. Right now, you need to implement a custom container to
handle units, which makes it exceedingly difficult to then properly
interact with other array_like objects, like dask, pandas, and xarray;
handling units is completely orthogonal to handling slicing operations,
data access, etc. so having to implement a container is overkill. Unit
information describes information about the type of each of the elements
within an array, including describing how operations between individual
elements work. This sounds exactly like a dtype to me.


> For instance, it makes sense to have the same physical unit but with
> different storage types. For instance, data with nanometer physical units
> can be stored as a float32 or as an int16 and be equally useful.
>

Yes, you would have the unit tracking as a mixin that would allow different
storage types, absolutely.


> In addition, a unit is something that is mutated by the operation. For
> instance, reducing a 2D image with physical units by a factor of two in
> each dimension produces a different unit scaling (1km/pixel goes to
> 2km/pixel); whereas cropping the center half does not (1km/pixel stays as
> 1km/pixel).
>

I'm not sure what your point is. Dtypes can change for some operations
(np.sqrt(np.arange(5)) produces a float) while staying the same for others
(e.g. addition)


> Finally, units may be different for each axis in multidimensional data.
> For instance, we want a float32 array with two dimensions with the units on
> one dimension being time and the other dimension being spatial. (3 seconds
> x 50 nm).
>

The units for an array describe the elements *within* the array, they would
have nothing to do with the dimensions. So for an array of image data, e.g.
brightness temperatures, you would have physical units (e.g. Kelvin). You
would have separate arrays of coordinates describing the spatial extent of
the data along the relevant dimensions--each of these arrays of coordinates
would have their own physical quantity information.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allowing Dependabot access to the numpy repo

2019-08-29 Thread Ryan May
Hi,

The answer to why Dependabot needs write permission seems to be to be able
to work with private repos:

https://github.com/dependabot/feedback/issues/22

There doesn't seem to be any way around it... :(

Ryan

On Thu, Aug 29, 2019 at 12:04 AM Matti Picus  wrote:

> In PR 14378 https://github.com/numpy/numpy/pull/14378 I moved all our
> python test dependencies to a test_requirements.txt file (for building
> numpy the only requirement is cython). This is worthy since it unifies the
> different "pip install" commands across the different CI systems we use.
> Additionally, there are services that monitor the file and will issue a PR
> if any of those packages have a new release, so we can test out new
> versions of dependencies in a controlled fashion. Someone suggested
> Dependabot (thanks Ryan), which turns out to be run by a company bought by
> github itself.
>
>
> When signing up for the service, it asks for permissions:
> https://pasteboard.co/IuTeWNz.png. The service is in use by other
> projects like cpython. Does it seem OK to sign up for this service?
>
>
> Matti
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adoption of a Code of Conduct

2018-08-01 Thread Ryan May
>
> When experts say that something is a bad idea, and when the people who
> a CoC is supposed to protect says it makes them feel unsafe, I feel
> like we should listen to that.
>
> I also thought that the points made in the Jupyter discussion thread
> made a lot of sense: of course it's possible for people to start
> harassing each other over any excuse, and a CoC can, should, and does
> make clear that that's not OK. But if you specifically *call out*
> political affiliation as a protected class, at a time when lots of the
> people who the CoC is trying to protect are facing governmental
> harassment justified as "mere political disagreement", then it really
> sends the wrong message.
>
> Besides, uh... isn't the whole definition of politics that it's topics
> where there is active debate? Not really sure why it's even in that
> list to start with.
>

So I hear all the arguments about people feeling unsafe due to some truly
despicable, discriminatory behavior, and I want absolutely no parts of
protecting that. However, I also recognize that we in the U.S. are in a
particularly divisive atmosphere, and people of varied political
persuasions want absolutely nothing to do with those who share differing
views. So, as a concrete example, if someone were to show up at a NumPy
developer summit with a MAGA ("Make America Great Again") hat, or talks
about their support for the president in non-numpy channels, WITHOUT
expressing anything discriminatory or support for such views, if "political
beliefs" is not in the CoC, is this person welcome? I'm not worried about
my own views, but I have friends of widely varying views, and I truly
wonder if they would be welcome. With differing "political beliefs" listed
as something welcomed, I feel ok for them; if this language is removed, I'm
much less certain.

IMO, "political beliefs" encompasses so much more things than a handful of
very specific, hateful views. People can disagree about a wide array of
"political beliefs" and it is important that we as a community welcome a
wide array of such views. If the CoC needs to protect against the wide
array of discriminatory views and behavior that make up U.S. politics right
now, how about specifically calling those behaviors out as not-welcome,
rather than completely ignoring the fact that 99% of "political beliefs"
are perfectly welcome within the community?

The CoC is about spelling out the community norms--how about just spelling
out that we welcome everyone, but, in the words of Will Wheaton, "Don't be
a dick"?

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-09 Thread Ryan May
On Fri, Mar 9, 2018 at 12:21 AM, Hameer Abbasi <einstein.edi...@gmail.com>
wrote:

> Not that I’m against different “levels” of ndarray granularity, but I just
> don’t want it to introduce complexity for the end-user. For example, it
> would be unreasonable to expect the end-user to check for all parts of the
> interface that they need support for separately.
>

I wouldn't necessarily want all of the granularity exposed in something
like "asarraylike"--that should be kept really simple. But I think there's
value in numpy providing multiple ABCs for portions of the interface (and
one big one that combines them all). That way, people who want the
finer-grained checking (say for a more limited array-like) can use a
common, shared, existing ABC, rather than having everyone re-invent it.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Ryan May
On Tue, Nov 7, 2017 at 1:20 PM, Chris Barker <chris.bar...@noaa.gov> wrote:

> On Mon, Nov 6, 2017 at 4:28 PM, Stephan Hoyer <sho...@gmail.com> wrote:
>
>>
>>> What's needed, though, is not just a single ABC. Some thought and design
>>> needs to go into segmenting the ndarray API to declare certain behaviors,
>>> just like was done for collections:
>>>
>>> https://docs.python.org/3/library/collections.abc.html
>>>
>>> You don't just have a single ABC declaring a collection, but rather "I
>>> am a mapping" or "I am a mutable sequence". It's more of a pain for
>>> developers to properly specify things, but this is not a bad thing to
>>> actually give code some thought.
>>>
>>
>> I agree, it would be nice to nail down a hierarchy of duck-arrays, if
>> possible. Although, there are quite a few options, so I don't know how
>> doable this is.
>>
>
> Exactly -- there are an exponential amount of options...
>
>
>> Well, to get the ball rolling a bit, the key thing that matplotlib needs
>> to know is if `shape`, `reshape`, 'size', broadcasting, and logical
>> indexing is respected. So, I see three possible abc's here: one for
>> attribute access (things like `shape` and `size`) and another for shape
>> manipulations (broadcasting and reshape, and assignment to .shape).
>
>
> I think we're going to get into an string of ABCs:
>
> ArrayLikeForMPL_ABC
>
> etc, etc.
>

Only if you try to provide perfectly-sized options for every occasion--but
that's not how we do things in (sane) software development. You provide a
few options that optimize the common use cases, and you don't try to cover
everything--let client code figure out the right combination from the
primitives you provide. One can always just inherit/register *all* the ABCs
if need be. The status quo is that we have 1 interface that covers
everything from multiple dims and shape to math and broadcasting to the
entire __array__ interface. Even breaking that up into the 3 "obvious"
chunks would be a massive improvement.

I just don't want to see this effort bog down into "this is so hard".
Getting it perfect is hard; getting it useful is much easier.

It's important to note that we can always break up/combine existing ABCs
into other ones later.


> And then a third abc for indexing support, although, I am not sure how
>> that could get implemented...
>
>
> This is the really tricky one -- all ABCs really check is the existence of
> methods -- making sure they behave the same way is up to the developer of
> the ducktype.
>
> which is K, but will require discipline.
>
> But indexing, specifically fancy indexing, is another matter -- I'm not
> sure if there even a way with an ABC to check for what types of indexing
> are support, but we'd still have the problem with whether the semantics are
> the same!
>
> For example, I work with netcdf variable objects, which are partly
> duck-typed as ndarrays, but I think n-dimensional fancy indexing works
> differently... how in the world do you detect that with an ABC???
>

Even documenting expected behavior as part of these ABCs would go a long
way towards helping standardize behavior.

Another idea would be to put together a conformance test suite as part of
this effort, in lieu of some kind of run-time checking of behavior (which
would be terrible). That would help developers of other "ducks" check that
they're doing the right things. I'd imagine the existing NumPy test suite
would largely cover this.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-06 Thread Ryan May
On Mon, Nov 6, 2017 at 3:18 PM, Chris Barker <chris.bar...@noaa.gov> wrote:

> Klunky, and maybe we could come up with a standard way to do it and
> include that in numpy, but I'm not sure that ABCs are the way to do it.
>

ABCs are *absolutely* the way to go about it. It's the only way baked into
the Python language itself that allows you to register a class for purposes
of `isinstance` without needing to subclass--i.e. duck-typing.

What's needed, though, is not just a single ABC. Some thought and design
needs to go into segmenting the ndarray API to declare certain behaviors,
just like was done for collections:

https://docs.python.org/3/library/collections.abc.html

You don't just have a single ABC declaring a collection, but rather "I am a
mapping" or "I am a mutable sequence". It's more of a pain for developers
to properly specify things, but this is not a bad thing to actually give
code some thought.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Ryan May
On Thu, Nov 2, 2017 at 6:46 AM, <josef.p...@gmail.com> wrote:

>
>
> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum <nathan12...@gmail.com>
> wrote:
>
>> I think the biggest issues could be resolved if __array_concatenate__
>> were finished. Unfortunately I don't feel like I can take that on right now.
>>
>> See Ryan May's talk at scipy about using an ndarray subclass for units
>> and the issues he's run into:
>>
>> https://www.youtube.com/watch?v=qCo9bkT9sow
>>
>
>
> Interesting talk, but I don't see how general library code should know
> what units the output has.
> for example if units are some flows per unit of time and we average, sum
> or integrate over time, then what are the new units? (e.g. pandas time
> aggregation)
>

A general library doesn't have to do anything--just not do annoying things
like isinstance() checks and calling np.asarray() everywhere. Honestly one
of those is the core of most of the problems I run into. It's currently
more complicated when doing things in compiled land, but that's
implementation details, not any kind of fundamental incompatibility.

For basic mathematical operations, units have perfectly well defined
semantics that many of us encountered in an introductory physics or
chemistry class:
- Want to add or subtract two things? They need to have the same units; a
units library can handle conversion provided they have the same
dimensionality (e.g. length, time)
- Multiplication/Divison: combine and cancel units ( m/s * s -> m)

Everything else we do on a computer with data in some way boils down to:
add, subtract, multiply, divide.

Average keeps the same units -- it's just a sum and division by a unit-less
constant
Integration (in 1-D) involves *two* variables, your data as well as the
time/space coordinates (or dx or dt); fundamentally it's a multiplication
by dx and a summation. The units results then are e.g. data.units *
dx.units. This works just like it does in Physics 101 where you integrate
velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m)

What are units of covariance or correlation between two variables with the
> same units, and what are they between variables with different units?
>

Well, covariance is subtracting the mean from each variable and multiplying
the residuals; therefore the units for cov(x, y):

(x.units - x.units) * (y.units - y.units) -> x.units * y.units

Correlation takes covariance and divides by the product of the standard
deviations, so that's:

(x.units * y.units) / (x.units * y.units) -> dimensionless

Which is what I'd expect for a correlation.


> How do you concatenate and operate arrays with different units?
>

If all arrays have compatible dimensionality (say meters, inches, miles),
you convert to one (say the first) and concatenate like normal. If they're
not compatible, you error out.


> interpolation or prediction would work with using the existing units.
>

I'm sure you wrote that thinking units didn't play a role, but the math
behind those operations works perfectly fine with units, with things
cancelling out properly to give the same units out as in.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-07 Thread Ryan May
On Fri, Jul 7, 2017 at 4:27 PM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi All,
>
> I doubt I'm really the last one thinking ndarray subclassing is a good
> idea, but as that was stated, I feel I should at least pipe in. It
> seems to me there is both a perceived problem -- with the two
> subclasses that numpy provides -- `matrix` and `MaskedArray` -- both
> being problematic in ways that seem to me to have very little to do
> with subclassing being a bad idea, and a real one following from the
> fact that numpy was written at a time when python's inheritance system
> was not as well developed as it is now.
>
> Though based on my experience with Quantity, I'd also argue that the
> more annoying problems are not so much with `ndarray` itself, but
> rather with the helper functions.  Ufuncs were not so bad -- they
> really just needed a better override mechanism, which __array_ufunc__
> now provides -- but for quite a few of the other functions subclassing
> was clearly an afterthought. Indeed, `MaskedArray` provides a nice
> example of this, with its many special `np.ma.` routines,
> providing huge duplication and thus lots of duplicated bugs (which
> Eric has been patiently fixing...). Indeed, `MaskedArray` is also a
> much better example than ndarrat of a class that is really hard to
> subclass (even though, conceptually, it should be a far easier one).
>
> All that said, duck-type arrays make a lot of sense, and e.g. the
> slicing and shaping methods are easily emulated, especially if one's
> underlying data are stored in `ndarray`. For astropy's version of a
> relevant mixin, see
> http://docs.astropy.org/en/stable/api/astropy.utils.misc.
> ShapedLikeNDArray.html


My biggest problem with subclassing as it exists now is that they don't
survive the first encounter with np.asarray (or np.array). So much code
written to work with numpy uses that as a bandaid (for e.g. handling lists)
that in my experience it's 50/50 whether passing a subclass to a function
will actually behave as expected--even if there's no good reason it
shouldn't.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion