Re: [Numpy-discussion] py2/py3 pickling

2015-08-25 Thread Pauli Virtanen
25.08.2015, 01:15, Chris Laumann kirjoitti:
 Would it be possible then (in relatively short order) to create
 a py2 - py3 numpy pickle converter? 

You probably need to modify the pickle stream directly, replacing
*STRING opcodes with *BYTES opcodes when it comes to objects that are
needed for constructing Numpy arrays.

https://hg.python.org/cpython/file/tip/Modules/_pickle.c#l82

Or, use a custom pickler class that emits the new opcodes when it comes
to data that is part of Numpy arrays, as Python 2 pickler doesn't know
how to write bytes opcodes.

It's probably doable, although likely annoying to implement. the pickles
created won't be loadable on Py2, only Py3.

You'd need to find a volunteer who wants to work on this or just do it
yourself, though.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy helper function for __getitem__?

2015-08-25 Thread Fabien
On 08/24/2015 10:23 AM, Sebastian Berg wrote:
 Fabien, just to make sure you are aware. If you are overriding
 `__getitem__`, you should also implement `__setitem__`. NumPy does some
 magic if you do not. That will seem to make `__setitem__` work fine, but
 breaks down if you have advanced indexing involved (or if you return
 copies, though it spits warnings in that case).

Hi Sebastian,

thanks for the info. I am writing a duck NetCDF4 Variable object, and 
therefore I am not trying to override Numpy arrays.

I think that Stephan's function for xray is very useful. A possible 
improvement (probably at a certain performance cost) would be to be able 
to provide a shape instead of a number of dimensions. The output would 
then be slices with valid start and ends.

Current behavior:
In[9]: expanded_indexer(slice(None), 2)
Out[9]: (slice(None, None, None), slice(None, None, None))

With shape:
In[9]: expanded_indexer(slice(None), (3, 4))
Out[9]: (slice(0, 4, 1), slice(0, 5, 1))

But if nobody needed something like this before me, I think that I might 
have a design problem in my code (still quite new to python).

Cheers and thanks,

Fabien

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] py2/py3 pickling

2015-08-25 Thread Antoine Pitrou
On Tue, 25 Aug 2015 19:12:30 +0300
Pauli Virtanen p...@iki.fi wrote:

 25.08.2015, 01:15, Chris Laumann kirjoitti:
  Would it be possible then (in relatively short order) to create
  a py2 - py3 numpy pickle converter? 
 
 You probably need to modify the pickle stream directly, replacing
 *STRING opcodes with *BYTES opcodes when it comes to objects that are
 needed for constructing Numpy arrays.
 
 https://hg.python.org/cpython/file/tip/Modules/_pickle.c#l82
 
 Or, use a custom pickler class that emits the new opcodes when it comes
 to data that is part of Numpy arrays, as Python 2 pickler doesn't know
 how to write bytes opcodes.
 
 It's probably doable, although likely annoying to implement. the pickles
 created won't be loadable on Py2, only Py3.

One could take a look at how the built-in bytearray type achieves
pickle compatibility between 2.x and 3.x. The solution is to serialize
the binary data as a latin-1 decoded unicode string, and to return the
right reconstructor from __reduce__.

The solution is less space-efficient than pure bytes pickling, since
the unicode string is serialized as utf-8 (so bytes  0x80 are
multibyte-encoded). There's also some CPU overhead, due to the
successive decoding and encoding steps.

You can take a look at the bytearray_reduce() function in
Objects/bytearrayobject.c, both for 2.x and 3.x.

(also note how the 3.x version does it only for protocols  3, to
achieve better efficiency on newer protocol versions)


Another possibility would be a custom Unpickler class for 3.x, dealing
specifically with 2.x-produced Numpy array pickles. That way the
pickles themselves could be cross-version.

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] 1.10.0rc1

2015-08-25 Thread Charles R Harris
Hi All,

The silence after the 1.10 beta has been eerie. Consequently, I'm thinking
of making a first release candidate this weekend. If you haven't yet tested
the beta, please do so. It would be good to discover as many problems as we
can before the first release.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-25 Thread Nathan Goldbaum
On Tue, Aug 25, 2015 at 5:03 AM, Nathaniel Smith n...@pobox.com wrote:

 Hi all,

 These are the notes from the NumPy dev meeting held July 7, 2015, at
 the SciPy conference in Austin, presented here so the list can keep up
 with what happens, and so you can give feedback. Please do give
 feedback, none of this is final!

 (Also, if anyone who was there notices anything I left out or
 mischaracterized, please speak up -- these are a lot of notes I'm
 trying to gather together, so I could easily have missed something!)

 Thanks to Jill Cowan and the rest of the SciPy organizers for donating
 space and organizing logistics for us, and to the Berkeley Institute
 for Data Science for funding travel for Jaime, Nathaniel, and
 Sebastian.


 Attendees
 =

   Present in the room for all or part: Daniel Allan, Chris Barker,
   Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fernández del
   Río, Chuck Harris, Nathaniel Smith, Stéfan van der Walt. (Note: I'm
   pretty sure this list is incomplete)

   Joining remotely for all or part: Stephan Hoyer, Julian Taylor.


 Formalizing our governance/decision making
 ==

   This was a major focus of discussion. At a high level, the consensus
   was to steal IPython's governance document (IPEP 29) and modify it
   to remove its use of a BDFL as a backstop to normal community
   consensus-based decision, and replace it with a new backstop based
   on Apache-project-style consensus voting amongst the core team.

   I'll send out a proper draft of this shortly for further discussion.


 Development roadmap
 ===

   General consensus:

   Let's assume NumPy is going to remain important indefinitely, and
   try to make it better, instead of waiting for something better to
   come along. (This is unlikely to be wasted effort even if something
   better does come along, and it's hardly a sure thing that that will
   happen anyway.)

   Let's focus on evolving numpy as far as we can without major
   break-the-world changes (no numpy 2.0, at least in the foreseeable
   future).

   And, as a target for that evolution, let's change our focus from
   numpy as NumPy is the library that gives you the np.ndarray object
   (plus some attached infrastructure), to NumPy provides the
   standard framework for working with arrays and array-like objects in
   Python

   This means, creating defined interfaces between array-like objects /
   ufunc objects / dtype objects, so that it becomes possible for third
   parties to add their own and mix-and-match. Right now ufuncs are
   pretty good at this, but if you want a new array class or dtype then
   in most cases you pretty much have to modify numpy itself.

   Vision: instead of everyone who wants a new container type having to
   reimplement all of numpy, Alice can implement an array class using
   (sparse / distributed / compressed / tiled / gpu / out-of-core /
   delayed / ...) storage, pass it to code that was written using
   direct calls to np.* functions, and it just works. (Instead of
   np.sin being the way you calculate the sine of an ndarray, it's
   the way you calculate the sine of any array-like container
   object.)

   Vision: Darryl can implement a new dtype for (categorical data /
   astronomical dates / integers-with-missing-values / ...) without
   having to touch the numpy core.

   Vision: Chandni can then come along and combine them by doing

   a = alice_array([...], dtype=darryl_dtype)

   and it just works.

   Vision: no-one is tempted to subclass ndarray, because anything you
   can do with an ndarray subclass you can also easily do by defining
   your own new class that implements the array protocol.


 Supporting third-party array types
 ~~

   Sub-goals:
   - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's
 API right there.
   - Go through the rest of the stuff in numpy, and figure out some
 story for how to let it handle third-party array classes:
 - ufunc ALL the things: Some things can be converted directly into
   (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some
   things could be converted into (g)ufuncs if we extended the
   (g)ufunc interface a bit (e.g. np.sort, np.matmul).
 - Some things probably need their own __numpy_ufunc__-like
   extensions (__numpy_concatenate__?)
   - Provide tools to make it easier to implement the more complicated
 parts of an array object (e.g. the bazillion different methods,
 many of which are ufuncs in disguise, or indexing)
   - Longer-run interesting research project: __numpy_ufunc__ requires
 that one or the other object have explicit knowledge of how to
 handle the other, so to handle binary ufuncs with N array types
 you need something like N**2 __numpy_ufunc__ code paths. As an
 alternative, if there were some interface that an object could
 export that provided the operations 

Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-25 Thread Charles R Harris
On Tue, Aug 25, 2015 at 4:03 AM, Nathaniel Smith n...@pobox.com wrote:

 Hi all,

 These are the notes from the NumPy dev meeting held July 7, 2015, at
 the SciPy conference in Austin, presented here so the list can keep up
 with what happens, and so you can give feedback. Please do give
 feedback, none of this is final!

 (Also, if anyone who was there notices anything I left out or
 mischaracterized, please speak up -- these are a lot of notes I'm
 trying to gather together, so I could easily have missed something!)

 Thanks to Jill Cowan and the rest of the SciPy organizers for donating
 space and organizing logistics for us, and to the Berkeley Institute
 for Data Science for funding travel for Jaime, Nathaniel, and
 Sebastian.


 Attendees
 =

   Present in the room for all or part: Daniel Allan, Chris Barker,
   Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fernández del
   Río, Chuck Harris, Nathaniel Smith, Stéfan van der Walt. (Note: I'm
   pretty sure this list is incomplete)

   Joining remotely for all or part: Stephan Hoyer, Julian Taylor.


 Formalizing our governance/decision making
 ==

   This was a major focus of discussion. At a high level, the consensus
   was to steal IPython's governance document (IPEP 29) and modify it
   to remove its use of a BDFL as a backstop to normal community
   consensus-based decision, and replace it with a new backstop based
   on Apache-project-style consensus voting amongst the core team.

   I'll send out a proper draft of this shortly for further discussion.


 Development roadmap
 ===

   General consensus:

   Let's assume NumPy is going to remain important indefinitely, and
   try to make it better, instead of waiting for something better to
   come along. (This is unlikely to be wasted effort even if something
   better does come along, and it's hardly a sure thing that that will
   happen anyway.)

   Let's focus on evolving numpy as far as we can without major
   break-the-world changes (no numpy 2.0, at least in the foreseeable
   future).

   And, as a target for that evolution, let's change our focus from
   numpy as NumPy is the library that gives you the np.ndarray object
   (plus some attached infrastructure), to NumPy provides the
   standard framework for working with arrays and array-like objects in
   Python

   This means, creating defined interfaces between array-like objects /
   ufunc objects / dtype objects, so that it becomes possible for third
   parties to add their own and mix-and-match. Right now ufuncs are
   pretty good at this, but if you want a new array class or dtype then
   in most cases you pretty much have to modify numpy itself.

   Vision: instead of everyone who wants a new container type having to
   reimplement all of numpy, Alice can implement an array class using
   (sparse / distributed / compressed / tiled / gpu / out-of-core /
   delayed / ...) storage, pass it to code that was written using
   direct calls to np.* functions, and it just works. (Instead of
   np.sin being the way you calculate the sine of an ndarray, it's
   the way you calculate the sine of any array-like container
   object.)

   Vision: Darryl can implement a new dtype for (categorical data /
   astronomical dates / integers-with-missing-values / ...) without
   having to touch the numpy core.

   Vision: Chandni can then come along and combine them by doing

   a = alice_array([...], dtype=darryl_dtype)

   and it just works.

   Vision: no-one is tempted to subclass ndarray, because anything you
   can do with an ndarray subclass you can also easily do by defining
   your own new class that implements the array protocol.


 Supporting third-party array types
 ~~

   Sub-goals:
   - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's
 API right there.
   - Go through the rest of the stuff in numpy, and figure out some
 story for how to let it handle third-party array classes:
 - ufunc ALL the things: Some things can be converted directly into
   (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some
   things could be converted into (g)ufuncs if we extended the
   (g)ufunc interface a bit (e.g. np.sort, np.matmul).
 - Some things probably need their own __numpy_ufunc__-like
   extensions (__numpy_concatenate__?)
   - Provide tools to make it easier to implement the more complicated
 parts of an array object (e.g. the bazillion different methods,
 many of which are ufuncs in disguise, or indexing)
   - Longer-run interesting research project: __numpy_ufunc__ requires
 that one or the other object have explicit knowledge of how to
 handle the other, so to handle binary ufuncs with N array types
 you need something like N**2 __numpy_ufunc__ code paths. As an
 alternative, if there were some interface that an object could
 export that provided the operations 

Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-25 Thread Travis Oliphant
On Tue, Aug 25, 2015 at 3:58 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant tra...@continuum.io
 wrote:

 Thanks for the write-up Nathaniel.   There is a lot of great detail and
 interesting ideas here.

 snip



 I think that summarizes my main concerns.  I will write-up more forward
 thinking ideas for what else is possible in the coming weeks.   In the mean
 time, thanks for keeping the discussion going.  It is extremely exciting to
 see the help people have continued to provide to maintain and improve
 NumPy.It will be exciting to see what the next few years bring as
 well.


 I think the only thing that looks even a little bit like a numpy 2.0 at
 this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a
 major project. Dynd is 2.5+ years old, 3500+ commits in, and still in
 progress.  If there is a decision to pursue Dynd I could support that, but
 I think we would want to think deeply about how to make the transition as
 painless as possible. It would be good at this point to get some feedback
 from people currently using dynd. IIRC, part of the reason for starting
 dynd was the perception that is was not possible to evolve numpy without
 running into compatibility road blocks. Travis, could you perhaps summarize
 the thinking that went into the decision to make dynd a separate project?


I think it would be best if Mark Wiebe speaks up here.   I can explain why
Continuum supported DyND with some fraction of Mark's time for a few years
and give my perspective, but ultimately DyND is Mark's story to tell (and a
few talented people have now joined him in the effort).  Mark Wiebe was a
productive NumPy developer.   He was one of a few people that jumped in on
the code-base and made substantial and significant changes and came to
understand just how hard it can be to develop in the NumPy code-base.
He also is a C++ developer who really likes the beauty and power of that
language (which definitely biases his NumPy work, but he did put a lot of
effort into making NumPy better).  Before Peter and I started Continuum,
Mark had begun the DyND project as an example of a general-purpose dynamic
array library that could be used by any dynamic language to make arrays.

In the early days of Continuum, we spent time from at least Mark W, Bryan
Van de Ven, Jay Borque, and Francesc Alted looking at how to extend NumPy
to add 1) categorical data-types, 2) variable-length strings, and 3) better
date-time types.Bryan, a good developer, who has gone on to be a
primary developer of Bokeh spent quite a bit of time and had a prototype of
categoricals *nearly* working.   He did not like working on the NumPy
code-base at all.  He struggled with it and found it very difficult to
extend.He worked closely with Mark Wiebe who helped him the best he
could.   What took him 4 weeks in NumPy took him 3 days in DyND to build.
I think that experience, convinced him and Mark W both that working with
NumPy code-base would take too long to make significant progress.

Also, during 2012 I was trying to help with release-management (though I
ended up just hiring Ondrej Certek to actually do the work and he did a
great job of getting a release of NumPy out the door --- thanks to much
help from many of you).At that point, I realized very clearly, that
what I could best do at this point was to try and get more resources for
open source and for the NumPy stack rather than work on the code directly.
   We also did work with several clients that helped me realize just how
many disruptive changes had happened from 1.4 to 1.7 for extensive users of
NumPy (much more than would be justified from a we don't break the ABI
mantra that was the stated goal).

We also realized that the kind of experimentation we wanted to do in the
first 2 years of Continuum would just not be possible on the NumPy
code-base and the need for getting community buy-in on every decision would
slow us down too much --- as we had to iterate rapidly on so many things
and find our center as a startup.   It also would not be fair to the NumPy
community. Our decision to do *all* of our exploration outside the
NumPy code base was basically 1) the kinds of changes we wanted ultimately
were potentially dramatic and disruptive, 2) it would be too difficult and
time-consuming to decide all things in public discussions with the NumPy
community --- especially when some things were experimental 3) tying
ourselves to releases of NumPy would be difficult at that time, and 4) the
design of the NumPy code-base makes it difficult to contribute to --- both
Mark W and Bryan V felt they could make progress *much* faster in a new
code-base.

Continuum did not have enough start-up funding to devote significant time
on DyND in the early days.So Mark rallied what resources he could and
we supported him the best we could and he made progress.  My only real
requirement with sponsoring his work when we did 

Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-25 Thread Travis Oliphant
On Tue, Aug 25, 2015 at 3:58 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant tra...@continuum.io
 wrote:

 Thanks for the write-up Nathaniel.   There is a lot of great detail and
 interesting ideas here.

 snip



 There are at least 3 areas of compatibility (ABI, API, and semantic).
  ABI-compatibility is a non-feature in today's world.   There are so many
 distributions of the NumPy stack (and conda makes it trivial for anyone to
 build their own or for you to build one yourself).   Making less-optimal
 software-engineering choices because of fear of breaking the ABI is not
 something I'm supportive of at all.   We should not break ABI every
 release, but a release every 3 years that breaks ABI is not a problem.

 API compatibility should be much more sacrosanct, but it is also
 something that can also be managed.   Any NumPy 2.0 should definitely
 support the full NumPy API (though there could be deprecated swaths).I
 think the community has done well in using deprecation and limiting the
 public API to make this more manageable and I would love to see a NumPy 2.0
 that solidifies a future-oriented API along with a back-ward compatible API
 that is also available.

 Semantic compatibility is the hardest.   We have already broken this on
 multiple occasions throughout the 1.x NumPy releases.  Every time you
 change the code, this can change.This is what I fear causing deep
 instability over the course of many years. These are things like the
 casting rule details,  the effect of indexing changes, any change to the
 calculations approaches. It is and has been the most at risk during any
 code-changes.My view is that a NumPy 2.0 (with a new low-level
 architecture) minimizes these changes to a single release rather than
 unavoidably spreading them out over many, many releases.

 I think that summarizes my main concerns.  I will write-up more forward
 thinking ideas for what else is possible in the coming weeks.   In the mean
 time, thanks for keeping the discussion going.  It is extremely exciting to
 see the help people have continued to provide to maintain and improve
 NumPy.It will be exciting to see what the next few years bring as
 well.


 I think the only thing that looks even a little bit like a numpy 2.0 at
 this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a
 major project. Dynd is 2.5+ years old, 3500+ commits in, and still in
 progress.  If there is a decision to pursue Dynd I could support that, but
 I think we would want to think deeply about how to make the transition as
 painless as possible. It would be good at this point to get some feedback
 from people currently using dynd. IIRC, part of the reason for starting
 dynd was the perception that is was not possible to evolve numpy without
 running into compatibility road blocks. Travis, could you perhaps summarize
 the thinking that went into the decision to make dynd a separate project?


 Thanks Chuck.   I'll do this in a separate email, but I just wanted to
point out that when I say NumPy 2.0, I'm actually only specifically talking
about a release of NumPy that breaks ABI compatibility --- not some
potential re-write.   I'm not ruling that out, but I'm not necessarily
implying such a thing by saying NumPy 2.0.


 snip

 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 

*Travis Oliphant*
*Co-founder and CEO*


@teoliphant
512-222-5440
http://www.continuum.io
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Python extensions for Python 3.5 - useful info...

2015-08-25 Thread Fernando Perez
Just an FYI for the upcoming Python release, a very detailed post from
Steve Dower, the Microsoft developer who is now in charge of the Windows
releases for Python, on how the build process will change in 3.5 regarding
extensions:

http://stevedower.id.au/blog/building-for-python-3-5/

Cheers,

f
-- 
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-25 Thread Travis Oliphant
Thanks for the write-up Nathaniel.   There is a lot of great detail and
interesting ideas here.

I've am very eager to understand how to help NumPy and the wider community
move forward however I can (my passions on this have not changed since
1999, though what I myself spend time on has changed).

There are a lot of ways to think about approaching this, though.   It's
hard to get all the ideas on the table, and it was unfortunate we couldn't
get everybody wyho are core NumPy devs together in person to have this
discussion as there are still a lot of questions unanswered and a lot of
thought that has gone into other approaches that was not brought up or
represented in the meeting (how does Numba fit into this, what about
data-shape, dynd, memory-views and Python type system, etc.).   If NumPy
becomes just an interface-specification, then why don't we just do that
*outside* NumPy itself in a way that doesn't jeopardize the stability of
NumPy today.These are some of the real questions I have.   I will try
to write up my thoughts in more depth soon, but  I won't be able to respond
in-depth right now.   I just wanted to comment because Nathaniel said I
disagree which is only partly true.

The three most important things for me are 1) let's make sure we have
representation from as wide of the community as possible (this is really
hard), 2) let's look around at the broader community and the prior art that
is happening in this space right now and 3) let's not pretend we are going
to be able to make all this happen without breaking ABI compatibility.
Let's just break ABI compatibility with NumPy 2.0 *and* have as much
fidelity with the API and semantics of current NumPy as possible (though
there will be some changes necessary long-term).

I don't think we should intentionally break ABI if we can avoid it, but I
also don't think we should spend in-ordinate amounts of time trying to
pretend that we won't break ABI (for at least some people), and most
importantly we should not pretend *not* to break the ABI when we actually
do.We did this once before with the roll-out of date-time, and it was
really un-necessary. When I released NumPy 1.0, there were several
things that I knew should be fixed very soon (NumPy was never designed to
not break ABI).Those problems are still there.Now, that we have
quite a bit better understanding of what NumPy *should* be (there have been
tremendous strides in understanding and community size over the past 10
years), let's actually make the infrastructure we think will last for the
next 20 years (instead of trying to shoe-horn new ideas into a 20-year old
code-base that wasn't designed for it).

NumPy is a hard code-base.  It has been since Numeric days in 1995. I
could be wrong, but my guess is that we will be passed by as a community if
we don't seize the opportunity to build something better than we can build
if we are forced to use a 20 year old code-base.

It is more important to not break people's code and to be clear when a
re-compile is necessary for dependencies.   Those to me are the most
important constraints. There are a lot of great ideas that we all have
about what we want NumPy to be able to do. Some of this are pretty
transformational (and the more exciting they are, the harder I think they
are going to be to implement without breaking at least the ABI). There
is probably some CAP-like theorem around
Stability-Features-Speed-of-Development (pick 2) when it comes to Open
Source Software development and making feature-progress with NumPy *is
going* to create in-stability which concerns me.

I would like to see a little-bit-of-pain one time with a NumPy 2.0, rather
than a constant pain because of constant churn over many years approach
that Nathaniel seems to advocate.   To me NumPy 2.0 is an ABI-breaking
release that is as API-compatible as possible and whose semantics are not
dramatically different.

There are at least 3 areas of compatibility (ABI, API, and semantic).
 ABI-compatibility is a non-feature in today's world.   There are so many
distributions of the NumPy stack (and conda makes it trivial for anyone to
build their own or for you to build one yourself).   Making less-optimal
software-engineering choices because of fear of breaking the ABI is not
something I'm supportive of at all.   We should not break ABI every
release, but a release every 3 years that breaks ABI is not a problem.

API compatibility should be much more sacrosanct, but it is also something
that can also be managed.   Any NumPy 2.0 should definitely support the
full NumPy API (though there could be deprecated swaths).I think the
community has done well in using deprecation and limiting the public API to
make this more manageable and I would love to see a NumPy 2.0 that
solidifies a future-oriented API along with a back-ward compatible API that
is also available.

Semantic compatibility is the hardest.   We have already broken this on
multiple occasions 

Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-25 Thread Antoine Pitrou
On Tue, 25 Aug 2015 03:03:41 -0700
Nathaniel Smith n...@pobox.com wrote:
 
 Supporting third-party dtypes
 ~
 
[...]
 
   Some features that would become straightforward to implement
   (e.g. even in third-party libraries) if this were fixed:
   - missing value support
   - physical unit tracking (meters / seconds - array of velocity;
 meters + seconds - error)
   - better and more diverse datetime representations (e.g. datetimes
 with attached timezones, or using funky geophysical or
 astronomical calendars)
   - categorical data
   - variable length strings
   - strings-with-encodings (e.g. latin1)
   - forward mode automatic differentiation (write a function that
 computes f(x) where x is an array of float64; pass that function
 an array with a special dtype and get out both f(x) and f'(x))
   - probably others I'm forgetting right now

It should also be the opportunity to streamline datetime64 and
timedelta64 dtypes. Currently the unit information is IIRC hidden in
some weird metadata thing called the PyArray_DatetimeMetaData.

Also, thanks the notes. It has been an interesting read.

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-25 Thread Nathaniel Smith
Hi all,

These are the notes from the NumPy dev meeting held July 7, 2015, at
the SciPy conference in Austin, presented here so the list can keep up
with what happens, and so you can give feedback. Please do give
feedback, none of this is final!

(Also, if anyone who was there notices anything I left out or
mischaracterized, please speak up -- these are a lot of notes I'm
trying to gather together, so I could easily have missed something!)

Thanks to Jill Cowan and the rest of the SciPy organizers for donating
space and organizing logistics for us, and to the Berkeley Institute
for Data Science for funding travel for Jaime, Nathaniel, and
Sebastian.


Attendees
=

  Present in the room for all or part: Daniel Allan, Chris Barker,
  Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fernández del
  Río, Chuck Harris, Nathaniel Smith, Stéfan van der Walt. (Note: I'm
  pretty sure this list is incomplete)

  Joining remotely for all or part: Stephan Hoyer, Julian Taylor.


Formalizing our governance/decision making
==

  This was a major focus of discussion. At a high level, the consensus
  was to steal IPython's governance document (IPEP 29) and modify it
  to remove its use of a BDFL as a backstop to normal community
  consensus-based decision, and replace it with a new backstop based
  on Apache-project-style consensus voting amongst the core team.

  I'll send out a proper draft of this shortly for further discussion.


Development roadmap
===

  General consensus:

  Let's assume NumPy is going to remain important indefinitely, and
  try to make it better, instead of waiting for something better to
  come along. (This is unlikely to be wasted effort even if something
  better does come along, and it's hardly a sure thing that that will
  happen anyway.)

  Let's focus on evolving numpy as far as we can without major
  break-the-world changes (no numpy 2.0, at least in the foreseeable
  future).

  And, as a target for that evolution, let's change our focus from
  numpy as NumPy is the library that gives you the np.ndarray object
  (plus some attached infrastructure), to NumPy provides the
  standard framework for working with arrays and array-like objects in
  Python

  This means, creating defined interfaces between array-like objects /
  ufunc objects / dtype objects, so that it becomes possible for third
  parties to add their own and mix-and-match. Right now ufuncs are
  pretty good at this, but if you want a new array class or dtype then
  in most cases you pretty much have to modify numpy itself.

  Vision: instead of everyone who wants a new container type having to
  reimplement all of numpy, Alice can implement an array class using
  (sparse / distributed / compressed / tiled / gpu / out-of-core /
  delayed / ...) storage, pass it to code that was written using
  direct calls to np.* functions, and it just works. (Instead of
  np.sin being the way you calculate the sine of an ndarray, it's
  the way you calculate the sine of any array-like container
  object.)

  Vision: Darryl can implement a new dtype for (categorical data /
  astronomical dates / integers-with-missing-values / ...) without
  having to touch the numpy core.

  Vision: Chandni can then come along and combine them by doing

  a = alice_array([...], dtype=darryl_dtype)

  and it just works.

  Vision: no-one is tempted to subclass ndarray, because anything you
  can do with an ndarray subclass you can also easily do by defining
  your own new class that implements the array protocol.


Supporting third-party array types
~~

  Sub-goals:
  - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's
API right there.
  - Go through the rest of the stuff in numpy, and figure out some
story for how to let it handle third-party array classes:
- ufunc ALL the things: Some things can be converted directly into
  (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some
  things could be converted into (g)ufuncs if we extended the
  (g)ufunc interface a bit (e.g. np.sort, np.matmul).
- Some things probably need their own __numpy_ufunc__-like
  extensions (__numpy_concatenate__?)
  - Provide tools to make it easier to implement the more complicated
parts of an array object (e.g. the bazillion different methods,
many of which are ufuncs in disguise, or indexing)
  - Longer-run interesting research project: __numpy_ufunc__ requires
that one or the other object have explicit knowledge of how to
handle the other, so to handle binary ufuncs with N array types
you need something like N**2 __numpy_ufunc__ code paths. As an
alternative, if there were some interface that an object could
export that provided the operations nditer needs to efficiently
iterate over (chunks of) it, then you would only need N
implementations of this interface to handle all N**2 operations.

  This 

Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-25 Thread Charles R Harris
On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant tra...@continuum.io
wrote:

 Thanks for the write-up Nathaniel.   There is a lot of great detail and
 interesting ideas here.

 I've am very eager to understand how to help NumPy and the wider community
 move forward however I can (my passions on this have not changed since
 1999, though what I myself spend time on has changed).

 There are a lot of ways to think about approaching this, though.   It's
 hard to get all the ideas on the table, and it was unfortunate we couldn't
 get everybody wyho are core NumPy devs together in person to have this
 discussion as there are still a lot of questions unanswered and a lot of
 thought that has gone into other approaches that was not brought up or
 represented in the meeting (how does Numba fit into this, what about
 data-shape, dynd, memory-views and Python type system, etc.).   If NumPy
 becomes just an interface-specification, then why don't we just do that
 *outside* NumPy itself in a way that doesn't jeopardize the stability of
 NumPy today.These are some of the real questions I have.   I will try
 to write up my thoughts in more depth soon, but  I won't be able to respond
 in-depth right now.   I just wanted to comment because Nathaniel said I
 disagree which is only partly true.

 The three most important things for me are 1) let's make sure we have
 representation from as wide of the community as possible (this is really
 hard), 2) let's look around at the broader community and the prior art that
 is happening in this space right now and 3) let's not pretend we are going
 to be able to make all this happen without breaking ABI compatibility.
 Let's just break ABI compatibility with NumPy 2.0 *and* have as much
 fidelity with the API and semantics of current NumPy as possible (though
 there will be some changes necessary long-term).

 I don't think we should intentionally break ABI if we can avoid it, but I
 also don't think we should spend in-ordinate amounts of time trying to
 pretend that we won't break ABI (for at least some people), and most
 importantly we should not pretend *not* to break the ABI when we actually
 do.We did this once before with the roll-out of date-time, and it was
 really un-necessary. When I released NumPy 1.0, there were several
 things that I knew should be fixed very soon (NumPy was never designed to
 not break ABI).Those problems are still there.Now, that we have
 quite a bit better understanding of what NumPy *should* be (there have been
 tremendous strides in understanding and community size over the past 10
 years), let's actually make the infrastructure we think will last for the
 next 20 years (instead of trying to shoe-horn new ideas into a 20-year old
 code-base that wasn't designed for it).

 NumPy is a hard code-base.  It has been since Numeric days in 1995. I
 could be wrong, but my guess is that we will be passed by as a community if
 we don't seize the opportunity to build something better than we can build
 if we are forced to use a 20 year old code-base.

 It is more important to not break people's code and to be clear when a
 re-compile is necessary for dependencies.   Those to me are the most
 important constraints. There are a lot of great ideas that we all have
 about what we want NumPy to be able to do. Some of this are pretty
 transformational (and the more exciting they are, the harder I think they
 are going to be to implement without breaking at least the ABI). There
 is probably some CAP-like theorem around
 Stability-Features-Speed-of-Development (pick 2) when it comes to Open
 Source Software development and making feature-progress with NumPy *is
 going* to create in-stability which concerns me.

 I would like to see a little-bit-of-pain one time with a NumPy 2.0, rather
 than a constant pain because of constant churn over many years approach
 that Nathaniel seems to advocate.   To me NumPy 2.0 is an ABI-breaking
 release that is as API-compatible as possible and whose semantics are not
 dramatically different.

 There are at least 3 areas of compatibility (ABI, API, and semantic).
  ABI-compatibility is a non-feature in today's world.   There are so many
 distributions of the NumPy stack (and conda makes it trivial for anyone to
 build their own or for you to build one yourself).   Making less-optimal
 software-engineering choices because of fear of breaking the ABI is not
 something I'm supportive of at all.   We should not break ABI every
 release, but a release every 3 years that breaks ABI is not a problem.

 API compatibility should be much more sacrosanct, but it is also something
 that can also be managed.   Any NumPy 2.0 should definitely support the
 full NumPy API (though there could be deprecated swaths).I think the
 community has done well in using deprecation and limiting the public API to
 make this more manageable and I would love to see a NumPy 2.0 that
 solidifies a future-oriented API along with a 

Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-25 Thread Feng Yu
Hi Nathaniel,

Thanks for the notes.

In some sense, the new dtype class(es) will provided a way of
formalizing these `weird` metadata, and probably exposing them to
Python.

May I add that please consider adding a way to declare the sorting
order (priority and direction) of fields in a structured array in the
new dtype as well?

Regards,

Yu

On Tue, Aug 25, 2015 at 12:21 PM, Antoine Pitrou solip...@pitrou.net wrote:
 On Tue, 25 Aug 2015 03:03:41 -0700
 Nathaniel Smith n...@pobox.com wrote:

 Supporting third-party dtypes
 ~

 [...]

   Some features that would become straightforward to implement
   (e.g. even in third-party libraries) if this were fixed:
   - missing value support
   - physical unit tracking (meters / seconds - array of velocity;
 meters + seconds - error)
   - better and more diverse datetime representations (e.g. datetimes
 with attached timezones, or using funky geophysical or
 astronomical calendars)
   - categorical data
   - variable length strings
   - strings-with-encodings (e.g. latin1)
   - forward mode automatic differentiation (write a function that
 computes f(x) where x is an array of float64; pass that function
 an array with a special dtype and get out both f(x) and f'(x))
   - probably others I'm forgetting right now

 It should also be the opportunity to streamline datetime64 and
 timedelta64 dtypes. Currently the unit information is IIRC hidden in
 some weird metadata thing called the PyArray_DatetimeMetaData.

 Also, thanks the notes. It has been an interesting read.

 Regards

 Antoine.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-25 Thread David Cournapeau
Thanks for the good summary Nathaniel.

Regarding dtype machinery, I agree casting is the hardest part. Unless the
code has changed dramatically, this was the main reason why you could not
make most of the dtypes separate from numpy codebase (I tried to move the
datetime dtype out of multiarray into a separate C extension some years
ago). Being able to separate the dtypes from the multiarray module would be
an obvious way to drive the internal API change.

Regarding the use of cython in numpy, was there any discussion about the
compilation/size cost of using cython, and talking to the cython team to
improve this ? Or was that considered acceptable with current cython for
numpy. I am convinced cleanly separating the low level parts from the
python C API plumbing would be the single most important thing one could do
to make the codebase more amenable.

David




On Tue, Aug 25, 2015 at 9:58 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant tra...@continuum.io
 wrote:

 Thanks for the write-up Nathaniel.   There is a lot of great detail and
 interesting ideas here.

 I've am very eager to understand how to help NumPy and the wider
 community move forward however I can (my passions on this have not changed
 since 1999, though what I myself spend time on has changed).

 There are a lot of ways to think about approaching this, though.   It's
 hard to get all the ideas on the table, and it was unfortunate we couldn't
 get everybody wyho are core NumPy devs together in person to have this
 discussion as there are still a lot of questions unanswered and a lot of
 thought that has gone into other approaches that was not brought up or
 represented in the meeting (how does Numba fit into this, what about
 data-shape, dynd, memory-views and Python type system, etc.).   If NumPy
 becomes just an interface-specification, then why don't we just do that
 *outside* NumPy itself in a way that doesn't jeopardize the stability of
 NumPy today.These are some of the real questions I have.   I will try
 to write up my thoughts in more depth soon, but  I won't be able to respond
 in-depth right now.   I just wanted to comment because Nathaniel said I
 disagree which is only partly true.

 The three most important things for me are 1) let's make sure we have
 representation from as wide of the community as possible (this is really
 hard), 2) let's look around at the broader community and the prior art that
 is happening in this space right now and 3) let's not pretend we are going
 to be able to make all this happen without breaking ABI compatibility.
 Let's just break ABI compatibility with NumPy 2.0 *and* have as much
 fidelity with the API and semantics of current NumPy as possible (though
 there will be some changes necessary long-term).

 I don't think we should intentionally break ABI if we can avoid it, but I
 also don't think we should spend in-ordinate amounts of time trying to
 pretend that we won't break ABI (for at least some people), and most
 importantly we should not pretend *not* to break the ABI when we actually
 do.We did this once before with the roll-out of date-time, and it was
 really un-necessary. When I released NumPy 1.0, there were several
 things that I knew should be fixed very soon (NumPy was never designed to
 not break ABI).Those problems are still there.Now, that we have
 quite a bit better understanding of what NumPy *should* be (there have been
 tremendous strides in understanding and community size over the past 10
 years), let's actually make the infrastructure we think will last for the
 next 20 years (instead of trying to shoe-horn new ideas into a 20-year old
 code-base that wasn't designed for it).

 NumPy is a hard code-base.  It has been since Numeric days in 1995. I
 could be wrong, but my guess is that we will be passed by as a community if
 we don't seize the opportunity to build something better than we can build
 if we are forced to use a 20 year old code-base.

 It is more important to not break people's code and to be clear when a
 re-compile is necessary for dependencies.   Those to me are the most
 important constraints. There are a lot of great ideas that we all have
 about what we want NumPy to be able to do. Some of this are pretty
 transformational (and the more exciting they are, the harder I think they
 are going to be to implement without breaking at least the ABI). There
 is probably some CAP-like theorem around
 Stability-Features-Speed-of-Development (pick 2) when it comes to Open
 Source Software development and making feature-progress with NumPy *is
 going* to create in-stability which concerns me.

 I would like to see a little-bit-of-pain one time with a NumPy 2.0,
 rather than a constant pain because of constant churn over many years
 approach that Nathaniel seems to advocate.   To me NumPy 2.0 is an
 ABI-breaking release that is as API-compatible as possible and whose