Re: [Numpy-discussion] py2/py3 pickling
25.08.2015, 01:15, Chris Laumann kirjoitti: Would it be possible then (in relatively short order) to create a py2 - py3 numpy pickle converter? You probably need to modify the pickle stream directly, replacing *STRING opcodes with *BYTES opcodes when it comes to objects that are needed for constructing Numpy arrays. https://hg.python.org/cpython/file/tip/Modules/_pickle.c#l82 Or, use a custom pickler class that emits the new opcodes when it comes to data that is part of Numpy arrays, as Python 2 pickler doesn't know how to write bytes opcodes. It's probably doable, although likely annoying to implement. the pickles created won't be loadable on Py2, only Py3. You'd need to find a volunteer who wants to work on this or just do it yourself, though. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy helper function for __getitem__?
On 08/24/2015 10:23 AM, Sebastian Berg wrote: Fabien, just to make sure you are aware. If you are overriding `__getitem__`, you should also implement `__setitem__`. NumPy does some magic if you do not. That will seem to make `__setitem__` work fine, but breaks down if you have advanced indexing involved (or if you return copies, though it spits warnings in that case). Hi Sebastian, thanks for the info. I am writing a duck NetCDF4 Variable object, and therefore I am not trying to override Numpy arrays. I think that Stephan's function for xray is very useful. A possible improvement (probably at a certain performance cost) would be to be able to provide a shape instead of a number of dimensions. The output would then be slices with valid start and ends. Current behavior: In[9]: expanded_indexer(slice(None), 2) Out[9]: (slice(None, None, None), slice(None, None, None)) With shape: In[9]: expanded_indexer(slice(None), (3, 4)) Out[9]: (slice(0, 4, 1), slice(0, 5, 1)) But if nobody needed something like this before me, I think that I might have a design problem in my code (still quite new to python). Cheers and thanks, Fabien ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] py2/py3 pickling
On Tue, 25 Aug 2015 19:12:30 +0300 Pauli Virtanen p...@iki.fi wrote: 25.08.2015, 01:15, Chris Laumann kirjoitti: Would it be possible then (in relatively short order) to create a py2 - py3 numpy pickle converter? You probably need to modify the pickle stream directly, replacing *STRING opcodes with *BYTES opcodes when it comes to objects that are needed for constructing Numpy arrays. https://hg.python.org/cpython/file/tip/Modules/_pickle.c#l82 Or, use a custom pickler class that emits the new opcodes when it comes to data that is part of Numpy arrays, as Python 2 pickler doesn't know how to write bytes opcodes. It's probably doable, although likely annoying to implement. the pickles created won't be loadable on Py2, only Py3. One could take a look at how the built-in bytearray type achieves pickle compatibility between 2.x and 3.x. The solution is to serialize the binary data as a latin-1 decoded unicode string, and to return the right reconstructor from __reduce__. The solution is less space-efficient than pure bytes pickling, since the unicode string is serialized as utf-8 (so bytes 0x80 are multibyte-encoded). There's also some CPU overhead, due to the successive decoding and encoding steps. You can take a look at the bytearray_reduce() function in Objects/bytearrayobject.c, both for 2.x and 3.x. (also note how the 3.x version does it only for protocols 3, to achieve better efficiency on newer protocol versions) Another possibility would be a custom Unpickler class for 3.x, dealing specifically with 2.x-produced Numpy array pickles. That way the pickles themselves could be cross-version. Regards Antoine. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] 1.10.0rc1
Hi All, The silence after the 1.10 beta has been eerie. Consequently, I'm thinking of making a first release candidate this weekend. If you haven't yet tested the beta, please do so. It would be good to discover as many problems as we can before the first release. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
On Tue, Aug 25, 2015 at 5:03 AM, Nathaniel Smith n...@pobox.com wrote: Hi all, These are the notes from the NumPy dev meeting held July 7, 2015, at the SciPy conference in Austin, presented here so the list can keep up with what happens, and so you can give feedback. Please do give feedback, none of this is final! (Also, if anyone who was there notices anything I left out or mischaracterized, please speak up -- these are a lot of notes I'm trying to gather together, so I could easily have missed something!) Thanks to Jill Cowan and the rest of the SciPy organizers for donating space and organizing logistics for us, and to the Berkeley Institute for Data Science for funding travel for Jaime, Nathaniel, and Sebastian. Attendees = Present in the room for all or part: Daniel Allan, Chris Barker, Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fernández del Río, Chuck Harris, Nathaniel Smith, Stéfan van der Walt. (Note: I'm pretty sure this list is incomplete) Joining remotely for all or part: Stephan Hoyer, Julian Taylor. Formalizing our governance/decision making == This was a major focus of discussion. At a high level, the consensus was to steal IPython's governance document (IPEP 29) and modify it to remove its use of a BDFL as a backstop to normal community consensus-based decision, and replace it with a new backstop based on Apache-project-style consensus voting amongst the core team. I'll send out a proper draft of this shortly for further discussion. Development roadmap === General consensus: Let's assume NumPy is going to remain important indefinitely, and try to make it better, instead of waiting for something better to come along. (This is unlikely to be wasted effort even if something better does come along, and it's hardly a sure thing that that will happen anyway.) Let's focus on evolving numpy as far as we can without major break-the-world changes (no numpy 2.0, at least in the foreseeable future). And, as a target for that evolution, let's change our focus from numpy as NumPy is the library that gives you the np.ndarray object (plus some attached infrastructure), to NumPy provides the standard framework for working with arrays and array-like objects in Python This means, creating defined interfaces between array-like objects / ufunc objects / dtype objects, so that it becomes possible for third parties to add their own and mix-and-match. Right now ufuncs are pretty good at this, but if you want a new array class or dtype then in most cases you pretty much have to modify numpy itself. Vision: instead of everyone who wants a new container type having to reimplement all of numpy, Alice can implement an array class using (sparse / distributed / compressed / tiled / gpu / out-of-core / delayed / ...) storage, pass it to code that was written using direct calls to np.* functions, and it just works. (Instead of np.sin being the way you calculate the sine of an ndarray, it's the way you calculate the sine of any array-like container object.) Vision: Darryl can implement a new dtype for (categorical data / astronomical dates / integers-with-missing-values / ...) without having to touch the numpy core. Vision: Chandni can then come along and combine them by doing a = alice_array([...], dtype=darryl_dtype) and it just works. Vision: no-one is tempted to subclass ndarray, because anything you can do with an ndarray subclass you can also easily do by defining your own new class that implements the array protocol. Supporting third-party array types ~~ Sub-goals: - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's API right there. - Go through the rest of the stuff in numpy, and figure out some story for how to let it handle third-party array classes: - ufunc ALL the things: Some things can be converted directly into (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some things could be converted into (g)ufuncs if we extended the (g)ufunc interface a bit (e.g. np.sort, np.matmul). - Some things probably need their own __numpy_ufunc__-like extensions (__numpy_concatenate__?) - Provide tools to make it easier to implement the more complicated parts of an array object (e.g. the bazillion different methods, many of which are ufuncs in disguise, or indexing) - Longer-run interesting research project: __numpy_ufunc__ requires that one or the other object have explicit knowledge of how to handle the other, so to handle binary ufuncs with N array types you need something like N**2 __numpy_ufunc__ code paths. As an alternative, if there were some interface that an object could export that provided the operations
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
On Tue, Aug 25, 2015 at 4:03 AM, Nathaniel Smith n...@pobox.com wrote: Hi all, These are the notes from the NumPy dev meeting held July 7, 2015, at the SciPy conference in Austin, presented here so the list can keep up with what happens, and so you can give feedback. Please do give feedback, none of this is final! (Also, if anyone who was there notices anything I left out or mischaracterized, please speak up -- these are a lot of notes I'm trying to gather together, so I could easily have missed something!) Thanks to Jill Cowan and the rest of the SciPy organizers for donating space and organizing logistics for us, and to the Berkeley Institute for Data Science for funding travel for Jaime, Nathaniel, and Sebastian. Attendees = Present in the room for all or part: Daniel Allan, Chris Barker, Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fernández del Río, Chuck Harris, Nathaniel Smith, Stéfan van der Walt. (Note: I'm pretty sure this list is incomplete) Joining remotely for all or part: Stephan Hoyer, Julian Taylor. Formalizing our governance/decision making == This was a major focus of discussion. At a high level, the consensus was to steal IPython's governance document (IPEP 29) and modify it to remove its use of a BDFL as a backstop to normal community consensus-based decision, and replace it with a new backstop based on Apache-project-style consensus voting amongst the core team. I'll send out a proper draft of this shortly for further discussion. Development roadmap === General consensus: Let's assume NumPy is going to remain important indefinitely, and try to make it better, instead of waiting for something better to come along. (This is unlikely to be wasted effort even if something better does come along, and it's hardly a sure thing that that will happen anyway.) Let's focus on evolving numpy as far as we can without major break-the-world changes (no numpy 2.0, at least in the foreseeable future). And, as a target for that evolution, let's change our focus from numpy as NumPy is the library that gives you the np.ndarray object (plus some attached infrastructure), to NumPy provides the standard framework for working with arrays and array-like objects in Python This means, creating defined interfaces between array-like objects / ufunc objects / dtype objects, so that it becomes possible for third parties to add their own and mix-and-match. Right now ufuncs are pretty good at this, but if you want a new array class or dtype then in most cases you pretty much have to modify numpy itself. Vision: instead of everyone who wants a new container type having to reimplement all of numpy, Alice can implement an array class using (sparse / distributed / compressed / tiled / gpu / out-of-core / delayed / ...) storage, pass it to code that was written using direct calls to np.* functions, and it just works. (Instead of np.sin being the way you calculate the sine of an ndarray, it's the way you calculate the sine of any array-like container object.) Vision: Darryl can implement a new dtype for (categorical data / astronomical dates / integers-with-missing-values / ...) without having to touch the numpy core. Vision: Chandni can then come along and combine them by doing a = alice_array([...], dtype=darryl_dtype) and it just works. Vision: no-one is tempted to subclass ndarray, because anything you can do with an ndarray subclass you can also easily do by defining your own new class that implements the array protocol. Supporting third-party array types ~~ Sub-goals: - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's API right there. - Go through the rest of the stuff in numpy, and figure out some story for how to let it handle third-party array classes: - ufunc ALL the things: Some things can be converted directly into (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some things could be converted into (g)ufuncs if we extended the (g)ufunc interface a bit (e.g. np.sort, np.matmul). - Some things probably need their own __numpy_ufunc__-like extensions (__numpy_concatenate__?) - Provide tools to make it easier to implement the more complicated parts of an array object (e.g. the bazillion different methods, many of which are ufuncs in disguise, or indexing) - Longer-run interesting research project: __numpy_ufunc__ requires that one or the other object have explicit knowledge of how to handle the other, so to handle binary ufuncs with N array types you need something like N**2 __numpy_ufunc__ code paths. As an alternative, if there were some interface that an object could export that provided the operations
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
On Tue, Aug 25, 2015 at 3:58 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant tra...@continuum.io wrote: Thanks for the write-up Nathaniel. There is a lot of great detail and interesting ideas here. snip I think that summarizes my main concerns. I will write-up more forward thinking ideas for what else is possible in the coming weeks. In the mean time, thanks for keeping the discussion going. It is extremely exciting to see the help people have continued to provide to maintain and improve NumPy.It will be exciting to see what the next few years bring as well. I think the only thing that looks even a little bit like a numpy 2.0 at this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a major project. Dynd is 2.5+ years old, 3500+ commits in, and still in progress. If there is a decision to pursue Dynd I could support that, but I think we would want to think deeply about how to make the transition as painless as possible. It would be good at this point to get some feedback from people currently using dynd. IIRC, part of the reason for starting dynd was the perception that is was not possible to evolve numpy without running into compatibility road blocks. Travis, could you perhaps summarize the thinking that went into the decision to make dynd a separate project? I think it would be best if Mark Wiebe speaks up here. I can explain why Continuum supported DyND with some fraction of Mark's time for a few years and give my perspective, but ultimately DyND is Mark's story to tell (and a few talented people have now joined him in the effort). Mark Wiebe was a productive NumPy developer. He was one of a few people that jumped in on the code-base and made substantial and significant changes and came to understand just how hard it can be to develop in the NumPy code-base. He also is a C++ developer who really likes the beauty and power of that language (which definitely biases his NumPy work, but he did put a lot of effort into making NumPy better). Before Peter and I started Continuum, Mark had begun the DyND project as an example of a general-purpose dynamic array library that could be used by any dynamic language to make arrays. In the early days of Continuum, we spent time from at least Mark W, Bryan Van de Ven, Jay Borque, and Francesc Alted looking at how to extend NumPy to add 1) categorical data-types, 2) variable-length strings, and 3) better date-time types.Bryan, a good developer, who has gone on to be a primary developer of Bokeh spent quite a bit of time and had a prototype of categoricals *nearly* working. He did not like working on the NumPy code-base at all. He struggled with it and found it very difficult to extend.He worked closely with Mark Wiebe who helped him the best he could. What took him 4 weeks in NumPy took him 3 days in DyND to build. I think that experience, convinced him and Mark W both that working with NumPy code-base would take too long to make significant progress. Also, during 2012 I was trying to help with release-management (though I ended up just hiring Ondrej Certek to actually do the work and he did a great job of getting a release of NumPy out the door --- thanks to much help from many of you).At that point, I realized very clearly, that what I could best do at this point was to try and get more resources for open source and for the NumPy stack rather than work on the code directly. We also did work with several clients that helped me realize just how many disruptive changes had happened from 1.4 to 1.7 for extensive users of NumPy (much more than would be justified from a we don't break the ABI mantra that was the stated goal). We also realized that the kind of experimentation we wanted to do in the first 2 years of Continuum would just not be possible on the NumPy code-base and the need for getting community buy-in on every decision would slow us down too much --- as we had to iterate rapidly on so many things and find our center as a startup. It also would not be fair to the NumPy community. Our decision to do *all* of our exploration outside the NumPy code base was basically 1) the kinds of changes we wanted ultimately were potentially dramatic and disruptive, 2) it would be too difficult and time-consuming to decide all things in public discussions with the NumPy community --- especially when some things were experimental 3) tying ourselves to releases of NumPy would be difficult at that time, and 4) the design of the NumPy code-base makes it difficult to contribute to --- both Mark W and Bryan V felt they could make progress *much* faster in a new code-base. Continuum did not have enough start-up funding to devote significant time on DyND in the early days.So Mark rallied what resources he could and we supported him the best we could and he made progress. My only real requirement with sponsoring his work when we did
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
On Tue, Aug 25, 2015 at 3:58 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant tra...@continuum.io wrote: Thanks for the write-up Nathaniel. There is a lot of great detail and interesting ideas here. snip There are at least 3 areas of compatibility (ABI, API, and semantic). ABI-compatibility is a non-feature in today's world. There are so many distributions of the NumPy stack (and conda makes it trivial for anyone to build their own or for you to build one yourself). Making less-optimal software-engineering choices because of fear of breaking the ABI is not something I'm supportive of at all. We should not break ABI every release, but a release every 3 years that breaks ABI is not a problem. API compatibility should be much more sacrosanct, but it is also something that can also be managed. Any NumPy 2.0 should definitely support the full NumPy API (though there could be deprecated swaths).I think the community has done well in using deprecation and limiting the public API to make this more manageable and I would love to see a NumPy 2.0 that solidifies a future-oriented API along with a back-ward compatible API that is also available. Semantic compatibility is the hardest. We have already broken this on multiple occasions throughout the 1.x NumPy releases. Every time you change the code, this can change.This is what I fear causing deep instability over the course of many years. These are things like the casting rule details, the effect of indexing changes, any change to the calculations approaches. It is and has been the most at risk during any code-changes.My view is that a NumPy 2.0 (with a new low-level architecture) minimizes these changes to a single release rather than unavoidably spreading them out over many, many releases. I think that summarizes my main concerns. I will write-up more forward thinking ideas for what else is possible in the coming weeks. In the mean time, thanks for keeping the discussion going. It is extremely exciting to see the help people have continued to provide to maintain and improve NumPy.It will be exciting to see what the next few years bring as well. I think the only thing that looks even a little bit like a numpy 2.0 at this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a major project. Dynd is 2.5+ years old, 3500+ commits in, and still in progress. If there is a decision to pursue Dynd I could support that, but I think we would want to think deeply about how to make the transition as painless as possible. It would be good at this point to get some feedback from people currently using dynd. IIRC, part of the reason for starting dynd was the perception that is was not possible to evolve numpy without running into compatibility road blocks. Travis, could you perhaps summarize the thinking that went into the decision to make dynd a separate project? Thanks Chuck. I'll do this in a separate email, but I just wanted to point out that when I say NumPy 2.0, I'm actually only specifically talking about a release of NumPy that breaks ABI compatibility --- not some potential re-write. I'm not ruling that out, but I'm not necessarily implying such a thing by saying NumPy 2.0. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Python extensions for Python 3.5 - useful info...
Just an FYI for the upcoming Python release, a very detailed post from Steve Dower, the Microsoft developer who is now in charge of the Windows releases for Python, on how the build process will change in 3.5 regarding extensions: http://stevedower.id.au/blog/building-for-python-3-5/ Cheers, f -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
Thanks for the write-up Nathaniel. There is a lot of great detail and interesting ideas here. I've am very eager to understand how to help NumPy and the wider community move forward however I can (my passions on this have not changed since 1999, though what I myself spend time on has changed). There are a lot of ways to think about approaching this, though. It's hard to get all the ideas on the table, and it was unfortunate we couldn't get everybody wyho are core NumPy devs together in person to have this discussion as there are still a lot of questions unanswered and a lot of thought that has gone into other approaches that was not brought up or represented in the meeting (how does Numba fit into this, what about data-shape, dynd, memory-views and Python type system, etc.). If NumPy becomes just an interface-specification, then why don't we just do that *outside* NumPy itself in a way that doesn't jeopardize the stability of NumPy today.These are some of the real questions I have. I will try to write up my thoughts in more depth soon, but I won't be able to respond in-depth right now. I just wanted to comment because Nathaniel said I disagree which is only partly true. The three most important things for me are 1) let's make sure we have representation from as wide of the community as possible (this is really hard), 2) let's look around at the broader community and the prior art that is happening in this space right now and 3) let's not pretend we are going to be able to make all this happen without breaking ABI compatibility. Let's just break ABI compatibility with NumPy 2.0 *and* have as much fidelity with the API and semantics of current NumPy as possible (though there will be some changes necessary long-term). I don't think we should intentionally break ABI if we can avoid it, but I also don't think we should spend in-ordinate amounts of time trying to pretend that we won't break ABI (for at least some people), and most importantly we should not pretend *not* to break the ABI when we actually do.We did this once before with the roll-out of date-time, and it was really un-necessary. When I released NumPy 1.0, there were several things that I knew should be fixed very soon (NumPy was never designed to not break ABI).Those problems are still there.Now, that we have quite a bit better understanding of what NumPy *should* be (there have been tremendous strides in understanding and community size over the past 10 years), let's actually make the infrastructure we think will last for the next 20 years (instead of trying to shoe-horn new ideas into a 20-year old code-base that wasn't designed for it). NumPy is a hard code-base. It has been since Numeric days in 1995. I could be wrong, but my guess is that we will be passed by as a community if we don't seize the opportunity to build something better than we can build if we are forced to use a 20 year old code-base. It is more important to not break people's code and to be clear when a re-compile is necessary for dependencies. Those to me are the most important constraints. There are a lot of great ideas that we all have about what we want NumPy to be able to do. Some of this are pretty transformational (and the more exciting they are, the harder I think they are going to be to implement without breaking at least the ABI). There is probably some CAP-like theorem around Stability-Features-Speed-of-Development (pick 2) when it comes to Open Source Software development and making feature-progress with NumPy *is going* to create in-stability which concerns me. I would like to see a little-bit-of-pain one time with a NumPy 2.0, rather than a constant pain because of constant churn over many years approach that Nathaniel seems to advocate. To me NumPy 2.0 is an ABI-breaking release that is as API-compatible as possible and whose semantics are not dramatically different. There are at least 3 areas of compatibility (ABI, API, and semantic). ABI-compatibility is a non-feature in today's world. There are so many distributions of the NumPy stack (and conda makes it trivial for anyone to build their own or for you to build one yourself). Making less-optimal software-engineering choices because of fear of breaking the ABI is not something I'm supportive of at all. We should not break ABI every release, but a release every 3 years that breaks ABI is not a problem. API compatibility should be much more sacrosanct, but it is also something that can also be managed. Any NumPy 2.0 should definitely support the full NumPy API (though there could be deprecated swaths).I think the community has done well in using deprecation and limiting the public API to make this more manageable and I would love to see a NumPy 2.0 that solidifies a future-oriented API along with a back-ward compatible API that is also available. Semantic compatibility is the hardest. We have already broken this on multiple occasions
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
On Tue, 25 Aug 2015 03:03:41 -0700 Nathaniel Smith n...@pobox.com wrote: Supporting third-party dtypes ~ [...] Some features that would become straightforward to implement (e.g. even in third-party libraries) if this were fixed: - missing value support - physical unit tracking (meters / seconds - array of velocity; meters + seconds - error) - better and more diverse datetime representations (e.g. datetimes with attached timezones, or using funky geophysical or astronomical calendars) - categorical data - variable length strings - strings-with-encodings (e.g. latin1) - forward mode automatic differentiation (write a function that computes f(x) where x is an array of float64; pass that function an array with a special dtype and get out both f(x) and f'(x)) - probably others I'm forgetting right now It should also be the opportunity to streamline datetime64 and timedelta64 dtypes. Currently the unit information is IIRC hidden in some weird metadata thing called the PyArray_DatetimeMetaData. Also, thanks the notes. It has been an interesting read. Regards Antoine. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
Hi all, These are the notes from the NumPy dev meeting held July 7, 2015, at the SciPy conference in Austin, presented here so the list can keep up with what happens, and so you can give feedback. Please do give feedback, none of this is final! (Also, if anyone who was there notices anything I left out or mischaracterized, please speak up -- these are a lot of notes I'm trying to gather together, so I could easily have missed something!) Thanks to Jill Cowan and the rest of the SciPy organizers for donating space and organizing logistics for us, and to the Berkeley Institute for Data Science for funding travel for Jaime, Nathaniel, and Sebastian. Attendees = Present in the room for all or part: Daniel Allan, Chris Barker, Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fernández del Río, Chuck Harris, Nathaniel Smith, Stéfan van der Walt. (Note: I'm pretty sure this list is incomplete) Joining remotely for all or part: Stephan Hoyer, Julian Taylor. Formalizing our governance/decision making == This was a major focus of discussion. At a high level, the consensus was to steal IPython's governance document (IPEP 29) and modify it to remove its use of a BDFL as a backstop to normal community consensus-based decision, and replace it with a new backstop based on Apache-project-style consensus voting amongst the core team. I'll send out a proper draft of this shortly for further discussion. Development roadmap === General consensus: Let's assume NumPy is going to remain important indefinitely, and try to make it better, instead of waiting for something better to come along. (This is unlikely to be wasted effort even if something better does come along, and it's hardly a sure thing that that will happen anyway.) Let's focus on evolving numpy as far as we can without major break-the-world changes (no numpy 2.0, at least in the foreseeable future). And, as a target for that evolution, let's change our focus from numpy as NumPy is the library that gives you the np.ndarray object (plus some attached infrastructure), to NumPy provides the standard framework for working with arrays and array-like objects in Python This means, creating defined interfaces between array-like objects / ufunc objects / dtype objects, so that it becomes possible for third parties to add their own and mix-and-match. Right now ufuncs are pretty good at this, but if you want a new array class or dtype then in most cases you pretty much have to modify numpy itself. Vision: instead of everyone who wants a new container type having to reimplement all of numpy, Alice can implement an array class using (sparse / distributed / compressed / tiled / gpu / out-of-core / delayed / ...) storage, pass it to code that was written using direct calls to np.* functions, and it just works. (Instead of np.sin being the way you calculate the sine of an ndarray, it's the way you calculate the sine of any array-like container object.) Vision: Darryl can implement a new dtype for (categorical data / astronomical dates / integers-with-missing-values / ...) without having to touch the numpy core. Vision: Chandni can then come along and combine them by doing a = alice_array([...], dtype=darryl_dtype) and it just works. Vision: no-one is tempted to subclass ndarray, because anything you can do with an ndarray subclass you can also easily do by defining your own new class that implements the array protocol. Supporting third-party array types ~~ Sub-goals: - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's API right there. - Go through the rest of the stuff in numpy, and figure out some story for how to let it handle third-party array classes: - ufunc ALL the things: Some things can be converted directly into (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some things could be converted into (g)ufuncs if we extended the (g)ufunc interface a bit (e.g. np.sort, np.matmul). - Some things probably need their own __numpy_ufunc__-like extensions (__numpy_concatenate__?) - Provide tools to make it easier to implement the more complicated parts of an array object (e.g. the bazillion different methods, many of which are ufuncs in disguise, or indexing) - Longer-run interesting research project: __numpy_ufunc__ requires that one or the other object have explicit knowledge of how to handle the other, so to handle binary ufuncs with N array types you need something like N**2 __numpy_ufunc__ code paths. As an alternative, if there were some interface that an object could export that provided the operations nditer needs to efficiently iterate over (chunks of) it, then you would only need N implementations of this interface to handle all N**2 operations. This
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant tra...@continuum.io wrote: Thanks for the write-up Nathaniel. There is a lot of great detail and interesting ideas here. I've am very eager to understand how to help NumPy and the wider community move forward however I can (my passions on this have not changed since 1999, though what I myself spend time on has changed). There are a lot of ways to think about approaching this, though. It's hard to get all the ideas on the table, and it was unfortunate we couldn't get everybody wyho are core NumPy devs together in person to have this discussion as there are still a lot of questions unanswered and a lot of thought that has gone into other approaches that was not brought up or represented in the meeting (how does Numba fit into this, what about data-shape, dynd, memory-views and Python type system, etc.). If NumPy becomes just an interface-specification, then why don't we just do that *outside* NumPy itself in a way that doesn't jeopardize the stability of NumPy today.These are some of the real questions I have. I will try to write up my thoughts in more depth soon, but I won't be able to respond in-depth right now. I just wanted to comment because Nathaniel said I disagree which is only partly true. The three most important things for me are 1) let's make sure we have representation from as wide of the community as possible (this is really hard), 2) let's look around at the broader community and the prior art that is happening in this space right now and 3) let's not pretend we are going to be able to make all this happen without breaking ABI compatibility. Let's just break ABI compatibility with NumPy 2.0 *and* have as much fidelity with the API and semantics of current NumPy as possible (though there will be some changes necessary long-term). I don't think we should intentionally break ABI if we can avoid it, but I also don't think we should spend in-ordinate amounts of time trying to pretend that we won't break ABI (for at least some people), and most importantly we should not pretend *not* to break the ABI when we actually do.We did this once before with the roll-out of date-time, and it was really un-necessary. When I released NumPy 1.0, there were several things that I knew should be fixed very soon (NumPy was never designed to not break ABI).Those problems are still there.Now, that we have quite a bit better understanding of what NumPy *should* be (there have been tremendous strides in understanding and community size over the past 10 years), let's actually make the infrastructure we think will last for the next 20 years (instead of trying to shoe-horn new ideas into a 20-year old code-base that wasn't designed for it). NumPy is a hard code-base. It has been since Numeric days in 1995. I could be wrong, but my guess is that we will be passed by as a community if we don't seize the opportunity to build something better than we can build if we are forced to use a 20 year old code-base. It is more important to not break people's code and to be clear when a re-compile is necessary for dependencies. Those to me are the most important constraints. There are a lot of great ideas that we all have about what we want NumPy to be able to do. Some of this are pretty transformational (and the more exciting they are, the harder I think they are going to be to implement without breaking at least the ABI). There is probably some CAP-like theorem around Stability-Features-Speed-of-Development (pick 2) when it comes to Open Source Software development and making feature-progress with NumPy *is going* to create in-stability which concerns me. I would like to see a little-bit-of-pain one time with a NumPy 2.0, rather than a constant pain because of constant churn over many years approach that Nathaniel seems to advocate. To me NumPy 2.0 is an ABI-breaking release that is as API-compatible as possible and whose semantics are not dramatically different. There are at least 3 areas of compatibility (ABI, API, and semantic). ABI-compatibility is a non-feature in today's world. There are so many distributions of the NumPy stack (and conda makes it trivial for anyone to build their own or for you to build one yourself). Making less-optimal software-engineering choices because of fear of breaking the ABI is not something I'm supportive of at all. We should not break ABI every release, but a release every 3 years that breaks ABI is not a problem. API compatibility should be much more sacrosanct, but it is also something that can also be managed. Any NumPy 2.0 should definitely support the full NumPy API (though there could be deprecated swaths).I think the community has done well in using deprecation and limiting the public API to make this more manageable and I would love to see a NumPy 2.0 that solidifies a future-oriented API along with a
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
Hi Nathaniel, Thanks for the notes. In some sense, the new dtype class(es) will provided a way of formalizing these `weird` metadata, and probably exposing them to Python. May I add that please consider adding a way to declare the sorting order (priority and direction) of fields in a structured array in the new dtype as well? Regards, Yu On Tue, Aug 25, 2015 at 12:21 PM, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 25 Aug 2015 03:03:41 -0700 Nathaniel Smith n...@pobox.com wrote: Supporting third-party dtypes ~ [...] Some features that would become straightforward to implement (e.g. even in third-party libraries) if this were fixed: - missing value support - physical unit tracking (meters / seconds - array of velocity; meters + seconds - error) - better and more diverse datetime representations (e.g. datetimes with attached timezones, or using funky geophysical or astronomical calendars) - categorical data - variable length strings - strings-with-encodings (e.g. latin1) - forward mode automatic differentiation (write a function that computes f(x) where x is an array of float64; pass that function an array with a special dtype and get out both f(x) and f'(x)) - probably others I'm forgetting right now It should also be the opportunity to streamline datetime64 and timedelta64 dtypes. Currently the unit information is IIRC hidden in some weird metadata thing called the PyArray_DatetimeMetaData. Also, thanks the notes. It has been an interesting read. Regards Antoine. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
Thanks for the good summary Nathaniel. Regarding dtype machinery, I agree casting is the hardest part. Unless the code has changed dramatically, this was the main reason why you could not make most of the dtypes separate from numpy codebase (I tried to move the datetime dtype out of multiarray into a separate C extension some years ago). Being able to separate the dtypes from the multiarray module would be an obvious way to drive the internal API change. Regarding the use of cython in numpy, was there any discussion about the compilation/size cost of using cython, and talking to the cython team to improve this ? Or was that considered acceptable with current cython for numpy. I am convinced cleanly separating the low level parts from the python C API plumbing would be the single most important thing one could do to make the codebase more amenable. David On Tue, Aug 25, 2015 at 9:58 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant tra...@continuum.io wrote: Thanks for the write-up Nathaniel. There is a lot of great detail and interesting ideas here. I've am very eager to understand how to help NumPy and the wider community move forward however I can (my passions on this have not changed since 1999, though what I myself spend time on has changed). There are a lot of ways to think about approaching this, though. It's hard to get all the ideas on the table, and it was unfortunate we couldn't get everybody wyho are core NumPy devs together in person to have this discussion as there are still a lot of questions unanswered and a lot of thought that has gone into other approaches that was not brought up or represented in the meeting (how does Numba fit into this, what about data-shape, dynd, memory-views and Python type system, etc.). If NumPy becomes just an interface-specification, then why don't we just do that *outside* NumPy itself in a way that doesn't jeopardize the stability of NumPy today.These are some of the real questions I have. I will try to write up my thoughts in more depth soon, but I won't be able to respond in-depth right now. I just wanted to comment because Nathaniel said I disagree which is only partly true. The three most important things for me are 1) let's make sure we have representation from as wide of the community as possible (this is really hard), 2) let's look around at the broader community and the prior art that is happening in this space right now and 3) let's not pretend we are going to be able to make all this happen without breaking ABI compatibility. Let's just break ABI compatibility with NumPy 2.0 *and* have as much fidelity with the API and semantics of current NumPy as possible (though there will be some changes necessary long-term). I don't think we should intentionally break ABI if we can avoid it, but I also don't think we should spend in-ordinate amounts of time trying to pretend that we won't break ABI (for at least some people), and most importantly we should not pretend *not* to break the ABI when we actually do.We did this once before with the roll-out of date-time, and it was really un-necessary. When I released NumPy 1.0, there were several things that I knew should be fixed very soon (NumPy was never designed to not break ABI).Those problems are still there.Now, that we have quite a bit better understanding of what NumPy *should* be (there have been tremendous strides in understanding and community size over the past 10 years), let's actually make the infrastructure we think will last for the next 20 years (instead of trying to shoe-horn new ideas into a 20-year old code-base that wasn't designed for it). NumPy is a hard code-base. It has been since Numeric days in 1995. I could be wrong, but my guess is that we will be passed by as a community if we don't seize the opportunity to build something better than we can build if we are forced to use a 20 year old code-base. It is more important to not break people's code and to be clear when a re-compile is necessary for dependencies. Those to me are the most important constraints. There are a lot of great ideas that we all have about what we want NumPy to be able to do. Some of this are pretty transformational (and the more exciting they are, the harder I think they are going to be to implement without breaking at least the ABI). There is probably some CAP-like theorem around Stability-Features-Speed-of-Development (pick 2) when it comes to Open Source Software development and making feature-progress with NumPy *is going* to create in-stability which concerns me. I would like to see a little-bit-of-pain one time with a NumPy 2.0, rather than a constant pain because of constant churn over many years approach that Nathaniel seems to advocate. To me NumPy 2.0 is an ABI-breaking release that is as API-compatible as possible and whose