Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1
On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: Hi, I'm pleased to announce the availability of the first release candidate of NumPy 1.6.2. This is a maintenance release. Due to the delay of the NumPy 1.7.0, this release contains far more fixes than a regular NumPy bugfix release. It also includes a number of documentation and build improvements. Sources and binary installers can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ Please test this release and report any issues on the numpy-discussion mailing list. Mh, I can't exactly understand this: $ diff -urNad numpy-1.6.1 numpy-1.6.2rc | diffstat | tail -1 2718 files changed, 390859 deletions(-) does it mean that the only thing the RC has done is to remove a lot of stuff? that's weird because the build process went all just fine and unit tests are passing ... /me confused? -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Missing data wrap-up and request for comments
Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations.I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). I would like one more discussion thread where the technical discussion can take place.I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else.I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion.I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start.So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked objects as their core. For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1
On Wed, May 9, 2012 at 10:36 AM, Sandro Tosi matrixh...@gmail.com wrote: On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: Hi, I'm pleased to announce the availability of the first release candidate of NumPy 1.6.2. This is a maintenance release. Due to the delay of the NumPy 1.7.0, this release contains far more fixes than a regular NumPy bugfix release. It also includes a number of documentation and build improvements. Sources and binary installers can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ Please test this release and report any issues on the numpy-discussion mailing list. Mh, I can't exactly understand this: $ diff -urNad numpy-1.6.1 numpy-1.6.2rc | diffstat | tail -1 2718 files changed, 390859 deletions(-) does it mean that the only thing the RC has done is to remove a lot of stuff? that's weird because the build process went all just fine and unit tests are passing ... /me confused? No, only a few files were changed. Since there are about 1000 files in numpy I suspect you are also counting everything in the build and documentation build directories. If you built inplace, you are also going to pick up *.pyc files and such. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1
On Wed, May 9, 2012 at 6:49 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, May 9, 2012 at 10:36 AM, Sandro Tosi matrixh...@gmail.com wrote: On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: Hi, I'm pleased to announce the availability of the first release candidate of NumPy 1.6.2. This is a maintenance release. Due to the delay of the NumPy 1.7.0, this release contains far more fixes than a regular NumPy bugfix release. It also includes a number of documentation and build improvements. Sources and binary installers can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ Please test this release and report any issues on the numpy-discussion mailing list. Mh, I can't exactly understand this: $ diff -urNad numpy-1.6.1 numpy-1.6.2rc | diffstat | tail -1 2718 files changed, 390859 deletions(-) does it mean that the only thing the RC has done is to remove a lot of stuff? that's weird because the build process went all just fine and unit tests are passing ... /me confused? No, only a few files were changed. Since there are about 1000 files in numpy I suspect you are also counting everything in the build and documentation build directories. If you built inplace, you are also going to pick up *.pyc files and such. sorry i didn't say that: they are the tarballs just extracted. i'd have to recheck again downloading from SF -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-Dev] Announce: scikit-learn v0.11
This news did not arrive at scikit-learn-gene...@lists.sourceforge.net Is above list deprecated? BTW thanks for supporting and working on this project ;) On Tue, May 8, 2012 at 1:13 AM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On behalf of Andy Mueller, our release manager, I am happy to announce the 0.11 release of scikit-learn. This release includes some major new features such as randomized sparse models, gradient boosted regression trees, label propagation and many more. The release also has major improvements in the documentation and in stability. Details can be found on the [1]what's new page. We also have a new page with [2]video tutorials on machine learning with scikit-learn and different aspects of the package. Sources and windows binaries are available on sourceforge, through pypi (http://pypi.python.org/pypi/scikit-learn/0.11) or can be installed directly using pip: pip install -U scikit-learn Thanks again to all the contributors who made this release possible. Cheers, Gaël 1. http://scikit-learn.org/stable/whats_new.html 2. http://scikit-learn.org/stable/presentations.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On Wed, May 9, 2012 at 10:46 AM, Travis Oliphant tra...@continuum.iowrote: Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations.I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). I would like one more discussion thread where the technical discussion can take place.I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else.I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion. I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start.So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked objects as their core. The numpy.ma is unmaintained and I don't see that changing anytime soon. As you know, I would prefer 1), but 2) is a good compromise and the infra structure for such a flag could be useful for other things, although like yourself I'm not sure how it would be implemented. I don't understand your proposal for 3), but from the description I don't see that it buys anything. For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1
On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: Please test this release and report any issues on the numpy-discussion mailing list. I think it's probably nice not to ship pyc in the source tarball: $ find numpy-1.6.2rc1/ -name *.pyc numpy-1.6.2rc1/doc/sphinxext/docscrape.pyc numpy-1.6.2rc1/doc/sphinxext/docscrape_sphinx.pyc numpy-1.6.2rc1/doc/sphinxext/numpydoc.pyc numpy-1.6.2rc1/doc/sphinxext/plot_directive.pyc Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On Wed, May 9, 2012 at 11:46 AM, Travis Oliphant tra...@continuum.iowrote: Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations.I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). I would like one more discussion thread where the technical discussion can take place.I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else.I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion. I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start.So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked objects as their core. For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). I'm most in favour of the second proposal. It won't take very much effort, and more clearly marks off this code as experimental than just documentation notes. Thanks, -Mark Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On May 9, 2012, at 2:07 PM, Mark Wiebe wrote: On Wed, May 9, 2012 at 11:46 AM, Travis Oliphant tra...@continuum.io wrote: Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations.I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). I would like one more discussion thread where the technical discussion can take place.I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else.I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion.I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start.So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked objects as their core. For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). I'm most in favour of the second proposal. It won't take very much effort, and more clearly marks off this code as experimental than just documentation notes. Mark will you give more details about this proposal?How would the flag work, what would it modify? The proposal to create a ndmasked object that is separate from ndarray objects also won't take much effort and also marks off the object so those who want to use it can and those who don't are not pushed into using it anyway. -Travis Thanks, -Mark Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On Wed, May 9, 2012 at 2:15 PM, Travis Oliphant tra...@continuum.io wrote: On May 9, 2012, at 2:07 PM, Mark Wiebe wrote: On Wed, May 9, 2012 at 11:46 AM, Travis Oliphant tra...@continuum.iowrote: Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations.I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). I would like one more discussion thread where the technical discussion can take place.I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else.I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion. I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start.So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked objects as their core. For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). I'm most in favour of the second proposal. It won't take very much effort, and more clearly marks off this code as experimental than just documentation notes. Mark will you give more details about this proposal?How would the flag work, what would it modify? The idea is inspired in part by the Chrome release cycle, which has a presentation here: https://docs.google.com/present/view?id=dg63dpc6_4d7vkk6chpli=1 Some quotes: Features should be engineered so that they can be disabled easily (1 patch) and Would large feature development still be possible? Yes, engineers would have to work behind flags, however they can work for as many releases as they need to and can remove the flag when they are done. The current numpy codebase isn't designed for this kind of workflow, but I think we can productively emulate the idea for a big feature like NA support. One way to do this flag would be to have a numpy.experimental namespace which is not imported by default. To enable the NA-mask feature, you could do: import numpy.experimental.maskna This would trigger an ExperimentalWarning to message that an experimental feature has been enabled, and would add any NA-specific symbols to the numpy namespace (NA, NAType, etc). Without this import, any operation which would create an NA or NA-masked array raises an ExperimentalError instead of succeeding. After this import, things would behave as they do now. Cheers, Mark The proposal to create a ndmasked object that is separate from ndarray objects also won't take much effort and also marks off the object so those who want to use it can and those who don't are not pushed into using it anyway. -Travis Thanks, -Mark Best regards, -Travis
Re: [Numpy-discussion] Missing data wrap-up and request for comments
My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked objects as their core. The numpy.ma is unmaintained and I don't see that changing anytime soon. As you know, I would prefer 1), but 2) is a good compromise and the infra structure for such a flag could be useful for other things, although like yourself I'm not sure how it would be implemented. I don't understand your proposal for 3), but from the description I don't see that it buys anything. That is a bit strong to call numpy.ma unmaintained.I don't consider it that way.Are there a lot of tickets for it that are unaddressed? Is it broken? I know it gets a lot of use in the wild and so I don't think NumPy users would be happy to here it is considered unmaintained by NumPy developers. I'm looking forward to more details of Mark's proposal for #2. The proposal for #3 is quite simple and I think it is also a good compromise between removing the masked array entirely from the core NumPy object and leaving things as is in master. It keeps the functionality (but in a separate object) much like numpy.ma is a separate object. Basically it buys not forcing *all* NumPy users (on the C-API level) to now deal with a masked array. I know this push is a feature that is part of Mark's intention (as it pushes downstream libraries to think about missing data at a fundamental level). But, I think this is too big of a change to put in a 1.X release. The internal array-model used by NumPy is used quite extensively in downstream libraries as a *concept*. Many people have enhanced this model with a separate mask array for various reasons, and Mark's current use of mask does not satisfy all those use-cases. I don't see how we can justify changing the NumPy 1.X memory model under these circumstances. This is the sort of change that in my mind is a NumPy 2.0 kind of change where downstream users will be looking for possible array-model changes. -Travis For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
Mark will you give more details about this proposal?How would the flag work, what would it modify? The idea is inspired in part by the Chrome release cycle, which has a presentation here: https://docs.google.com/present/view?id=dg63dpc6_4d7vkk6chpli=1 Some quotes: Features should be engineered so that they can be disabled easily (1 patch) and Would large feature development still be possible? Yes, engineers would have to work behind flags, however they can work for as many releases as they need to and can remove the flag when they are done. The current numpy codebase isn't designed for this kind of workflow, but I think we can productively emulate the idea for a big feature like NA support. One way to do this flag would be to have a numpy.experimental namespace which is not imported by default. To enable the NA-mask feature, you could do: import numpy.experimental.maskna This would trigger an ExperimentalWarning to message that an experimental feature has been enabled, and would add any NA-specific symbols to the numpy namespace (NA, NAType, etc). Without this import, any operation which would create an NA or NA-masked array raises an ExperimentalError instead of succeeding. After this import, things would behave as they do now. How would this flag work at the C-API level? -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On 05/09/2012 06:46 PM, Travis Oliphant wrote: Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations. I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). I would like one more discussion thread where the technical discussion can take place. I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else. I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion. I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start. So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked. Ideally, numpy.ma http://numpy.ma would be changed to use ndmasked objects as their core. For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark! The third proposal is certainly the best one from Cython's perspective; and I imagine for those writing C extensions against the C API too. Having PyType_Check fail for ndmasked is a very good way of having code fail that is not written to take masks into account. If it is in ndarray we would also have some pressure to add support in Cython, with ndmasked we avoid that too. Likely outcome is we won't ever support it either way, but then we need some big warning in the docs, and it's better to avoid that. (I guess be +0 on Mark Florisson implementing it if it ends up in core ndarray; I'd almost certainly not do it myself.) That covers Cython. My view as a NumPy user follows. I'm a heavy user of masks, which are used to make data NA in the statistical sense. The setting is that we have to mask out the radiation coming from the Milky Way in full-sky images of the Cosmic Microwave Background. There's data, but we know we can't trust it, so we make it NA. But we also do play around with different masks. Today we keep the mask in a seperate array, and to zero-mask we do masked_data = data * mask or masked_data = data.copy() masked_data[mask == 0] = np.nan # soon np.NA depending on the circumstances. Honestly, API-wise, this is as good as its gets for us. Nice and transparent, no new semantics to learn in the special case of masks. Now, this has performance issues: Lots of memory use, extra transfers over the memory bus. BUT, NumPy has that problem all over the place, even for x + y + z! Solving it in the special case of masks, by making a new API, seems a bit myopic to me. IMO, that's much better solved at the fundamental level. As an *illustration*: with np.lazy: masked_data1 = data * mask1 masked_data2 = data * (mask1 | mask2) masked_data3 = (x + y + z) * (mask1 mask3) This would
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On Wed, May 9, 2012 at 1:35 PM, Travis Oliphant tra...@continuum.io wrote: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked objects as their core. The numpy.ma is unmaintained and I don't see that changing anytime soon. As you know, I would prefer 1), but 2) is a good compromise and the infra structure for such a flag could be useful for other things, although like yourself I'm not sure how it would be implemented. I don't understand your proposal for 3), but from the description I don't see that it buys anything. That is a bit strong to call numpy.ma unmaintained.I don't consider it that way.Are there a lot of tickets for it that are unaddressed? Is it broken? I know it gets a lot of use in the wild and so I don't think NumPy users would be happy to here it is considered unmaintained by NumPy developers. I'm looking forward to more details of Mark's proposal for #2. The proposal for #3 is quite simple and I think it is also a good compromise between removing the masked array entirely from the core NumPy object and leaving things as is in master. It keeps the functionality (but in a separate object) much like numpy.ma is a separate object. Basically it buys not forcing *all* NumPy users (on the C-API level) to now deal with a masked array. To me, it looks like we will get stuck with a more complicated implementation without changing the API, something that 2) achieves more easily while providing a feature likely to be useful as we head towards 2.0. I know this push is a feature that is part of Mark's intention (as it pushes downstream libraries to think about missing data at a fundamental level).But, I think this is too big of a change to put in a 1.X release. The internal array-model used by NumPy is used quite extensively in downstream libraries as a *concept*. Many people have enhanced this model with a separate mask array for various reasons, and Mark's current use of mask does not satisfy all those use-cases. I don't see how we can justify changing the NumPy 1.X memory model under these circumstances. You keep referring to these ghostly people and their unspecified uses, no doubt to protect the guilty. You don't have to name names, but a little detail on what they have done and how they use things would be *very* helpful. This is the sort of change that in my mind is a NumPy 2.0 kind of change where downstream users will be looking for possible array-model changes. We tried the flag day approach to 2.0 already and it failed. I think it better to have a long term release and a series of releases thereafter moving step by step with incremental changes towards a 2.0. Mark's 2) would support that approach. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Documentation roles in the numpy/scipy documentation editor
We considered lowering the review standard near the end of my direct involvement in the doc project but decided not to. You didn't mention any benefit to the proposed changes, so while I'm not active in the doc project anymore, let me relate our decision. It's often the case that docstrings get written fast, and it's usually the case that they're written by a single person, who has a single perspective. We wanted to make docs that were professional, that could be placed next to the manuals for IDL, Matlab, etc. without embarrassment. So, we set up a system similar to academic publishing. Every docstring would be seen by two sets of critical eyes, and for major X.0 releases we'd pay a proofreader to spend a few days to polish off the English and get the style totally consistent. At the same time, we needed to get something decent in every docstring fast, so we made that the priority. About the time we achieved that, money ran out. So, lots of docstrings are in needs review or even being edited status. But that doesn't mean money will never come again. Indeed, there are now several companies basing their services around this software. If someone does want to make the docs professional, say for numpy 2.0 or 3.0 or whatever, or as part of a larger system for sale, then they have a system in place that can do it. The purpose of the review statuses is to identify how close a docstring is to publishable. However, there is no consequence to the statuses: a docstring gets included in the release no matter its status. But, you do know which docstrings need what kind of work. So, what's the benefit of changing what the statuses mean, or eliminating them? I think it may only be that the writers feel better. The users don't even see the statuses as they're not listed in the release. Tim felt that docs should be continually edited, not finished. I agree, especially if the underlying routine or surrounding docs get changed. But the system is designed to encourage this! Here's how: Say most/all routines get genuine proofed status. That's great, but it's not the end of the line by any means. If someone comes along and edits a proofed docstring, that docstring then automatically needs review once again, to ensure that a mistake was not inserted. Now you know what to look at when checking things over before a release (since there can't be unit tests for docs). From the history, you also know it was once proofed, so reviewing and proofing it is very easy just by looking at the diffs. So, the system encourages and accounts for continual edits while allowing a professional product to be produced for a particular release. The way to move forward is to declare that the goal is to get all docs to some status, say needs review (that was our initial goal, and the only one we achieved, more or less). Then, go after the docs that don't have that, like the new polynomial docs. If someone wants to publish a manual, the goal becomes proofed, and there's more work to do. It DOES make sense to give the reviewer role to more people. Just make sure they take care in their reviews, so the statuses continue to have meaning. Otherwise what's the point? --jh-- On Mon, 7 May 2012 22:14:56, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Mon, May 7, 2012 at 7:37 PM, Tim Cera t...@cerazone.net wrote: I think we should change the roles established for the Numpy/Scipy documentation editors because they do not work as intended. For reference they are described here: http://docs.scipy.org/numpy/Front%20Page/ Basically there aren't that many active people to support being split into the roles as described which has led to a backlog of 'Needs review' docstrings and only one 'Proofed' docstring. I think that many of these docstrings are good enough, just that not enough people have put themselves out front as so knowledgeable about a certain topic to label docstrings as 'Reviewed' or 'Proofed'. You're right. I think at some point the goal shifted from getting everything to proofed to getting everything to needs review. Here are the current statistics for numpy docstrings: Current %Count Needs editing17 279 Being written / Changed4 62 Needs review76 1235 Needs review (revised)2 35 Needs work (reviewed)0 3Reviewed (needs proof) 0 0 Proofed0 1 Unimportant? 1793 The needs editing category actually contains mostly docstrings that are quite good, but were recently created and never edited in the doc wiki. The % keeps on growing. Bumping all polynomial docstrings up to needs review would be a good start here to make the % reflect the actual status. I have thought about some solutions in no particular order: * Get rid of the 'Reviewer' and 'Proofer' roles. * Assign all 'Editors', the 'Reviewer', and 'Proofer' privileges. * People start out as 'Editors', and then become 'Reviewers', and 'Proofers' based on some editing metric. For full disclosure, I would be generous with a 'Reviewed' label
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On re-reading, I want to make a couple of things clear: 1) This wrap-up discussion is *only* for what to do for NumPy 1.7 in such a way that we don't tie our hands in the future.I do not believe we can figure out what to do for masked arrays in one short week. What happens beyond NumPy 1.7 should be still discussed and explored.My urgency is entirely about moving forward from where we are in master right now in a direction that we can all accept. The tight timeline is so that we do *something* and move forward. 2) I missed another possible proposal for NumPy 1.7 which is in the write-up that Mark and Nathaniel made: remove the masked array additions entirely possibly moving them to another module like numpy-dtypes. Again, these are only for NumPy 1.7. What happens in any future NumPy and beyond will depend on who comes to the table for both discussion and code-development. Best regards, -Travis On May 9, 2012, at 11:46 AM, Travis Oliphant wrote: Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations.I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). I would like one more discussion thread where the technical discussion can take place.I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else.I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion.I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start.So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked objects as their core. For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On Wed, May 9, 2012 at 5:46 PM, Travis Oliphant tra...@continuum.io wrote: Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations. I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). If we're talking about deciding what to do for the 1.7 release branch, then I agree. Otherwise, I definitely don't. We really just don't *know* what our users need with regards to mask-based storage versions of missing data, so committing to something within a short time period will just guarantee we have to re-do it all again later. [Edit: I see that you've clarified this in a follow-up email -- great!] We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion. I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. Again, I'm assuming that what you mean here is that we can't and shouldn't delay 1.7 indefinitely for this discussion to play out, so you're proposing that we give ourselves a deadline of 1 week to decide how to at least get the release unblocked. Let me know if I'm misreading, though... In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start. So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is In the context of 1.7, this seems like a non-starter at this point, at least if we're going to move in the direction of making decisions by consensus. It might well be that we'll decide that the current NEP-like API is what we want (or that some compatible super-set is). But (as described in more detail in the NA-overview document), I think there are still serious questions to work out about how and whether a masked-storage/NA-semantics API is something we want as part of the ndarray object at all. And Ralf with his release-manager hat says that he doesn't want to release the current API unless we can guarantee that some version of it will continue to be supported. To me that suggests that this is off the table for 1.7. * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) I've been assuming something like a global variable, and some guards added to all the top-level functions that take maskna= arguments, so that it's impossible to construct an ndarray that has its maskna flag set to True unless the flag has been toggled. As I said in NA-overview, I'd be fine with this in principle, but only if we're certain we're okay with the ABI consequences. And we should be clear on the goal -- if we just want to let people play with the API, then there are other options, such as my little experiment: https://github.com/njsmith/numpyNEP (This is certainly less robust, but it works, and is probably a much easier base for modifications to test alternative APIs.) If the goal is just to keep the code in master, then that's fine too, though it has both costs and benefits. (An example of a cost is that its presence may complicate adding bitpattern NA support.) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked. Ideally, numpy.ma would be changed to use ndmasked objects as their core. If we're talking about 1.7, then what kind of status do you propose these new objects would have in 1.7? Regular feature, totally experimental, something else? My only objection to this proposal is that committing to this approach
Re: [Numpy-discussion] Missing data wrap-up and request for comments
Hi, On Wed, May 9, 2012 at 12:44 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 05/09/2012 06:46 PM, Travis Oliphant wrote: Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations. I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). I would like one more discussion thread where the technical discussion can take place. I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else. I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion. I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start. So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked. Ideally, numpy.ma http://numpy.ma would be changed to use ndmasked objects as their core. For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark! Yes, it is very well written, my compliments to the chefs. The third proposal is certainly the best one from Cython's perspective; and I imagine for those writing C extensions against the C API too. Having PyType_Check fail for ndmasked is a very good way of having code fail that is not written to take masks into account. Mark, Nathaniel - can you comment how your chosen approaches would interact with extension code? I'm guessing the bitpattern dtypes would be expected to cause extension code to choke if the type is not supported? Mark - in : https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#cython - do I understand correctly that you think that Cython and other extension writers should use the numpy API to access the data rather than accessing it directly via the data pointer and strides? Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-Dev] Announce: scikit-learn v0.11
On Wed, May 09, 2012 at 06:55:12PM +0200, klo uo wrote: This news did not arrive at scikit-learn-gene...@lists.sourceforge.net Is above list deprecated? Andy Mueller did the announcement on the scikit-learn mailing list. BTW thanks for supporting and working on this project ;) Thank you very much, it is my pleasure. But it's really a team that you need to thank: the number of active contributors is huge. Cheers, Gael On Tue, May 8, 2012 at 1:13 AM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On behalf of Andy Mueller, our release manager, I am happy to announce the 0.11 release of scikit-learn. This release includes some major new features such as randomized sparse models, gradient boosted regression trees, label propagation and many more. The release also has major improvements in the documentation and in stability. Details can be found on the [1]what's new page. We also have a new page with [2]video tutorials on machine learning with scikit-learn and different aspects of the package. Sources and windows binaries are available on sourceforge, through pypi (http://pypi.python.org/pypi/scikit-learn/0.11) or can be installed directly using pip: pip install -U scikit-learn Thanks again to all the contributors who made this release possible. Cheers, Gaël 1. http://scikit-learn.org/stable/whats_new.html 2. http://scikit-learn.org/stable/presentations.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Gael Varoquaux Researcher, INRIA Parietal Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On Wed, May 9, 2012 at 3:12 PM, Travis Oliphant tra...@continuum.io wrote: On re-reading, I want to make a couple of things clear: 1) This wrap-up discussion is *only* for what to do for NumPy 1.7 in such a way that we don't tie our hands in the future.I do not believe we can figure out what to do for masked arrays in one short week. What happens beyond NumPy 1.7 should be still discussed and explored.My urgency is entirely about moving forward from where we are in master right now in a direction that we can all accept. The tight timeline is so that we do *something* and move forward. 2) I missed another possible proposal for NumPy 1.7 which is in the write-up that Mark and Nathaniel made: remove the masked array additions entirely possibly moving them to another module like numpy-dtypes. Again, these are only for NumPy 1.7. What happens in any future NumPy and beyond will depend on who comes to the table for both discussion and code-development. I'm glad that this sentence made it into the write-up: A project like numpy requires developers to write code for advancement to occur, and obstacles that impede the writing of code discourage existing developers from contributing more, and potentially scare away developers who are thinking about joining in. I agree, which is why I'm a little surprised after reading the write-up that there's no deference to the alterNEP (admittedly kludgy) implementation? One of the arguments made for the NEP preliminary NA-mask implementation is that has been extensively tested against scipy and other third-party packages, and has been in master in a stable state for a significant amount of time. It is my understanding that the manner in which this implementation found its way into master was a source of concern and contention. To me (and I don't know the level to which this is a technically feasible) that's precisely the reason that BOTH approaches be allowed to make their way into numpy with experimental status. Otherwise, it seems that there is a sort of scaring away of developers - seeing (from the sidelines) how much of a struggle it's been for the alterNEP to find a nurturing environment as an experimental alternative inside numpy. In my reading, the process and consensus threads that have generated so many responses stem precisely from trying to have an atmosphere where everyone is encouraged to join in. The alternatives proposed so far (though I do understand it's only for 1.7) do not suggest an appreciation for the gravity of the fallout from the neglect the alterNEP and the issues which sprang forth from that. Importantly, I find a problem with how personal this document (and discussion) is - I'd much prefer if we talk about technical things by a descriptive name, not the person who thought of it. You'll note how I've been referring to NEP and alterNEP above. One advantage of this is that down the line, if either Mark or Nathaniel change their minds about their current preferred way forward, it doesn't take the wind out of it with something like Even Paul changed his mind and now withdraws his support of Paul's proposal. We should only focus on the technical merits of a given approach, not how many commits have been made by the person proposing them or what else they've done in their life: a good idea has value regardless of who expresses it. In my fantasy world, with both approaches clearly existing in an experimental sandbox inside numpy, folks who feel primary attachments to either NEP or alterNEP would be willing to cross party lines and pitch in towardd making progress in both camps. That's the way we'll find better solutions, by working together, instead of working in opposition. best, -- Paul Ivanov 314 address only used for lists, off-list direct email at: http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On Wed, May 9, 2012 at 6:13 PM, Paul Ivanov pivanov...@gmail.com wrote: On Wed, May 9, 2012 at 3:12 PM, Travis Oliphant tra...@continuum.iowrote: On re-reading, I want to make a couple of things clear: 1) This wrap-up discussion is *only* for what to do for NumPy 1.7 in such a way that we don't tie our hands in the future.I do not believe we can figure out what to do for masked arrays in one short week. What happens beyond NumPy 1.7 should be still discussed and explored.My urgency is entirely about moving forward from where we are in master right now in a direction that we can all accept. The tight timeline is so that we do *something* and move forward. 2) I missed another possible proposal for NumPy 1.7 which is in the write-up that Mark and Nathaniel made: remove the masked array additions entirely possibly moving them to another module like numpy-dtypes. Again, these are only for NumPy 1.7. What happens in any future NumPy and beyond will depend on who comes to the table for both discussion and code-development. I'm glad that this sentence made it into the write-up: A project like numpy requires developers to write code for advancement to occur, and obstacles that impede the writing of code discourage existing developers from contributing more, and potentially scare away developers who are thinking about joining in. I agree, which is why I'm a little surprised after reading the write-up that there's no deference to the alterNEP (admittedly kludgy) implementation? One of the arguments made for the NEP preliminary NA-mask implementation is that has been extensively tested against scipy and other third-party packages, and has been in master in a stable state for a significant amount of time. It is my understanding that the manner in which this implementation found its way into master was a source of concern and contention. To me (and I don't know the level to which this is a technically feasible) that's precisely the reason that BOTH approaches be allowed to make their way into numpy with experimental status. Otherwise, it seems that there is a sort of scaring away of developers - seeing (from the sidelines) how much of a struggle it's been for the alterNEP to find a nurturing environment as an experimental alternative inside numpy. In my reading, the process and consensus threads that have generated so many responses stem precisely from trying to have an atmosphere where everyone is encouraged to join in. The alternatives proposed so far (though I do understand it's only for 1.7) do not suggest an appreciation for the gravity of the fallout from the neglect the alterNEP and the issues which sprang forth from that. Importantly, I find a problem with how personal this document (and discussion) is - I'd much prefer if we talk about technical things by a descriptive name, not the person who thought of it. You'll note how I've been referring to NEP and alterNEP above. One advantage of this is that down the line, if either Mark or Nathaniel change their minds about their current preferred way forward, it doesn't take the wind out of it with something like Even Paul changed his mind and now withdraws his support of Paul's proposal. We should only focus on the technical merits of a given approach, not how many commits have been made by the person proposing them or what else they've done in their life: a good idea has value regardless of who expresses it. In my fantasy world, with both approaches clearly existing in an experimental sandbox inside numpy, folks who feel primary attachments to either NEP or alterNEP would be willing to cross party lines and pitch in towardd making progress in both camps. That's the way we'll find better solutions, by working together, instead of working in opposition. We are certainly open to code submissions and alternate implementations. The experimental tag would help there. But someone, as you mention, needs to write the code. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1
On Wed, May 9, 2012 at 12:40 PM, Sandro Tosi matrixh...@gmail.com wrote: On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: Please test this release and report any issues on the numpy-discussion mailing list. I think it's probably nice not to ship pyc in the source tarball: $ find numpy-1.6.2rc1/ -name *.pyc numpy-1.6.2rc1/doc/sphinxext/docscrape.pyc numpy-1.6.2rc1/doc/sphinxext/docscrape_sphinx.pyc numpy-1.6.2rc1/doc/sphinxext/numpydoc.pyc numpy-1.6.2rc1/doc/sphinxext/plot_directive.pyc Good point ;) Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On 05/10/2012 01:01 AM, Matthew Brett wrote: Hi, On Wed, May 9, 2012 at 12:44 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 05/09/2012 06:46 PM, Travis Oliphant wrote: Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations. I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). I would like one more discussion thread where the technical discussion can take place. I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else. I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion. I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start. So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's masked ndarray objects into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked. Ideally, numpy.ma http://numpy.ma would be changed to use ndmasked objects as their core. For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark! Yes, it is very well written, my compliments to the chefs. The third proposal is certainly the best one from Cython's perspective; and I imagine for those writing C extensions against the C API too. Having PyType_Check fail for ndmasked is a very good way of having code fail that is not written to take masks into account. I want to make something more clear: There are two Cython cases; in the case of cdef np.ndarray[double] there is no problem as PEP 3118 access will raise an exception for masked arrays. But, there's the case where you do cdef np.ndarray, and then proceed to use PyArray_DATA. Myself I do this more than PEP 3118 access; usually because I pass the data pointer to some C or C++ code. It'd be great to have such code be forward-compatible in the sense that it raises an exception when it meets a masked array. Having PyType_Check fail seems like the only way? Am I wrong? Mark, Nathaniel - can you comment how your chosen approaches would interact with extension code? I'm guessing the bitpattern dtypes would be expected to cause extension code to choke if the type is not supported? The proposal, as I understand it, is to use that with new dtypes (?). So things will often be fine for that reason: if arr.dtype == np.float32: c_function_32bit(np.PyArray_DATA(arr), ...) else: raise ValueError(need 32-bit float array) Mark - in : https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#cython - do I understand correctly that you think that Cython and other extension writers should use the numpy API to access the data rather than accessing it directly via the data pointer and strides? That's not really fleshed out (for
Re: [Numpy-discussion] Masking through generator arrays
On Wed, May 9, 2012 at 9:54 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: Sorry everyone for being so dense and contaminating that other thread. Here's a new thread where I can respond to Nathaniel's response. On 05/10/2012 01:08 AM, Nathaniel Smith wrote: Hi Dag, On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: I'm a heavy user of masks, which are used to make data NA in the statistical sense. The setting is that we have to mask out the radiation coming from the Milky Way in full-sky images of the Cosmic Microwave Background. There's data, but we know we can't trust it, so we make it NA. But we also do play around with different masks. Oh, this is great -- that means you're one of the users that I wasn't sure existed or not :-). Now I know! Today we keep the mask in a seperate array, and to zero-mask we do masked_data = data * mask or masked_data = data.copy() masked_data[mask == 0] = np.nan # soon np.NA depending on the circumstances. Honestly, API-wise, this is as good as its gets for us. Nice and transparent, no new semantics to learn in the special case of masks. Now, this has performance issues: Lots of memory use, extra transfers over the memory bus. Right -- this is a case where (in the NA-overview terminology) masked storage+NA semantics would be useful. BUT, NumPy has that problem all over the place, even for x + y + z! Solving it in the special case of masks, by making a new API, seems a bit myopic to me. IMO, that's much better solved at the fundamental level. As an *illustration*: with np.lazy: masked_data1 = data * mask1 masked_data2 = data * (mask1 | mask2) masked_data3 = (x + y + z) * (mask1 mask3) This would create three generator arrays that would zero-mask the arrays (and perform the three-term addition...) upon request. You could slice the generator arrays as you wish, and by that slice the data and the mask in one operation. Obviously this could handle NA-masking too. You can probably do this today with Theano and numexpr, and I think Travis mentioned that generator arrays are on his radar for core NumPy. Implementing this today would require some black magic hacks, because on entry/exit to the context manager you'd have to reach up into the calling scope and replace all the ndarray's with LazyArrays and then vice-versa. This is actually totally possible: https://gist.github.com/2347382 but I'm not sure I'd call it *wise*. (You could probably avoid the truly horrible set_globals_dict part of that gist, though.) Might be fun to prototype, though... 1) My main point was just that I believe masked arrays is something that to me feels immature, and that it is the kind of thing that should be constructed from simpler primitives. And that NumPy should focus on simple primitives. You could make it I can't disagree, as I suggested the same as a possibility myself ;) There is a lot of infrastructure now in numpy, but given the use cases I'm tending towards the view that masked arrays should be left to others, at least for the time being. The question is how to generalize the infrastructure and what hooks to provide. I think just spending a month or two pulling stuff out is counter productive, but evolving the code is definitely needed. If you could familiarize yourself with what is in there, something that seems largely neglected by the critics, and make suggestions, that would be helpful. I'd also like to hear from Mark. It has been about 9 mos since he did the work, and I'd be surprised if he didn't have ideas for doing some things differently. OTOH, I can understand his reluctance to get involved in a topic where I thought he was poorly treated last time around. np.gen.generating_multiply(data, mask) 2) About the with construct in particular, I intended __enter__ and __exit__ to only toggle a thread-local flag, and when that flag is in effect, __mul__ would do a generating_multiply and return an ndarraygenerator rather than an ndarray. But of course, the amount of work is massive. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data wrap-up and request for comments
On Wed, May 9, 2012 at 11:05 PM, Benjamin Root ben.r...@ou.edu wrote: On Wednesday, May 9, 2012, Nathaniel Smith wrote: My only objection to this proposal is that committing to this approach seems premature. The existing masked array objects act quite differently from numpy.ma, so why do you believe that they're a good foundation for numpy.ma, and why will users want to switch to their semantics over numpy.ma's semantics? These aren't rhetorical questions, it seems like they must have concrete answers, but I don't know what they are. Based on the design decisions made in the original NEP, a re-made numpy.mawould have to lose _some_ features particularly, the ability to share masks. Save for that and some very obscure behaviors that are undocumented, it is possible to remake numpy.ma as a compatibility layer. That being said, I think that there are some fundamental questions that has concerned. If I recall, there were unresolved questions about behaviors surrounding assignments to elements of a view. I see the project as broken down like this: 1.) internal architecture (largely abi issues) 2.) external architecture (hooks throughout numpy to utilize the new features where possible such as where= argument) 3.) getter/setter semantics 4.) mathematical semantics At this moment, I think we have pieces of 2 and they are fairly non-controversial. It is 1 that I see as being the immediate hold-up here. 3 4 are non-trivial, but because they are mostly about interfaces, I think we can be willing to accept some very basic, fundamental, barebones components here in order to lay the groundwork for a more complete API later. To talk of Travis's proposal, doing nothing is no-go. Not moving forward would dishearten the community. Making a ndmasked type is very intriguing. I see it as a set towards eventually deprecating ndarray? Also, how would it behave with no.asarray() and no.asanyarray()? My other concern is a possible violation of DRY. How difficult would it be to maintain two ndarrays in parallel? As for the flag approach, this still doesn't solve the problem of legacy code (or did I misunderstand?) My understanding of the flag is to allow the code to stay in and get reworked and experimented with while keeping it from contaminating conventional use. The whole point of putting the code in was to experiment and adjust. The rather bizarre idea that it needs to be perfect from the get go is disheartening, and is seldom how new things get developed. Sure, there is a plan up front, but there needs to be feedback and change. And in fact, I haven't seen much feedback about the actual code, I don't even know that the people complaining have tried using it to see where it hurts. I'd like that sort of feedback. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion