Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1

2012-05-09 Thread Sandro Tosi
On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:
 Hi,

 I'm pleased to announce the availability of the first release candidate of
 NumPy 1.6.2.  This is a maintenance release. Due to the delay of the NumPy
 1.7.0, this release contains far more fixes than a regular NumPy bugfix
 release.  It also includes a number of documentation and build improvements.

 Sources and binary installers can be found at
 https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/

 Please test this release and report any issues on the numpy-discussion
 mailing list.

Mh, I can't exactly understand this:

$ diff -urNad numpy-1.6.1 numpy-1.6.2rc | diffstat | tail -1
 2718 files changed, 390859 deletions(-)

does it mean that the only thing the RC has done is to remove a lot of
stuff? that's weird because the build process went all just fine and
unit tests are passing ... /me confused?

-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Travis Oliphant
Hey all, 

Nathaniel and Mark have worked very hard on a joint document to try and explain 
the current status of the missing-data debate.   I think they've done an 
amazing job at providing some context, articulating their views and suggesting 
ways forward in a mutually respectful manner.   This is an exemplary 
collaboration and is at the core of why open source is valuable. 

The document is available here: 
   https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst

After reading that document, it appears to me that there are some fundamentally 
different views on how things should move forward.   I'm also reading the 
document incorporating my understanding of the history, of NumPy as well as all 
of the users I've met and interacted with which means I have my own perspective 
that is not necessarily incorporated into that document but informs my 
recommendations.I'm not sure we can reach full consensus on this. We 
are also well past time for moving forward with a resolution on this (perhaps 
we can all agree on that).

I would like one more discussion thread where the technical discussion can take 
place.I will make a plea that we keep this discussion as free from logical 
fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can.   I can't 
guarantee that I personally will succeed at that, but I can tell you that I 
will try.   That's all I'm asking of anyone else.I recognize that there are 
a lot of other issues at play here besides *just* the technical questions, but 
we are not going to resolve every community issue in this technical thread. 

We need concrete proposals and so I will start with three.   Please feel free 
to comment on these proposals or add your own during the discussion.I will 
stop paying attention to this thread next Wednesday (May 16th) (or earlier if 
the thread dies) and hope that by that time we can agree on a way forward.  If 
we don't have agreement, then I will move forward with what I think is the 
right approach.   I will either write the code myself or convince someone else 
to write it. 

In all cases, we have agreement that bit-pattern dtypes should be added to 
NumPy.  We should work on these (int32, float64, complex64, str, bool) to 
start.So, the three proposals are independent of this way forward.   The 
proposals are all about the extra mask part:  

My three proposals: 

* do nothing and leave things as is 

* add a global flag that turns off masked array support by default but 
otherwise leaves things unchanged (I'm still unclear how this would work 
exactly)

* move Mark's masked ndarray objects into a new fundamental type 
(ndmasked), leaving the actual ndarray type unchanged.  The array_interface 
keeps the masked array notions and the ufuncs keep the ability to handle arrays 
like ndmasked.Ideally, numpy.ma would be changed to use ndmasked objects as 
their core. 

For the record, I'm currently in favor of the third proposal.   Feel free to 
comment on these proposals (or provide your own). 

Best regards,

-Travis

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1

2012-05-09 Thread Charles R Harris
On Wed, May 9, 2012 at 10:36 AM, Sandro Tosi matrixh...@gmail.com wrote:

 On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
  Hi,
 
  I'm pleased to announce the availability of the first release candidate
 of
  NumPy 1.6.2.  This is a maintenance release. Due to the delay of the
 NumPy
  1.7.0, this release contains far more fixes than a regular NumPy bugfix
  release.  It also includes a number of documentation and build
 improvements.
 
  Sources and binary installers can be found at
  https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/
 
  Please test this release and report any issues on the numpy-discussion
  mailing list.

 Mh, I can't exactly understand this:

 $ diff -urNad numpy-1.6.1 numpy-1.6.2rc | diffstat | tail -1
  2718 files changed, 390859 deletions(-)

 does it mean that the only thing the RC has done is to remove a lot of
 stuff? that's weird because the build process went all just fine and
 unit tests are passing ... /me confused?


No, only a few files were changed. Since there are about 1000 files in
numpy I suspect you are also counting everything in the build and
documentation build directories. If you built inplace, you are also going
to pick up *.pyc files and such.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1

2012-05-09 Thread Sandro Tosi
On Wed, May 9, 2012 at 6:49 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Wed, May 9, 2012 at 10:36 AM, Sandro Tosi matrixh...@gmail.com wrote:

 On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
  Hi,
 
  I'm pleased to announce the availability of the first release candidate
  of
  NumPy 1.6.2.  This is a maintenance release. Due to the delay of the
  NumPy
  1.7.0, this release contains far more fixes than a regular NumPy bugfix
  release.  It also includes a number of documentation and build
  improvements.
 
  Sources and binary installers can be found at
  https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/
 
  Please test this release and report any issues on the numpy-discussion
  mailing list.

 Mh, I can't exactly understand this:

 $ diff -urNad numpy-1.6.1 numpy-1.6.2rc | diffstat | tail -1
  2718 files changed, 390859 deletions(-)

 does it mean that the only thing the RC has done is to remove a lot of
 stuff? that's weird because the build process went all just fine and
 unit tests are passing ... /me confused?


 No, only a few files were changed. Since there are about 1000 files in numpy
 I suspect you are also counting everything in the build and documentation
 build directories. If you built inplace, you are also going to pick up *.pyc
 files and such.

sorry i didn't say that: they are the tarballs just extracted. i'd
have to recheck again downloading from SF


-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] Announce: scikit-learn v0.11

2012-05-09 Thread klo uo
This news did not arrive at scikit-learn-gene...@lists.sourceforge.net
Is above list deprecated?

BTW thanks for supporting and working on this project ;)


On Tue, May 8, 2012 at 1:13 AM, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:
   On behalf of Andy Mueller, our release manager, I am happy to announce
   the 0.11 release of scikit-learn.

   This release includes some major new features such as randomized
   sparse models, gradient boosted regression trees, label propagation
   and many more. The release also has major improvements in the
   documentation and in stability.

   Details can be found on the [1]what's new page.

   We also have a new page with [2]video tutorials on machine learning
   with scikit-learn and different aspects of the package.

   Sources and windows binaries are available on sourceforge,
   through pypi (http://pypi.python.org/pypi/scikit-learn/0.11) or
   can be installed directly using pip:

   pip install -U scikit-learn

   Thanks again to all the contributors who made this release possible.

   Cheers,

    Gaël

   1. http://scikit-learn.org/stable/whats_new.html
   2. http://scikit-learn.org/stable/presentations.html

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Charles R Harris
On Wed, May 9, 2012 at 10:46 AM, Travis Oliphant tra...@continuum.iowrote:

 Hey all,

 Nathaniel and Mark have worked very hard on a joint document to try and
 explain the current status of the missing-data debate.   I think they've
 done an amazing job at providing some context, articulating their views and
 suggesting ways forward in a mutually respectful manner.   This is an
 exemplary collaboration and is at the core of why open source is valuable.

 The document is available here:
https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst

 After reading that document, it appears to me that there are some
 fundamentally different views on how things should move forward.   I'm also
 reading the document incorporating my understanding of the history, of
 NumPy as well as all of the users I've met and interacted with which means
 I have my own perspective that is not necessarily incorporated into that
 document but informs my recommendations.I'm not sure we can reach full
 consensus on this. We are also well past time for moving forward with a
 resolution on this (perhaps we can all agree on that).

 I would like one more discussion thread where the technical discussion can
 take place.I will make a plea that we keep this discussion as free from
 logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can.
   I can't guarantee that I personally will succeed at that, but I can tell
 you that I will try.   That's all I'm asking of anyone else.I recognize
 that there are a lot of other issues at play here besides *just* the
 technical questions, but we are not going to resolve every community issue
 in this technical thread.

 We need concrete proposals and so I will start with three.   Please feel
 free to comment on these proposals or add your own during the discussion.
  I will stop paying attention to this thread next Wednesday (May 16th) (or
 earlier if the thread dies) and hope that by that time we can agree on a
 way forward.  If we don't have agreement, then I will move forward with
 what I think is the right approach.   I will either write the code myself
 or convince someone else to write it.

 In all cases, we have agreement that bit-pattern dtypes should be added to
 NumPy.  We should work on these (int32, float64, complex64, str, bool)
 to start.So, the three proposals are independent of this way forward.
 The proposals are all about the extra mask part:

 My three proposals:

 * do nothing and leave things as is

 * add a global flag that turns off masked array support by default but
 otherwise leaves things unchanged (I'm still unclear how this would work
 exactly)

 * move Mark's masked ndarray objects into a new fundamental type
 (ndmasked), leaving the actual ndarray type unchanged.  The array_interface
 keeps the masked array notions and the ufuncs keep the ability to handle
 arrays like ndmasked.Ideally, numpy.ma would be changed to use
 ndmasked objects as their core.


The numpy.ma is unmaintained and I don't see that changing anytime soon. As
you know, I would prefer 1), but 2) is a good compromise and the infra
structure for such a flag could be useful for other things, although like
yourself I'm not sure how it would be implemented. I don't understand your
proposal for 3), but from the description I don't see that it buys anything.


 For the record, I'm currently in favor of the third proposal.   Feel free
 to comment on these proposals (or provide your own).


Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1

2012-05-09 Thread Sandro Tosi
On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:
 Please test this release and report any issues on the numpy-discussion
 mailing list.

I think it's probably nice not to ship pyc in the source tarball:

$ find numpy-1.6.2rc1/ -name *.pyc
numpy-1.6.2rc1/doc/sphinxext/docscrape.pyc
numpy-1.6.2rc1/doc/sphinxext/docscrape_sphinx.pyc
numpy-1.6.2rc1/doc/sphinxext/numpydoc.pyc
numpy-1.6.2rc1/doc/sphinxext/plot_directive.pyc

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Mark Wiebe
On Wed, May 9, 2012 at 11:46 AM, Travis Oliphant tra...@continuum.iowrote:

 Hey all,

 Nathaniel and Mark have worked very hard on a joint document to try and
 explain the current status of the missing-data debate.   I think they've
 done an amazing job at providing some context, articulating their views and
 suggesting ways forward in a mutually respectful manner.   This is an
 exemplary collaboration and is at the core of why open source is valuable.

 The document is available here:
https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst

 After reading that document, it appears to me that there are some
 fundamentally different views on how things should move forward.   I'm also
 reading the document incorporating my understanding of the history, of
 NumPy as well as all of the users I've met and interacted with which means
 I have my own perspective that is not necessarily incorporated into that
 document but informs my recommendations.I'm not sure we can reach full
 consensus on this. We are also well past time for moving forward with a
 resolution on this (perhaps we can all agree on that).

 I would like one more discussion thread where the technical discussion can
 take place.I will make a plea that we keep this discussion as free from
 logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can.
   I can't guarantee that I personally will succeed at that, but I can tell
 you that I will try.   That's all I'm asking of anyone else.I recognize
 that there are a lot of other issues at play here besides *just* the
 technical questions, but we are not going to resolve every community issue
 in this technical thread.

 We need concrete proposals and so I will start with three.   Please feel
 free to comment on these proposals or add your own during the discussion.
  I will stop paying attention to this thread next Wednesday (May 16th) (or
 earlier if the thread dies) and hope that by that time we can agree on a
 way forward.  If we don't have agreement, then I will move forward with
 what I think is the right approach.   I will either write the code myself
 or convince someone else to write it.

 In all cases, we have agreement that bit-pattern dtypes should be added to
 NumPy.  We should work on these (int32, float64, complex64, str, bool)
 to start.So, the three proposals are independent of this way forward.
 The proposals are all about the extra mask part:

 My three proposals:

 * do nothing and leave things as is

 * add a global flag that turns off masked array support by default but
 otherwise leaves things unchanged (I'm still unclear how this would work
 exactly)

 * move Mark's masked ndarray objects into a new fundamental type
 (ndmasked), leaving the actual ndarray type unchanged.  The array_interface
 keeps the masked array notions and the ufuncs keep the ability to handle
 arrays like ndmasked.Ideally, numpy.ma would be changed to use
 ndmasked objects as their core.

 For the record, I'm currently in favor of the third proposal.   Feel free
 to comment on these proposals (or provide your own).


I'm most in favour of the second proposal. It won't take very much effort,
and more clearly marks off this code as experimental than just
documentation notes.

Thanks,
-Mark



 Best regards,

 -Travis


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Travis Oliphant

On May 9, 2012, at 2:07 PM, Mark Wiebe wrote:

 On Wed, May 9, 2012 at 11:46 AM, Travis Oliphant tra...@continuum.io wrote:
 Hey all, 
 
 Nathaniel and Mark have worked very hard on a joint document to try and 
 explain the current status of the missing-data debate.   I think they've done 
 an amazing job at providing some context, articulating their views and 
 suggesting ways forward in a mutually respectful manner.   This is an 
 exemplary collaboration and is at the core of why open source is valuable. 
 
 The document is available here: 
https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst
 
 After reading that document, it appears to me that there are some 
 fundamentally different views on how things should move forward.   I'm also 
 reading the document incorporating my understanding of the history, of NumPy 
 as well as all of the users I've met and interacted with which means I have 
 my own perspective that is not necessarily incorporated into that document 
 but informs my recommendations.I'm not sure we can reach full consensus 
 on this. We are also well past time for moving forward with a resolution 
 on this (perhaps we can all agree on that). 
 
 I would like one more discussion thread where the technical discussion can 
 take place.I will make a plea that we keep this discussion as free from 
 logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can.   I 
 can't guarantee that I personally will succeed at that, but I can tell you 
 that I will try.   That's all I'm asking of anyone else.I recognize that 
 there are a lot of other issues at play here besides *just* the technical 
 questions, but we are not going to resolve every community issue in this 
 technical thread. 
 
 We need concrete proposals and so I will start with three.   Please feel free 
 to comment on these proposals or add your own during the discussion.I 
 will stop paying attention to this thread next Wednesday (May 16th) (or 
 earlier if the thread dies) and hope that by that time we can agree on a way 
 forward.  If we don't have agreement, then I will move forward with what I 
 think is the right approach.   I will either write the code myself or 
 convince someone else to write it. 
 
 In all cases, we have agreement that bit-pattern dtypes should be added to 
 NumPy.  We should work on these (int32, float64, complex64, str, bool) to 
 start.So, the three proposals are independent of this way forward.   The 
 proposals are all about the extra mask part:  
 
 My three proposals: 
 
   * do nothing and leave things as is 
 
   * add a global flag that turns off masked array support by default but 
 otherwise leaves things unchanged (I'm still unclear how this would work 
 exactly)
 
   * move Mark's masked ndarray objects into a new fundamental type 
 (ndmasked), leaving the actual ndarray type unchanged.  The array_interface 
 keeps the masked array notions and the ufuncs keep the ability to handle 
 arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked 
 objects as their core. 
 
 For the record, I'm currently in favor of the third proposal.   Feel free to 
 comment on these proposals (or provide your own).
 
 I'm most in favour of the second proposal. It won't take very much effort, 
 and more clearly marks off this code as experimental than just documentation 
 notes.
 

Mark will you give more details about this proposal?How would the flag 
work, what would it modify? 

The proposal to create a ndmasked object that is separate from ndarray objects 
also won't take much effort and also marks off the object so those who want to 
use it can and those who don't are not pushed into using it anyway. 

-Travis


 Thanks,
 -Mark
  
 
 Best regards,
 
 -Travis
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Mark Wiebe
On Wed, May 9, 2012 at 2:15 PM, Travis Oliphant tra...@continuum.io wrote:


 On May 9, 2012, at 2:07 PM, Mark Wiebe wrote:

 On Wed, May 9, 2012 at 11:46 AM, Travis Oliphant tra...@continuum.iowrote:

 Hey all,

 Nathaniel and Mark have worked very hard on a joint document to try and
 explain the current status of the missing-data debate.   I think they've
 done an amazing job at providing some context, articulating their views and
 suggesting ways forward in a mutually respectful manner.   This is an
 exemplary collaboration and is at the core of why open source is valuable.

 The document is available here:
https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst

 After reading that document, it appears to me that there are some
 fundamentally different views on how things should move forward.   I'm also
 reading the document incorporating my understanding of the history, of
 NumPy as well as all of the users I've met and interacted with which means
 I have my own perspective that is not necessarily incorporated into that
 document but informs my recommendations.I'm not sure we can reach full
 consensus on this. We are also well past time for moving forward with a
 resolution on this (perhaps we can all agree on that).

 I would like one more discussion thread where the technical discussion
 can take place.I will make a plea that we keep this discussion as free
 from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as
 we can.   I can't guarantee that I personally will succeed at that, but I
 can tell you that I will try.   That's all I'm asking of anyone else.I
 recognize that there are a lot of other issues at play here besides *just*
 the technical questions, but we are not going to resolve every community
 issue in this technical thread.

 We need concrete proposals and so I will start with three.   Please feel
 free to comment on these proposals or add your own during the discussion.
  I will stop paying attention to this thread next Wednesday (May 16th) (or
 earlier if the thread dies) and hope that by that time we can agree on a
 way forward.  If we don't have agreement, then I will move forward with
 what I think is the right approach.   I will either write the code myself
 or convince someone else to write it.

 In all cases, we have agreement that bit-pattern dtypes should be added
 to NumPy.  We should work on these (int32, float64, complex64, str,
 bool) to start.So, the three proposals are independent of this way
 forward.   The proposals are all about the extra mask part:

 My three proposals:

 * do nothing and leave things as is

 * add a global flag that turns off masked array support by default but
 otherwise leaves things unchanged (I'm still unclear how this would work
 exactly)

 * move Mark's masked ndarray objects into a new fundamental type
 (ndmasked), leaving the actual ndarray type unchanged.  The array_interface
 keeps the masked array notions and the ufuncs keep the ability to handle
 arrays like ndmasked.Ideally, numpy.ma would be changed to use
 ndmasked objects as their core.

 For the record, I'm currently in favor of the third proposal.   Feel free
 to comment on these proposals (or provide your own).


 I'm most in favour of the second proposal. It won't take very much effort,
 and more clearly marks off this code as experimental than just
 documentation notes.


 Mark will you give more details about this proposal?How would the flag
 work, what would it modify?


The idea is inspired in part by the Chrome release cycle, which has a
presentation here:

https://docs.google.com/present/view?id=dg63dpc6_4d7vkk6chpli=1

Some quotes:

Features should be engineered so that they can be disabled easily (1 patch)

and

Would large feature development still be possible?

Yes, engineers would have to work behind flags, however they can work for
as many releases as they need to and can remove the flag when they are
done.


The current numpy codebase isn't designed for this kind of workflow, but I
think we can productively emulate the idea for a big feature like NA
support.

One way to do this flag would be to have a numpy.experimental namespace
which is not imported by default. To enable the NA-mask feature, you could
do:

 import numpy.experimental.maskna

This would trigger an ExperimentalWarning to message that an experimental
feature has been enabled, and would add any NA-specific symbols to the
numpy namespace (NA, NAType, etc). Without this import, any operation which
would create an NA or NA-masked array raises an ExperimentalError instead
of succeeding. After this import, things would behave as they do now.

Cheers,
Mark

The proposal to create a ndmasked object that is separate from ndarray
 objects also won't take much effort and also marks off the object so those
 who want to use it can and those who don't are not pushed into using it
 anyway.

 -Travis


 Thanks,
 -Mark



 Best regards,

 -Travis


 

Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Travis Oliphant
 My three proposals: 
 
   * do nothing and leave things as is 
 
   * add a global flag that turns off masked array support by default but 
 otherwise leaves things unchanged (I'm still unclear how this would work 
 exactly)
 
   * move Mark's masked ndarray objects into a new fundamental type 
 (ndmasked), leaving the actual ndarray type unchanged.  The array_interface 
 keeps the masked array notions and the ufuncs keep the ability to handle 
 arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked 
 objects as their core. 
 
 
 The numpy.ma is unmaintained and I don't see that changing anytime soon. As 
 you know, I would prefer 1), but 2) is a good compromise and the infra 
 structure for such a flag could be useful for other things, although like 
 yourself I'm not sure how it would be implemented. I don't understand your 
 proposal for 3), but from the description I don't see that it buys anything.

That is a bit strong to call numpy.ma unmaintained.I don't consider it that 
way.Are there a lot of tickets for it that are unaddressed?   Is it broken? 
  I know it gets a lot of use in the wild and so I don't think NumPy users 
would be happy to here it is considered unmaintained by NumPy developers. 

I'm looking forward to more details of Mark's proposal for #2. 

The proposal for #3 is quite simple and I think it is also a good compromise 
between removing the masked array entirely from the core NumPy object and 
leaving things as is in master.  It keeps the functionality (but in a separate 
object) much like numpy.ma is a separate object.   Basically it buys not 
forcing *all* NumPy users (on the C-API level) to now deal with a masked array. 
   I know this push is a feature that is part of Mark's intention (as it pushes 
downstream libraries to think about missing data at a fundamental level).
But, I think this is too big of a change to put in a 1.X release.   The 
internal array-model used by NumPy is used quite extensively in downstream 
libraries as a *concept*.  Many people have enhanced this model with a separate 
mask array for various reasons, and Mark's current use of mask does not satisfy 
all those use-cases.   I don't see how we can justify changing the NumPy 1.X 
memory model under these circumstances. 

This is the sort of change that in my mind is a NumPy 2.0 kind of change where 
downstream users will be looking for possible array-model changes.  

-Travis





  
 For the record, I'm currently in favor of the third proposal.   Feel free to 
 comment on these proposals (or provide your own). 
 
 
 Chuck 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Travis Oliphant
 Mark will you give more details about this proposal?How would the flag 
 work, what would it modify?
 
 The idea is inspired in part by the Chrome release cycle, which has a 
 presentation here:
 
 https://docs.google.com/present/view?id=dg63dpc6_4d7vkk6chpli=1
 
 Some quotes:
 Features should be engineered so that they can be disabled easily (1 patch)
 and
 Would large feature development still be possible?
 
 Yes, engineers would have to work behind flags, however they can work for as 
 many releases as they need to and can remove the flag when they are done.
 
 The current numpy codebase isn't designed for this kind of workflow, but I 
 think we can productively emulate the idea for a big feature like NA support.
 
 One way to do this flag would be to have a numpy.experimental namespace 
 which is not imported by default. To enable the NA-mask feature, you could do:
 
  import numpy.experimental.maskna
 
 This would trigger an ExperimentalWarning to message that an experimental 
 feature has been enabled, and would add any NA-specific symbols to the numpy 
 namespace (NA, NAType, etc). Without this import, any operation which would 
 create an NA or NA-masked array raises an ExperimentalError instead of 
 succeeding. After this import, things would behave as they do now.

How would this flag work at the C-API level? 

-Travis


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Dag Sverre Seljebotn
On 05/09/2012 06:46 PM, Travis Oliphant wrote:
 Hey all,

 Nathaniel and Mark have worked very hard on a joint document to try and
 explain the current status of the missing-data debate. I think they've
 done an amazing job at providing some context, articulating their views
 and suggesting ways forward in a mutually respectful manner. This is an
 exemplary collaboration and is at the core of why open source is valuable.

 The document is available here:
 https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst

 After reading that document, it appears to me that there are some
 fundamentally different views on how things should move forward. I'm
 also reading the document incorporating my understanding of the history,
 of NumPy as well as all of the users I've met and interacted with which
 means I have my own perspective that is not necessarily incorporated
 into that document but informs my recommendations. I'm not sure we can
 reach full consensus on this. We are also well past time for moving
 forward with a resolution on this (perhaps we can all agree on that).

 I would like one more discussion thread where the technical discussion
 can take place. I will make a plea that we keep this discussion as free
 from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as
 we can. I can't guarantee that I personally will succeed at that, but I
 can tell you that I will try. That's all I'm asking of anyone else. I
 recognize that there are a lot of other issues at play here besides
 *just* the technical questions, but we are not going to resolve every
 community issue in this technical thread.

 We need concrete proposals and so I will start with three. Please feel
 free to comment on these proposals or add your own during the
 discussion. I will stop paying attention to this thread next Wednesday
 (May 16th) (or earlier if the thread dies) and hope that by that time we
 can agree on a way forward. If we don't have agreement, then I will move
 forward with what I think is the right approach. I will either write the
 code myself or convince someone else to write it.

 In all cases, we have agreement that bit-pattern dtypes should be added
 to NumPy. We should work on these (int32, float64, complex64, str, bool)
 to start. So, the three proposals are independent of this way forward.
 The proposals are all about the extra mask part:

 My three proposals:

 * do nothing and leave things as is

 * add a global flag that turns off masked array support by default but
 otherwise leaves things unchanged (I'm still unclear how this would work
 exactly)

 * move Mark's masked ndarray objects into a new fundamental type
 (ndmasked), leaving the actual ndarray type unchanged. The
 array_interface keeps the masked array notions and the ufuncs keep the
 ability to handle arrays like ndmasked. Ideally, numpy.ma
 http://numpy.ma would be changed to use ndmasked objects as their core.

 For the record, I'm currently in favor of the third proposal. Feel free
 to comment on these proposals (or provide your own).


Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark!

The third proposal is certainly the best one from Cython's perspective; 
and I imagine for those writing C extensions against the C API too. 
Having PyType_Check fail for ndmasked is a very good way of having code 
fail that is not written to take masks into account.

If it is in ndarray we would also have some pressure to add support in 
Cython, with ndmasked we avoid that too. Likely outcome is we won't ever 
support it either way, but then we need some big warning in the docs, 
and it's better to avoid that. (I guess be +0 on Mark Florisson 
implementing it if it ends up in core ndarray; I'd almost certainly not 
do it myself.)

That covers Cython. My view as a NumPy user follows.

I'm a heavy user of masks, which are used to make data NA in the 
statistical sense. The setting is that we have to mask out the radiation 
coming from the Milky Way in full-sky images of the Cosmic Microwave 
Background. There's data, but we know we can't trust it, so we make it 
NA. But we also do play around with different masks.

Today we keep the mask in a seperate array, and to zero-mask we do

masked_data = data * mask

or

masked_data = data.copy()
masked_data[mask == 0] = np.nan # soon np.NA

depending on the circumstances.

Honestly, API-wise, this is as good as its gets for us. Nice and 
transparent, no new semantics to learn in the special case of masks.

Now, this has performance issues: Lots of memory use, extra transfers 
over the memory bus.

BUT, NumPy has that problem all over the place, even for x + y + z! 
Solving it in the special case of masks, by making a new API, seems a 
bit myopic to me.

IMO, that's much better solved at the fundamental level. As an 
*illustration*:

with np.lazy:
 masked_data1 = data * mask1
 masked_data2 = data * (mask1 | mask2)
 masked_data3 = (x + y + z) * (mask1  mask3)

This would 

Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Charles R Harris
On Wed, May 9, 2012 at 1:35 PM, Travis Oliphant tra...@continuum.io wrote:

 My three proposals:

 * do nothing and leave things as is

 * add a global flag that turns off masked array support by default but
 otherwise leaves things unchanged (I'm still unclear how this would work
 exactly)

 * move Mark's masked ndarray objects into a new fundamental type
 (ndmasked), leaving the actual ndarray type unchanged.  The array_interface
 keeps the masked array notions and the ufuncs keep the ability to handle
 arrays like ndmasked.Ideally, numpy.ma would be changed to use
 ndmasked objects as their core.


 The numpy.ma is unmaintained and I don't see that changing anytime soon.
 As you know, I would prefer 1), but 2) is a good compromise and the infra
 structure for such a flag could be useful for other things, although like
 yourself I'm not sure how it would be implemented. I don't understand your
 proposal for 3), but from the description I don't see that it buys anything.


 That is a bit strong to call numpy.ma unmaintained.I don't consider
 it that way.Are there a lot of tickets for it that are unaddressed?
 Is it broken?   I know it gets a lot of use in the wild and so I don't
 think NumPy users would be happy to here it is considered unmaintained by
 NumPy developers.

 I'm looking forward to more details of Mark's proposal for #2.

 The proposal for #3 is quite simple and I think it is also a good
 compromise between removing the masked array entirely from the core NumPy
 object and leaving things as is in master.  It keeps the functionality (but
 in a separate object) much like numpy.ma is a separate object.
   Basically it buys not forcing *all* NumPy users (on the C-API level) to
 now deal with a masked array.


To me, it looks like we will get stuck with a more complicated
implementation without changing the API, something that 2) achieves more
easily while providing a feature likely to be useful as we head towards 2.0.


 I know this push is a feature that is part of Mark's intention (as it
 pushes downstream libraries to think about missing data at a fundamental
 level).But, I think this is too big of a change to put in a 1.X
 release.   The internal array-model used by NumPy is used quite extensively
 in downstream libraries as a *concept*.  Many people have enhanced this
 model with a separate mask array for various reasons, and Mark's current
 use of mask does not satisfy all those use-cases.   I don't see how we can
 justify changing the NumPy 1.X memory model under these circumstances.


You keep referring to these ghostly people and their unspecified uses, no
doubt to protect the guilty. You don't have to name names, but a little
detail on what they have done and how they use things would be *very*
helpful.


 This is the sort of change that in my mind is a NumPy 2.0 kind of change
 where downstream users will be looking for possible array-model changes.


We tried the flag day approach to 2.0 already and it failed. I think it
better to have a long term release and a series of releases thereafter
moving step by step with incremental changes towards a 2.0. Mark's 2) would
support that approach.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Documentation roles in the numpy/scipy documentation editor

2012-05-09 Thread Joe Harrington
We considered lowering the review standard near the end of my direct
involvement in the doc project but decided not to.  You didn't mention
any benefit to the proposed changes, so while I'm not active in the doc
project anymore, let me relate our decision.

It's often the case that docstrings get written fast, and it's usually
the case that they're written by a single person, who has a single
perspective.  We wanted to make docs that were professional, that could
be placed next to the manuals for IDL, Matlab, etc. without
embarrassment.  So, we set up a system similar to academic publishing.
Every docstring would be seen by two sets of critical eyes, and for
major X.0 releases we'd pay a proofreader to spend a few days to polish
off the English and get the style totally consistent.

At the same time, we needed to get something decent in every docstring
fast, so we made that the priority.  About the time we achieved that,
money ran out.  So, lots of docstrings are in needs review or even
being edited status.  But that doesn't mean money will never come
again.  Indeed, there are now several companies basing their services
around this software.  If someone does want to make the docs
professional, say for numpy 2.0 or 3.0 or whatever, or as part of a
larger system for sale, then they have a system in place that can do it.

The purpose of the review statuses is to identify how close a docstring
is to publishable.  However, there is no consequence to the statuses: a
docstring gets included in the release no matter its status.  But, you
do know which docstrings need what kind of work.  So, what's the benefit
of changing what the statuses mean, or eliminating them?  I think it may
only be that the writers feel better.  The users don't even see the
statuses as they're not listed in the release.

Tim felt that docs should be continually edited, not finished.  I
agree, especially if the underlying routine or surrounding docs get
changed.  But the system is designed to encourage this!  Here's how:

Say most/all routines get genuine proofed status.  That's great, but
it's not the end of the line by any means.  If someone comes along and
edits a proofed docstring, that docstring then automatically needs
review once again, to ensure that a mistake was not inserted.  Now you
know what to look at when checking things over before a release (since
there can't be unit tests for docs).  From the history, you also know it
was once proofed, so reviewing and proofing it is very easy just by
looking at the diffs.

So, the system encourages and accounts for continual edits while
allowing a professional product to be produced for a particular release.

The way to move forward is to declare that the goal is to get all docs
to some status, say needs review (that was our initial goal, and the
only one we achieved, more or less).  Then, go after the docs that don't
have that, like the new polynomial docs.  If someone wants to publish a
manual, the goal becomes proofed, and there's more work to do.

It DOES make sense to give the reviewer role to more people.  Just make
sure they take care in their reviews, so the statuses continue to have
meaning.  Otherwise what's the point?

--jh--

On Mon, 7 May 2012 22:14:56, Ralf Gommers ralf.gomm...@googlemail.com wrote:
On Mon, May 7, 2012 at 7:37 PM, Tim Cera t...@cerazone.net wrote:

 I think we should change the roles established for the Numpy/Scipy
 documentation editors because they do not work as intended.

 For reference they are described here:
 http://docs.scipy.org/numpy/Front%20Page/

 Basically there aren't that many active people to support being split into
 the roles as described which has led to a backlog of 'Needs review'
 docstrings and only one  'Proofed' docstring.  I think that many of these
 docstrings are good enough, just that not enough people have put themselves
 out front as so knowledgeable about a certain topic to label docstrings as
 'Reviewed' or 'Proofed'.

 You're right. I think at some point the goal shifted from getting
everything to proofed to getting everything to needs review.


 Here are the current statistics for numpy docstrings:
 Current %Count Needs editing17 279 Being written / Changed4 62 Needs
 review76 1235 Needs review (revised)2 35 Needs work (reviewed)0 3Reviewed 
 (needs proof)
 0 0 Proofed0 1 Unimportant? 1793

 The needs editing category actually contains mostly docstrings that are
quite good, but were recently created and never edited in the doc wiki. The
% keeps on growing. Bumping all polynomial docstrings up to needs review
would be a good start here to make the % reflect the actual status.


 I have thought about some solutions in no particular order:

 * Get rid of the 'Reviewer' and 'Proofer' roles.
 * Assign all 'Editors', the 'Reviewer', and 'Proofer' privileges.
 * People start out as 'Editors', and then become 'Reviewers', and
 'Proofers' based on some editing metric.

 For full disclosure, I would be generous with a 'Reviewed' label 

Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Travis Oliphant
On re-reading, I want to make a couple of things clear:   

1) This wrap-up discussion is *only* for what to do for NumPy 1.7 in 
such a way that we don't tie our hands in the future.I do not believe we 
can figure out what to do for masked arrays in one short week.   What happens 
beyond NumPy 1.7 should be still discussed and explored.My urgency is 
entirely about moving forward from where we are in master right now in a 
direction that we can all accept.  The tight timeline is so that we do 
*something* and move forward.

2) I missed another possible proposal for NumPy 1.7 which is in the 
write-up that Mark and Nathaniel made:  remove the masked array additions 
entirely possibly moving them to another module like numpy-dtypes.

Again, these are only for NumPy 1.7.   What happens in any future NumPy and 
beyond will depend on who comes to the table for both discussion and 
code-development. 

Best regards,

-Travis



On May 9, 2012, at 11:46 AM, Travis Oliphant wrote:

 Hey all, 
 
 Nathaniel and Mark have worked very hard on a joint document to try and 
 explain the current status of the missing-data debate.   I think they've done 
 an amazing job at providing some context, articulating their views and 
 suggesting ways forward in a mutually respectful manner.   This is an 
 exemplary collaboration and is at the core of why open source is valuable. 
 
 The document is available here: 
https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst
 
 After reading that document, it appears to me that there are some 
 fundamentally different views on how things should move forward.   I'm also 
 reading the document incorporating my understanding of the history, of NumPy 
 as well as all of the users I've met and interacted with which means I have 
 my own perspective that is not necessarily incorporated into that document 
 but informs my recommendations.I'm not sure we can reach full consensus 
 on this. We are also well past time for moving forward with a resolution 
 on this (perhaps we can all agree on that). 
 
 I would like one more discussion thread where the technical discussion can 
 take place.I will make a plea that we keep this discussion as free from 
 logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can.   I 
 can't guarantee that I personally will succeed at that, but I can tell you 
 that I will try.   That's all I'm asking of anyone else.I recognize that 
 there are a lot of other issues at play here besides *just* the technical 
 questions, but we are not going to resolve every community issue in this 
 technical thread. 
 
 We need concrete proposals and so I will start with three.   Please feel free 
 to comment on these proposals or add your own during the discussion.I 
 will stop paying attention to this thread next Wednesday (May 16th) (or 
 earlier if the thread dies) and hope that by that time we can agree on a way 
 forward.  If we don't have agreement, then I will move forward with what I 
 think is the right approach.   I will either write the code myself or 
 convince someone else to write it. 
 
 In all cases, we have agreement that bit-pattern dtypes should be added to 
 NumPy.  We should work on these (int32, float64, complex64, str, bool) to 
 start.So, the three proposals are independent of this way forward.   The 
 proposals are all about the extra mask part:  
 
 My three proposals: 
 
   * do nothing and leave things as is 
 
   * add a global flag that turns off masked array support by default but 
 otherwise leaves things unchanged (I'm still unclear how this would work 
 exactly)
 
   * move Mark's masked ndarray objects into a new fundamental type 
 (ndmasked), leaving the actual ndarray type unchanged.  The array_interface 
 keeps the masked array notions and the ufuncs keep the ability to handle 
 arrays like ndmasked.Ideally, numpy.ma would be changed to use ndmasked 
 objects as their core. 
 
 For the record, I'm currently in favor of the third proposal.   Feel free to 
 comment on these proposals (or provide your own). 
 
 Best regards,
 
 -Travis
 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Nathaniel Smith
On Wed, May 9, 2012 at 5:46 PM, Travis Oliphant tra...@continuum.io wrote:
 Hey all,

 Nathaniel and Mark have worked very hard on a joint document to try and
 explain the current status of the missing-data debate.   I think they've
 done an amazing job at providing some context, articulating their views and
 suggesting ways forward in a mutually respectful manner.   This is an
 exemplary collaboration and is at the core of why open source is valuable.

 The document is available here:
    https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst

 After reading that document, it appears to me that there are some
 fundamentally different views on how things should move forward.   I'm also
 reading the document incorporating my understanding of the history, of NumPy
 as well as all of the users I've met and interacted with which means I have
 my own perspective that is not necessarily incorporated into that document
 but informs my recommendations.    I'm not sure we can reach full consensus
 on this.     We are also well past time for moving forward with a resolution
 on this (perhaps we can all agree on that).

If we're talking about deciding what to do for the 1.7 release branch,
then I agree. Otherwise, I definitely don't. We really just don't
*know* what our users need with regards to mask-based storage versions
of missing data, so committing to something within a short time period
will just guarantee we have to re-do it all again later.

[Edit: I see that you've clarified this in a follow-up email -- great!]

 We need concrete proposals and so I will start with three.   Please feel
 free to comment on these proposals or add your own during the discussion.
  I will stop paying attention to this thread next Wednesday (May 16th) (or
 earlier if the thread dies) and hope that by that time we can agree on a way
 forward.  If we don't have agreement, then I will move forward with what I
 think is the right approach.   I will either write the code myself or
 convince someone else to write it.

Again, I'm assuming that what you mean here is that we can't and
shouldn't delay 1.7 indefinitely for this discussion to play out, so
you're proposing that we give ourselves a deadline of 1 week to decide
how to at least get the release unblocked. Let me know if I'm
misreading, though...

 In all cases, we have agreement that bit-pattern dtypes should be added to
 NumPy.      We should work on these (int32, float64, complex64, str, bool)
 to start.    So, the three proposals are independent of this way forward.
 The proposals are all about the extra mask part:

 My three proposals:

 * do nothing and leave things as is

In the context of 1.7, this seems like a non-starter at this point, at
least if we're going to move in the direction of making decisions by
consensus. It might well be that we'll decide that the current
NEP-like API is what we want (or that some compatible super-set is).
But (as described in more detail in the NA-overview document), I think
there are still serious questions to work out about how and whether a
masked-storage/NA-semantics API is something we want as part of the
ndarray object at all. And Ralf with his release-manager hat says that
he doesn't want to release the current API unless we can guarantee
that some version of it will continue to be supported. To me that
suggests that this is off the table for 1.7.

 * add a global flag that turns off masked array support by default but
 otherwise leaves things unchanged (I'm still unclear how this would work
 exactly)

I've been assuming something like a global variable, and some guards
added to all the top-level functions that take maskna= arguments, so
that it's impossible to construct an ndarray that has its maskna
flag set to True unless the flag has been toggled.

As I said in NA-overview, I'd be fine with this in principle, but only
if we're certain we're okay with the ABI consequences. And we should
be clear on the goal -- if we just want to let people play with the
API, then there are other options, such as my little experiment:
  https://github.com/njsmith/numpyNEP
(This is certainly less robust, but it works, and is probably a much
easier base for modifications to test alternative APIs.) If the goal
is just to keep the code in master, then that's fine too, though it
has both costs and benefits. (An example of a cost is that its
presence may complicate adding bitpattern NA support.)

 * move Mark's masked ndarray objects into a new fundamental type
 (ndmasked), leaving the actual ndarray type unchanged.  The array_interface
 keeps the masked array notions and the ufuncs keep the ability to handle
 arrays like ndmasked.    Ideally, numpy.ma would be changed to use ndmasked
 objects as their core.

If we're talking about 1.7, then what kind of status do you propose
these new objects would have in 1.7? Regular feature, totally
experimental, something else?

My only objection to this proposal is that committing to this approach

Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Matthew Brett
Hi,

On Wed, May 9, 2012 at 12:44 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 On 05/09/2012 06:46 PM, Travis Oliphant wrote:
 Hey all,

 Nathaniel and Mark have worked very hard on a joint document to try and
 explain the current status of the missing-data debate. I think they've
 done an amazing job at providing some context, articulating their views
 and suggesting ways forward in a mutually respectful manner. This is an
 exemplary collaboration and is at the core of why open source is valuable.

 The document is available here:
 https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst

 After reading that document, it appears to me that there are some
 fundamentally different views on how things should move forward. I'm
 also reading the document incorporating my understanding of the history,
 of NumPy as well as all of the users I've met and interacted with which
 means I have my own perspective that is not necessarily incorporated
 into that document but informs my recommendations. I'm not sure we can
 reach full consensus on this. We are also well past time for moving
 forward with a resolution on this (perhaps we can all agree on that).

 I would like one more discussion thread where the technical discussion
 can take place. I will make a plea that we keep this discussion as free
 from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as
 we can. I can't guarantee that I personally will succeed at that, but I
 can tell you that I will try. That's all I'm asking of anyone else. I
 recognize that there are a lot of other issues at play here besides
 *just* the technical questions, but we are not going to resolve every
 community issue in this technical thread.

 We need concrete proposals and so I will start with three. Please feel
 free to comment on these proposals or add your own during the
 discussion. I will stop paying attention to this thread next Wednesday
 (May 16th) (or earlier if the thread dies) and hope that by that time we
 can agree on a way forward. If we don't have agreement, then I will move
 forward with what I think is the right approach. I will either write the
 code myself or convince someone else to write it.

 In all cases, we have agreement that bit-pattern dtypes should be added
 to NumPy. We should work on these (int32, float64, complex64, str, bool)
 to start. So, the three proposals are independent of this way forward.
 The proposals are all about the extra mask part:

 My three proposals:

 * do nothing and leave things as is

 * add a global flag that turns off masked array support by default but
 otherwise leaves things unchanged (I'm still unclear how this would work
 exactly)

 * move Mark's masked ndarray objects into a new fundamental type
 (ndmasked), leaving the actual ndarray type unchanged. The
 array_interface keeps the masked array notions and the ufuncs keep the
 ability to handle arrays like ndmasked. Ideally, numpy.ma
 http://numpy.ma would be changed to use ndmasked objects as their core.

 For the record, I'm currently in favor of the third proposal. Feel free
 to comment on these proposals (or provide your own).


 Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark!

Yes, it is very well written, my compliments to the chefs.

 The third proposal is certainly the best one from Cython's perspective;
 and I imagine for those writing C extensions against the C API too.
 Having PyType_Check fail for ndmasked is a very good way of having code
 fail that is not written to take masks into account.

Mark, Nathaniel - can you comment how your chosen approaches would
interact with extension code?

I'm guessing the bitpattern dtypes would be expected to cause
extension code to choke if the type is not supported?

Mark - in :

https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#cython

- do I understand correctly that you think that Cython and other
extension writers should use the numpy API to access the data rather
than accessing it directly via the data pointer and strides?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] Announce: scikit-learn v0.11

2012-05-09 Thread Gael Varoquaux
On Wed, May 09, 2012 at 06:55:12PM +0200, klo uo wrote:
 This news did not arrive at scikit-learn-gene...@lists.sourceforge.net
 Is above list deprecated?

Andy Mueller did the announcement on the scikit-learn mailing list.

 BTW thanks for supporting and working on this project ;)

Thank you very much, it is my pleasure. But it's really a team that you
need to thank: the number of active contributors is huge.

Cheers,

Gael


 On Tue, May 8, 2012 at 1:13 AM, Gael Varoquaux
 gael.varoqu...@normalesup.org wrote:
    On behalf of Andy Mueller, our release manager, I am happy to announce
    the 0.11 release of scikit-learn.

    This release includes some major new features such as randomized
    sparse models, gradient boosted regression trees, label propagation
    and many more. The release also has major improvements in the
    documentation and in stability.

    Details can be found on the [1]what's new page.

    We also have a new page with [2]video tutorials on machine learning
    with scikit-learn and different aspects of the package.

    Sources and windows binaries are available on sourceforge,
    through pypi (http://pypi.python.org/pypi/scikit-learn/0.11) or
    can be installed directly using pip:

    pip install -U scikit-learn

    Thanks again to all the contributors who made this release possible.

    Cheers,

     Gaël

    1. http://scikit-learn.org/stable/whats_new.html
    2. http://scikit-learn.org/stable/presentations.html

  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone:  ++ 33-1-69-08-79-68
http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Paul Ivanov
On Wed, May 9, 2012 at 3:12 PM, Travis Oliphant tra...@continuum.io wrote:

 On re-reading, I want to make a couple of things clear:

 1) This wrap-up discussion is *only* for what to do for NumPy 1.7 in
 such a way that we don't tie our hands in the future.I do not believe
 we can figure out what to do for masked arrays in one short week.   What
 happens beyond NumPy 1.7 should be still discussed and explored.My
 urgency is entirely about moving forward from where we are in master right
 now in a direction that we can all accept.  The tight timeline is so
 that we do *something* and move forward.

 2) I missed another possible proposal for NumPy 1.7 which is in the
 write-up that Mark and Nathaniel made:  remove the masked array additions
 entirely possibly moving them to another module like numpy-dtypes.

 Again, these are only for NumPy 1.7.   What happens in any future NumPy
 and beyond will depend on who comes to the table for both discussion and
 code-development.


I'm glad that this sentence made it into the write-up: A project like
numpy requires developers to write code for advancement to occur, and
obstacles that impede the writing of code discourage existing developers
from contributing more, and potentially scare away developers who are
thinking about joining in. I agree, which is why I'm a little surprised
after reading the write-up that there's no deference to the alterNEP
(admittedly kludgy) implementation? One of the arguments made for the NEP
preliminary NA-mask implementation is that has been extensively tested
against scipy and other third-party packages, and has been in master in a
stable state for a significant amount of time. It is my understanding that
the manner in which this implementation found its way into master was a
source of concern and contention. To me (and I don't know the level to
which this is a technically feasible) that's precisely the reason that BOTH
approaches be allowed to make their way into numpy with experimental
status. Otherwise, it seems that there is a sort of scaring away of
developers - seeing (from the sidelines) how much of a struggle it's been
for the alterNEP to find a nurturing environment as an experimental
alternative inside numpy. In my reading, the process and consensus threads
that have generated so many responses stem precisely from trying to have an
atmosphere where everyone is encouraged to join in. The alternatives
proposed so far (though I do understand it's only for 1.7) do not suggest
an appreciation for the gravity of the fallout from the neglect the
alterNEP and the issues which sprang forth from that.

Importantly, I find a problem with how personal this document (and
discussion) is - I'd much prefer if we talk about technical things by a
descriptive name, not the person who thought of it. You'll note how I've
been referring to NEP and alterNEP above. One advantage of this is that
down the line, if either Mark or Nathaniel change their minds about their
current preferred way forward, it doesn't take the wind out of it with
something like Even Paul changed his mind and now withdraws his support of
Paul's proposal. We should only focus on the technical merits of a given
approach, not how many commits have been made by the person proposing them
or what else they've done in their life: a good idea has value regardless
of who expresses it. In my fantasy world, with both approaches clearly
existing in an experimental sandbox inside numpy, folks who feel primary
attachments to either NEP or alterNEP would be willing to cross party lines
and pitch in towardd making progress in both camps. That's the way we'll
find better solutions, by working together, instead of working in
opposition.

best,
-- 
Paul Ivanov
314 address only used for lists,  off-list direct email at:
http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Charles R Harris
On Wed, May 9, 2012 at 6:13 PM, Paul Ivanov pivanov...@gmail.com wrote:



 On Wed, May 9, 2012 at 3:12 PM, Travis Oliphant tra...@continuum.iowrote:

 On re-reading, I want to make a couple of things clear:

 1) This wrap-up discussion is *only* for what to do for NumPy 1.7 in
 such a way that we don't tie our hands in the future.I do not believe
 we can figure out what to do for masked arrays in one short week.   What
 happens beyond NumPy 1.7 should be still discussed and explored.My
 urgency is entirely about moving forward from where we are in master right
 now in a direction that we can all accept.  The tight timeline is so
 that we do *something* and move forward.

 2) I missed another possible proposal for NumPy 1.7 which is in the
 write-up that Mark and Nathaniel made:  remove the masked array additions
 entirely possibly moving them to another module like numpy-dtypes.

 Again, these are only for NumPy 1.7.   What happens in any future NumPy
 and beyond will depend on who comes to the table for both discussion and
 code-development.


 I'm glad that this sentence made it into the write-up: A project like
 numpy requires developers to write code for advancement to occur, and
 obstacles that impede the writing of code discourage existing developers
 from contributing more, and potentially scare away developers who are
 thinking about joining in. I agree, which is why I'm a little surprised
 after reading the write-up that there's no deference to the alterNEP
 (admittedly kludgy) implementation? One of the arguments made for the NEP
 preliminary NA-mask implementation is that has been extensively tested
 against scipy and other third-party packages, and has been in master in a
 stable state for a significant amount of time. It is my understanding that
 the manner in which this implementation found its way into master was a
 source of concern and contention. To me (and I don't know the level to
 which this is a technically feasible) that's precisely the reason that BOTH
 approaches be allowed to make their way into numpy with experimental
 status. Otherwise, it seems that there is a sort of scaring away of
 developers - seeing (from the sidelines) how much of a struggle it's been
 for the alterNEP to find a nurturing environment as an experimental
 alternative inside numpy. In my reading, the process and consensus threads
 that have generated so many responses stem precisely from trying to have an
 atmosphere where everyone is encouraged to join in. The alternatives
 proposed so far (though I do understand it's only for 1.7) do not suggest
 an appreciation for the gravity of the fallout from the neglect the
 alterNEP and the issues which sprang forth from that.

 Importantly, I find a problem with how personal this document (and
 discussion) is - I'd much prefer if we talk about technical things by a
 descriptive name, not the person who thought of it. You'll note how I've
 been referring to NEP and alterNEP above. One advantage of this is that
 down the line, if either Mark or Nathaniel change their minds about their
 current preferred way forward, it doesn't take the wind out of it with
 something like Even Paul changed his mind and now withdraws his support of
 Paul's proposal. We should only focus on the technical merits of a given
 approach, not how many commits have been made by the person proposing them
 or what else they've done in their life: a good idea has value regardless
 of who expresses it. In my fantasy world, with both approaches clearly
 existing in an experimental sandbox inside numpy, folks who feel primary
 attachments to either NEP or alterNEP would be willing to cross party lines
 and pitch in towardd making progress in both camps. That's the way we'll
 find better solutions, by working together, instead of working in
 opposition.


We are certainly open to code submissions and alternate implementations.
The experimental tag would help there. But someone, as you mention, needs
to write the code.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1

2012-05-09 Thread Charles R Harris
On Wed, May 9, 2012 at 12:40 PM, Sandro Tosi matrixh...@gmail.com wrote:

 On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
  Please test this release and report any issues on the numpy-discussion
  mailing list.

 I think it's probably nice not to ship pyc in the source tarball:

 $ find numpy-1.6.2rc1/ -name *.pyc
 numpy-1.6.2rc1/doc/sphinxext/docscrape.pyc
 numpy-1.6.2rc1/doc/sphinxext/docscrape_sphinx.pyc
 numpy-1.6.2rc1/doc/sphinxext/numpydoc.pyc
 numpy-1.6.2rc1/doc/sphinxext/plot_directive.pyc


Good point ;)

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Dag Sverre Seljebotn
On 05/10/2012 01:01 AM, Matthew Brett wrote:
 Hi,

 On Wed, May 9, 2012 at 12:44 PM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no  wrote:
 On 05/09/2012 06:46 PM, Travis Oliphant wrote:
 Hey all,

 Nathaniel and Mark have worked very hard on a joint document to try and
 explain the current status of the missing-data debate. I think they've
 done an amazing job at providing some context, articulating their views
 and suggesting ways forward in a mutually respectful manner. This is an
 exemplary collaboration and is at the core of why open source is valuable.

 The document is available here:
 https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst

 After reading that document, it appears to me that there are some
 fundamentally different views on how things should move forward. I'm
 also reading the document incorporating my understanding of the history,
 of NumPy as well as all of the users I've met and interacted with which
 means I have my own perspective that is not necessarily incorporated
 into that document but informs my recommendations. I'm not sure we can
 reach full consensus on this. We are also well past time for moving
 forward with a resolution on this (perhaps we can all agree on that).

 I would like one more discussion thread where the technical discussion
 can take place. I will make a plea that we keep this discussion as free
 from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as
 we can. I can't guarantee that I personally will succeed at that, but I
 can tell you that I will try. That's all I'm asking of anyone else. I
 recognize that there are a lot of other issues at play here besides
 *just* the technical questions, but we are not going to resolve every
 community issue in this technical thread.

 We need concrete proposals and so I will start with three. Please feel
 free to comment on these proposals or add your own during the
 discussion. I will stop paying attention to this thread next Wednesday
 (May 16th) (or earlier if the thread dies) and hope that by that time we
 can agree on a way forward. If we don't have agreement, then I will move
 forward with what I think is the right approach. I will either write the
 code myself or convince someone else to write it.

 In all cases, we have agreement that bit-pattern dtypes should be added
 to NumPy. We should work on these (int32, float64, complex64, str, bool)
 to start. So, the three proposals are independent of this way forward.
 The proposals are all about the extra mask part:

 My three proposals:

 * do nothing and leave things as is

 * add a global flag that turns off masked array support by default but
 otherwise leaves things unchanged (I'm still unclear how this would work
 exactly)

 * move Mark's masked ndarray objects into a new fundamental type
 (ndmasked), leaving the actual ndarray type unchanged. The
 array_interface keeps the masked array notions and the ufuncs keep the
 ability to handle arrays like ndmasked. Ideally, numpy.ma
 http://numpy.ma  would be changed to use ndmasked objects as their core.

 For the record, I'm currently in favor of the third proposal. Feel free
 to comment on these proposals (or provide your own).


 Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark!

 Yes, it is very well written, my compliments to the chefs.

 The third proposal is certainly the best one from Cython's perspective;
 and I imagine for those writing C extensions against the C API too.
 Having PyType_Check fail for ndmasked is a very good way of having code
 fail that is not written to take masks into account.

I want to make something more clear: There are two Cython cases; in the 
case of cdef np.ndarray[double] there is no problem as PEP 3118 access 
will raise an exception for masked arrays.

But, there's the case where you do cdef np.ndarray, and then proceed 
to use PyArray_DATA. Myself I do this more than PEP 3118 access; usually 
because I pass the data pointer to some C or C++ code.

It'd be great to have such code be forward-compatible in the sense that 
it raises an exception when it meets a masked array. Having PyType_Check 
fail seems like the only way? Am I wrong?


 Mark, Nathaniel - can you comment how your chosen approaches would
 interact with extension code?

 I'm guessing the bitpattern dtypes would be expected to cause
 extension code to choke if the type is not supported?

The proposal, as I understand it, is to use that with new dtypes (?). So 
things will often be fine for that reason:

if arr.dtype == np.float32:
 c_function_32bit(np.PyArray_DATA(arr), ...)
else:
 raise ValueError(need 32-bit float array)



 Mark - in :

 https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#cython

 - do I understand correctly that you think that Cython and other
 extension writers should use the numpy API to access the data rather
 than accessing it directly via the data pointer and strides?

That's not really fleshed out (for 

Re: [Numpy-discussion] Masking through generator arrays

2012-05-09 Thread Charles R Harris
On Wed, May 9, 2012 at 9:54 PM, Dag Sverre Seljebotn 
d.s.seljeb...@astro.uio.no wrote:

 Sorry everyone for being so dense and contaminating that other thread.
 Here's a new thread where I can respond to Nathaniel's response.

 On 05/10/2012 01:08 AM, Nathaniel Smith wrote:
   Hi Dag,
  
   On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn
   d.s.seljeb...@astro.uio.no  wrote:
   I'm a heavy user of masks, which are used to make data NA in the
   statistical sense. The setting is that we have to mask out the
 radiation
   coming from the Milky Way in full-sky images of the Cosmic Microwave
   Background. There's data, but we know we can't trust it, so we make it
   NA. But we also do play around with different masks.
  
   Oh, this is great -- that means you're one of the users that I wasn't
   sure existed or not :-). Now I know!
  
   Today we keep the mask in a seperate array, and to zero-mask we do
  
   masked_data = data * mask
  
   or
  
   masked_data = data.copy()
   masked_data[mask == 0] = np.nan # soon np.NA
  
   depending on the circumstances.
  
   Honestly, API-wise, this is as good as its gets for us. Nice and
   transparent, no new semantics to learn in the special case of masks.
  
   Now, this has performance issues: Lots of memory use, extra transfers
   over the memory bus.
  
   Right -- this is a case where (in the NA-overview terminology) masked
   storage+NA semantics would be useful.
  
   BUT, NumPy has that problem all over the place, even for x + y + z!
   Solving it in the special case of masks, by making a new API, seems a
   bit myopic to me.
  
   IMO, that's much better solved at the fundamental level. As an
   *illustration*:
  
   with np.lazy:
masked_data1 = data * mask1
masked_data2 = data * (mask1 | mask2)
masked_data3 = (x + y + z) * (mask1  mask3)
  
   This would create three generator arrays that would zero-mask the
   arrays (and perform the three-term addition...) upon request. You could
   slice the generator arrays as you wish, and by that slice the data and
   the mask in one operation. Obviously this could handle NA-masking too.
  
   You can probably do this today with Theano and numexpr, and I think
   Travis mentioned that generator arrays are on his radar for core
 NumPy.
  
   Implementing this today would require some black magic hacks, because
   on entry/exit to the context manager you'd have to reach up into the
   calling scope and replace all the ndarray's with LazyArrays and then
   vice-versa. This is actually totally possible:
  https://gist.github.com/2347382
   but I'm not sure I'd call it *wise*. (You could probably avoid the
   truly horrible set_globals_dict part of that gist, though.) Might be
   fun to prototype, though...

 1) My main point was just that I believe masked arrays is something that
 to me feels immature, and that it is the kind of thing that should be
 constructed from simpler primitives. And that NumPy should focus on
 simple primitives. You could make it


I can't disagree, as I suggested the same as a possibility myself ;) There
is a lot of infrastructure now in numpy, but given the use cases I'm
tending towards the view that masked arrays should be left to others, at
least for the time being. The question is how to generalize the
infrastructure and what hooks to provide. I think just spending a month or
two pulling stuff out is counter productive, but evolving the code is
definitely needed. If you could familiarize yourself with what is in there,
something that seems largely neglected by the critics, and make
suggestions, that would be helpful.

I'd also like to hear from Mark. It has been about 9 mos since he did the
work, and I'd be surprised if he didn't have ideas for doing some things
differently. OTOH, I can understand his reluctance to get involved in a
topic where I thought he was poorly treated last time around.



 np.gen.generating_multiply(data, mask)

 2) About the with construct in particular, I intended __enter__ and
 __exit__ to only toggle a thread-local flag, and when that flag is in
 effect, __mul__ would do a generating_multiply and return an
 ndarraygenerator rather than an ndarray.

 But of course, the amount of work is massive.


snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-09 Thread Charles R Harris
On Wed, May 9, 2012 at 11:05 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Wednesday, May 9, 2012, Nathaniel Smith wrote:



 My only objection to this proposal is that committing to this approach
 seems premature. The existing masked array objects act quite
 differently from numpy.ma, so why do you believe that they're a good
 foundation for numpy.ma, and why will users want to switch to their
 semantics over numpy.ma's semantics? These aren't rhetorical
 questions, it seems like they must have concrete answers, but I don't
 know what they are.


 Based on the design decisions made in the original NEP, a re-made 
 numpy.mawould have to lose _some_ features particularly, the ability to share
 masks. Save for that and some very obscure behaviors that are undocumented,
 it is possible to remake numpy.ma as a compatibility layer.

 That being said, I think that there are some fundamental questions that
 has concerned. If I recall, there were unresolved questions about behaviors
 surrounding assignments to elements of a view.

 I see the project as broken down like this:
 1.) internal architecture (largely abi issues)
 2.) external architecture (hooks throughout numpy to utilize the new
 features where possible such as where= argument)
 3.) getter/setter semantics
 4.) mathematical semantics

 At this moment, I think we have pieces of 2 and they are fairly
 non-controversial. It is 1 that I see as being the immediate hold-up here.
 3  4 are non-trivial, but because they are mostly about interfaces, I
 think we can be willing to accept some very basic, fundamental, barebones
 components here in order to lay the groundwork for a more complete API
 later.

 To talk of Travis's proposal, doing nothing is no-go. Not moving forward
 would dishearten the community. Making a ndmasked type is very intriguing.
 I see it as a set towards eventually deprecating ndarray? Also, how would
 it behave with no.asarray() and no.asanyarray()? My other concern is a
 possible violation of DRY. How difficult would it be to maintain two
 ndarrays in parallel?

 As for the flag approach, this still doesn't solve the problem of legacy
 code (or did I misunderstand?)


My understanding of the flag is to allow the code to stay in and get
reworked and experimented with while keeping it from contaminating
conventional use.

The whole point of putting the code in was to experiment and adjust. The
rather bizarre idea that it needs to be perfect from the get go is
disheartening, and is seldom how new things get developed. Sure, there is a
plan up front, but there needs to be feedback and change. And in fact, I
haven't seen much feedback about the actual code, I don't even know that
the people complaining have tried using it to see where it hurts. I'd like
that sort of feedback.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion