Re: [Numpy-discussion] var bias reason?

2008-10-15 Thread Travis E. Oliphant
Gabriel Gellner wrote:
 Some colleagues noticed that var uses biased formula's by default in numpy,
 searching for the reason only brought up:

 http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias

 which I totally agree with, but there was no response? Any reason for this?
I will try to respond to this as it was me who made the change.  I think 
there have been responses, but I think I've preferred to stay quiet 
rather than feed a flame war.   Ultimately, it is a matter of preference 
and I don't think there would be equal weights given to all the 
arguments surrounding the decision by everybody.

I will attempt to articulate my reasons:  dividing by n is the maximum 
likelihood estimator of variance and I prefer that justification more 
than the un-biased justification for a default (especially given that 
bias is just one part of the error in an estimator).Having every 
package that computes the mean return the un-biased estimate gives it 
more cultural weight than than the concept deserves, I think.  Any 
surprise that is created by the different default should be mitigated by 
the fact that it's an opportunity to learn something about what you are 
doing.Here is a paper I wrote on the subject that you might find 
useful:

https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1
(Hopefully, they will resolve a link problem at the above site soon, but 
you can read the abstract).

I'm not trying to persuade anybody with this email (although if you can 
download the paper at the above link, then I am trying to persuade with 
that).  In this email I'm just trying to give context to the poster as I 
think the question is legitimate.

With that said, there is the ddof parameter so that you can change what 
the divisor is.  I think that is a useful compromise.

I'm unhappy with the internal inconsistency of cov, as I think it was an 
oversight. I'd be happy to see cov changed as well to use the ddof 
argument instead of the bias keyword, but that is an API change and 
requires some transition discussion and work.

The only other argument I've heard against the current situation is 
unit testing with MATLAB or R code.   Just use ddof=1 when comparing 
against MATLAB and R code is my suggestion.

Best regards,

-Travis

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] var bias reason?

2008-10-15 Thread David Cournapeau
On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant
[EMAIL PROTECTED] wrote:
 Gabriel Gellner wrote:
 Some colleagues noticed that var uses biased formula's by default in numpy,
 searching for the reason only brought up:

 http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias

 which I totally agree with, but there was no response? Any reason for this?
 I will try to respond to this as it was me who made the change.  I think
 there have been responses, but I think I've preferred to stay quiet
 rather than feed a flame war.   Ultimately, it is a matter of preference
 and I don't think there would be equal weights given to all the
 arguments surrounding the decision by everybody.

 I will attempt to articulate my reasons:  dividing by n is the maximum
 likelihood estimator of variance and I prefer that justification more
 than the un-biased justification for a default (especially given that
 bias is just one part of the error in an estimator).Having every
 package that computes the mean return the un-biased estimate gives it
 more cultural weight than than the concept deserves, I think.  Any
 surprise that is created by the different default should be mitigated by
 the fact that it's an opportunity to learn something about what you are
 doing.Here is a paper I wrote on the subject that you might find
 useful:

 https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1
 (Hopefully, they will resolve a link problem at the above site soon, but
 you can read the abstract).

Yes, I hope too, I would be happy to read the article.

On the limit of unbiasdness, the following document mentions an
example (in a different context than variance estimation):

http://www.stat.columbia.edu/~gelman/research/published/badbayesresponsemain.pdf

AFAIK, even statisticians who consider themselves as mostly
frequentist (if that makes any sense) do not advocate unbiasdness as
such an important concept anymore (Larry Wasserman mentions it in his
all of statistics).

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] var bias reason?

2008-10-15 Thread Paul Barrett
I'm behind Travis on this one.

 -- Paul

On Wed, Oct 15, 2008 at 11:19 AM, David Cournapeau [EMAIL PROTECTED] wrote:
 On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant
 [EMAIL PROTECTED] wrote:
 Gabriel Gellner wrote:
 Some colleagues noticed that var uses biased formula's by default in numpy,
 searching for the reason only brought up:

 http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias

 which I totally agree with, but there was no response? Any reason for this?
 I will try to respond to this as it was me who made the change.  I think
 there have been responses, but I think I've preferred to stay quiet
 rather than feed a flame war.   Ultimately, it is a matter of preference
 and I don't think there would be equal weights given to all the
 arguments surrounding the decision by everybody.

 I will attempt to articulate my reasons:  dividing by n is the maximum
 likelihood estimator of variance and I prefer that justification more
 than the un-biased justification for a default (especially given that
 bias is just one part of the error in an estimator).Having every
 package that computes the mean return the un-biased estimate gives it
 more cultural weight than than the concept deserves, I think.  Any
 surprise that is created by the different default should be mitigated by
 the fact that it's an opportunity to learn something about what you are
 doing.Here is a paper I wrote on the subject that you might find
 useful:

 https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1
 (Hopefully, they will resolve a link problem at the above site soon, but
 you can read the abstract).

 Yes, I hope too, I would be happy to read the article.

 On the limit of unbiasdness, the following document mentions an
 example (in a different context than variance estimation):

 http://www.stat.columbia.edu/~gelman/research/published/badbayesresponsemain.pdf

 AFAIK, even statisticians who consider themselves as mostly
 frequentist (if that makes any sense) do not advocate unbiasdness as
 such an important concept anymore (Larry Wasserman mentions it in his
 all of statistics).

 cheers,

 David
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] var bias reason?

2008-10-15 Thread Scott Ransom
Me too.

S

On Wednesday 15 October 2008 11:31:44 am Paul Barrett wrote:
 I'm behind Travis on this one.

  -- Paul

 On Wed, Oct 15, 2008 at 11:19 AM, David Cournapeau 
[EMAIL PROTECTED] wrote:
  On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant
 
  [EMAIL PROTECTED] wrote:
  Gabriel Gellner wrote:
  Some colleagues noticed that var uses biased formula's by default
  in numpy, searching for the reason only brought up:
 
  http://article.gmane.org/gmane.comp.python.numeric.general/12438/
 match=var+bias
 
  which I totally agree with, but there was no response? Any reason
  for this?
 
  I will try to respond to this as it was me who made the change.  I
  think there have been responses, but I think I've preferred to
  stay quiet rather than feed a flame war.   Ultimately, it is a
  matter of preference and I don't think there would be equal
  weights given to all the arguments surrounding the decision by
  everybody.
 
  I will attempt to articulate my reasons:  dividing by n is the
  maximum likelihood estimator of variance and I prefer that
  justification more than the un-biased justification for a
  default (especially given that bias is just one part of the
  error in an estimator).Having every package that computes
  the mean return the un-biased estimate gives it more cultural
  weight than than the concept deserves, I think.  Any surprise that
  is created by the different default should be mitigated by the
  fact that it's an opportunity to learn something about what you
  are doing.Here is a paper I wrote on the subject that you
  might find useful:
 
  https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERC
 ISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link
  problem at the above site soon, but you can read the abstract).
 
  Yes, I hope too, I would be happy to read the article.
 
  On the limit of unbiasdness, the following document mentions an
  example (in a different context than variance estimation):
 
  http://www.stat.columbia.edu/~gelman/research/published/badbayesres
 ponsemain.pdf
 
  AFAIK, even statisticians who consider themselves as mostly
  frequentist (if that makes any sense) do not advocate unbiasdness
  as such an important concept anymore (Larry Wasserman mentions it
  in his all of statistics).
 
  cheers,
 
  David
  ___
  Numpy-discussion mailing list
  Numpy-discussion@scipy.org
  http://projects.scipy.org/mailman/listinfo/numpy-discussion

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


-- 
Scott M. RansomAddress:  NRAO
Phone:  (434) 296-0320   520 Edgemont Rd.
email:  [EMAIL PROTECTED] Charlottesville, VA 22903 USA
GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] var bias reason?

2008-10-15 Thread Gabriel Gellner
On Wed, Oct 15, 2008 at 09:45:39AM -0500, Travis E. Oliphant wrote:
 Gabriel Gellner wrote:
  Some colleagues noticed that var uses biased formula's by default in numpy,
  searching for the reason only brought up:
 
  http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias
 
  which I totally agree with, but there was no response? Any reason for this?
 I will try to respond to this as it was me who made the change.  I think 
 there have been responses, but I think I've preferred to stay quiet 
 rather than feed a flame war.   Ultimately, it is a matter of preference 
 and I don't think there would be equal weights given to all the 
 arguments surrounding the decision by everybody.
 
 I will attempt to articulate my reasons:  dividing by n is the maximum 
 likelihood estimator of variance and I prefer that justification more 
 than the un-biased justification for a default (especially given that 
 bias is just one part of the error in an estimator).Having every 
 package that computes the mean return the un-biased estimate gives it 
 more cultural weight than than the concept deserves, I think.  Any 
 surprise that is created by the different default should be mitigated by 
 the fact that it's an opportunity to learn something about what you are 
 doing.Here is a paper I wrote on the subject that you might find 
 useful:
 
 https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1
 (Hopefully, they will resolve a link problem at the above site soon, but 
 you can read the abstract).
 
Thanks for the reply, I look forward to reading the paper when it is
available. The major issue in my mind is not the technical issue but the
surprise factor. I can't think of single other package that uses this as the
default, and since it is also a method of ndarray (which is a built in type
and can't be monkey patched) there is no way of taking a different view (that
is supplying my on function) without the confusion I am feeling in my own lab
. . . (less technical people need to understand that they shouldn't
use a method of the same name) 

I worry about having numpy take this unpopular stance (as far as packages go)
simply to fight the good fight, as a built in method/behaviour of any ndarray,
rather than as a built in function, which presents no such problem, as it
allows dissent over a clearly muddy issue.

Sorry for the noise, and I am happy to see their is a reason, but I can't help
but find this a wort for purely pedagogical reasons. 

Gabriel
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] var bias reason?

2008-10-15 Thread Bruce Southey

Hi,
While I disagree, I really do not care because this is documented. But 
perhaps a clear warning is need at the start so it clear what the 
default ddof means instead of it being buried in the Notes section.


Also I am surprised that you did not directly reference the Stein 
estimator (your minimum mean-squared estimator) and known effects in 
your paper:

http://en.wikipedia.org/wiki/James-Stein_estimator
So I did not find thiss any different from what is already known about 
the Stein estimator.


Bruce

PS While I may have gotten access via my University, I did get it from 
the link *Access this item. 
https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EERCISOPTR=134filename=135.pdf

https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EERCISOPTR=134filename=135.pdf
*
Travis E. Oliphant wrote:

Gabriel Gellner wrote:
  

Some colleagues noticed that var uses biased formula's by default in numpy,
searching for the reason only brought up:

http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias

which I totally agree with, but there was no response? Any reason for this?

I will try to respond to this as it was me who made the change.  I think 
there have been responses, but I think I've preferred to stay quiet 
rather than feed a flame war.   Ultimately, it is a matter of preference 
and I don't think there would be equal weights given to all the 
arguments surrounding the decision by everybody.


I will attempt to articulate my reasons:  dividing by n is the maximum 
likelihood estimator of variance and I prefer that justification more 
than the un-biased justification for a default (especially given that 
bias is just one part of the error in an estimator).Having every 
package that computes the mean return the un-biased estimate gives it 
more cultural weight than than the concept deserves, I think.  Any 
surprise that is created by the different default should be mitigated by 
the fact that it's an opportunity to learn something about what you are 
doing.Here is a paper I wrote on the subject that you might find 
useful:


https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1
(Hopefully, they will resolve a link problem at the above site soon, but 
you can read the abstract).


I'm not trying to persuade anybody with this email (although if you can 
download the paper at the above link, then I am trying to persuade with 
that).  In this email I'm just trying to give context to the poster as I 
think the question is legitimate.


With that said, there is the ddof parameter so that you can change what 
the divisor is.  I think that is a useful compromise.


I'm unhappy with the internal inconsistency of cov, as I think it was an 
oversight. I'd be happy to see cov changed as well to use the ddof 
argument instead of the bias keyword, but that is an API change and 
requires some transition discussion and work.


The only other argument I've heard against the current situation is 
unit testing with MATLAB or R code.   Just use ddof=1 when comparing 
against MATLAB and R code is my suggestion.


Best regards,

-Travis

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

  


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] var bias reason?

2008-10-15 Thread Charles R Harris
On Wed, Oct 15, 2008 at 9:19 AM, David Cournapeau [EMAIL PROTECTED]wrote:

 On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant
 [EMAIL PROTECTED] wrote:
  Gabriel Gellner wrote:
  Some colleagues noticed that var uses biased formula's by default in
 numpy,
  searching for the reason only brought up:
 
 
 http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias
 
  which I totally agree with, but there was no response? Any reason for
 this?
  I will try to respond to this as it was me who made the change.  I think
  there have been responses, but I think I've preferred to stay quiet
  rather than feed a flame war.   Ultimately, it is a matter of preference
  and I don't think there would be equal weights given to all the
  arguments surrounding the decision by everybody.
 
  I will attempt to articulate my reasons:  dividing by n is the maximum
  likelihood estimator of variance and I prefer that justification more
  than the un-biased justification for a default (especially given that
  bias is just one part of the error in an estimator).Having every
  package that computes the mean return the un-biased estimate gives it
  more cultural weight than than the concept deserves, I think.  Any
  surprise that is created by the different default should be mitigated by
  the fact that it's an opportunity to learn something about what you are
  doing.Here is a paper I wrote on the subject that you might find
  useful:
 
 
 https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1
  (Hopefully, they will resolve a link problem at the above site soon, but
  you can read the abstract).

 Yes, I hope too, I would be happy to read the article.

 On the limit of unbiasdness, the following document mentions an
 example (in a different context than variance estimation):


 http://www.stat.columbia.edu/~gelman/research/published/badbayesresponsemain.pdfhttp://www.stat.columbia.edu/%7Egelman/research/published/badbayesresponsemain.pdf

 AFAIK, even statisticians who consider themselves as mostly
 frequentist (if that makes any sense) do not advocate unbiasdness as


Frequently frequentist?

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion