Re: [Numpy-discussion] var bias reason?
Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). I'm not trying to persuade anybody with this email (although if you can download the paper at the above link, then I am trying to persuade with that). In this email I'm just trying to give context to the poster as I think the question is legitimate. With that said, there is the ddof parameter so that you can change what the divisor is. I think that is a useful compromise. I'm unhappy with the internal inconsistency of cov, as I think it was an oversight. I'd be happy to see cov changed as well to use the ddof argument instead of the bias keyword, but that is an API change and requires some transition discussion and work. The only other argument I've heard against the current situation is unit testing with MATLAB or R code. Just use ddof=1 when comparing against MATLAB and R code is my suggestion. Best regards, -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). Yes, I hope too, I would be happy to read the article. On the limit of unbiasdness, the following document mentions an example (in a different context than variance estimation): http://www.stat.columbia.edu/~gelman/research/published/badbayesresponsemain.pdf AFAIK, even statisticians who consider themselves as mostly frequentist (if that makes any sense) do not advocate unbiasdness as such an important concept anymore (Larry Wasserman mentions it in his all of statistics). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
I'm behind Travis on this one. -- Paul On Wed, Oct 15, 2008 at 11:19 AM, David Cournapeau [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). Yes, I hope too, I would be happy to read the article. On the limit of unbiasdness, the following document mentions an example (in a different context than variance estimation): http://www.stat.columbia.edu/~gelman/research/published/badbayesresponsemain.pdf AFAIK, even statisticians who consider themselves as mostly frequentist (if that makes any sense) do not advocate unbiasdness as such an important concept anymore (Larry Wasserman mentions it in his all of statistics). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
Me too. S On Wednesday 15 October 2008 11:31:44 am Paul Barrett wrote: I'm behind Travis on this one. -- Paul On Wed, Oct 15, 2008 at 11:19 AM, David Cournapeau [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/ match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERC ISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). Yes, I hope too, I would be happy to read the article. On the limit of unbiasdness, the following document mentions an example (in a different context than variance estimation): http://www.stat.columbia.edu/~gelman/research/published/badbayesres ponsemain.pdf AFAIK, even statisticians who consider themselves as mostly frequentist (if that makes any sense) do not advocate unbiasdness as such an important concept anymore (Larry Wasserman mentions it in his all of statistics). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Scott M. RansomAddress: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: [EMAIL PROTECTED] Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
On Wed, Oct 15, 2008 at 09:45:39AM -0500, Travis E. Oliphant wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). Thanks for the reply, I look forward to reading the paper when it is available. The major issue in my mind is not the technical issue but the surprise factor. I can't think of single other package that uses this as the default, and since it is also a method of ndarray (which is a built in type and can't be monkey patched) there is no way of taking a different view (that is supplying my on function) without the confusion I am feeling in my own lab . . . (less technical people need to understand that they shouldn't use a method of the same name) I worry about having numpy take this unpopular stance (as far as packages go) simply to fight the good fight, as a built in method/behaviour of any ndarray, rather than as a built in function, which presents no such problem, as it allows dissent over a clearly muddy issue. Sorry for the noise, and I am happy to see their is a reason, but I can't help but find this a wort for purely pedagogical reasons. Gabriel ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
Hi, While I disagree, I really do not care because this is documented. But perhaps a clear warning is need at the start so it clear what the default ddof means instead of it being buried in the Notes section. Also I am surprised that you did not directly reference the Stein estimator (your minimum mean-squared estimator) and known effects in your paper: http://en.wikipedia.org/wiki/James-Stein_estimator So I did not find thiss any different from what is already known about the Stein estimator. Bruce PS While I may have gotten access via my University, I did get it from the link *Access this item. https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EERCISOPTR=134filename=135.pdf https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EERCISOPTR=134filename=135.pdf * Travis E. Oliphant wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). I'm not trying to persuade anybody with this email (although if you can download the paper at the above link, then I am trying to persuade with that). In this email I'm just trying to give context to the poster as I think the question is legitimate. With that said, there is the ddof parameter so that you can change what the divisor is. I think that is a useful compromise. I'm unhappy with the internal inconsistency of cov, as I think it was an oversight. I'd be happy to see cov changed as well to use the ddof argument instead of the bias keyword, but that is an API change and requires some transition discussion and work. The only other argument I've heard against the current situation is unit testing with MATLAB or R code. Just use ddof=1 when comparing against MATLAB and R code is my suggestion. Best regards, -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
On Wed, Oct 15, 2008 at 9:19 AM, David Cournapeau [EMAIL PROTECTED]wrote: On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). Yes, I hope too, I would be happy to read the article. On the limit of unbiasdness, the following document mentions an example (in a different context than variance estimation): http://www.stat.columbia.edu/~gelman/research/published/badbayesresponsemain.pdfhttp://www.stat.columbia.edu/%7Egelman/research/published/badbayesresponsemain.pdf AFAIK, even statisticians who consider themselves as mostly frequentist (if that makes any sense) do not advocate unbiasdness as Frequently frequentist? Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion