Certainly the docs should be updated, what about a `newstatsover` function which can return a more well thought out set of statistics? The docs could contain a note about why `newstatsover` is preferred.
Joel On Tue, Nov 15, 2011 at 6:26 PM, David Mertens <[email protected]>wrote: > I second what Jarle said about changing docs instead of changing code. > > David > On Nov 15, 2011 5:30 PM, "Jarle Brinchmann" <[email protected]> wrote: > >> >> On 15 Nov 2011, at 23:59, Derek Lamb wrote: >> >> > I would like to change some of the definitions of the quantities >> returned by statsover. I find that either their names or their >> calculations are not consistent with normal statistical practices. However >> I also know that the statistical terminology used by different communities >> can be different, so I wanted to make sure I wasn't stepping on too many >> toes first. In particular: >> > >> > 1) the absolute deviation is given in the docs as: >> > ADEV = sqrt(sum( abs(x-mean(x)) )/N) >> > with a note that "This is also called the standard deviation" >> >> You are totally right about this one. This has a) never been called the >> standard deviation nor b) has the absolute deviation every been defined in >> this way. Even the units would be wrong with this usage. There is some >> variation in the definition of the absolute deviation and about language, >> although it is never what you show there. The most common in my experience >> is: >> >> ADEV = Sum( |x-<x>|)/N, >> >> which is what you are suggesting, where <x> is the mean. Sometimes it is >> the median instead (my personal preference). In this case it is known as >> the average absolute deviation or the mean absolute deviation - in the >> latter case you often find it with the acronym MAD. There is also an even >> more robust estimator called the median absolute deviation which is: >> >> MedAD = median ( |x-<x>|) >> >> but I see this much less often. It could be good to have in PDL perhaps, >> but as the name normally would be MAD it could be confusing. >> >> I'd suggest leaving ADEV to be the average absolute deviation above with >> <x> to be the mean(x) which i think is exactly what you suggest. I do think >> this has to be changed as the current implementation is plain wrong. >> >> > 3) We have two root-mean-square calculations, a regular parent >> distribution divide-by-N, and a sample population divide-by-(N-1). I'm not >> sure why we have both of these--will a piddle ever be able to contain a >> parent distribution? Probably not--my definition has it taking the average >> as the number of points goes to infinity. If it were up to me I would >> remove the RMS calculation so that statsover would only return 6 quantities >> (including the PRMS) instead of 7--the difference in the two calculations >> is negligible for large datasets, and for small datasets one should not be >> using the RMS calculation anyway, correct? But I worry about backwards >> compatibility, particularly with these sorts of constructs: >> > >> > $rms = @{statsover($pdl)}[-1] (that doesn't work, I can never remember >> that syntax, but you probably get the point--the poor user is going to get >> the ADEV instead) >> >> Bah, I didn't realise we had two. The sample variance is probably the >> most sensible to keep - but note that if you know (somehow) the mean, then >> even the sample variance is divided by N. Anyway, I think it is dodgy to >> make significant changes here in stats - changing the docs would be my >> preferred solution here. >> >> Cheers, >> Jarle. >> >> >> _______________________________________________ >> Perldl mailing list >> [email protected] >> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >> > > _______________________________________________ > Perldl mailing list > [email protected] > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl > >
_______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
