Agreed ADEV has to be fixed (in code). It has the wrong unit dimensions for one thing
Karl On 16/11/2011, at 10:19 AM, Chris Marshall wrote: > Hi Derek- > > The fix you refer to was for an inconsistent calculation > between the algorithm used with badvals and that used > without badvals. I have the same problems with stats and > statsover in that the values seem to be fairly redundant > or unneeded for what I wanted for a "quick look" at some > data. However, I'm a bit leery of changing something > that has been around so long. > > --Chris > > On Tue, Nov 15, 2011 at 5:59 PM, Derek Lamb <[email protected]> wrote: >> I would like to change some of the definitions of the quantities returned by >> statsover. I find that either their names or their calculations are not >> consistent with normal statistical practices. However I also know that the >> statistical terminology used by different communities can be different, so I >> wanted to make sure I wasn't stepping on too many toes first. In >> particular: >> 1) the absolute deviation is given in the docs as: >> ADEV = sqrt(sum( abs(x-mean(x)) )/N) >> with a note that "This is also called the standard deviation" >> I can find nothing that supports the sqrt in this formula or the following >> note. The average absolute deviation is given by my edition of Bevington & >> Robinson (pg 10) (not a statistics bible, I understand, but what was on my >> shelf) and also >> by http://en.wikipedia.org/wiki/Absolute_deviation#Average_absolute_deviation >> as >> AADEV = sum( abs(x-mean(x)) )/N. >> The Bevington & Robinson text says "the presence of the absolute value sign >> makes its use inconvenient for statistical analysis...a parameter that is >> easier to use analytically and that can be justified fairly well on >> theoretical grounds to be a more appropriate measure of the dispersion of >> the observations is the <i>standard deviation</i> \sigma." So I would like >> to take out the sqrt of that function and remove the note about it also >> being called the standard deviation. As a side note, this was "fixed" back >> in February (see SF bug #3185864 and this git commit) but I think the fix >> should have gone the other way (changed the docs and the other code, and >> left the fixed code as it was). >> 2) the function example gives the $prms second in the returned list and $rms >> last, but the detailed description below reverses this. I will change the >> docs, to avoid confusion. >> 3) We have two root-mean-square calculations, a regular parent distribution >> divide-by-N, and a sample population divide-by-(N-1). I'm not sure why we >> have both of these--will a piddle ever be able to contain a parent >> distribution? Probably not--my definition has it taking the average as the >> number of points goes to infinity. If it were up to me I would remove the >> RMS calculation so that statsover would only return 6 quantities (including >> the PRMS) instead of 7--the difference in the two calculations is negligible >> for large datasets, and for small datasets one should not be using the RMS >> calculation anyway, correct? But I worry about backwards compatibility, >> particularly with these sorts of constructs: >> $rms = @{statsover($pdl)}[-1] (that doesn't work, I can never remember that >> syntax, but you probably get the point--the poor user is going to get the >> ADEV instead) >> 4) If we keep the RMS calculation, then I would like to append "or the >> standard deviation" to the note following its definition in the docs. >> Comments welcome. >> cheers, >> Derek >> _______________________________________________ >> Perldl mailing list >> [email protected] >> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >> >> > > _______________________________________________ > Perldl mailing list > [email protected] > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
