I would like to change some of the definitions of the quantities returned by
statsover. I find that either their names or their calculations are not
consistent with normal statistical practices. However I also know that the
statistical terminology used by different communities can be different, so I
wanted to make sure I wasn't stepping on too many toes first. In particular:
1) the absolute deviation is given in the docs as:
ADEV = sqrt(sum( abs(x-mean(x)) )/N)
with a note that "This is also called the standard deviation"
I can find nothing that supports the sqrt in this formula or the following
note. The average absolute deviation is given by my edition of Bevington &
Robinson (pg 10) (not a statistics bible, I understand, but what was on my
shelf) and also by
http://en.wikipedia.org/wiki/Absolute_deviation#Average_absolute_deviation as
AADEV = sum( abs(x-mean(x)) )/N.
The Bevington & Robinson text says "the presence of the absolute value sign
makes its use inconvenient for statistical analysis...a parameter that is
easier to use analytically and that can be justified fairly well on theoretical
grounds to be a more appropriate measure of the dispersion of the observations
is the <i>standard deviation</i> \sigma." So I would like to take out the sqrt
of that function and remove the note about it also being called the standard
deviation. As a side note, this was "fixed" back in February (see SF bug
#3185864 and this git commit) but I think the fix should have gone the other
way (changed the docs and the other code, and left the fixed code as it was).
2) the function example gives the $prms second in the returned list and $rms
last, but the detailed description below reverses this. I will change the
docs, to avoid confusion.
3) We have two root-mean-square calculations, a regular parent distribution
divide-by-N, and a sample population divide-by-(N-1). I'm not sure why we have
both of these--will a piddle ever be able to contain a parent distribution?
Probably not--my definition has it taking the average as the number of points
goes to infinity. If it were up to me I would remove the RMS calculation so
that statsover would only return 6 quantities (including the PRMS) instead of
7--the difference in the two calculations is negligible for large datasets, and
for small datasets one should not be using the RMS calculation anyway, correct?
But I worry about backwards compatibility, particularly with these sorts of
constructs:
$rms = @{statsover($pdl)}[-1] (that doesn't work, I can never remember that
syntax, but you probably get the point--the poor user is going to get the ADEV
instead)
4) If we keep the RMS calculation, then I would like to append "or the standard
deviation" to the note following its definition in the docs.
Comments welcome.
cheers,
Derek_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl