Hi Derek-

The fix you refer to was for an inconsistent calculation
between the algorithm used with badvals and that used
without badvals.  I have the same problems with stats and
statsover in that the values seem to be fairly redundant
or unneeded for what I wanted for a "quick look" at some
data.  However, I'm a bit leery of changing something
that has been around so long.

--Chris

On Tue, Nov 15, 2011 at 5:59 PM, Derek Lamb <[email protected]> wrote:
> I would like to change some of the definitions of the quantities returned by
> statsover.  I find that either their names or their calculations are not
> consistent with normal statistical practices.  However I also know that the
> statistical terminology used by different communities can be different, so I
> wanted to make sure I wasn't stepping on too many toes first.  In
> particular:
> 1) the absolute deviation is given in the docs as:
> ADEV = sqrt(sum( abs(x-mean(x)) )/N)
> with a note that "This is also called the standard deviation"
> I can find nothing that supports the sqrt in this formula or the following
> note.  The average absolute deviation is given by my edition of Bevington &
> Robinson (pg 10) (not a statistics bible, I understand, but what was on my
> shelf) and also
> by http://en.wikipedia.org/wiki/Absolute_deviation#Average_absolute_deviation
> as
> AADEV = sum( abs(x-mean(x)) )/N.
> The Bevington & Robinson text says "the presence of the absolute value sign
> makes its use inconvenient for statistical analysis...a parameter that is
> easier to use analytically and that can be justified fairly well on
> theoretical grounds to be a more appropriate measure of the dispersion of
> the observations is the <i>standard deviation</i> \sigma."  So I would like
> to take out the sqrt of that function and remove the note about it also
> being called the standard deviation.  As a side note, this was "fixed" back
> in February (see SF bug #3185864 and this git commit) but I think the fix
> should have gone the other way (changed the docs and the other code, and
> left the fixed code as it was).
> 2) the function example gives the $prms second in the returned list and $rms
> last, but the detailed description below reverses this.  I will change the
> docs, to avoid confusion.
> 3) We have two root-mean-square calculations, a regular parent distribution
> divide-by-N, and a sample population divide-by-(N-1).  I'm not sure why we
> have both of these--will a piddle ever be able to contain a parent
> distribution?  Probably not--my definition has it taking the average as the
> number of points goes to infinity.  If it were up to me I would remove the
> RMS calculation so that statsover would only return 6 quantities (including
> the PRMS) instead of 7--the difference in the two calculations is negligible
> for large datasets, and for small datasets one should not be using the RMS
> calculation anyway, correct?  But I worry about backwards compatibility,
> particularly with these sorts of constructs:
> $rms = @{statsover($pdl)}[-1]  (that doesn't work, I can never remember that
> syntax, but you probably get the point--the poor user is going to get the
> ADEV instead)
> 4) If we keep the RMS calculation, then I would like to append "or the
> standard deviation" to the note following its definition in the docs.
> Comments welcome.
> cheers,
> Derek
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to