Re: [Perldl] statsover definitions

Maggie X Thu, 17 Nov 2011 13:25:59 -0800

Yes ADEV is wrong to have sqrt. It's good to have both RMS and PRMS in the
function but change the docs. RMS and PRMS serve different purposes. RMS
with /N is appropriate when you absolutely have to discard outliers 2
standard deviations away from the mean of the sample. PRMS with /(N-1) is
the appropriate one for generalizing from sample to population.



Best,
Maggie


On Wed, Nov 16, 2011 at 6:41 PM, Derek Lamb <[email protected]> wrote:

> Thanks everybody for the feedback.
>
> Since the ADEV calculation is just plain wrong, I will fix the code for
> that, and the docs that go with it.  I will leave the RMS calculation as it
> is, but will change around the docs and perhaps add a note about it not
> making too much sense to use RMS, but to use PRMS instead.  Probably people
> use that one anyway out of convenience, I found snippets of my own code
> where I had things like "($mean, $rms) = stats($pdl);", which is the
> correct calculation to use.  I don't want to add a new function like Joel
> suggested.  I've seen APIs that have list of functions like 'func, func0,
> func1'--what a mess.  But I think adding a note to look for PDL::Stats for
> more statistical calculations would also be a good idea.
>
> cheers,
> Derek
>
> On Nov 16, 2011, at 7:06 AM, Karl Glazebrook wrote:
>
> > Agreed ADEV has to be fixed (in code). It has the wrong unit dimensions
> for one thing
> >
> > Karl
> >
> > On 16/11/2011, at 10:19 AM, Chris Marshall wrote:
> >
> >> Hi Derek-
> >>
> >> The fix you refer to was for an inconsistent calculation
> >> between the algorithm used with badvals and that used
> >> without badvals.  I have the same problems with stats and
> >> statsover in that the values seem to be fairly redundant
> >> or unneeded for what I wanted for a "quick look" at some
> >> data.  However, I'm a bit leery of changing something
> >> that has been around so long.
> >>
> >> --Chris
> >>
> >> On Tue, Nov 15, 2011 at 5:59 PM, Derek Lamb <[email protected]>
> wrote:
> >>> I would like to change some of the definitions of the quantities
> returned by
> >>> statsover.  I find that either their names or their calculations are
> not
> >>> consistent with normal statistical practices.  However I also know
> that the
> >>> statistical terminology used by different communities can be
> different, so I
> >>> wanted to make sure I wasn't stepping on too many toes first.  In
> >>> particular:
> >>> 1) the absolute deviation is given in the docs as:
> >>> ADEV = sqrt(sum( abs(x-mean(x)) )/N)
> >>> with a note that "This is also called the standard deviation"
> >>> I can find nothing that supports the sqrt in this formula or the
> following
> >>> note.  The average absolute deviation is given by my edition of
> Bevington &
> >>> Robinson (pg 10) (not a statistics bible, I understand, but what was
> on my
> >>> shelf) and also
> >>> by
> http://en.wikipedia.org/wiki/Absolute_deviation#Average_absolute_deviation
> >>> as
> >>> AADEV = sum( abs(x-mean(x)) )/N.
> >>> The Bevington & Robinson text says "the presence of the absolute value
> sign
> >>> makes its use inconvenient for statistical analysis...a parameter that
> is
> >>> easier to use analytically and that can be justified fairly well on
> >>> theoretical grounds to be a more appropriate measure of the dispersion
> of
> >>> the observations is the <i>standard deviation</i> \sigma."  So I would
> like
> >>> to take out the sqrt of that function and remove the note about it also
> >>> being called the standard deviation.  As a side note, this was "fixed"
> back
> >>> in February (see SF bug #3185864 and this git commit) but I think the
> fix
> >>> should have gone the other way (changed the docs and the other code,
> and
> >>> left the fixed code as it was).
> >>> 2) the function example gives the $prms second in the returned list
> and $rms
> >>> last, but the detailed description below reverses this.  I will change
> the
> >>> docs, to avoid confusion.
> >>> 3) We have two root-mean-square calculations, a regular parent
> distribution
> >>> divide-by-N, and a sample population divide-by-(N-1).  I'm not sure
> why we
> >>> have both of these--will a piddle ever be able to contain a parent
> >>> distribution?  Probably not--my definition has it taking the average
> as the
> >>> number of points goes to infinity.  If it were up to me I would remove
> the
> >>> RMS calculation so that statsover would only return 6 quantities
> (including
> >>> the PRMS) instead of 7--the difference in the two calculations is
> negligible
> >>> for large datasets, and for small datasets one should not be using the
> RMS
> >>> calculation anyway, correct?  But I worry about backwards
> compatibility,
> >>> particularly with these sorts of constructs:
> >>> $rms = @{statsover($pdl)}[-1]  (that doesn't work, I can never
> remember that
> >>> syntax, but you probably get the point--the poor user is going to get
> the
> >>> ADEV instead)
> >>> 4) If we keep the RMS calculation, then I would like to append "or the
> >>> standard deviation" to the note following its definition in the docs.
> >>> Comments welcome.
> >>> cheers,
> >>> Derek
> >>> _______________________________________________
> >>> Perldl mailing list
> >>> [email protected]
> >>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Perldl mailing list
> >> [email protected]
> >> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
> >
>
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] statsover definitions

Reply via email to