Yes ADEV is wrong to have sqrt. It's good to have both RMS and PRMS in the
function but change the docs. RMS and PRMS serve different purposes. RMS
with /N is appropriate when you absolutely have to discard outliers 2
standard deviations away from the mean of the sample. PRMS with /(N-1) is
the appropriate one for generalizing from sample to population.


Best,
Maggie


On Wed, Nov 16, 2011 at 6:41 PM, Derek Lamb <[email protected]> wrote:

> Thanks everybody for the feedback.
>
> Since the ADEV calculation is just plain wrong, I will fix the code for
> that, and the docs that go with it.  I will leave the RMS calculation as it
> is, but will change around the docs and perhaps add a note about it not
> making too much sense to use RMS, but to use PRMS instead.  Probably people
> use that one anyway out of convenience, I found snippets of my own code
> where I had things like "($mean, $rms) = stats($pdl);", which is the
> correct calculation to use.  I don't want to add a new function like Joel
> suggested.  I've seen APIs that have list of functions like 'func, func0,
> func1'--what a mess.  But I think adding a note to look for PDL::Stats for
> more statistical calculations would also be a good idea.
>
> cheers,
> Derek
>
> On Nov 16, 2011, at 7:06 AM, Karl Glazebrook wrote:
>
> > Agreed ADEV has to be fixed (in code). It has the wrong unit dimensions
> for one thing
> >
> > Karl
> >
> > On 16/11/2011, at 10:19 AM, Chris Marshall wrote:
> >
> >> Hi Derek-
> >>
> >> The fix you refer to was for an inconsistent calculation
> >> between the algorithm used with badvals and that used
> >> without badvals.  I have the same problems with stats and
> >> statsover in that the values seem to be fairly redundant
> >> or unneeded for what I wanted for a "quick look" at some
> >> data.  However, I'm a bit leery of changing something
> >> that has been around so long.
> >>
> >> --Chris
> >>
> >> On Tue, Nov 15, 2011 at 5:59 PM, Derek Lamb <[email protected]>
> wrote:
> >>> I would like to change some of the definitions of the quantities
> returned by
> >>> statsover.  I find that either their names or their calculations are
> not
> >>> consistent with normal statistical practices.  However I also know
> that the
> >>> statistical terminology used by different communities can be
> different, so I
> >>> wanted to make sure I wasn't stepping on too many toes first.  In
> >>> particular:
> >>> 1) the absolute deviation is given in the docs as:
> >>> ADEV = sqrt(sum( abs(x-mean(x)) )/N)
> >>> with a note that "This is also called the standard deviation"
> >>> I can find nothing that supports the sqrt in this formula or the
> following
> >>> note.  The average absolute deviation is given by my edition of
> Bevington &
> >>> Robinson (pg 10) (not a statistics bible, I understand, but what was
> on my
> >>> shelf) and also
> >>> by
> http://en.wikipedia.org/wiki/Absolute_deviation#Average_absolute_deviation
> >>> as
> >>> AADEV = sum( abs(x-mean(x)) )/N.
> >>> The Bevington & Robinson text says "the presence of the absolute value
> sign
> >>> makes its use inconvenient for statistical analysis...a parameter that
> is
> >>> easier to use analytically and that can be justified fairly well on
> >>> theoretical grounds to be a more appropriate measure of the dispersion
> of
> >>> the observations is the <i>standard deviation</i> \sigma."  So I would
> like
> >>> to take out the sqrt of that function and remove the note about it also
> >>> being called the standard deviation.  As a side note, this was "fixed"
> back
> >>> in February (see SF bug #3185864 and this git commit) but I think the
> fix
> >>> should have gone the other way (changed the docs and the other code,
> and
> >>> left the fixed code as it was).
> >>> 2) the function example gives the $prms second in the returned list
> and $rms
> >>> last, but the detailed description below reverses this.  I will change
> the
> >>> docs, to avoid confusion.
> >>> 3) We have two root-mean-square calculations, a regular parent
> distribution
> >>> divide-by-N, and a sample population divide-by-(N-1).  I'm not sure
> why we
> >>> have both of these--will a piddle ever be able to contain a parent
> >>> distribution?  Probably not--my definition has it taking the average
> as the
> >>> number of points goes to infinity.  If it were up to me I would remove
> the
> >>> RMS calculation so that statsover would only return 6 quantities
> (including
> >>> the PRMS) instead of 7--the difference in the two calculations is
> negligible
> >>> for large datasets, and for small datasets one should not be using the
> RMS
> >>> calculation anyway, correct?  But I worry about backwards
> compatibility,
> >>> particularly with these sorts of constructs:
> >>> $rms = @{statsover($pdl)}[-1]  (that doesn't work, I can never
> remember that
> >>> syntax, but you probably get the point--the poor user is going to get
> the
> >>> ADEV instead)
> >>> 4) If we keep the RMS calculation, then I would like to append "or the
> >>> standard deviation" to the note following its definition in the docs.
> >>> Comments welcome.
> >>> cheers,
> >>> Derek
> >>> _______________________________________________
> >>> Perldl mailing list
> >>> [email protected]
> >>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Perldl mailing list
> >> [email protected]
> >> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
> >
>
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to