I second what Jarle said about changing docs instead of changing code.
David
On Nov 15, 2011 5:30 PM, "Jarle Brinchmann" <[email protected]> wrote:
>
> On 15 Nov 2011, at 23:59, Derek Lamb wrote:
>
> > I would like to change some of the definitions of the quantities
> returned by statsover. I find that either their names or their
> calculations are not consistent with normal statistical practices. However
> I also know that the statistical terminology used by different communities
> can be different, so I wanted to make sure I wasn't stepping on too many
> toes first. In particular:
> >
> > 1) the absolute deviation is given in the docs as:
> > ADEV = sqrt(sum( abs(x-mean(x)) )/N)
> > with a note that "This is also called the standard deviation"
>
> You are totally right about this one. This has a) never been called the
> standard deviation nor b) has the absolute deviation every been defined in
> this way. Even the units would be wrong with this usage. There is some
> variation in the definition of the absolute deviation and about language,
> although it is never what you show there. The most common in my experience
> is:
>
> ADEV = Sum( |x-<x>|)/N,
>
> which is what you are suggesting, where <x> is the mean. Sometimes it is
> the median instead (my personal preference). In this case it is known as
> the average absolute deviation or the mean absolute deviation - in the
> latter case you often find it with the acronym MAD. There is also an even
> more robust estimator called the median absolute deviation which is:
>
> MedAD = median ( |x-<x>|)
>
> but I see this much less often. It could be good to have in PDL perhaps,
> but as the name normally would be MAD it could be confusing.
>
> I'd suggest leaving ADEV to be the average absolute deviation above with
> <x> to be the mean(x) which i think is exactly what you suggest. I do think
> this has to be changed as the current implementation is plain wrong.
>
> > 3) We have two root-mean-square calculations, a regular parent
> distribution divide-by-N, and a sample population divide-by-(N-1). I'm not
> sure why we have both of these--will a piddle ever be able to contain a
> parent distribution? Probably not--my definition has it taking the average
> as the number of points goes to infinity. If it were up to me I would
> remove the RMS calculation so that statsover would only return 6 quantities
> (including the PRMS) instead of 7--the difference in the two calculations
> is negligible for large datasets, and for small datasets one should not be
> using the RMS calculation anyway, correct? But I worry about backwards
> compatibility, particularly with these sorts of constructs:
> >
> > $rms = @{statsover($pdl)}[-1] (that doesn't work, I can never remember
> that syntax, but you probably get the point--the poor user is going to get
> the ADEV instead)
>
> Bah, I didn't realise we had two. The sample variance is probably the most
> sensible to keep - but note that if you know (somehow) the mean, then even
> the sample variance is divided by N. Anyway, I think it is dodgy to make
> significant changes here in stats - changing the docs would be my preferred
> solution here.
>
> Cheers,
> Jarle.
>
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl