Yes ADEV is wrong to have sqrt. It's good to have both RMS and PRMS in the function but change the docs. RMS and PRMS serve different purposes. RMS with /N is appropriate when you absolutely have to discard outliers 2 standard deviations away from the mean of the sample. PRMS with /(N-1) is the appropriate one for generalizing from sample to population.
Best, Maggie On Wed, Nov 16, 2011 at 6:41 PM, Derek Lamb <[email protected]> wrote: > Thanks everybody for the feedback. > > Since the ADEV calculation is just plain wrong, I will fix the code for > that, and the docs that go with it. I will leave the RMS calculation as it > is, but will change around the docs and perhaps add a note about it not > making too much sense to use RMS, but to use PRMS instead. Probably people > use that one anyway out of convenience, I found snippets of my own code > where I had things like "($mean, $rms) = stats($pdl);", which is the > correct calculation to use. I don't want to add a new function like Joel > suggested. I've seen APIs that have list of functions like 'func, func0, > func1'--what a mess. But I think adding a note to look for PDL::Stats for > more statistical calculations would also be a good idea. > > cheers, > Derek > > On Nov 16, 2011, at 7:06 AM, Karl Glazebrook wrote: > > > Agreed ADEV has to be fixed (in code). It has the wrong unit dimensions > for one thing > > > > Karl > > > > On 16/11/2011, at 10:19 AM, Chris Marshall wrote: > > > >> Hi Derek- > >> > >> The fix you refer to was for an inconsistent calculation > >> between the algorithm used with badvals and that used > >> without badvals. I have the same problems with stats and > >> statsover in that the values seem to be fairly redundant > >> or unneeded for what I wanted for a "quick look" at some > >> data. However, I'm a bit leery of changing something > >> that has been around so long. > >> > >> --Chris > >> > >> On Tue, Nov 15, 2011 at 5:59 PM, Derek Lamb <[email protected]> > wrote: > >>> I would like to change some of the definitions of the quantities > returned by > >>> statsover. I find that either their names or their calculations are > not > >>> consistent with normal statistical practices. However I also know > that the > >>> statistical terminology used by different communities can be > different, so I > >>> wanted to make sure I wasn't stepping on too many toes first. In > >>> particular: > >>> 1) the absolute deviation is given in the docs as: > >>> ADEV = sqrt(sum( abs(x-mean(x)) )/N) > >>> with a note that "This is also called the standard deviation" > >>> I can find nothing that supports the sqrt in this formula or the > following > >>> note. The average absolute deviation is given by my edition of > Bevington & > >>> Robinson (pg 10) (not a statistics bible, I understand, but what was > on my > >>> shelf) and also > >>> by > http://en.wikipedia.org/wiki/Absolute_deviation#Average_absolute_deviation > >>> as > >>> AADEV = sum( abs(x-mean(x)) )/N. > >>> The Bevington & Robinson text says "the presence of the absolute value > sign > >>> makes its use inconvenient for statistical analysis...a parameter that > is > >>> easier to use analytically and that can be justified fairly well on > >>> theoretical grounds to be a more appropriate measure of the dispersion > of > >>> the observations is the <i>standard deviation</i> \sigma." So I would > like > >>> to take out the sqrt of that function and remove the note about it also > >>> being called the standard deviation. As a side note, this was "fixed" > back > >>> in February (see SF bug #3185864 and this git commit) but I think the > fix > >>> should have gone the other way (changed the docs and the other code, > and > >>> left the fixed code as it was). > >>> 2) the function example gives the $prms second in the returned list > and $rms > >>> last, but the detailed description below reverses this. I will change > the > >>> docs, to avoid confusion. > >>> 3) We have two root-mean-square calculations, a regular parent > distribution > >>> divide-by-N, and a sample population divide-by-(N-1). I'm not sure > why we > >>> have both of these--will a piddle ever be able to contain a parent > >>> distribution? Probably not--my definition has it taking the average > as the > >>> number of points goes to infinity. If it were up to me I would remove > the > >>> RMS calculation so that statsover would only return 6 quantities > (including > >>> the PRMS) instead of 7--the difference in the two calculations is > negligible > >>> for large datasets, and for small datasets one should not be using the > RMS > >>> calculation anyway, correct? But I worry about backwards > compatibility, > >>> particularly with these sorts of constructs: > >>> $rms = @{statsover($pdl)}[-1] (that doesn't work, I can never > remember that > >>> syntax, but you probably get the point--the poor user is going to get > the > >>> ADEV instead) > >>> 4) If we keep the RMS calculation, then I would like to append "or the > >>> standard deviation" to the note following its definition in the docs. > >>> Comments welcome. > >>> cheers, > >>> Derek > >>> _______________________________________________ > >>> Perldl mailing list > >>> [email protected] > >>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl > >>> > >>> > >> > >> _______________________________________________ > >> Perldl mailing list > >> [email protected] > >> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl > > > > > _______________________________________________ > Perldl mailing list > [email protected] > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >
_______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
