Thanks everybody for the feedback. Since the ADEV calculation is just plain wrong, I will fix the code for that, and the docs that go with it. I will leave the RMS calculation as it is, but will change around the docs and perhaps add a note about it not making too much sense to use RMS, but to use PRMS instead. Probably people use that one anyway out of convenience, I found snippets of my own code where I had things like "($mean, $rms) = stats($pdl);", which is the correct calculation to use. I don't want to add a new function like Joel suggested. I've seen APIs that have list of functions like 'func, func0, func1'--what a mess. But I think adding a note to look for PDL::Stats for more statistical calculations would also be a good idea.
cheers, Derek On Nov 16, 2011, at 7:06 AM, Karl Glazebrook wrote: > Agreed ADEV has to be fixed (in code). It has the wrong unit dimensions for > one thing > > Karl > > On 16/11/2011, at 10:19 AM, Chris Marshall wrote: > >> Hi Derek- >> >> The fix you refer to was for an inconsistent calculation >> between the algorithm used with badvals and that used >> without badvals. I have the same problems with stats and >> statsover in that the values seem to be fairly redundant >> or unneeded for what I wanted for a "quick look" at some >> data. However, I'm a bit leery of changing something >> that has been around so long. >> >> --Chris >> >> On Tue, Nov 15, 2011 at 5:59 PM, Derek Lamb <[email protected]> wrote: >>> I would like to change some of the definitions of the quantities returned by >>> statsover. I find that either their names or their calculations are not >>> consistent with normal statistical practices. However I also know that the >>> statistical terminology used by different communities can be different, so I >>> wanted to make sure I wasn't stepping on too many toes first. In >>> particular: >>> 1) the absolute deviation is given in the docs as: >>> ADEV = sqrt(sum( abs(x-mean(x)) )/N) >>> with a note that "This is also called the standard deviation" >>> I can find nothing that supports the sqrt in this formula or the following >>> note. The average absolute deviation is given by my edition of Bevington & >>> Robinson (pg 10) (not a statistics bible, I understand, but what was on my >>> shelf) and also >>> by >>> http://en.wikipedia.org/wiki/Absolute_deviation#Average_absolute_deviation >>> as >>> AADEV = sum( abs(x-mean(x)) )/N. >>> The Bevington & Robinson text says "the presence of the absolute value sign >>> makes its use inconvenient for statistical analysis...a parameter that is >>> easier to use analytically and that can be justified fairly well on >>> theoretical grounds to be a more appropriate measure of the dispersion of >>> the observations is the <i>standard deviation</i> \sigma." So I would like >>> to take out the sqrt of that function and remove the note about it also >>> being called the standard deviation. As a side note, this was "fixed" back >>> in February (see SF bug #3185864 and this git commit) but I think the fix >>> should have gone the other way (changed the docs and the other code, and >>> left the fixed code as it was). >>> 2) the function example gives the $prms second in the returned list and $rms >>> last, but the detailed description below reverses this. I will change the >>> docs, to avoid confusion. >>> 3) We have two root-mean-square calculations, a regular parent distribution >>> divide-by-N, and a sample population divide-by-(N-1). I'm not sure why we >>> have both of these--will a piddle ever be able to contain a parent >>> distribution? Probably not--my definition has it taking the average as the >>> number of points goes to infinity. If it were up to me I would remove the >>> RMS calculation so that statsover would only return 6 quantities (including >>> the PRMS) instead of 7--the difference in the two calculations is negligible >>> for large datasets, and for small datasets one should not be using the RMS >>> calculation anyway, correct? But I worry about backwards compatibility, >>> particularly with these sorts of constructs: >>> $rms = @{statsover($pdl)}[-1] (that doesn't work, I can never remember that >>> syntax, but you probably get the point--the poor user is going to get the >>> ADEV instead) >>> 4) If we keep the RMS calculation, then I would like to append "or the >>> standard deviation" to the note following its definition in the docs. >>> Comments welcome. >>> cheers, >>> Derek >>> _______________________________________________ >>> Perldl mailing list >>> [email protected] >>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >>> >>> >> >> _______________________________________________ >> Perldl mailing list >> [email protected] >> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl > _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
