On Wed, Dec 9, 2009 at 5:53 AM, Hal Murray <[email protected]> wrote: > > Suppose I want to average a bunch of samples. Sometimes it helps to discard > the outliers. I think that helps when there are two noise mechanisms, say > the typical Gaussian plus sometimes some other noise added on. If the other > noise is rare but large, those occasional samples can have a big influence on > the average. So discarding those outliers gives better results, for some > value of "better". > > I know how to do it in one dimension. How do I do it in two dimensions?
I think the relevant property of the median is that it minimizes the expected value of the norm of the deviations. That is, suppose that we have n data points in one dimension. Call them X_1, ..., X_n, and pick out one of them which we denote by M. The deviations from M are the values X_1 - M, ..., X_n - M, and the absolute values of the deviations are |X_1 - M|, ..., |X_n - M|. Take the expected value of these. The median M makes this expected value as low as possible. In higher dimensions, you could do the same thing: Take all your data points X_1, ..., X_n; pick out one called M; compute X_1 - M, ..., X_n - M; take the norm, i.e., the usual Euclidean distance; and take the expected value of these. Do that for all M and find which one's the smallest; that'll be your median. If you have more than one, take the mean. Of course, this is a really slow algorithm, but I'd guess that the output would be optimal. -- Kyle Hofmann <[email protected]> _______________________________________________ time-nuts mailing list -- [email protected] To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts and follow the instructions there.
