Re: [time-nuts] Discarding outliers in two dimensions

Kyle Hofmann Thu, 10 Dec 2009 19:43:43 -0800

On Wed, Dec 9, 2009 at 5:53 AM, Hal Murray <[email protected]> wrote:
>
> Suppose I want to average a bunch of samples.  Sometimes it helps to discard
> the outliers.  I think that helps when there are two noise mechanisms, say
> the typical Gaussian plus sometimes some other noise added on.  If the other
> noise is rare but large, those occasional samples can have a big influence on
> the average.  So discarding those outliers gives better results, for some
> value of "better".
>
> I know how to do it in one dimension.  How do I do it in two dimensions?


I think the relevant property of the median is that it minimizes the
expected value of the norm of the deviations. That is, suppose that we
have n data points in one dimension.  Call them X_1, ..., X_n, and
pick out one of them which we denote by M. The deviations from M are
the values X_1 - M, ..., X_n - M, and the absolute values of the
deviations are |X_1 - M|, ..., |X_n - M|. Take the expected value of
these. The median M makes this expected value as low as possible.

In higher dimensions, you could do the same thing: Take all your data
points X_1, ..., X_n; pick out one called M; compute X_1 - M, ..., X_n
- M; take the norm, i.e., the usual Euclidean distance; and take the
expected value of these. Do that for all M and find which one's the
smallest; that'll be your median. If you have more than one, take the
mean.

Of course, this is a really slow algorithm, but I'd guess that the
output would be optimal.

-- 
Kyle Hofmann <[email protected]>

_______________________________________________
time-nuts mailing list -- [email protected]
To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
and follow the instructions there.

Re: [time-nuts] Discarding outliers in two dimensions

Reply via email to