Re: What is an outlier ? cont'd

2002-02-25 Thread Art Kendall


--A59A95727DA65C2AB2F9EBF5
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

That being said, occasions can arise where there are outliers other than
from measurement or data entry error. Different disciplines have different
approaches.
What discipline are you studying? What is the variable you are concerned
about?  How is it measured?

some examples of low values:
10 pounds would be a suspicious value for an adult's weight.
Few college students are under 16.
37degrees F would be unreasonable for a body temperature of a li


--A59A95727DA65C2AB2F9EBF5
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

!doctype html public -//w3c//dtd html 4.0 transitional//en
html
That being said, occasions bcan /barise where there are outliers other
than from measurement or data entry error. Different disciplines have different
approaches.
brWhat discipline are you studying? What is the variable you are concerned
about?nbsp; How is it measured?
psome examples of low values:
br10 pounds would be a suspicious value for an adult's weight.
brFew college students are under 16.
br37degrees F would be unreasonable for a body temperature of a li
brnbsp;/html

--A59A95727DA65C2AB2F9EBF5--



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: What is an outlier ?

2002-02-25 Thread Dennis Roberts

of course, if one has control over the data, checking the coding and making 
sure it is correct is a good thing to do

if you do not have control over that, then there may be very little you can 
do with it and in fact, you may be totally UNaware of an outlier problem

i see as a potentially MUCH larger problem when ONLY certain summary 
statistics are shown without any basic tallies/graphs displayed so, IF 
there are some really strange outlier values, it usually will go undetected ...

correlations are ONE good case in point ... have a look at the following 
scatterplot ... height in inches and weight in pounds ... from the pulse 
data set in minitab


  -  *
  -
   300+
  -
  Weight  -
  - 2
  - 2  224 32
   150+   ** 3458*454322*
  -*53*3*535  2
  -  **
--+-+-+-+-+-+Height
   32.0  40.0  48.0  56.0  64.0  72.0

now, the actual r between the X and Y is -.075 ... and of course, this 
seems strange but, IF you had only seen this in a matrix of r values ... 
you might say that perhaps there was serious range restriction that more or 
less wiped out the r in this case ...  but even the desc. stats might not 
adequately tell you of this problem

IF you had the scatterplot, you probably would figure out REAL quick that 
there is a PROBLEM with one of the data points ...

in fact, without that one weird data point, the r is about .8 ... which 
makes a lot better sense when correlating heights and weights of college 
students


At 09:06 PM 2/25/02 +, Art Kendall wrote:

--6F47CB3D3B10A10A3E9B064C
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

An outlier is any value for a variable that is suspect given the
measurement system, common sense,  other values for the variable in
the data set, or  the values a case has on other variables.
=

Dennis Roberts, 208 Cedar Bldg., University Park PA 16802
Emailto: [EMAIL PROTECTED]
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
AC 8148632401



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: What is an outlier ?

2002-02-25 Thread Glen Barnett

Voltolini wrote:
 
 Hi,
 
 My doubt isan outlier can be a LOW data value in the sample (and not
 just the highest) ?
 
 Several text boks dont make this clear !!!

What makes an outlier an outlier is your model. If your model accounts
for all the observations, you can't really call any of them an outlier.
If your model adequately accounts for all but one or two unusual
observations, you might regard them as coming from some process other
than that which generated the data you model accounts for, and call them
outliers.

Such not adequately accounted for observations may be low
observations, or high
observations, or they may actually turn out be somewhere in the middle
of the range of your data - as I have seen with time series for example,
where in some applications an autoregressive models was a very good
desctiption of a long series, apart from a few outliers in the first
quarter or so of the time period (which did in the end turn out to have
come from a different process, because the protocol wasn't always being
properly followed early on). Two of those outliers - in the sense that
the model didn't adequately account for them - turn out to be neither
particularly high or low observations - but they were substantially
higher or lower than expected from the model. 

Another case where you might have outliers in the middle of your data
is in a regression context, where a generally increasing relationship
shows a tight, gaussian-looking random scatter about the relationship,
but with a couple of relatively low y-values at some of the higher
x-values. The observations themselves may actually be very close to the
mean of the y's, but the model of the relationship makes them unusual.
A different model - for example, one where the observations come from a
distribution which has the same expectation as a function of x, but
which has a heavier tail to the left around that - might account for all
the data and not find any outliers.

Glen


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=