Re: [R] need help

Jim Lemon Sat, 13 Aug 2005 04:21:08 -0700

Weiwei Shi wrote:

Hi, there:
I think i need to re-phrase my question since last time I did not get
any reply but i think the question is not that hard, probably i did
not make the question clear:


I want to find cases like
35, 90, 330, 330, 335

from the rest which look like
3, 3, 3, 3.2, 3.3
4, 4.4, 4.5, 4.6, 4.7
....

basically there is one (or more) big 'gap' in the case i seek.

Hi Weiwei,

I think your method of defining a central value for the large proportionof values and then setting a criterion for outliers is valid (or atleast as valid as many other ways of defining outliers). However, hereis a different method, sorting the vector of values and then looking fora "gap" with a specified multiple (gap.prop) of the mean differencesbetween the smaller values. It returns the first value after the "gap"(easily changed to all the values after). To account for vectors thathave negative values the minimum value is subtracted when calculating"newx" and then added to the result. For your data, a gap.prop of 20works, but the default value of 10 doesn't. It also won't work wherelarge values are typical and small ones are the outliers (well, it willindicate where the "gap" is).

Jim

find.first.gap<-function(x,gap.prop=10) {
 lenx<-length(x)
 newx<-sort(x)-min(x)
 not.found<-1
 gap.pos<-2
 # set the 
 mean.diff<-newx[2]-newx[1]
 while(not.found && gap.pos <= lenx) {
  this.diff<-newx[gap.pos]-newx[gap.pos-1]
  print(c(mean.diff,this.diff))
  if(mean.diff != 0) {
   if(this.diff/mean.diff >= gap.prop) not.found<-0
   else gap.pos<-gap.pos+1
  }
  else gap.pos<-gap.pos+1
  mean.diff<-(this.diff+mean.diff*(gap.pos-1))/gap.pos
 }
 return(newx[gap.pos]+min(x))
}

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] need help

Reply via email to