Hello!  I'm a newcomer to R hoping to replace some convoluted database 
code with an R script.  Unfortunately, I haven't been able to figure out 
how to implement the following logic.

Essentially, we have a database of transactions that are coded with a 
geographic locale and a type.  These are being loaded into a data.frame 
with named variables city, type, and price.  E.g., trans$city and all 
that.

We want to calculate mean prices by city and type, AFTER excluding 
outliers.  That is, we want to calculate the mean price in 3 steps:

1. calculate a mean and standard deviation by city and type over all 
transactions
2. create a subset of the original data frame, excluding transactions that 
differ from the relevant mean by more than 2 standard deviations
3. calculate a final mean by city and type based on this subset.

I'm stuck on step 2.  I would like to do something like the following:

fs <- list(factor(trans$city), factor(trans$type))
means <- tapply(trans$price, fs, mean)
stdevs <- tapply(trans$price, fs, sd)

filter <- abs(trans$price - means[trans$city, trans$type]) <
             2*stdevs[trans$city, trans$type]

sub <- subset(trans, filter)

The above code doesn't work.  What's the correct way to do this?

Thanks,
Josh

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to