Hello! I'm a newcomer to R hoping to replace some convoluted database code with an R script. Unfortunately, I haven't been able to figure out how to implement the following logic.
Essentially, we have a database of transactions that are coded with a geographic locale and a type. These are being loaded into a data.frame with named variables city, type, and price. E.g., trans$city and all that. We want to calculate mean prices by city and type, AFTER excluding outliers. That is, we want to calculate the mean price in 3 steps: 1. calculate a mean and standard deviation by city and type over all transactions 2. create a subset of the original data frame, excluding transactions that differ from the relevant mean by more than 2 standard deviations 3. calculate a final mean by city and type based on this subset. I'm stuck on step 2. I would like to do something like the following: fs <- list(factor(trans$city), factor(trans$type)) means <- tapply(trans$price, fs, mean) stdevs <- tapply(trans$price, fs, sd) filter <- abs(trans$price - means[trans$city, trans$type]) < 2*stdevs[trans$city, trans$type] sub <- subset(trans, filter) The above code doesn't work. What's the correct way to do this? Thanks, Josh ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html