I think this discussion has gone off the rails to matters lying out of the purview of this list.
Bert On Fri, Apr 21, 2023 at 6:16 PM Ebert,Timothy Aaron <teb...@ufl.edu> wrote: > > Sometimes outliers happen. No matter the sample size there is always the > possibility that one or more values are correct though highly improbable. > > -----Original Message----- > From: R-help <r-help-boun...@r-project.org> On Behalf Of Richard O'Keefe > Sent: Friday, April 21, 2023 7:31 PM > To: AbouEl-Makarim Aboueissa <abouelmakarim1...@gmail.com> > Cc: R mailing list <r-help@r-project.org> > Subject: Re: [R] detect and replace outliers by the average > > [External Email] > > This can be seen as three steps: > (1) identify outliers > (2) replace them with NA (trivial) > (3) impute missing values. > There are packages for imputing missing data. > See > https://www.analyticsvidhya.com/blog/2016/03/tutorial-powerful-packages-imputing-missing-values/ > > Here I just want to address the first step. > An observation is only an outlier relative to some model. > Outliers can indicate > - data that are just wrong (data entry error, failing battery in measurement > device, all sorts of stuff). In this case, deletion + imputation makes > sense. > - data that are generated by a mixture of two or more processes, > not the single process you thought was there. In this case, > deletion + imputation is dangerous. The world is trying to tell > you something and you are ignoring it. > - the model is wrong. Here again, deletion + imputation is > dangerous. You need a better model. > > "Detecting outliers in R" as a web query turned up > https://statsandr.com/blog/outliers-detection-in-r/ > on the first page of results. There's plenty of material about finding > outliers. > > But please give very VERY serious consideration to the possibility that some > or even all of your outliers are actually GOOD data telling you something you > need to know. > > > On Fri, 21 Apr 2023 at 06:38, AbouEl-Makarim Aboueissa < > abouelmakarim1...@gmail.com> wrote: > > > Dear All: > > > > > > > > *Re:* detect and replace outliers by the average > > > > > > > > The dataset, please see attached, contains a group factoring column " > > *factor*" and two columns of data "x1" and "x2" with some NA values. I > > need some help to detect the outliers and replace it and the NAs with > > the average within each level (0,1,2) for each variable "x1" and "x2". > > > > > > > > I tried the below code, but it did not accomplish what I want to do. > > > > > > > > > > > > data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE) > > > > data > > > > replace_outlier_with_mean <- function(x) { > > > > replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE)) #### , > > na.rm=TRUE NOT working > > > > } > > > > data[] <- lapply(data, replace_outlier_with_mean) > > > > > > > > > > > > Thank you all very much for your help in advance. > > > > > > > > > > > > with many thanks > > > > abou > > > > > > ______________________ > > > > > > *AbouEl-Makarim Aboueissa, PhD* > > > > *Professor, Mathematics and Statistics* *Graduate Coordinator* > > > > *Department of Mathematics and Statistics* *University of Southern > > Maine* ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat/ > > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu > > %7C1b625ca69ad442654a3e08db42c07f15%7C0d4da0f84a314d76ace60a62331e1b84 > > %7C0%7C0%7C638177166777282433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw > > MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda > > ta=TkZ0pb02TnNHZz94QtR5j%2BcYHwVJLLZRVqnMhmdxpz8%3D&reserved=0 > > PLEASE do read the posting guide > > http://www.r/ > > -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C1b > > 625ca69ad442654a3e08db42c07f15%7C0d4da0f84a314d76ace60a62331e1b84%7C0% > > 7C0%7C638177166777282433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL > > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Rw > > %2F3iEOV%2Fu2bF16LPt8y8xt8aA9a0P8DsaeXYpo%2F97k%3D&reserved=0 > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.r-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.