Hi Rui:
How about this dataset, please see below. I included a few outliers in each column, as you can see in the printed dataset; please see below. Once again, thank you very much, and sorry if I bothered you all. abou > dput(datafortest) structure(list(factor1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, NA, NA, NA), levels = c("1", "2", "3"), class = "factor"), X = c(994455.077, 4348.031, 9999.789, 3813.139, 12.65, 5642.667, 876684.386, 5165.731, NA, 3259.241, 8.383, 1997.878, 99990.608, 2655.977, 9.49, 1826.851, 4386.002, 883295.091, 2120.902, NA, 2056.123, 5.088, NA, 92539.873, NA, NA, NA, NA), Y = c(76888L, 333L, 618L, 10L, 344L, NA, 3L, 86999L, 265L, 557L, 77777L, 383L, NA, NA, 87777L, 287L, 352L, 308L, 999526L, 489L, 2L, 444L, 9L, 333L, NA, NA, NA, NA), factor2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("1", "2", "3"), class = "factor"), Z = c(54999L, 475L, 15L, 603L, 442L, 79486L, 927L, 971L, 388L, 888L, 514L, 409L, 546L, 523L, 313L, 296L, 320L, 388L, 79999L, 677L, 555L, NA, 479L, 257L, 313L, 21L, 320L, 4L), U = c(NA, NA, 1.5, 332, 216, 217, 1000, 10, 9999, 444, NA, 5, 327, 58888, 456, 412, 251, 6, 398, 438, 428, 15, NA, 406, 334, 465, 180, 88999), V = c(12, 240, 9000, 265, NA, 99999, 1, 562, 13, 777, 322, NA, 99988, 653, 450, 576, NA, 396.5, 91888, 5, 219, NA, 321, 417, 409, 999999, 523, 10)), row.names = c(NA, -28L), class = "data.frame") > > datafortest factor1 X Y factor2 Z U V 1 1 994455.077 76888 1 54999 NA 12.0 2 1 4348.031 333 1 475 NA 240.0 3 1 9999.789 618 1 15 1.5 9000.0 4 1 3813.139 10 1 603 332.0 265.0 5 1 12.650 344 1 442 216.0 NA 6 1 5642.667 NA 1 79486 217.0 99999.0 7 1 876684.386 3 1 927 1000.0 1.0 8 2 5165.731 86999 1 971 10.0 562.0 9 2 NA 265 1 388 9999.0 13.0 10 2 3259.241 557 2 888 444.0 777.0 11 2 8.383 77777 2 514 NA 322.0 12 2 1997.878 383 2 409 5.0 NA 13 2 99990.608 NA 2 546 327.0 99988.0 14 2 2655.977 NA 2 523 58888.0 653.0 15 3 9.490 87777 2 313 456.0 450.0 16 3 1826.851 287 2 296 412.0 576.0 17 3 4386.002 352 2 320 251.0 NA 18 3 883295.091 308 2 388 6.0 396.5 19 3 2120.902 999526 3 79999 398.0 91888.0 20 3 NA 489 3 677 438.0 5.0 21 3 2056.123 2 3 555 428.0 219.0 22 3 5.088 444 3 NA 15.0 NA 23 3 NA 9 3 479 NA 321.0 24 3 92539.873 333 3 257 406.0 417.0 25 <NA> NA NA 3 313 334.0 409.0 26 <NA> NA NA 3 21 465.0 999999.0 27 <NA> NA NA 3 320 180.0 523.0 28 <NA> NA NA 3 4 88999.0 10.0 > with many thanks abou ______________________ *AbouEl-Makarim Aboueissa, PhD* *Professor, Mathematics and Statistics* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Sat, Apr 29, 2023 at 8:05 AM Rui Barradas <ruipbarra...@sapo.pt> wrote: > Às 14:09 de 28/04/2023, AbouEl-Makarim Aboueissa escreveu: > > *R: *Grubbs Test to detect all outliers Per group for all columns in a > data > > frame > > > > > > > > Dear All: good morning > > > > I have a dataset (as an example) with two column factors (factor1 and > > factor2) and 5 numerical columns (X,Y,Z,U,V). The X and Y columns have > same > > length as factor1; and Z, U, and V have same length as factor2. Please > see > > dataset is copied below. Please note that all dataset columns have NAs > > values. > > > > *Need help on this:* > > > > > > Can we use the grubbs.test() function to detect all outliers and replace > it > > by NA in X and Y datasets per group in factor1; and in Z, U, and V > datasets > > per group in factor2. Columns in the dataframe have different lengths, > but > > when I read the .csv file, R added NA values for the shorter columns. > > > > If you need the .csv data file, please let me know. > > > > > > Thank you very much for your help in advance. > > > > > > > > > > install.packages("outliers") > > library(outliers) > > > > datafortest<-read.csv("G:/data_for_test.csv", header=TRUE) > > datafortest > > > > datafortest<-data.frame(datafortest) > > > > datafortest$factor1<-as.factor(datafortest$factor1) > > datafortest$factor2<-as.factor(datafortest$factor2) > > > > str(datafortest) > > > > ##### tried to use grubbs.test() on a single column of the dataframe, but > > still not working > > tests.for.outliers.X<- grubbs.test(datafortest$X, na.rm = TRUE, type=11) > > > > > > #################################### > > > > *grubbs.test() on a single dataset: but this can only detect if the min > and > > the max are outliers.* > > > > > > xx999<-c(0.088,1,2,3,4,5,6,7,8,9,88,98,99) > > grubbs.test(xx999, type=11) > > > > > > > > > > With many thanks > > > > Abou > > > > > > > > factor1 X Y factor2 Z U > > V > > 1 4455.077 888 1 999 NA 999 > > 1 4348.031 333 1 475 NA 240 > > 1 9999.789 618 1 507 252 394 > > 1 3813.139 417 1 603 332 265 > > 1 7512.65 344 1 442 216 NA > > 1 5642.667 NA 1 486 217 275 > > 1 6684.386 341 1 927 698 479 > > 2 5165.731 999 1 971 311 562 > > 2 NA 265 1 388 999 512 > > 2 3259.241 557 2 888 444 777 > > 2 3288.383 234 2 514 NA 322 > > 2 1997.878 383 2 409 311 NA > > 2 99990.61 NA 2 546 327 728 > > 2 2655.977 NA 2 523 228 653 > > 3 3189.49 7777 2 313 456 450 > > 3 1826.851 287 2 296 412 576 > > 3 4386.002 352 2 320 251 NA > > 3 3295.091 308 2 388 888 396.5 > > 3 2120.902 526 3 9999 398 888 > > 3 NA 489 3 677 438 307 > > 3 2056.123 291 3 555 428 219 > > 3 1995.088 444 3 NA 319 NA > > 3 NA 349 3 479 NA 321 > > 3 2539.873 333 3 257 406 417 > > 3 313 334 409 > > 3 296 465 546 > > 3 320 180 523 > > 3 388 999 313 > > > > > > > > ______________________ > > > > > > *AbouEl-Makarim Aboueissa, PhD* > > > > *Professor, Mathematics and Statistics* > > *Graduate Coordinator* > > > > *Department of Mathematics and Statistics* > > *University of Southern Maine* > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > Hello, > > With the data file you have attached I cannot reproduce any errors, all > went well at the first try. > > > library(outliers) > > fl <- "~/data_for_test.csv" > datafortest <- read.csv(fl) > > # these are not needed to run the test > datafortest$factor1 <- as.factor(datafortest$factor1) > datafortest$factor2 <- as.factor(datafortest$factor2) > str(datafortest) > #> 'data.frame': 28 obs. of 7 variables: > #> $ factor1: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 2 2 2 ... > #> $ X : num 4455 4348 10000 3813 7513 ... > #> $ Y : int 888 333 618 417 344 NA 341 999 265 557 ... > #> $ factor2: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 2 ... > #> $ Z : int 999 475 507 603 442 486 927 971 388 888 ... > #> $ U : int NA NA 252 332 216 217 698 311 999 444 ... > #> $ V : num 999 240 394 265 NA 275 479 562 512 777 ... > head(datafortest) > #> factor1 X Y factor2 Z U V > #> 1 1 4455.077 888 1 999 NA 999 > #> 2 1 4348.031 333 1 475 NA 240 > #> 3 1 9999.789 618 1 507 252 394 > #> 4 1 3813.139 417 1 603 332 265 > #> 5 1 7512.650 344 1 442 216 NA > #> 6 1 5642.667 NA 1 486 217 275 > > ##### tried to use grubbs.test() on a single column of the dataframe, but > ##### still not working > grubbs.test(datafortest$X, type = 11) > #> > #> Grubbs test for two opposite outliers > #> > #> data: datafortest$X > #> G = 4.6640014, U = 0.0091756, p-value = 0.02867 > #> alternative hypothesis: 1826.851 and 99990.608 are outliers > > > > Hope this helps, > > Rui Barradas > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.