Às 14:09 de 28/04/2023, AbouEl-Makarim Aboueissa escreveu:
*R: *Grubbs Test to detect all outliers Per group for all columns in a data
frame



Dear All: good morning

I have a dataset (as an example) with two column factors (factor1 and
factor2) and 5 numerical columns (X,Y,Z,U,V). The X and Y columns have same
length as factor1; and Z, U, and V have same length as factor2. Please see
dataset is copied below. Please note that all dataset columns have NAs
values.

*Need help on this:*


Can we use the grubbs.test() function to detect all outliers and replace it
by NA in X and Y datasets per group in factor1; and in Z, U, and V datasets
per group in factor2. Columns in the dataframe have different lengths, but
when I read the .csv file, R added NA values for the shorter columns.

If you need the .csv data file, please let me know.


Thank you very much for your help in advance.




install.packages("outliers")
library(outliers)

datafortest<-read.csv("G:/data_for_test.csv", header=TRUE)
datafortest

datafortest<-data.frame(datafortest)

datafortest$factor1<-as.factor(datafortest$factor1)
datafortest$factor2<-as.factor(datafortest$factor2)

str(datafortest)

##### tried to use grubbs.test() on a single column of the dataframe, but
still not working
tests.for.outliers.X<- grubbs.test(datafortest$X, na.rm = TRUE, type=11)


####################################

*grubbs.test() on a single dataset: but this can only detect if the min and
the max are outliers.*


xx999<-c(0.088,1,2,3,4,5,6,7,8,9,88,98,99)
grubbs.test(xx999, type=11)




With many thanks

Abou



factor1      X            Y         factor2          Z           U
   V
1     4455.077 888 1 999           NA 999
1     4348.031 333 1 475            NA 240
1    9999.789 618 1 507 252 394
1    3813.139 417 1 603 332 265
1  7512.65 344 1 442 216           NA
1     5642.667            NA 1 486 217 275
1     6684.386 341 1 927 698 479
2     5165.731 999 1 971 311 562
2 NA 265 1 388 999 512
2     3259.241 557 2 888 444 777
2     3288.383 234 2 514            NA 322
2      1997.878 383 2 409 311           NA
2       99990.61           NA 2 546 327 728
2       2655.977          NA 2 523 228 653
3      3189.49 7777 2 313 456 450
3      1826.851 287 2 296 412 576
3      4386.002 352 2 320 251         NA
3      3295.091 308 2 388 888 396.5
3      2120.902 526 3 9999 398 888
3 NA 489 3 677 438 307
3      2056.123 291 3 555 428 219
3      1995.088 444 3              NA 319           NA
3 NA 349 3 479           NA 321
3      2539.873 333 3 257 406 417
       3 313 334 409
       3 296 465 546
       3 320 180 523
       3 388 999 313



______________________


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hello,

With the data file you have attached I cannot reproduce any errors, all went well at the first try.


library(outliers)

fl <- "~/data_for_test.csv"
datafortest <- read.csv(fl)

# these are not needed to run the test
datafortest$factor1 <- as.factor(datafortest$factor1)
datafortest$factor2 <- as.factor(datafortest$factor2)
str(datafortest)
#> 'data.frame':    28 obs. of  7 variables:
#>  $ factor1: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 2 2 2 ...
#>  $ X      : num  4455 4348 10000 3813 7513 ...
#>  $ Y      : int  888 333 618 417 344 NA 341 999 265 557 ...
#>  $ factor2: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 2 ...
#>  $ Z      : int  999 475 507 603 442 486 927 971 388 888 ...
#>  $ U      : int  NA NA 252 332 216 217 698 311 999 444 ...
#>  $ V      : num  999 240 394 265 NA 275 479 562 512 777 ...
head(datafortest)
#>   factor1        X   Y factor2   Z   U   V
#> 1       1 4455.077 888       1 999  NA 999
#> 2       1 4348.031 333       1 475  NA 240
#> 3       1 9999.789 618       1 507 252 394
#> 4       1 3813.139 417       1 603 332 265
#> 5       1 7512.650 344       1 442 216  NA
#> 6       1 5642.667  NA       1 486 217 275

##### tried to use grubbs.test() on a single column of the dataframe, but
##### still not working
grubbs.test(datafortest$X, type = 11)
#>
#>  Grubbs test for two opposite outliers
#>
#> data:  datafortest$X
#> G = 4.6640014, U = 0.0091756, p-value = 0.02867
#> alternative hypothesis: 1826.851 and 99990.608 are outliers



Hope this helps,

Rui Barradas

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to