Re: [R] Normality tests on groups of rows in a data frame, grouped based on content in other columns
Hi Dennis, Thanks for your prompt response. Best, Joel Dennis Murphy djmu...@gmail.com 30-10-2011 21:11 Hi: Here are a few ways (untested, so caveat emptor): # plyr package library('plyr') ddply(df, .(Plant, Tissue, Gene), summarise, ntest = shapiro.test(ExpressionLevel)) # data.table package library('data.table') dt - data.table(df, key = 'Plant, Tissue, Gene') dt[, list(ntest = shapiro.test(ExpressionLevel)), by = key(dt)] # aggregate() function aggregate(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN = shapiro.test) # doBy package: summaryBy(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN = shapiro.test) There are others, too... HTH, Dennis 2011/10/30 Joel Fürstenberg-Hägg jo...@life.ku.dk: Dear R users, I have a data frame in the form below, on which I would like to make normality tests on the values in the ExpressionLevel column. head(df) ID Plant Tissue Gene ExpressionLevel 1 1 p1 t1 g1 366.53 2 2 p1 t1 g2 0.57 3 3 p1 t1 g311.81 4 4 p1 t2 g1 498.43 5 5 p1 t2 g2 2.14 6 6 p1 t2 g3 7.85 I would like to make the tests on every group according to the content of the Plant, Tissue and Gene columns. My first problem is how to run a function for all these sub groups. I first thought of making subsets: group1 - subset(df, Plant==p1 Tissue==t1 Gene==g1) group2 - subset(df, Plant==p1 Tissue==t1 Gene==g2) group3 - subset(df, Plant==p1 Tissue==t1 Gene==g3) group4 - subset(df, Plant==p1 Tissue==t2 Gene==g1) group5 - subset(df, Plant==p1 Tissue==t2 Gene==g2) group6 - subset(df, Plant==p1 Tissue==t2 Gene==g3) etc... But that would be very time consuming and I would like to be able to use the code for other data frames... I have also tried to store these in a list, which I am looping through, running the tests, something like this: alist=list(group1, group2, group3, group4, group5, group6) for(i in alist) { print(shapiro.test(i$ExpressionLevel)) print(pearson.test(i$ExpressionLevel)) print(pearson.test(i$ExpressionLevel, adjust=FALSE)) } But, there must be an easier and more elegant way of doing this... I found the example below at http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop. I think might be used for the printing of the results, but I do not know how to adjust for my data frame, since the functions are applied on several columns instead of certain rows in one column. DF - data.frame(A = rnorm(100), B = rlnorm(100)) obj2 - lapply(DF, shapiro.test) tab2 - lapply(obj, function(x) c(W = unname(x$statistic), p.value = x$p.value)) tab2 - data.frame(do.call(rbind, tab2)) printCoefmat(tab2, has.Pvalue = TRUE) Finally, I have found several different functions for testing for normality, but which one(s) should I choose? As far as I can see in the help files they only differ in the minimum number of samples required. Thanks in advance! Kind regards, Joel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Normality tests on groups of rows in a data frame, grouped based on content in other columns
Dear R users, I have a data frame in the form below, on which I would like to make normality tests on the values in the ExpressionLevel column. head(df) ID Plant Tissue Gene ExpressionLevel 1 1 p1 t1 g1 366.53 2 2 p1 t1 g2 0.57 3 3 p1 t1 g311.81 4 4 p1 t2 g1 498.43 5 5 p1 t2 g2 2.14 6 6 p1 t2 g3 7.85 I would like to make the tests on every group according to the content of the Plant, Tissue and Gene columns. My first problem is how to run a function for all these sub groups. I first thought of making subsets: group1 - subset(df, Plant==p1 Tissue==t1 Gene==g1) group2 - subset(df, Plant==p1 Tissue==t1 Gene==g2) group3 - subset(df, Plant==p1 Tissue==t1 Gene==g3) group4 - subset(df, Plant==p1 Tissue==t2 Gene==g1) group5 - subset(df, Plant==p1 Tissue==t2 Gene==g2) group6 - subset(df, Plant==p1 Tissue==t2 Gene==g3) etc... But that would be very time consuming and I would like to be able to use the code for other data frames... I have also tried to store these in a list, which I am looping through, running the tests, something like this: alist=list(group1, group2, group3, group4, group5, group6) for(i in alist) { print(shapiro.test(i$ExpressionLevel)) print(pearson.test(i$ExpressionLevel)) print(pearson.test(i$ExpressionLevel, adjust=FALSE)) } But, there must be an easier and more elegant way of doing this... I found the example below at http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop. I think might be used for the printing of the results, but I do not know how to adjust for my data frame, since the functions are applied on several columns instead of certain rows in one column. DF - data.frame(A = rnorm(100), B = rlnorm(100)) obj2 - lapply(DF, shapiro.test) tab2 - lapply(obj, function(x) c(W = unname(x$statistic), p.value = x$p.value)) tab2 - data.frame(do.call(rbind, tab2)) printCoefmat(tab2, has.Pvalue = TRUE) Finally, I have found several different functions for testing for normality, but which one(s) should I choose? As far as I can see in the help files they only differ in the minimum number of samples required. Thanks in advance! Kind regards, Joel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Normality tests on groups of rows in a data frame, grouped based on content in other columns
Hi: Here are a few ways (untested, so caveat emptor): # plyr package library('plyr') ddply(df, .(Plant, Tissue, Gene), summarise, ntest = shapiro.test(ExpressionLevel)) # data.table package library('data.table') dt - data.table(df, key = 'Plant, Tissue, Gene') dt[, list(ntest = shapiro.test(ExpressionLevel)), by = key(dt)] # aggregate() function aggregate(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN = shapiro.test) # doBy package: summaryBy(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN = shapiro.test) There are others, too... HTH, Dennis 2011/10/30 Joel Fürstenberg-Hägg jo...@life.ku.dk: Dear R users, I have a data frame in the form below, on which I would like to make normality tests on the values in the ExpressionLevel column. head(df) ID Plant Tissue Gene ExpressionLevel 1 1 p1 t1 g1 366.53 2 2 p1 t1 g2 0.57 3 3 p1 t1 g3 11.81 4 4 p1 t2 g1 498.43 5 5 p1 t2 g2 2.14 6 6 p1 t2 g3 7.85 I would like to make the tests on every group according to the content of the Plant, Tissue and Gene columns. My first problem is how to run a function for all these sub groups. I first thought of making subsets: group1 - subset(df, Plant==p1 Tissue==t1 Gene==g1) group2 - subset(df, Plant==p1 Tissue==t1 Gene==g2) group3 - subset(df, Plant==p1 Tissue==t1 Gene==g3) group4 - subset(df, Plant==p1 Tissue==t2 Gene==g1) group5 - subset(df, Plant==p1 Tissue==t2 Gene==g2) group6 - subset(df, Plant==p1 Tissue==t2 Gene==g3) etc... But that would be very time consuming and I would like to be able to use the code for other data frames... I have also tried to store these in a list, which I am looping through, running the tests, something like this: alist=list(group1, group2, group3, group4, group5, group6) for(i in alist) { print(shapiro.test(i$ExpressionLevel)) print(pearson.test(i$ExpressionLevel)) print(pearson.test(i$ExpressionLevel, adjust=FALSE)) } But, there must be an easier and more elegant way of doing this... I found the example below at http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop. I think might be used for the printing of the results, but I do not know how to adjust for my data frame, since the functions are applied on several columns instead of certain rows in one column. DF - data.frame(A = rnorm(100), B = rlnorm(100)) obj2 - lapply(DF, shapiro.test) tab2 - lapply(obj, function(x) c(W = unname(x$statistic), p.value = x$p.value)) tab2 - data.frame(do.call(rbind, tab2)) printCoefmat(tab2, has.Pvalue = TRUE) Finally, I have found several different functions for testing for normality, but which one(s) should I choose? As far as I can see in the help files they only differ in the minimum number of samples required. Thanks in advance! Kind regards, Joel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.