Re: [R] Normality tests on groups of rows in a data frame, grouped based on content in other columns

2011-10-31 Thread Joel Fürstenberg-Hägg
Hi Dennis,
 
Thanks for your prompt response.
 
Best,
 
Joel

 Dennis Murphy djmu...@gmail.com 30-10-2011 21:11 
Hi:

Here are a few ways (untested, so caveat emptor):

# plyr package
library('plyr')
ddply(df, .(Plant, Tissue, Gene), summarise, ntest =
shapiro.test(ExpressionLevel))

# data.table package
library('data.table')
dt - data.table(df, key = 'Plant, Tissue, Gene')
dt[, list(ntest = shapiro.test(ExpressionLevel)), by = key(dt)]

# aggregate() function
aggregate(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN =
shapiro.test)

# doBy package:
summaryBy(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN =
shapiro.test)

There are others, too...

HTH,
Dennis

2011/10/30 Joel Fürstenberg-Hägg jo...@life.ku.dk:
 Dear R users,

 I have a data frame in the form below, on which I would like to make
normality tests on the values in the ExpressionLevel column.

 head(df)
  ID Plant  Tissue  Gene ExpressionLevel
 1  1 p1 t1  g1   366.53
 2  2 p1 t1  g2 0.57
 3  3 p1 t1  g311.81
 4  4 p1 t2  g1   498.43
 5  5 p1 t2  g2 2.14
 6  6 p1 t2  g3 7.85

 I would like to make the tests on every group according to the
content of the Plant, Tissue and Gene columns.

 My first problem is how to run a function for all these sub groups.
 I first thought of making subsets:

 group1 - subset(df, Plant==p1  Tissue==t1  Gene==g1)
 group2 - subset(df, Plant==p1  Tissue==t1  Gene==g2)
 group3 - subset(df, Plant==p1  Tissue==t1  Gene==g3)
 group4 - subset(df, Plant==p1  Tissue==t2  Gene==g1)
 group5 - subset(df, Plant==p1  Tissue==t2  Gene==g2)
 group6 - subset(df, Plant==p1  Tissue==t2  Gene==g3) etc...

 But that would be very time consuming and I would like to be able to
use the code for other data frames...
 I have also tried to store these in a list, which I am looping
through, running the tests, something like this:

 alist=list(group1, group2, group3, group4, group5, group6)
 for(i in alist)
 {
  print(shapiro.test(i$ExpressionLevel))
  print(pearson.test(i$ExpressionLevel))
  print(pearson.test(i$ExpressionLevel, adjust=FALSE))
 }

 But, there must be an easier and more elegant way of doing this... I
found the example below at
http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop.
I think might be used for the printing of the results, but I do not know
how to adjust for my data frame, since the functions are applied on
several columns instead of certain rows in one column.

 DF - data.frame(A = rnorm(100), B = rlnorm(100))

 obj2 - lapply(DF, shapiro.test)

 tab2 - lapply(obj, function(x) c(W = unname(x$statistic), p.value =
x$p.value))
 tab2 - data.frame(do.call(rbind, tab2))
 printCoefmat(tab2, has.Pvalue = TRUE)

 Finally, I have found several different functions for testing for
normality, but which one(s) should I choose? As far as I can see in the
help files they only differ in the minimum number of samples required.

 Thanks in advance!

 Kind regards,

 Joel






[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Normality tests on groups of rows in a data frame, grouped based on content in other columns

2011-10-30 Thread Joel Fürstenberg-Hägg
Dear R users,

I have a data frame in the form below, on which I would like to make normality 
tests on the values in the ExpressionLevel column.

 head(df)
  ID Plant  Tissue  Gene ExpressionLevel
1  1 p1 t1  g1   366.53
2  2 p1 t1  g2 0.57
3  3 p1 t1  g311.81
4  4 p1 t2  g1   498.43
5  5 p1 t2  g2 2.14
6  6 p1 t2  g3 7.85

I would like to make the tests on every group according to the content of the 
Plant, Tissue and Gene columns.

My first problem is how to run a function for all these sub groups.
I first thought of making subsets:

group1 - subset(df, Plant==p1  Tissue==t1  Gene==g1)
group2 - subset(df, Plant==p1  Tissue==t1  Gene==g2)
group3 - subset(df, Plant==p1  Tissue==t1  Gene==g3)
group4 - subset(df, Plant==p1  Tissue==t2  Gene==g1)
group5 - subset(df, Plant==p1  Tissue==t2  Gene==g2)
group6 - subset(df, Plant==p1  Tissue==t2  Gene==g3) etc...

But that would be very time consuming and I would like to be able to use the 
code for other data frames...
I have also tried to store these in a list, which I am looping through, running 
the tests, something like this:

alist=list(group1, group2, group3, group4, group5, group6)
for(i in alist)
{
  print(shapiro.test(i$ExpressionLevel))
  print(pearson.test(i$ExpressionLevel))
  print(pearson.test(i$ExpressionLevel, adjust=FALSE))
}

But, there must be an easier and more elegant way of doing this... I found the 
example below at 
http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop.
 I think might be used for the printing of the results, but I do not know how 
to adjust for my data frame, since the functions are applied on several columns 
instead of certain rows in one column.

DF - data.frame(A = rnorm(100), B = rlnorm(100))

obj2 - lapply(DF, shapiro.test)

tab2 - lapply(obj, function(x) c(W = unname(x$statistic), p.value = x$p.value))
tab2 - data.frame(do.call(rbind, tab2))
printCoefmat(tab2, has.Pvalue = TRUE)

Finally, I have found several different functions for testing for normality, 
but which one(s) should I choose? As far as I can see in the help files they 
only differ in the minimum number of samples required.

Thanks in advance!

Kind regards,

Joel






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Normality tests on groups of rows in a data frame, grouped based on content in other columns

2011-10-30 Thread Dennis Murphy
Hi:

Here are a few ways (untested, so caveat emptor):

# plyr package
library('plyr')
ddply(df, .(Plant, Tissue, Gene), summarise, ntest =
shapiro.test(ExpressionLevel))

# data.table package
library('data.table')
dt - data.table(df, key = 'Plant, Tissue, Gene')
dt[, list(ntest = shapiro.test(ExpressionLevel)), by = key(dt)]

# aggregate() function
aggregate(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN =
shapiro.test)

# doBy package:
summaryBy(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN =
shapiro.test)

There are others, too...

HTH,
Dennis

2011/10/30 Joel Fürstenberg-Hägg jo...@life.ku.dk:
 Dear R users,

 I have a data frame in the form below, on which I would like to make 
 normality tests on the values in the ExpressionLevel column.

 head(df)
  ID Plant  Tissue  Gene ExpressionLevel
 1  1 p1     t1      g1   366.53
 2  2 p1     t1      g2     0.57
 3  3 p1     t1      g3    11.81
 4  4 p1     t2      g1   498.43
 5  5 p1     t2      g2     2.14
 6  6 p1     t2      g3     7.85

 I would like to make the tests on every group according to the content of the 
 Plant, Tissue and Gene columns.

 My first problem is how to run a function for all these sub groups.
 I first thought of making subsets:

 group1 - subset(df, Plant==p1  Tissue==t1  Gene==g1)
 group2 - subset(df, Plant==p1  Tissue==t1  Gene==g2)
 group3 - subset(df, Plant==p1  Tissue==t1  Gene==g3)
 group4 - subset(df, Plant==p1  Tissue==t2  Gene==g1)
 group5 - subset(df, Plant==p1  Tissue==t2  Gene==g2)
 group6 - subset(df, Plant==p1  Tissue==t2  Gene==g3) etc...

 But that would be very time consuming and I would like to be able to use the 
 code for other data frames...
 I have also tried to store these in a list, which I am looping through, 
 running the tests, something like this:

 alist=list(group1, group2, group3, group4, group5, group6)
 for(i in alist)
 {
  print(shapiro.test(i$ExpressionLevel))
  print(pearson.test(i$ExpressionLevel))
  print(pearson.test(i$ExpressionLevel, adjust=FALSE))
 }

 But, there must be an easier and more elegant way of doing this... I found 
 the example below at 
 http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop.
  I think might be used for the printing of the results, but I do not know how 
 to adjust for my data frame, since the functions are applied on several 
 columns instead of certain rows in one column.

 DF - data.frame(A = rnorm(100), B = rlnorm(100))

 obj2 - lapply(DF, shapiro.test)

 tab2 - lapply(obj, function(x) c(W = unname(x$statistic), p.value = 
 x$p.value))
 tab2 - data.frame(do.call(rbind, tab2))
 printCoefmat(tab2, has.Pvalue = TRUE)

 Finally, I have found several different functions for testing for normality, 
 but which one(s) should I choose? As far as I can see in the help files they 
 only differ in the minimum number of samples required.

 Thanks in advance!

 Kind regards,

 Joel






        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.