on 06/15/2008 01:18 PM ss wrote:
Dear all,
I have a matrix, called newdata1,
dim(newdata1)
[1] 3417683
It looks like:
EntrezIDName S1 S2 S3S4 S5.
1 4076 CAPRIN1 0.1 0.2 0.3...
2 139170WDR40B 0.4 0.5 0.6...
35505PPP1R2P1 0.3 0.3 0.7...
44076 CAPRIN1 0.7 0.3 0.2...
5 139170WDR40B null 0.8 0.4...
6 139170WDR40B null null 0.75...
If there are rows whose EntrezID and Name are exactly the same,
I want to take the average for these rows.
There might be some 'null's in the data set. For example, there are
three rows of the same EntrezID and Name as 139170 and WDR40B.
For the sample called 'S1', there are three values, 0.4, null, null. For
this
scenario, I want to keep the final value as 0.4 For the sample 'S2', the
values
are 0.5, 0.8, null. For this, I want to ignore 'null', and take the average
of
0.5 and 0.8, the final value is (0.5+0.8)/2=0.65. For the sample 'S3', there
is no 'null', so just take the average as (0.6+0.4+0.75)/3=0.5833.
Can you show me how to do this?
I appreciate!
If your data file is exactly the way you have it above, you first want
to convert the 'null' entries to NA so that they are treated as missing
values by R.
Thus:
DF - read.table(YourFileName, header = TRUE, na.strings = null)
DF
EntrezID Name S1 S2 S3
1 4076 CAPRIN1 0.1 0.2 0.30
2 139170 WDR40B 0.4 0.5 0.60
3 5505 PPP1R2P1 0.3 0.3 0.70
4 4076 CAPRIN1 0.7 0.3 0.20
5 139170 WDR40B NA 0.8 0.40
6 139170 WDR40B NA NA 0.75
Then use aggregate():
aggregate(DF[, -c(1:2)], by = list(DF$EntrezID, DF$Name),
mean, na.rm = TRUE)
Group.1 Group.2 S1 S2S3
14076 CAPRIN1 0.4 0.25 0.250
25505 PPP1R2P1 0.3 0.30 0.700
3 139170 WDR40B 0.4 0.65 0.583
See ?read.table, ?aggregate and ?mean for more information. Take note of
the 'na.rm' argument in ?mean.
HTH,
Marc Schwartz
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.