I'm aware that S N Krishna asked the same
question. However, I have failed to implement the
posted solution for running rank order
correlations on multiple subsets of data using the by() function.
Here is my problem:
Take a set of data from two subjects, who
provided numerical infant mortality (IM) estimates for five countries:
sub <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
#grouping variable = 5 rows x 2 subjects
est <- c(60, 20, 260, 160, 42, 2, 1, 3,
7, 12) #response variable = 5 estimates x 2 subjects
im <- c(4, 5, 7, 8, 10, 4, 5, 7, 8, 10) #actual IM values x 2 subjects
data <- cbind(sub, est, im)
data
Using the by() function:
by(data, sub, function(x) cor(est, im, method = "spearman"))
does result in two correlation coefficients. But
instead of by subject, the est x im correlation
for the entire set is reported, and then assigned
to both subjects. This can be checked using:
cor(est, im, method = "spearman")
Nevertheless, the true coeff's and p-values should be:
sub[1] cor.coef = 0.1 p > .1
sub[2] cor.coef = 0.9 p < .05
I find it peculiar that running a simple regression by groups does work:
by(data, sub, function(x) lm(est ~ im, data = x))
indicating that perhaps I'm using the wrong
grouping function for correlations. I'm using a
fairly standard Pentium 4 running Windows XP.
On occasion I am required to calculate up to a
quarter of a million individual correlations, so
any help would be very much appreciated.
Best wishes,
Peter James Lee
_________________________
Peter James Lee
Assistant Professor
Psikoloji Bölümü
Bilkent University
Bilkent
Ankara
Turkey
06800
e-mail: [EMAIL PROTECTED]
office: (90) 312 290 1807
home: (90) 312 290 3447
website: http://www.bilkent.edu.tr/~peterjl/index.html
_________________________
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.