Dear list,

I get some strange results with daply from the plyr package. In the example below, the average age per municipality for employed en unemployed is calculated. If I do this using tapply (see code below) I get the following result:

        no      yes
A       NA 36.94931
B 51.22505 34.24887
C 48.05759 51.00198

If I do this using daply:

municipality       no      yes
           A 36.94931 48.05759
           B 51.22505 51.00198
           C 34.24887       NA

daply generates the same numbers. However, these are not in the correct cells. For example, in municipality A everybody is employed. Therefore, the NA should be in the cell for unemployed in municipality A.

Am I using daply incorrectly or is there indeed something wrong with the output of daply?

Regards,

Jan


I am using version 1.1 of the plyr-package.


# Generate some test data
data.test <- data.frame(
  municipality=rep(LETTERS[1:3], each=10),
  employed=sample(c("yes", "no"), 30, replace=TRUE),
  age=runif(30,20,70))
# Make sure everybody is employed in municipality A
data.test$employed[data.test$municipality == "A"] <- "yes"

# Compare the output of tapply:
tapply(data.test$age, list(data.test$municipality, data.test$employed),
mean)
# to that of daply:
daply(data.test, .(municipality, employed), function(d){mean(d$age)} )
# results of ddply are the samen as tapply
ddply(data.test, .(municipality, employed), function(d){mean(d$age)} )

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to