you can use something like: dat <- data.frame(hhid = rep(c(10010020, 10010126, 10010142, 10010150), c(2, 2, 3, 3)), h.age = sample(18:50, 10, TRUE)) ########### dat$mean.age <- rep(tapply(dat$h.age, dat$hhid, mean), tapply(dat$h.age, dat$hhid, length)) dat
I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Stephan Lindner" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Tuesday, June 20, 2006 10:42 AM Subject: [R] Create variables with common values for each group > Dear all, > > sorry, this is for sure really basic, but I searched a lot in the > internet, and just couldn't find a solution. > > The problem is to create new variables from a data frame which > contains both individual and group variables, such as mean age for > an > household. My data frame: > > > > df > > hhid h.age > 1 10010020 23 > 2 10010020 23 > 3 10010126 42 > 4 10010126 60 > 5 10010142 20 > 6 10010142 49 > 7 10010142 52 > 8 10010150 18 > 9 10010150 51 > 10 10010150 28 > > > where hhid is the same number for each household, h.age the age for > each household member. > > I tried tapply, by(), and aggregate. The best I could get was: > > by(df, df$hhid, function(subset) > rep(mean(subset$h.age,na.rm=T),nrow(subset))) > > df$hhid: 10010020 > [1] 23 23 > ------------------------------------------------------------ > df$hhid: 10010126 > [1] 51 51 > ------------------------------------------------------------ > df$hhid: 10010142 > [1] 40.33333 40.33333 40.33333 > ------------------------------------------------------------ > df$hhid: 10010150 > [1] 32.33333 32.33333 32.33333 > > > Now I principally only would have to stack up the mean values, and > this is where I'm stucked. The function aggregate works nice, and I > could loop then, but I was wondering whether there is a better way > to > do that. > > My end result should look like this (assigning mean.age to the data > frame): > > > > hhid h.age mean.age > 1 10010020 23 23.00 > 2 10010020 23 23.00 > 3 10010126 42 51.00 > 4 10010126 60 51.00 > 5 10010142 20 40.33 > 6 10010142 49 40.33 > 7 10010142 52 40.33 > 8 10010150 18 32.33 > 9 10010150 51 32.33 > 10 10010150 28 32.33 > > > > Cheers, and thanks a lot, > > > Stephan Lindner > > > > > -- > ----------------------- > Stephan Lindner, Dipl.Vw. > 1512 Gilbert Ct., V-17 > Ann Arbor, Michigan 48105 > U.S.A. > Tel.: 001-734-272-2437 > E-Mail: [EMAIL PROTECTED] > > "The prevailing ideas of a time were always only the ideas of the > ruling class" -- Karl Marx > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
