Hi: This is *really* ugly, but given the number of variables you have in mind, it seems rather necessary, at least to me.
# Given an original data frame dataf, find the max ID number: Mvars <- names(dataf)[grep('^M', names(dataf))] # Pick out the number of variables that start with M: n <- max(as.numeric(sapply(strsplit(Mvars, ''), function(x) unlist(x)[2]))) # Generate all the new variables, similar to those in the transform() statement to get df2 # In your example, you showed the new variables for one pair - this extends to all pairs, # repeated up to the largest parent number newvars <- c(paste('m', rep(1:max(c(Parent1, Parent2), na.rm = TRUE), each = n), rep(c('a', 'b')), rep(c('p1', 'p2'), each = 2), sep = '')) # Generate a two column matrix of the M*a and M*b names v <- matrix(names(dataf)[-(1:4)], ncol = 2, byrow = TRUE) # cbind v with itself, transpose the result and collapse it to a vector vnames <- as.vector(t(cbind(v, v))) # vnames and newvars should have the same length # Generate a character string of the generate the set of calls, # extending the call you used to get the new variables in df2 nvarx <- paste(newvars, '=', vnames, '[', rep(c('Parent1', 'Parent2'), each = 2), ']', sep = '') nvarx.str <- paste(nvarx, collapse = ', ') # Generate the text string that comprises the call to transform() txtexp <- paste('transform(dataf, ', nvarx.str, ')', sep = '') # Parse the text string in txtemp and evaluate it to get the extended data frame df3 df3 <- eval(parse(text = txtexp)) # Next, we work on producing the averages. To do this, we first generate a matrix of names, # then write a function for each row of the matrix that # * produces text strings that select the proper names for each of the calls in hP1, hP2, t1 and t2 # * evaluates the parsed strings # * produces the average # Matrix of names for each parent namesmat <- matrix(c(paste('M', 1:n, 'a', sep = ''), paste('M', 1:n, 'b', sep = ''), paste('m', 1:n, 'ap1', sep = ''), paste('m', 1:n, 'bp1', sep = ''), paste('m', 1:n, 'ap2', sep = ''), paste('m', 1:n, 'bp2', sep = '')), nrow = n) # Function to apply to each row of the above matrix findavg <- function(x) { # x is an input string of variable names # we construct the calls as strings and then evaluate them # final output is the average x <- as.vector(x) hP1 <- eval(parse(text = 'as.numeric(df3[x[3]] != df3[x[4]])')) hP2 <- eval(parse(text ='as.numeric(df3[x[5]] != df3[x[6]])')) t1 <- eval(parse(text ='as.numeric(df3[x[1]] != df3[x[3]])')) t2 <- eval(parse(text ='as.numeric(df3[x[2]] != df3[x[5]])')) C <- (hP1*(t1-0.25)+ hP2 *(t2-0.25)) yv <- df3$y mean(C * yv, na.rm = TRUE) } # Apply it to the matrix of names: apply(namesmat, 1, findavg) # For the example data given, [1] -1.166667 6.500000 -1.166667 2.916667 Please double check this on your example below to make sure it's doing the right thing - I didn't check whether or not the averages were right. After all that eval(parse(text = *))ing, I need a shower...I feel dirty :) If there's a better way, I'd love to see it. HTH, Dennis On Mon, Feb 28, 2011 at 5:00 AM, Umesh Rosyara <rosyar...@gmail.com> wrote: > Dear R-community members. > > I am really appreciate R-help group. Dennis has been extrremely helpful to > solve some of my questions. I am following Dennis recommendation in the > following email, yet I am stuck at another point (hope this will took me to > end of this project. > > Ind <- c(1:5) > Parent1 <- c(NA,NA,1,1,3) > Parent2 <- c(NA,NA,2,2,4) > y <- c(6,5,8,10,7) > M1a <- c(1,2,1,1,1) > M1b <- c(1,2,2,2,1) > M2a <- c(3,3,1,1,3) > M2b <- c(1,1,3,3,3) > M3a <- c(4,4,4,4,4) > M3b <- c(4,4,1,1,4) > M4a <- c(1,4,4,1,4) > M4b <- c(4,4,4,4,4) > > dataf <- data.frame (Ind, Parent1, Parent2, y, M1a, M1b,M2a,M2b, > M3a,M3b,M4a, M4b) # I have more than >1000 variables pair > > # pair1 (M1a,M1b) pair2 (M2a, M2b), pair3 (M3a, M3b)... > > df2 <- transform(dataf,m1ap1 = dataf$M1a[dataf$Parent1], > m1bp1 = dataf$M1b[dataf$Parent1], > m1ap2 = dataf$M1a[dataf$Parent2], > m1bp2 = dataf$M1b[dataf$Parent2]) > # downstream calculations > hP1 <- ifelse(df2$m1ap1==df2$m1bp1,0,1) > hP2 <- ifelse(df2$m1bp2==df2$m1bp2,0,1) > t1 <- ifelse(df2$M1a==df2$m1ap1,1,0) > t2 <- ifelse(df2$M1b==df2$m1ap2,1,0) > C <- (hP1*(t1-0.25)+ hP2 *(t2-0.25)) > yv <- df2$y > Cy <- C*yv > avgCy <- mean(Cy, na.rm=T) > avgCy # I want to store this value to new dataframe with first model i.e. > > > How can I loop the process to output the second pair( here M2a, M2b), third > pair (here M3a, M3b) to all pairs (I have more than 1000) > > Mode1 avgCy > 1 1.75 # from pair M1a and M1b > 2 # from pair M2a and M2b > 3 # from pair M3a and M3b > 4 # from pair M4a and M4b > > to the end of the file > > Thank you in advance > > Umesh R > > ------------------------------ > *From:* Dennis Murphy [mailto:djmu...@gmail.com] > *Sent:* Friday, February 18, 2011 12:28 AM > *To:* Umesh Rosyara > *Cc:* r-help@r-project.org > *Subject:* Re: [R] recoding a data in different way: please help > > Hi: > > This is as far as I could get: > > df <- read.table(textConnection(" > Individual Parent1 Parent2 mark1 mark2 > 1 0 0 12 11 > 2 0 0 11 22 > 3 0 0 13 22 > 4 0 0 13 11 > 5 1 2 11 12 > 6 1 2 12 12 > 7 3 4 11 12 > 8 3 4 13 12 > 9 1 4 11 12 > 10 1 4 11 12"), header = TRUE) > df2 <- transform(df, Parent1 = replace(Parent1, Parent1 == 0, NA), > Parent2 = replace(Parent2, Parent2 == 0, NA)) > df2 <- transform(df2, imark1p1 = df2$mark1[df2$Parent1], # Parent 1's > mark1 > imark1p2 = df2$mark1[df2$Parent2], > # Parent 2's mark1 > imark2p1 = df2$mark2[df2$Parent1], > # Parent 1's mark2 > imark2p2 = df2$mark2[df2$Parent2]) > # Parent 2's mark2 > > I created df2 so as not to overwrite the original in case of a mistake. At > this point, you have several sets of vectors that you can compare; e.g., > mark1 with imark1p1 and imark1p2. Like Josh, I couldn't make heads or tails > out of what these logical tests were meant to output, but perhaps this gives > you a broader template with which to work. At this point, you can probably > remove the rows corresponding to the parents. I believe ifelse() is your > friend here - it can perform logical tests in a vectorized fashion. As long > as the tests are consistent from one individual to the next, it's likely to > be an efficient route. > > HTH, > Dennis > > On Thu, Feb 17, 2011 at 6:21 PM, Umesh Rosyara <rosyar...@gmail.com>wrote: > >> Dear R users >> >> The following question looks simple but I have spend alot of time to solve >> it. I would highly appeciate your help. >> >> I have following dataset from family dataset : >> >> Here we have individuals and their two parents and their marker scores >> (marker1, marker2,....and so on). 0 means that their parent information >> not >> available. >> >> >> Individual Parent1 Parent2 mark1 mark2 >> 1 0 0 12 11 >> 2 0 0 11 22 >> 3 0 0 13 22 >> 4 0 0 13 11 >> 5 1 2 11 12 >> 6 1 2 12 12 >> 7 3 4 11 12 >> 8 3 4 13 12 >> 9 1 4 11 12 >> 10 1 4 11 12 >> >> I want to recode mark1 and other mark2.....and so on column by looking >> indvidual parent (Parent1 and Parent2). >> >> For example >> >> Take case of Individual 5, who's Parent 1 is 1 (has mark1 score 12) and >> Parent 2 is 2 (has mark1 score 11). Individual 5 has mark1 score 11. >> Suppose >> I have following condition to recode Individual 5's mark1 score: >> >> For mark1 variable, If Parent1 score "11" and Parent2 score "22" and >> recode >> indvidual 5's score, "12"=1, else 0 >> If Parent1 score "12" and Parent2 score >> "22" and recode individual 5's score, "22"=1, "12"= 0.5, else 0 >> .........................more >> conditions >> >> Similarly the pointer should move from individual 5 to n individuals at >> the >> end of the file. >> >> Thank you in advance >> >> Umesh R >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ------------------------------ > > No virus found in this message. > Checked by AVG - www.avg.com > Version: 10.0.1204 / Virus Database: 1435/3449 - Release Date: 02/17/11 > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.