On Sat, Jul 24, 2010 at 2:23 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > Fahim Md wrote: >> >> Is there any function/way to merge/unite the following data >> >> GENEID col1 col2 col3 col4 >> G234064 1 0 0 0 >> G234064 1 0 0 0 >> G234064 1 0 0 0 >> G234064 0 1 0 0 >> G234065 0 1 0 0 >> G234065 0 1 0 0 >> G234065 0 1 0 0 >> G234065 0 0 1 0 >> G234065 0 0 1 0 >> G234065 0 0 0 1 >> >> >> into >> GENEID col1 col2 col3 col4 >> G234064 1 1 0 0 >> // 1 appears in col1 and col2 above, rest are zero >> G234065 0 1 1 1 >> // 1 appears in col2 , 3 and 4 above. >> >> >> Thank > > Warning on terminology: there is a "merge" function in R that lines up rows > from different tables to make a new set of longer rows (more columns). The > usual term for combining column values from multiple rows is "aggregation". > > In addition to the example offered by Jim Holtzman, here are some other > options in no particular order: > > x <- read.table(textConnection(" GENEID col1 col2 col3 col4 > G234064 1 0 0 0 > G234064 1 0 0 0 > G234064 1 0 0 0 > G234064 0 1 0 0 > G234065 0 1 0 0 > G234065 0 1 0 0 > G234065 0 1 0 0 > G234065 0 0 1 0 > G234065 0 0 1 0 > G234065 0 0 0 1 > "), header=TRUE, as.is=TRUE, row.names=NULL) > closeAllConnections() > > # syntactic repackaging of Jim's basic approach > library(plyr) > ddply( x, .(GENEID), function(df) > {with(as.integer(c(col1=any(col1),col2=any(col2),col3=any(col3),col4=any(col4))))} > )
You can do this a little more succinctly with colwise: any_1 <- function(x) as.integer(any(x)) ddply(x, "GENEID", numcolwise(any_1)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.