On Sat, Jul 24, 2010 at 2:23 AM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us> wrote:
> Fahim Md wrote:
>>
>> Is there any function/way to merge/unite the following data
>>
>>  GENEID      col1          col2             col3                col4
>>  G234064         1             0                  0                   0
>>  G234064         1             0                  0                   0
>>  G234064         1             0                  0                   0
>>  G234064         0             1                  0                   0
>>  G234065         0             1                  0                   0
>>  G234065         0             1                  0                   0
>>  G234065         0             1                  0                   0
>>  G234065         0             0                  1                   0
>>  G234065         0             0                  1                   0
>>  G234065         0             0                  0                   1
>>
>>
>> into
>> GENEID      col1          col2             col3                col4
>>  G234064         1             1                  0                   0
>> // 1 appears in col1 and col2 above, rest are zero
>>  G234065         0             1                  1                   1
>> // 1 appears in col2 , 3 and 4 above.
>>
>>
>> Thank
>
> Warning on terminology: there is a "merge" function in R that lines up rows
> from different tables to make a new set of longer rows (more columns). The
> usual term for combining column values from multiple rows is "aggregation".
>
> In addition to the example offered by Jim Holtzman, here are some other
> options in no particular order:
>
> x <- read.table(textConnection(" GENEID col1 col2 col3 col4
> G234064 1 0 0 0
> G234064 1 0 0 0
> G234064 1 0 0 0
> G234064 0 1 0 0
> G234065 0 1 0 0
> G234065 0 1 0 0
> G234065 0 1 0 0
> G234065 0 0 1 0
> G234065 0 0 1 0
> G234065 0 0 0 1
> "), header=TRUE, as.is=TRUE, row.names=NULL)
> closeAllConnections()
>
> # syntactic repackaging of Jim's basic approach
> library(plyr)
> ddply( x, .(GENEID), function(df)
> {with(as.integer(c(col1=any(col1),col2=any(col2),col3=any(col3),col4=any(col4))))}
> )

You can do this a little more succinctly with colwise:

any_1 <- function(x) as.integer(any(x))
ddply(x, "GENEID", numcolwise(any_1))

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to