On 05/11/2010 02:05 PM, Greg Orm wrote:
Dear r-help list members,

I am quite new to R, and hope to seek advice from you about a problem I have
been cracking my head over. Apologies if this seems like a simple problem.

I have essentially two tables. The first (Table A) is a standard patient
clinicopathological data table, where rows correspond to patient IDs and
columns correspond to clinical features. Records in this table are stored as
1 or 0 (denoting presence). An example is provided below.

The second (Table B) is a table that represents a 'key' to Table A. This
Table B has a category field, as well as a feature field which links to the
Table B. Unfortunately, this is a one-to-many relationship, and the numbers
in the feature field represent the respective columns in Table A, delimited
by semicolons. So in the example below, I need to collapse the data in Table
B into a table with nrow equivalent to the number of categories and ncol =
number of patients. The collapsing of each categoriy, will be based on a
Boolean OR, or the equivalent ANY in R (so long as 1 of the features is
true, the resulting outcome will be true)

data.table.a<-
matrix(data=round(runif(100)),nrow=10,ncol=10,dimnames=list(paste("Patient",1:10),paste("Feature",1:10)))
data.table.b<- data.frame
(ID=c(1,2,3,4,5,6,7),CATEGORY=c(1,2,3,3,4,5,5),FEATURE=c("9","3;5","7","4","6;10","1;2","8"))

In the example tables above, we hope to collapse the features by category -
so the final desired output will be a total of 10 patients as rows, and a
total of 5 categories as columns. (after collapsing the features by a
Boolean OR). (i.e. if any of the features in the category are present, it
will be a TRUE).

I apologize for the apparently awkward table, but this is what I had to
start with. I tried expanding data.table.b$FEATURE using strsplit, which
resulted in a list, and then I got stuck there for a long time.

Hi Greg,
Messy, but I think it works.

feature2category<-function(dta,dtb) {
 categories<-unique(dtb$CATEGORY)
 category.table<-matrix(0,nrow=dim(dta)[1],
  ncol=length(categories))
 colnames(category.table)<-
  paste("Category",1:length(categories))
 for(patrow in 1:dim(dta)[1]) {
  for(catrow in 1:dim(dtb)[1]) {
   acols<-as.numeric(unlist(strsplit(
    as.character(dtb[catrow,"FEATURE"]),";")))
   cat("patrow",patrow,"catrow",catrow,acols,"\n")
   category.table[patrow,dtb[catrow,"CATEGORY"]]<-
    as.logical(sum(dta[patrow,acols]))
  }
 }
 return(as.data.frame(category.table))
}

Jim

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to