Hi all, I have relatively big data frames (> 10000 rows by 80 columns) that need to be exposed to "merge". Works marvelously well in general, but some fields of the data frames actually contain multiple ";"-separated values encoded as a character string without defined order, which makes the fields not match each other.
Example: > frame1[1,1] [1] "some;thing" >frame2[2,1] [2] "thing;some" In order to enable merging/duplicate identification of columns containing these strings, I wrote the following function, which passes through the rows one by one, identifies ";"-containing cells, splits and resorts them. ResortCombinedFields <- function(dframe){ if(!is.data.frame(dframe)){ stop("\"ResortCombinedFields\" input needs to be a data frame.") } for(row in seq(nrow(dframe))){ for(mef in grep(";",dframe[row,])){ dframe[row,mef] <- paste(sort(unlist(strsplit(dframe[row,mef],";"))),collapse=";") } } return(dframe) } works fine, but is horribly inefficient. How might this be tackled more elegantly? Thanks for any input, Joh ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.