On May 25, 4:46 am, Stefan <ste...@inizio.se> wrote: > analyst41 <at> hotmail.com <analyst41 <at> hotmail.com> writes: > > > > > I have a data set that has some comma separated strings in each row. > > I'd like to create a vector consisting of all distinct strings that > > occur. The number of strings in each row may vary. > > > Thanks for any help. > > # > # > # Some data: > d <- data.frame(id = 1:5, > text = c('one,two', > 'two,three,three,four', > 'one,three,three,five', > 'five,five,five,five', > 'one,two,three'), > stringsAsFactors = FALSE > ) > # > # > # A function. I'm not a black belt at this, so there > # are probably a more efficient way of writing this. > fcn <- function(x){ > a <- strsplit(x, ',') # Split the string by comma > unique(a[[1]]) # Uniquify the vector} > > # > # > # Use the function with sapply. > sapply(d[,2], fcn) >
Thanks - but this solves a slightly different problem - it outputs the unique values in each row. I want a list of the unique values in the whole data frame. In this case the output should be a single vector = c("one","two","three","four","five"). > ______________________________________________ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.