the following works - double backslash to remove the "or" functionality of | in a regex. (Bill Dunlap showed that you don't need sapply for it to work)
xs <- "this is | string" xsv <- paste(xs, 1:10) strsplit(xsv, "\\|") On Oct 23, 3:50 pm, Jonathan Greenberg <greenb...@ucdavis.edu> wrote: > William et al: > > Thanks! I think I have a somewhat more complicated issue due to the > type of string I'm using -- the split is " | " (space pipe space) -- how > do I code that based on your sub code below? Using " | *" doesn't seem > to be working. Thanks! > > --j > > > > William Dunlap wrote: > >> -----Original Message----- > >> From: r-help-boun...@r-project.org > >> [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg > >> Sent: Thursday, October 22, 2009 7:35 PM > >> To: r-help > >> Subject: [R] splitting a vector of strings... > > >> Quick question -- if I have a vector of strings that I'd like > >> to split > >> into two new vectors based on a substring that is inside of > >> each string, > >> what is the most efficient way to do this? The substring > >> that I want to > >> split on is multiple characters, if that matters, and it is > >> contained in > >> every element of the character vector. > > > strsplit and sub can both be used for this. If you know > > the string will be split into 2 parts then 2 calls to sub > > with slightly different patterns will do it. strsplit requires > > less fiddling with the pattern and is handier when the number > > of parts is variable or large. strsplit's output often needs to > > be rearranged for convenient use. > > > E.g., I made 100,000 strings with a 'qaz' in their middles with > > x<-paste("X",sample(1e5),sep="") > > y<-sub("X","Y",x) > > xy<-paste(x,y,sep="qaz") > > and split them by the 'qaz' in two ways: > > system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy))) > > # user system elapsed > > # 0.22 0.00 0.21 > > > system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`, > > 1)),y=unlist(lapply(tmp,`[`,2)))}) > > user system elapsed > > # 2.42 0.00 2.20 > > identical(ret1,ret2) > > #[1] TRUE > > identical(ret1$x,x) && identical(ret1$y,y) > > #[1] TRUE > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > >> --j > > >> -- > > >> Jonathan A. Greenberg, PhD > >> Postdoctoral Scholar > >> Center for Spatial Technologies and Remote Sensing (CSTARS) > >> University of California, Davis > >> One Shields Avenue > >> The Barn, Room 250N > >> Davis, CA 95616 > >> Phone: 415-763-5476 > >> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 > > >> ______________________________________________ > >> r-h...@r-project.org mailing list > >>https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >>http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > -- > > Jonathan A. Greenberg, PhD > Postdoctoral Scholar > Center for Spatial Technologies and Remote Sensing (CSTARS) > University of California, Davis > One Shields Avenue > The Barn, Room 250N > Davis, CA 95616 > Phone: 415-763-5476 > AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 > > ______________________________________________ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.