That was still slower and doesn't quite give what was requested: > cbind(F1,utils::strcapture("([^_]*)_(.*)", F1$text, proto=data.frame(Before_=character(), After_=character()))) ID1 ID2 text Before_ After_ 1 A1 B1 NONE <NA> <NA> 2 A1 B1 cf_12 cf 12 3 A1 B1 NONE <NA> <NA> 4 A2 B2 X2_25 X2 25 5 A2 B3 fd_15 fd 15
> system.time({ + cbind(F2,utils::strcapture("([^_]*)_(.*)", F2$text, proto=data.frame(Before_=character(), After_=character()))) + } + ) user system elapsed 32.712 0.736 33.587 Cheers, Bert On Tue, Sep 22, 2020 at 5:45 PM Bill Dunlap <williamwdun...@gmail.com> wrote: > Another way to make columns out of the stuff before and after the > underscore, with NAs if there is no underscore, is > > utils::strcapture("([^_]*)_(.*)", F1$text, > proto=data.frame(Before_=character(), After_=character())) > > -Bill > > On Tue, Sep 22, 2020 at 4:25 PM Bert Gunter <bgunter.4...@gmail.com> > wrote: > >> To be clear, I think Rui's solution is perfectly fine and probably better >> than what I offer below. But just for fun, I wanted to do it without the >> lapply(). Here is one way. I think my comments suffice to explain. >> >> > ## which are the non "_" indices? >> > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE) >> > ## paste "_." to these >> > F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_") >> > ## Now strsplit() and unlist() them to get a vector >> > z <- unlist(strsplit(F1$text, "_")) >> > ## now cbind() to the data frame >> > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE)) >> > F1 >> ID1 ID2 text 1 2 >> 1 A1 B1 NONE_. NONE . >> 2 A1 B1 cf_12 cf 12 >> 3 A1 B1 NONE_. NONE . >> 4 A2 B2 X2_25 X2 25 >> 5 A2 B3 fd_15 fd 15 >> >## You can change the names of the 2 columns yourself >> >> Cheers, >> Bert >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along and >> sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarra...@sapo.pt> >> wrote: >> >> > Hello, >> > >> > A base R solution with strsplit, like in your code. >> > >> > F1$Y1 <- +grepl("_", F1$text) >> > >> > tmp <- strsplit(as.character(F1$text), "_") >> > tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x) >> > tmp <- do.call(rbind, tmp) >> > colnames(tmp) <- c("X1", "X2") >> > F1 <- cbind(F1[-3], tmp) # remove the original column >> > rm(tmp) >> > >> > F1 >> > # ID1 ID2 Y1 X1 X2 >> > #1 A1 B1 0 NONE . >> > #2 A1 B1 1 cf 12 >> > #3 A1 B1 0 NONE . >> > #4 A2 B2 1 X2 25 >> > #5 A2 B3 1 fd 15 >> > >> > >> > Note that cbind dispatches on F1, an object of class "data.frame". >> > Therefore it's the method cbind.data.frame that is called and the result >> > is also a df, though tmp is a "matrix". >> > >> > >> > Hope this helps, >> > >> > Rui Barradas >> > >> > >> > Às 20:07 de 22/09/20, Rui Barradas escreveu: >> > > Hello, >> > > >> > > Something like this? >> > > >> > > >> > > F1$Y1 <- +grepl("_", F1$text) >> > > F1 <- F1[c(1, 2, 4, 3)] >> > > F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill >> = >> > > "right") >> > > F1 >> > > >> > > >> > > Hope this helps, >> > > >> > > Rui Barradas >> > > >> > > Às 19:55 de 22/09/20, Val escreveu: >> > >> HI All, >> > >> >> > >> I am trying to create new columns based on another column string >> > >> content. First I want to identify rows that contain a particular >> > >> string. If it contains, I want to split the string and create two >> > >> variables. >> > >> >> > >> Here is my sample of data. >> > >> F1<-read.table(text="ID1 ID2 text >> > >> A1 B1 NONE >> > >> A1 B1 cf_12 >> > >> A1 B1 NONE >> > >> A2 B2 X2_25 >> > >> A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F) >> > >> If the variable "text" contains this "_" I want to create an >> indicator >> > >> variable as shown below >> > >> >> > >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0) >> > >> >> > >> >> > >> Then I want to split that string in to two, before "_" and after "_" >> > >> and create two variables as shown below >> > >> x1= strsplit(as.character(F1$text),'_',2) >> > >> >> > >> My problem is how to combine this with the original data frame. The >> > >> desired output is shown below, >> > >> >> > >> >> > >> ID1 ID2 Y1 X1 X2 >> > >> A1 B1 0 NONE . >> > >> A1 B1 1 cf 12 >> > >> A1 B1 0 NONE . >> > >> A2 B2 1 X2 25 >> > >> A2 B3 1 fd 15 >> > >> >> > >> Any help? >> > >> Thank you. >> > >> >> > >> ______________________________________________ >> > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > >> https://stat.ethz.ch/mailman/listinfo/r-help >> > >> PLEASE do read the posting guide >> > >> http://www.R-project.org/posting-guide.html >> > >> and provide commented, minimal, self-contained, reproducible code. >> > >> >> > > >> > > ______________________________________________ >> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > > https://stat.ethz.ch/mailman/listinfo/r-help >> > > PLEASE do read the posting guide >> > > http://www.R-project.org/posting-guide.html >> > > and provide commented, minimal, self-contained, reproducible code. >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.