What is the delimiter is in the input data? Is it tab, space, etc? Is this going to be the same for the output data that you will use for R input?
LMH Val wrote: > Thank you all for the help! > > LMH, Yes I would like to see the alternative. I am using this for a > large data set and if the alternative is more efficient than this > then I would be happy. > > On Tue, Sep 22, 2020 at 6:25 PM Bert Gunter <bgunter.4...@gmail.com> wrote: >> >> To be clear, I think Rui's solution is perfectly fine and probably better >> than what I offer below. But just for fun, I wanted to do it without the >> lapply(). Here is one way. I think my comments suffice to explain. >> >>> ## which are the non "_" indices? >>> wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE) >>> ## paste "_." to these >>> F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_") >>> ## Now strsplit() and unlist() them to get a vector >>> z <- unlist(strsplit(F1$text, "_")) >>> ## now cbind() to the data frame >>> F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE)) >>> F1 >> ID1 ID2 text 1 2 >> 1 A1 B1 NONE_. NONE . >> 2 A1 B1 cf_12 cf 12 >> 3 A1 B1 NONE_. NONE . >> 4 A2 B2 X2_25 X2 25 >> 5 A2 B3 fd_15 fd 15 >>> ## You can change the names of the 2 columns yourself >> >> Cheers, >> Bert >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along and >> sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarra...@sapo.pt> wrote: >>> >>> Hello, >>> >>> A base R solution with strsplit, like in your code. >>> >>> F1$Y1 <- +grepl("_", F1$text) >>> >>> tmp <- strsplit(as.character(F1$text), "_") >>> tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x) >>> tmp <- do.call(rbind, tmp) >>> colnames(tmp) <- c("X1", "X2") >>> F1 <- cbind(F1[-3], tmp) # remove the original column >>> rm(tmp) >>> >>> F1 >>> # ID1 ID2 Y1 X1 X2 >>> #1 A1 B1 0 NONE . >>> #2 A1 B1 1 cf 12 >>> #3 A1 B1 0 NONE . >>> #4 A2 B2 1 X2 25 >>> #5 A2 B3 1 fd 15 >>> >>> >>> Note that cbind dispatches on F1, an object of class "data.frame". >>> Therefore it's the method cbind.data.frame that is called and the result >>> is also a df, though tmp is a "matrix". >>> >>> >>> Hope this helps, >>> >>> Rui Barradas >>> >>> >>> Às 20:07 de 22/09/20, Rui Barradas escreveu: >>>> Hello, >>>> >>>> Something like this? >>>> >>>> >>>> F1$Y1 <- +grepl("_", F1$text) >>>> F1 <- F1[c(1, 2, 4, 3)] >>>> F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill = >>>> "right") >>>> F1 >>>> >>>> >>>> Hope this helps, >>>> >>>> Rui Barradas >>>> >>>> Às 19:55 de 22/09/20, Val escreveu: >>>>> HI All, >>>>> >>>>> I am trying to create new columns based on another column string >>>>> content. First I want to identify rows that contain a particular >>>>> string. If it contains, I want to split the string and create two >>>>> variables. >>>>> >>>>> Here is my sample of data. >>>>> F1<-read.table(text="ID1 ID2 text >>>>> A1 B1 NONE >>>>> A1 B1 cf_12 >>>>> A1 B1 NONE >>>>> A2 B2 X2_25 >>>>> A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F) >>>>> If the variable "text" contains this "_" I want to create an indicator >>>>> variable as shown below >>>>> >>>>> F1$Y1 <- ifelse(grepl("_", F1$text),1,0) >>>>> >>>>> >>>>> Then I want to split that string in to two, before "_" and after "_" >>>>> and create two variables as shown below >>>>> x1= strsplit(as.character(F1$text),'_',2) >>>>> >>>>> My problem is how to combine this with the original data frame. The >>>>> desired output is shown below, >>>>> >>>>> >>>>> ID1 ID2 Y1 X1 X2 >>>>> A1 B1 0 NONE . >>>>> A1 B1 1 cf 12 >>>>> A1 B1 0 NONE . >>>>> A2 B2 1 X2 25 >>>>> A2 B3 1 fd 15 >>>>> >>>>> Any help? >>>>> Thank you. >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.