Thank you! That was a easy and fast solution! May I post a follow-up question? (I am not sure if this would rather should be posted as a new question, but I post it here and then I can re-post it if this is the wrong place to ask this). I am ever so grateful for your help! /Anna
######################### FOLLOW-UP QUESTION #################################### df1 <- data.frame(cbind(Identifier = c("M123.B23.VJHJ", "M123.B24.VJHJ", "M123.B23.VLKE", "M123.B23.HKJH", "M123.B24.LKJH"), Sequence = c("ATATATATATA", "ATATATATATA", "ATATAGCATATA", "ATATATAGGGTA", "ATCGCGCGAATA"))) # as a follow-up question: # How can I split the identifier in df1 above into several columns based on the # separating dots? The real data includes thousands of rows. # This is what I want it to look like in the end: df1_solution <- data.frame(cbind(Identifier1 = c("M123", "M123", "M123", "M123", "M123"), Identifier2 = c("B23", "B24", "B23", "B23", "B24"), Identifier3 = c("VJHJ", "VJHJ", "VLKE", "HKJH", "LKJH"), Sequence = c("ATATATATATA", "ATATATATATA", "ATATAGCATATA", "ATATATAGGGTA", "ATCGCGCGAATA"))) # I am very grateful for your help! I am no whiz at R and everything I know # is self-taught. Therefore, some basics can turn out to be quite some # obsatcles for me. # /Anna ><((((º>`•. . • `•. .• `•. . ><((((º>`•. . • `•. .• `•. .><((((º>`•. . • `•. .• >`•. .><((((º> Anna Zakrisson Braeunlich PhD student Department of Ecology, Environment and Plant Sciences Stockholm University Svante Arrheniusv. 21A SE-106 91 Stockholm Sweden/Sverige Lives in Berlin. For paper mail: Katzbachstr. 21 D-10965, Berlin Germany/Deutschland E-mail: anna.zakris...@su.se Tel work: +49-(0)3091541281 Mobile: +49-(0)15777374888 LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b ><((((º>`•. . • `•. .• `•. . ><((((º>`•. . • `•. .• `•. .><((((º>`•. . • `•. .• >`•. .><((((º> ________________________________________ From: Ista Zahn [istaz...@gmail.com] Sent: 13 October 2014 15:42 To: Anna Zakrisson Braeunlich Cc: r-help@r-project.org Subject: Re: [R] seqinr ?: Splitting a factor name into several columns. Dealing with metabarcoding data. Hi Anna, On Sun, Oct 12, 2014 at 3:24 AM, Anna Zakrisson Braeunlich <anna.zakris...@su.se> wrote: > Hi, > > I have a question how to split a factor name into different columns. I have > metabarcoding data and need to merge the FASTA-file with the taxonomy- and > counttable files (dataframes). To be able to do this merge, I need to isolate > the common identifier, that unfortunately is baked in with a lot of other > labels in the factor name eg: > sequence identifier: > M01271_77_000000000.A8J0P_1_1101_10150_1525.1.322519.sample_1.sample_2 > > I want to split this name at every "." to get several columns: > column1: M01271_77_000000000 > column2: A8J0P_1_1101_10150_1525 > column3: 1 > column4: 322519 > column5: sample_1 > column6: sample_2 > > I must add that I have no influence on how these names are given. This is how > thay are supplied from Illumina Miseq. I just need to be able to deal with it. > > Here is some extremely simplified dummy data to further show the issue at > hand: > > df1 <- data.frame(cbind(X = 1:10, Y = rnorm(10)), > Z.identifierA.B1298712 = factor(rep(LETTERS[1:2], each = > 5))) > df2 <- data.frame(cbind(B = 13:22, K = rnorm(10)), > Q.identifierA.B4668726 = factor(rep(LETTERS[1:2], each = > 5))) > > # I have metabarcoding data with one FASTA-file, one count table and one > taxonomy file > # Above dummy data is just showing the issue at hand. I want to be able to > merge my three > # original data frames (here, the dummy data is only two dataframes). The > problem is that > # the only identifier that is commmon for the dataframes is "hidden" in the > # factor name eg: Z.identifierA.1298712 and Q.identifierA.4668726. I hence > need to be able > # to split this name up into different columns to get "identifierA" alone as > one column name > # Then I can merge the dataframes. > # How can I do this in R. I know that it can be done in excel, but I would > like to > # produce a complete R-script to get a fast pipeline and avoid copy and paste > errors. > # This is what I want it to look: > > df1.goal <- data.frame(cbind(X = 1:10, Y = rnorm(10)), > Z = factor(rep(LETTERS[1:2], each = 5)), > identifierA = factor(rep(LETTERS[1:2], each = 5)), > B1298712 = factor(rep(LETTERS[1:2], each = 5))) Use strsplit to separate the components, something like separateNames <- strsplit(names(df1)[3], split = "\\.")[[1]] for(name in separateNames) { df1[[name]] <- df1[[3]] } df1[[3]] <- NULL Best, Ista > > # Many thank's and with kind regards > Anna Zakrisson > >><((((º>`•. . • `•. .• `•. . ><((((º>`•. . • `•. .• `•. .><((((º>`•. . • `•. >>.• `•. .><((((º> > > Anna Zakrisson Braeunlich > PhD student > > Department of Ecology, Environment and Plant Sciences > Stockholm University > Svante Arrheniusv. 21A > SE-106 91 Stockholm > Sweden/Sverige > > Lives in Berlin. > For paper mail: > Katzbachstr. 21 > D-10965, Berlin > Germany/Deutschland > > E-mail: anna.zakris...@su.se > Tel work: +49-(0)3091541281 > Mobile: +49-(0)15777374888 > LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b > >><((((º>`•. . • `•. .• `•. . ><((((º>`•. . • `•. .• `•. .><((((º>`•. . • `•. >>.• `•. .><((((º> > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.