Hi Jarrett, ----- Le 28 Juin 21, à 5:08, Jarrett Phillips phillipsjarre...@gmail.com a écrit : > Hello All, > > I have a COI FASTA alignment file with 11926 sequences spanning 668 species. > > According to my code, a total of 361 species are represented by 6 or more > sequences. > > I need to extract these 361 species (along with all their associated > sequences) from the alignment but am having issues. > > Here is my code: > > library(ape) > library(pegas) > library(spider) > library(stringr) > > seqs <- read.dna(file = file.choose(), format = "fasta") # import data > and convert to matrix > seqs.mat <- as.matrix(seqs) > > spp <- substr(dimnames(seqs.mat)[[1]], 1, 50) # extract sequence labels > > res <- str_remove(spp, "^[^|]+\\|") # remove BOLD Process ID
I suggest you store the output of table() in a distinct object so you keep 'res': tab.res <- tab(res) Besides, table() can be a bit long so it's better to avoid duplicating its call. Then you can do: sel <- tab.res >= 6 species.to.keep <- names(tab.res)[sel] # which() is not necessary here length(species.to.keep) # should be 361 i <- which(res %in% species.to.keep) write.dna(seqs[i, ], ..... This way, you can change your selection criterion easily. Suppose you want to select the species with at least 5 sequences but no more than 10, then you'd only need to change the first line: sel <- tab.res >= 5 & tab.res <= 10 HTH. Cheers, Emmanuel > res <- table(res)[which(table(res) >= 6)] # species with 6 or more > records > names(res) # 361 species names > > > The problem is that `names(res)' contains unique species names.rather than > repeated species names (according to the BOLD process ID). > > I also need the sequences. There are between 6-421 sequences across the 361 > species (11143 sequences total). > > Does anyone have any ideas on next steps here? > > Once this is done, I would then write the sequences to a file via > `write.dna()` > > If clarification is needed, please let me know. > > > Thanks. > > Cheers, > > Jarrett > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/