Hello All, I have a COI FASTA alignment file with 11926 sequences spanning 668 species.
According to my code, a total of 361 species are represented by 6 or more sequences. I need to extract these 361 species (along with all their associated sequences) from the alignment but am having issues. Here is my code: library(ape) library(pegas) library(spider) library(stringr) seqs <- read.dna(file = file.choose(), format = "fasta") # import data and convert to matrix seqs.mat <- as.matrix(seqs) spp <- substr(dimnames(seqs.mat)[[1]], 1, 50) # extract sequence labels res <- str_remove(spp, "^[^|]+\\|") # remove BOLD Process ID res <- table(res)[which(table(res) >= 6)] # species with 6 or more records names(res) # 361 species names The problem is that `names(res)' contains unique species names.rather than repeated species names (according to the BOLD process ID). I also need the sequences. There are between 6-421 sequences across the 361 species (11143 sequences total). Does anyone have any ideas on next steps here? Once this is done, I would then write the sequences to a file via `write.dna()` If clarification is needed, please let me know. Thanks. Cheers, Jarrett [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/