Re: [R-sig-phylo] Removing columns containing "N" in DNA alignment

Andreas Kolter Fri, 27 Oct 2017 08:03:30 -0700

Hello V.

Because you speak of columns I assume you are handling an alignment,right? If you handle an alignment all sequences have the same length andyou can do as.matrix


Like this?

library(magrittr)
#maximum number of n's
thresh <- 0.005  #0.5%
seq <- as.matrix(seq)

temp <- seq %>% sapply(.,grep,pattern="n") %>% unlist(.,use.names=F) %>%table

seq[,-(names(temp)[which(temp/ncol(seq)>thresh)] %>% as.integer)]

Greetings,
Andreas


Am 2017-10-27 16:25, schrieb Vojtěch Zeisek:

Hello,
I checked ape::del.colgapsonly, ips::deleteGaps andips::deleteEmptyCells.They delete columns containing missing values, but I need also todelete
columns containing base "N" (all columns with amount of Ns over certain
threshold).
Actually, ips::deleteEmptyCells has option nset=c("-", "n", "?"), so itissuppose to remove columns/rows containing only the given characters,but if Iuse it and export data (ape::write.dna or ape::write.nexus.data), somesamples
consist only of N characters...
The DNAbin object being processed was originally imported from VCFusing vcfR(read.vcfR(file="my.vcf") and converted: vcfR2DNAbin(x=myvcf,consensus=TRUE,
extract.haps=FALSE, unphased_as_NA=FALSE)).
I checked source code of the above functions, but they seem to onlycount NAsand then drop respective columns. And as sequences in DNAbin are storedin
binary format, I'm bit struggled here... :(
Any idea how to remove columns with given portion of "N" in sequences?
Sincerely,
V.

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive athttp://www.mail-archive.com/r-sig-phylo@r-project.org/


_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Removing columns containing "N" in DNA alignment

Reply via email to