Hello V.
Because you speak of columns I assume you are handling an alignment, right? If you handle an alignment all sequences have the same length and you can do as.matrix

Like this?

library(magrittr)
#maximum number of n's
thresh <- 0.005  #0.5%
seq <- as.matrix(seq)
temp <- seq %>% sapply(.,grep,pattern="n") %>% unlist(.,use.names=F) %>% table
seq[,-(names(temp)[which(temp/ncol(seq)>thresh)] %>% as.integer)]

Greetings,
Andreas


Am 2017-10-27 16:25, schrieb Vojtěch Zeisek:
Hello,
I checked ape::del.colgapsonly, ips::deleteGaps and ips::deleteEmptyCells. They delete columns containing missing values, but I need also to delete
columns containing base "N" (all columns with amount of Ns over certain
threshold).
Actually, ips::deleteEmptyCells has option nset=c("-", "n", "?"), so it is suppose to remove columns/rows containing only the given characters, but if I use it and export data (ape::write.dna or ape::write.nexus.data), some samples
consist only of N characters...
The DNAbin object being processed was originally imported from VCF using vcfR (read.vcfR(file="my.vcf") and converted: vcfR2DNAbin(x=myvcf, consensus=TRUE,
extract.haps=FALSE, unphased_as_NA=FALSE)).
I checked source code of the above functions, but they seem to only count NAs and then drop respective columns. And as sequences in DNAbin are stored in
binary format, I'm bit struggled here... :(
Any idea how to remove columns with given portion of "N" in sequences?
Sincerely,
V.

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to