Hello V.
Because you speak of columns I assume you are handling an alignment,
right? If you handle an alignment all sequences have the same length and
you can do as.matrix
Like this?
library(magrittr)
#maximum number of n's
thresh <- 0.005 #0.5%
seq <- as.matrix(seq)
temp <- seq %>% sapply(.,grep,pattern="n") %>% unlist(.,use.names=F) %>%
table
seq[,-(names(temp)[which(temp/ncol(seq)>thresh)] %>% as.integer)]
Greetings,
Andreas
Am 2017-10-27 16:25, schrieb Vojtěch Zeisek:
Hello,
I checked ape::del.colgapsonly, ips::deleteGaps and
ips::deleteEmptyCells.
They delete columns containing missing values, but I need also to
delete
columns containing base "N" (all columns with amount of Ns over certain
threshold).
Actually, ips::deleteEmptyCells has option nset=c("-", "n", "?"), so it
is
suppose to remove columns/rows containing only the given characters,
but if I
use it and export data (ape::write.dna or ape::write.nexus.data), some
samples
consist only of N characters...
The DNAbin object being processed was originally imported from VCF
using vcfR
(read.vcfR(file="my.vcf") and converted: vcfR2DNAbin(x=myvcf,
consensus=TRUE,
extract.haps=FALSE, unphased_as_NA=FALSE)).
I checked source code of the above functions, but they seem to only
count NAs
and then drop respective columns. And as sequences in DNAbin are stored
in
binary format, I'm bit struggled here... :(
Any idea how to remove columns with given portion of "N" in sequences?
Sincerely,
V.
_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/
_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/