[R-sig-phylo] Removing columns containing "N" in DNA alignment

Vojtěch Zeisek Fri, 27 Oct 2017 07:25:35 -0700

Hello,
I checked ape::del.colgapsonly, ips::deleteGaps and ips::deleteEmptyCells. 
They delete columns containing missing values, but I need also to delete 
columns containing base "N" (all columns with amount of Ns over certain 
threshold).
Actually, ips::deleteEmptyCells has option nset=c("-", "n", "?"), so it is 
suppose to remove columns/rows containing only the given characters, but if I 
use it and export data (ape::write.dna or ape::write.nexus.data), some samples 
consist only of N characters...
The DNAbin object being processed was originally imported from VCF using vcfR 
(read.vcfR(file="my.vcf") and converted: vcfR2DNAbin(x=myvcf, consensus=TRUE, 
extract.haps=FALSE, unphased_as_NA=FALSE)).
I checked source code of the above functions, but they seem to only count NAs 
and then drop respective columns. And as sequences in DNAbin are stored in 
binary format, I'm bit struggled here... :(
Any idea how to remove columns with given portion of "N" in sequences?
Sincerely,
V.


-- 
Vojtěch Zeisek
https://trapa.cz/en/

Department of Botany, Faculty of Science
Charles University, Prague, Czech Republic
https://www.natur.cuni.cz/biology/botany/

Institute of Botany, Czech Academy of Sciences
Průhonice, Czech Republic
http://www.ibot.cas.cz/en/

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

[R-sig-phylo] Removing columns containing "N" in DNA alignment

Reply via email to