Hm, I tried a dirty hack: I exported the DNAbin object using ape::write.dna and replaced all occurrences of "n" in any sequence by "-" and imported the file back to R with ape::read.dna. Then I tried the mentioned functions. They did nothing. When I exported the file to disk, the FASTA file did not contain any "-", only "n". DO I do something wrong, or is there a bug in ape as it seems to confuse "n" and "-"? Sincerely, V.
Dne pátek 27. října 2017 16:25:02 CEST jste napsal(a): > Hello, > I checked ape::del.colgapsonly, ips::deleteGaps and ips::deleteEmptyCells. > They delete columns containing missing values, but I need also to delete > columns containing base "N" (all columns with amount of Ns over certain > threshold). > Actually, ips::deleteEmptyCells has option nset=c("-", "n", "?"), so it is > suppose to remove columns/rows containing only the given characters, but if > I use it and export data (ape::write.dna or ape::write.nexus.data), some > samples consist only of N characters... > The DNAbin object being processed was originally imported from VCF using > vcfR (read.vcfR(file="my.vcf") and converted: vcfR2DNAbin(x=myvcf, > consensus=TRUE, extract.haps=FALSE, unphased_as_NA=FALSE)). > I checked source code of the above functions, but they seem to only count > NAs and then drop respective columns. And as sequences in DNAbin are stored > in binary format, I'm bit struggled here... :( > Any idea how to remove columns with given portion of "N" in sequences? > Sincerely, > V. -- Vojtěch Zeisek https://trapa.cz/en/ Department of Botany, Faculty of Science Charles University, Prague, Czech Republic https://www.natur.cuni.cz/biology/botany/ Institute of Botany, Czech Academy of Sciences Průhonice, Czech Republic http://www.ibot.cas.cz/en/
Description: This is a digitally signed message part.
_______________________________________________ R-sig-phylo mailing list - Rfirstname.lastname@example.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://email@example.com/