Hello, I have a FASTQ file from which I would like to extract only the FASTA sequences (and not the associated PHRED quality scores).
For instance, using the file mentioned in the R documentation for read.fastq(): library(ape) a <- " https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/sequence_read/ " file <- "SRR062641.filt.fastq.gz" URL <- paste0(a, file) download.file(URL, file) fastq <- read.fastq(file) # import FASTQ file One can easily extract the quality scores via qual <- attr(fastq, "QUAL") but there doesn't appear to be an attribute associated with the DNA sequences themselves (without the scores appended); only a "class" attribute, which evaluates to "DNAbin", is available. I've examined the underlying code for read.fastq() and there is a variable called DNA that is created, but it's not what I need: function (file, offset = -33) { Z <- scan(file, "", sep = "\n", quiet = TRUE) tmp <- Z[c(TRUE, TRUE, FALSE, FALSE)] sel <- c(TRUE, FALSE) tmp[sel] <- gsub("^@", ">", tmp[sel]) fl <- tempfile() cat(tmp, file = fl, sep = "\n") DNA <- read.FASTA(fl) tmp <- Z[c(FALSE, FALSE, FALSE, TRUE)] QUAL <- lapply(tmp, function(x) as.integer(charToRaw(x))) if (offset) QUAL <- lapply(QUAL, "+", offset) names(QUAL) <- names(DNA) attr(DNA, "QUAL") <- QUAL DNA } I then tried the following: x <- as.character.DNAbin(fastq) # extract DNA sequences y <- as.alignment(x) seqs <- y$seq but 'seqs' can't be written to a file using write.FASTA(). Any assistance is greatly appreciated. Thanks! Cheers, Jarrett [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/[email protected]/
