Hello,

I have a FASTQ file from which I would like to extract only the FASTA
sequences (and not the associated PHRED quality scores).

For instance, using the file mentioned in the R documentation for
read.fastq():

library(ape)

a <- "
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/sequence_read/
"

file <- "SRR062641.filt.fastq.gz"
URL <- paste0(a, file)
download.file(URL, file)

fastq <- read.fastq(file) # import FASTQ file

One can easily extract the quality scores via

qual <- attr(fastq, "QUAL")

but there doesn't appear to be an attribute associated with the DNA
sequences themselves (without the scores appended); only a "class"
attribute, which evaluates to "DNAbin", is available.

I've examined the underlying code for read.fastq() and there is a variable
called DNA that is created, but it's not what I need:

function (file, offset = -33)
{
    Z <- scan(file, "", sep = "\n", quiet = TRUE)
    tmp <- Z[c(TRUE, TRUE, FALSE, FALSE)]
    sel <- c(TRUE, FALSE)
    tmp[sel] <- gsub("^@", ">", tmp[sel])
    fl <- tempfile()
    cat(tmp, file = fl, sep = "\n")
    DNA <- read.FASTA(fl)
    tmp <- Z[c(FALSE, FALSE, FALSE, TRUE)]
    QUAL <- lapply(tmp, function(x) as.integer(charToRaw(x)))
    if (offset)
        QUAL <- lapply(QUAL, "+", offset)
    names(QUAL) <- names(DNA)
    attr(DNA, "QUAL") <- QUAL
    DNA
}

I then tried the following:

x <- as.character.DNAbin(fastq) # extract DNA sequences
y <- as.alignment(x)

seqs <- y$seq

but 'seqs' can't be written to a file using write.FASTA().


Any assistance is greatly appreciated.

Thanks!

Cheers,

Jarrett

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

Reply via email to