Hi Michael,

I finally got to fix this in Biostrings 2.36.1 (release) and 2.37.2
(devel). Both should become available via biocLite() on Friday around
11am (Seattle time). Thanks for your patience.

Cheers,
H.


On 04/30/2015 10:50 AM, Hervé Pagès wrote:
Hi Michael,

Thanks for the reminder. I must confess this slipped out of my radar.
I'll look into it today and will let you know.

H.

On 04/30/2015 08:42 AM, Michael Stadler wrote:
Hi Herve,

I stumbled again over the 'memory not mapped' issue in trimLRpatterns
using updated versions of R and BioC-devel. I guess it does not hit
people very often, but I would highly appreciate if it could be fixed.

Many thanks,
Michael

PS: I can reproduce the issue using the code below under:
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
  [1] ShortRead_1.27.1        GenomicAlignments_1.5.1 Rsamtools_1.21.3
  [4] GenomicRanges_1.21.5    GenomeInfoDb_1.5.1      BiocParallel_1.3.4
  [7] Biostrings_2.37.0       XVector_0.9.0           IRanges_2.3.6
[10] S4Vectors_0.7.0         BiocGenerics_0.15.0     RColorBrewer_1.1-2

loaded via a namespace (and not attached):
  [1] lattice_0.20-31      bitops_1.0-6         grid_3.2.0
  [4] futile.options_1.0.0 zlibbioc_1.15.0      hwriter_1.3.2
  [7] latticeExtra_0.6-26  futile.logger_1.4.1  lambda.r_1.1.7
[10] Biobase_2.29.0


On 25.04.2014 13:11, Hervé Pagès wrote:
Hi Michael,

Thanks for the report. I'll look into this.

H.

On 04/22/2014 08:29 AM, Michael Stadler wrote:
Dear Herve,

We are hitting a 'memory not mapped' problem when using trimLRpatterns
as detailed below. I did not manage to reproduce it with few sequences,
so I have to refer to a publicly available sequence file with many
reads. Even then, it occasionally runs through without problems.

Also, our use-case may not be typical and be part of the problem -
maybe
the solution will be to change our use of trimLRpatterns.

Here is some code to illustrate/reproduce the problem:

library(Biostrings)
library(ShortRead)

Rpat <- "TGGAATTCTCGGGTGCCAAGG"
maxRmm <- rep(0:2, c(6,3,nchar(Rpat)-9))

fq1 <- DNAStringSet(c("AAAATGGAATTCTCGGGTGCCAAGG",
                        "AAAATGGAATTCTCGGGTGCCAAGGTTTT"))

# the second read is not trimmed because it runs through the adaptor
trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
#  A DNAStringSet instance of length 2
#    width seq
#[1]     4 AAAA
#[2]    28 AAAATGGAATTCTCGGGTGCCAAGGTTT

# as a workaround, we pad the adaptor with Ns and
# increase the mismatch tolerance
numNs <- 90
maxRmm <- c(maxRmm, 1:numNs+max(maxRmm))
Rpat <- paste(c(Rpat, rep("N", numNs)), collapse="")

# now, also the second read gets trimmed
trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
#  A DNAStringSet instance of length 2
#    width seq
#[1]     4 AAAA
#[2]     4 AAAA

# to trigger the segmentation fault, many reads are needed
download.file("ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz";,


"ERR000916_1.fastq.gz")
fq2 <- readFastq("ERR000916_1.fastq.gz")

fq3 <- trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm)
# *** caught segfault ***
#address 0x7f5109be4fed, cause 'memory not mapped'
#
#Traceback:
# 1: .Call(.NAME, ..., PACKAGE = PACKAGE)
# 2: .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at,
at.type, max.mismatch, min.mismatch, with.indels, fixed,     ans.type,
auto.reduce.pattern, PACKAGE = "Biostrings")
# 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch,
min.mismatch, with.indels, fixed, .to.ans.type(follow.index),
auto.reduce.pattern)
# 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
   ending.at = subject_width, max.mismatch = max.Rmismatch,
with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
# 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
   ending.at = subject_width, max.mismatch = max.Rmismatch,
with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
# 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels,
   Rfixed)
# 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject,
max.Lmismatch,     max.Rmismatch, with.Lindels, with.Rindels, Lfixed,
Rfixed,     ranges)
# 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
     max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
ranges
= TRUE)
# 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
     max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
ranges
= TRUE)
#10: eval(expr, envir, enclos)
#11: eval(call, sys.frame(sys.parent()))
#12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch,
max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,     ranges =
TRUE)
#13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
maxRmm,     with.Rindels = FALSE)
#14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
maxRmm,     with.Rindels = FALSE)

The problem did not occur in R 3.0.3 with BioC 2.13.
Do you have an idea what's wrong?

Thanks for your help,
Michael



sessionInfo()R version 3.1.0 (2014-04-10)
#Platform: x86_64-unknown-linux-gnu (64-bit)
#
#locale:
#[1] C
#
#attached base packages:
#[1] parallel  stats     graphics  grDevices utils     datasets
methods
#[8] base
#
#other attached packages:
# [1] ShortRead_1.22.0        GenomicAlignments_1.0.0 BSgenome_1.32.0

# [4] Rsamtools_1.16.0        GenomicRanges_1.16.0
GenomeInfoDb_1.0.1

# [7] BiocParallel_0.6.0      Biostrings_2.32.0       XVector_0.4.0

#[10] IRanges_1.22.2          BiocGenerics_0.10.0
RColorBrewer_1.0-5

#
#loaded via a namespace (and not attached):
# [1] BBmisc_1.5          BatchJobs_1.2       Biobase_2.24.0
# [4] DBI_0.2-7           RSQLite_0.11.4      Rcpp_0.11.1
# [7] bitops_1.0-6        brew_1.0-6          codetools_0.2-8
#[10] compiler_3.1.0      digest_0.6.4        fail_1.2
#[13] foreach_1.4.2       grid_3.1.0          hwriter_1.3
#[16] iterators_1.0.7     lattice_0.20-29     latticeExtra_0.6-26
#[19] plyr_1.8.1          sendmailR_1.1-2     stats4_3.1.0
#[22] stringr_0.6.2       tools_3.1.0         zlibbioc_1.10.0

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to