On 5/30/24 09:29, Barry Rowlingson wrote:
I get an R error and no segfault:

> parse(textConnection(text), srcfile = srcfile)
Error in parse(textConnection(text), srcfile = srcfile) :
  test.r:1:1: unexpected $end
1: ×
    ^

This is R 4.3.0, so maybe the bug has been introduced since then...

Thanks, am looking into it and have found the cause, now testing a patch. The bug has been in the code for a long time, but whether it causes a crash or not is non-deterministic, depending on memory layout and content (out of bounds access).

Tomas


Version and system info:

> version
               _
platform       x86_64-pc-linux-gnu
arch           x86_64
os             linux-gnu
system         x86_64, linux-gnu
status
major          4
minor          3.0
year           2023
month          04
day            21
svn rev        84292
language       R
version.string R version 4.3.0 (2023-04-21)
nickname       Already Tomorrow

> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so <http://libopenblasp-r0.3.20.so>;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.0

On Tue, May 28, 2024 at 7:42 PM Tomas Kalibera <tomas.kalib...@gmail.com> wrote:

    This email originated outside the University. Check before
    clicking links or attachments.

    On 5/28/24 19:35, Hadley Wickham wrote:
    > Hi all,
    >
    > When I run the following code, R segfaults:
    >
    > text <- "×"
    > srcfile <- srcfilecopy("test.r", text)
    > parse(textConnection(text), srcfile = srcfile)
    >
    > It doesn't segfault if text is ASCII, or it's not wrapped in
    > textConnection, or srcfile isn't set.

    Thanks, this is because R parser doesn't support non-ASCII UTF-8
    outside
    string literals and comments, plus a missing bounds check. The
    "correct"
    result should be an R error, which I get in a debug build.

    The tokenizer ends up with a negative token and then when the
    parse data
    are being finalized, creating a table of token names, there is an
    out of
    bounds access (yytname array). Probably the check should go right away
    into the tokenizer.

    Tomas

    >
    > Hadley
    >

    ______________________________________________
    R-devel@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to