On 5/28/24 19:35, Hadley Wickham wrote:
Hi all, When I run the following code, R segfaults: text <- "×" srcfile <- srcfilecopy("test.r", text) parse(textConnection(text), srcfile = srcfile) It doesn't segfault if text is ASCII, or it's not wrapped in textConnection, or srcfile isn't set.
Thanks, this is because R parser doesn't support non-ASCII UTF-8 outside string literals and comments, plus a missing bounds check. The "correct" result should be an R error, which I get in a debug build.
The tokenizer ends up with a negative token and then when the parse data are being finalized, creating a table of token names, there is an out of bounds access (yytname array). Probably the check should go right away into the tokenizer.
Tomas
Hadley
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel