Re: [Rd] Crash/bug when calling match on latin1 strings
> Rui Barradas > on Mon, 11 Oct 2021 07:41:51 +0100 writes: > Hello, > R 4.1.1 on Ubuntu 20.04. > I can reproduce this error but not ~90% of the time, only the 1st time I > run the script. > If I run other (terminal) commands before rerunning the R script it > sometimes segfaults again but once again very far from 90% of the time. > rui@rui:~/tmp$ R -q -f rhelp.R >> sessionInfo() > R version 4.1.1 (2021-08-10) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 20.04.3 LTS > Matrix products: default > BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 > LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 > locale: > [1] LC_CTYPE=pt_PT.UTF-8 LC_NUMERIC=C > [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8 > [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8 > [7] LC_PAPER=pt_PT.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C > attached base packages: > [1] stats graphics grDevices utils datasets methods base > loaded via a namespace (and not attached): > [1] compiler_4.1.1 >> >> # A bunch of words in UTF8; replace *'s >> words <- readLines("h://pastebin.c**/raw/MFCQfhpY", encoding = > "UTF-8") >> words2 <- iconv(words, "utf-8", "latin1") >> gctorture(TRUE) >> y <- match(words2, words2) > *** caught segfault *** > address 0x10, cause 'memory not mapped' > *** recursive gc invocation > *** recursive gc invocation > *** recursive gc invocation > *** recursive gc invocation > *** recursive gc invocation > *** recursive gc invocation > *** recursive gc invocation > *** recursive gc invocation > *** recursive gc invocation > *** recursive gc invocation > Traceback: > 1: match(words2, words2) > An irrecoverable exception occurred. R is aborting now ... > Falta de segmentação (núcleo despejado) > This last line is Portuguese for > Segmentation fault (core dumped) > Hope this helps, Yes, it does, thank you! I can confirm the problem: Only in R 4.1.0 and newer, and including current "R-patched" and "R-devel" versions. I've now turned this into a formal R bug report on R's bugzilla, and (slightly) extended your (Travers') example into self contained (no internet access) R script. Bugzilla PR#18211 :" match() memory corruption " https://bugs.r-project.org/show_bug.cgi?id=18211 with attachment 2929 --> https://bugs.r-project.org/attachment.cgi?id=2929=edit ==> please if possible follow up on bugzilla Thanks again to you both! Martin Maechler > Rui Barradas > Às 06:05 de 11/10/21, Travers Ching escreveu: >> Here's a brief example: >> >> # A bunch of words in UTF8; replace *'s >> words <- readLines("h://pastebin.c**/raw/MFCQfhpY", encoding = "UTF-8") >> words2 <- iconv(words, "utf-8", "latin1") >> gctorture(TRUE) >> y <- match(words2, words2) >> >> >> I searched bugzilla but didn't see anything. Apologies if this is already >> reported. >> >> The bug appears in both R-devel and the release, but doesn't seem to affect >> R 4.0.5. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Crash/bug when calling match on latin1 strings
Hello, R 4.1.1 on Ubuntu 20.04. I can reproduce this error but not ~90% of the time, only the 1st time I run the script. If I run other (terminal) commands before rerunning the R script it sometimes segfaults again but once again very far from 90% of the time. rui@rui:~/tmp$ R -q -f rhelp.R > sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.3 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=pt_PT.UTF-8 LC_NUMERIC=C [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8 [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8 [7] LC_PAPER=pt_PT.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.1.1 > > # A bunch of words in UTF8; replace *'s > words <- readLines("h://pastebin.c**/raw/MFCQfhpY", encoding = "UTF-8") > words2 <- iconv(words, "utf-8", "latin1") > gctorture(TRUE) > y <- match(words2, words2) *** caught segfault *** address 0x10, cause 'memory not mapped' *** recursive gc invocation *** recursive gc invocation *** recursive gc invocation *** recursive gc invocation *** recursive gc invocation *** recursive gc invocation *** recursive gc invocation *** recursive gc invocation *** recursive gc invocation *** recursive gc invocation Traceback: 1: match(words2, words2) An irrecoverable exception occurred. R is aborting now ... Falta de segmentação (núcleo despejado) This last line is Portuguese for Segmentation fault (core dumped) Hope this helps, Rui Barradas Às 06:05 de 11/10/21, Travers Ching escreveu: Here's a brief example: # A bunch of words in UTF8; replace *'s words <- readLines("h://pastebin.c**/raw/MFCQfhpY", encoding = "UTF-8") words2 <- iconv(words, "utf-8", "latin1") gctorture(TRUE) y <- match(words2, words2) I searched bugzilla but didn't see anything. Apologies if this is already reported. The bug appears in both R-devel and the release, but doesn't seem to affect R 4.0.5. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Crash/bug when calling match on latin1 strings
Here's a brief example: # A bunch of words in UTF8; replace *'s words <- readLines("h://pastebin.c**/raw/MFCQfhpY", encoding = "UTF-8") words2 <- iconv(words, "utf-8", "latin1") gctorture(TRUE) y <- match(words2, words2) I searched bugzilla but didn't see anything. Apologies if this is already reported. The bug appears in both R-devel and the release, but doesn't seem to affect R 4.0.5. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel