On Sun, 21 Dec 2025, Ivan Krylov via R-devel wrote:
Hello R-devel,
Some inputs cause sort(method="radix") to try to read vectors at index
-1, which is caught for character vectors on some builds that use clang
-fsanitize=address since r89198:
podman run --rm -it \
registry.gitlab.com/rdatatable/dockerfiles/r-devel-clang-san \
R -q -s -e "order(NA_character_, 'c', method = 'radix', na.last = NA)"
# Error in order(NA_character_, "c", method = "radix", na.last = NA) :
# attempt access index -1/1 in STRING_ELT
With a build configured with --enable-strict-barrier most base calls
will use the non-inlined version, so for my setup
luke@MacBook-Air-102 build% ../barrier/bin/R -q -s -e "order(NA_character_, 'c',
method = 'radix', na.last = NA)"
Error in order(NA_character_, "c", method = "radix", na.last = NA) :
attempt access index -1/1 in STRING_ELT
Execution halted
Since savetl_end() did not run, some CHARSXPs retain their altered
TRUELENGTHs. The R session is then likely to crash when it tries to
read a negative-numbered hash bucket (usually during install() while
lazy-loading bytecode for another function call, e.g., when wrapping
the order() call in try()).
This seems to be a matter of catching elements already sorted as NA on
a previous pass:
Index: src/main/radixsort.c
===================================================================
--- src/main/radixsort.c (revision 89211)
+++ src/main/radixsort.c (working copy)
@@ -1766,7 +1766,9 @@
// this edge case had to be taken care of
// here.. (see the bottom of this file for
// more explanation)
- switch (TYPEOF(x)) {
+ if (o[i] == 0) { // already sorted as NA
+ isSorted = false;
+ } else switch (TYPEOF(x)) {
case INTSXP:
if (INTEGER(x)[o[i] - 1] == NA_INTEGER) {
isSorted = false;
I don't entirely understand what causes src/main/radixsort.c to call
the non-inlined version of STRING_ELT in some cases.
`inline` is only a hint to the compiler; some compilers ignore the
hint more often than others.
This code was originally contributed by data.table. I believe Michael
Lawrence handled the integration at the time. There were a number of
issues like this early on that were resolved on the R side and I
believe contributed back to data.table. If you have the energy it
might be good to compare the two now and see if there are things that
should be ported from one to the other.
Best,
luke
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: [email protected]
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel