Thanks; I've committed Ivan'x fix to R-devel in r89248.
Best,
luke
On Sat, 27 Dec 2025, Benjamin Schwendinger wrote:
Hello! data.table diverged over 7 years ago from this code when Matt
rewrote forder.
This code was originally contributed by data.table. I believe Michael
Lawrence handled the integration at the time. There were a number of
issues like this early on that were resolved on the R side and I
believe contributed back to data.table. If you have the energy it
might be good to compare the two now and see if there are things that
should be ported from one to the other.
The original code had the following note which is also mentioned
multiple times but not explicitly part of radixsort.c
Note on challenges implementing `nalast=NA` so that things are under the same
perspective when looked at later. To go to the point, there are four challenges
in implementing `nalast=NA` with the current form of forder.c (which is
excellent!):
1) Handling the i|d|csorted routines for nalast=NA:
The current implementation of nalast=NA in *sorted routine is that if all
values are NAs, then it returns -2, a different value, so that all values of
o/osub can be replacedw with 0. If any values are NA, then we assume that this
vector is unsorted and return 0 to be taken care of by the corresponding sort
routine. Now, this may very well be questioned/improved. For ex: we could
detect if the vector is sorted and also if the first value is NA, and if so,
get the group size of these NAs and replace that many indices with 0. This is
not yet done, but could very well be implemented later. If there are no NAs in
the vector, then things are the same.
2) Handling cases where n==1 and n==2:
The current sorting routines don't take care of n<=2, and rightly so, as they can be dealt with
directly. But it's not the case for nalast=NA because, when n==2 and all are NAs, point 1 will handle it. But
when n==1 and it is NA and the column is not the first column to order upon, with the current implementation,
"thisgrpn" is checked for "== 1" and if so, just pushes that group size and continues.. But
we've to replace it with 0 in our case. So, nalast==0 is checked for inside that if statement and if so, if the
value is NA is checked for all types and if so, replaced with 0. For n==2 and "any" is NA, we've to
introduce a special case for this in each sort routine to take care of and not result in an error (which tells
this should be already taken care of in *sorted).
3) o[0] and newo[0] = -1:
Since nalast already populates NAs with 0 values in o, we can't use the
old value of 0 to check if it's already populated or not. Therefore this has
been changed to -1.
4) default cases are untouched as much as possible:
In order to not compromise in speed for default case (`setkey`), iinsert and
dinsert are not touched at all, which means, we've to make sure that they are not
run when n<200 and nalast==0, as they don't replace o[x=NA] with 0. And,
'icount', 'iradix', 'dradix' are modified to take care of nalast=NA, to my
knowledge.
However, case 2) does not cater for Ivans identified corner case
example of order(NA_character_, "c", method = "radix", na.last = NA),
which is not tied to n==1 or n==2 but rather to thisgrpn==1 and
x=c(NA_character_, another_string).
Hence, the missing case is indeed to cater for o[i] = 0 with Ivans
1-liner or more explicitly via
--- src/main/radixsort.c
@@ -1766,7 +1766,12 @@
// this edge case had to be taken care of
// here.. (see the bottom of this file for
// more explanation)
- switch (TYPEOF(x)) {
+ if (o[i] == 0) {
+ isSorted = false;
+ i++;
+ push(1);
+ continue;
+ } switch (TYPEOF(x)) {
case INTSXP:
if (INTEGER(x)[o[i] - 1] == NA_INTEGER) {
isSorted = false;
Best, Ben
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: [email protected]
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel