On 4/11/25 16:23, Suharto Anggono Suharto Anggono via R-devel wrote:
  Alternative revision:

Added to my changed 'StringFromLogical':
#define CACHE 16
if (!(*warn & CACHE)) {TrueCh = FalseCh = NULL; *warn |= CACHE;}

No change to 'coerceToString' and 'coerceToSymbol'.

--------------
On Friday, 11 April 2025 at 08:02:58 pm GMT+7, Suharto Anggono Suharto Anggono 
<suharto_angg...@yahoo.com> wrote:


Oh, with the abuse of 'warn' in my previous message, warning would be issued if 
the input 'v' of 'coerceToString' is a logical vector of length 1.

Revision:

Added to my changed 'StringFromLogical':
if (*warn) {TrueCh = FalseCh = NULL; *warn = 0;}

'coerceToString': insert
if (i == 0) warn = 1;
for LGLSXP case or initialize 'warn' to 16

'coerceToSymbol': insert
warn = 1;
for LGLSXP case or initialize 'warn' to 16


Another way is following the approach of caching in ''StringFromInteger'.

--------------
On Friday, 11 April 2025 at 05:05:30 pm GMT+7, Suharto Anggono Suharto Anggono 
<suharto_angg...@yahoo.com> wrote:


On second thought, I wonder if the caching in my changed 'StringFromLogical' in 
my previous message is safe. While 'ans' in the C function 'coerceToString' is 
protected, its element is also protected. If the object corresponding to 'ans' 
is then no longer protected, is it possible for the cached object 'TrueCh' or 
'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of 
clearing the cache for each first filling. For example, by abusing 'warn' 
argument, the following is added to my changed 'StringFromLogical'.

If this is the caching you had in mind:

    > attribute_hidden SEXP StringFromLogical(int x, int *warn)
    > {
    >    static SEXP TrueCh, FalseCh;
    >    if (x == NA_LOGICAL) return NA_STRING;
    >    else if (x) return TrueCh ? TrueCh : (TrueCh = mkChar("TRUE"));
    >    else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE"));

that is really a protection error. StringFromLogical() should make sure that TrueCh, FalseCh will be protected as long as recorded in the static field. PreserveObject() would be a natural function for this.

Best
Tomas

if (*warn) TrueCh = FalseCh = NULL;

Correspondingly, in 'coerceToString',

warn = i == 0;

is inserted before

SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn));

for LGLSXP case.

---------------------
On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler 
<maech...@stat.math.ethz.ch> wrote:


Suharto Anggono Suharto Anggono via R-devel
     on Thu, 10 Apr 2025 07:53:04 +0000 (UTC) writes:
     > Chain of calls of C functions in coerce.c for as.character(<logical>) in 
R:

     > do_asatomic
     > ascommon
     > coerceVector
     > coerceToString
     > StringFromLogical (for each element)

     > The definition of 'StringFromLogical' in coerce.c :

     > Chain of calls of C functions in coerce.c for as.character(<logical>) in 
R:
     >
     > do_asatomic
     > ascommon
     > coerceVector
     > coerceToString
     > StringFromLogical (for each element)
     >
     > The definition of 'StringFromLogical' in coerce.c :
     >
     > attribute_hidden SEXP StringFromLogical(int x, int *warn)
     > {
     >    int w;
     >    formatLogical(&x, 1, &w);
     >    if (x == NA_LOGICAL) return NA_STRING;
     >    else return mkChar(EncodeLogical(x, w));
     > }
     >
     > The definition of 'EncodeLogical' in printutils.c :
     >
     > const char *EncodeLogical(int x, int w)
     > {
     >    static char buff[NB];
     >    if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), 
CHAR(R_print.na_string));
     >    else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
     >    else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
     >    buff[NB-1] = '\0';
     >    return buff;
     > }
     >
     > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
     > > system.time(as.character(L))
     >    user  system elapsed
     >    2.69    0.02    2.73
     > > system.time(c("FALSE", "TRUE")[L+1])
     >    user  system elapsed
     >    0.15    0.04    0.20
     > > system.time(c("FALSE", "TRUE")[L+1L])
     >    user  system elapsed
     >    0.08    0.05    0.13
     > > L <- rep(NA, 10^7)
     > > system.time(as.character(L))
     >    user  system elapsed
     >    0.11    0.00    0.11
     > > system.time(c("FALSE", "TRUE")[L+1])
     >    user  system elapsed
     >    0.16    0.06    0.22
     > > system.time(c("FALSE", "TRUE")[L+1L])
     >    user  system elapsed
     >    0.09    0.03    0.12
     >
     > `as.character` of a logical vector that is all NA is fast enough.
     > It appears that the call to 'formatLogical' inside > the C function
     > 'StringFromLogical' does not introduce much    > slowdown.


     > I found that using string literal inside the C function 
'StringFromLogical', by replacing
     > EncodeLogical(x, w)
     > with
     > x ? "TRUE" : "FALSE"
     > (and the call to 'formatLogical' is not needed anymore), make it faster.

indeed! ... and we also notice that the 'w' argument is neither
needed anymore, and that makes sense: At this point when you
know you have a an R logical value there are only three
possibilities and no reason ever to warn about the conversion.

     > Alternatively,
or in addition !


     > "fast path" could be introduced in 'EncodeLogical', potentially also 
benefits format() in R.
     > For example, without replacing existing code, the following fragment 
could be inserted.
     >
     >    if(x == NA_LOGICAL) {if(w == R_print.na_width) return 
CHAR(R_print.na_string);}
     >    else if(x) {if(w == 4) return "TRUE";}
     >    else {if(w == 5) return "FALSE";}
     >
     > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster 
than as.character(L) .
     >
     > Precomputing or caching possible results of the C function 'StringFromLogical' allows 
as.character(L) to be as fast as c("FALSE", "TRUE")[L+1L] in R. For example, 
'StringFromLogical' could be changed to
     >
     > attribute_hidden SEXP StringFromLogical(int x, int *warn)
     > {
     >    static SEXP TrueCh, FalseCh;
     >    if (x == NA_LOGICAL) return NA_STRING;
     >    else if (x) return TrueCh ? TrueCh : (TrueCh = mkChar("TRUE"));
     >    else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE"));

     > }

Indeed, and something along this line (storing the other two constant strings) 
was also
my thought when seeing the
   mkChar(x ? "TRUE" : "FALSE)
you implicitly proposed above.

I'm looking into applying both speedups;
thank you very much, Suharto!

Martin


--
Martin Maechler
ETH Zurich  and  R Core team
        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to