Re: [Rd] Check for protection (was: table() and as.character() performance for logical values)

Paul McQuesten Fri, 11 Apr 2025 08:58:26 -0700

For a long-term horizon, would it help R developers to use a naming
convention?
Perhaps, varName_PROT, or the inverse varName_UNPROT?
Eventually, teach some linter about that?


On Fri, Apr 11, 2025 at 10:40 AM Duncan Murdoch <murdoch.dun...@gmail.com>
wrote:

> On a tangent from the main topic of this thread:  sometimes (especially
> to non-experts) it's not obvious whether a variable is protected or not.
>
> I don't think there's any easy way to determine that, but perhaps there
> should be.  Would it be possible to add a run-time test you could call
> in C code (e.g. is_protected(x)) that would do the same search the
> garbage collector does in order to determine if a particular pointer is
> protected?
>
> This would be an expensive operation, similar in cost to actually doing
> a garbage collection.  You wouldn't want to do it routinely, but it
> would be really helpful in debugging.
>
> Duncan Murdoch
>
> On 2025-04-11 6:05 a.m., Suharto Anggono Suharto Anggono via R-devel wrote:
> >   On second thought, I wonder if the caching in my changed
> 'StringFromLogical' in my previous message is safe. While 'ans' in the C
> function 'coerceToString' is protected, its element is also protected. If
> the object corresponding to 'ans' is then no longer protected, is it
> possible for the cached object 'TrueCh' or 'FalseCh' in 'StringFromLogical'
> to be garbage collected? If it is, I think of clearing the cache for each
> first filling. For example, by abusing 'warn' argument, the following is
> added to my changed 'StringFromLogical'.
> >
> >   if (*warn) TrueCh = FalseCh = NULL;
> >
> > Correspondingly, in 'coerceToString',
> >
> >   warn = i == 0;
> >
> > is inserted before
> >
> >   SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn));
> >
> > for LGLSXP case.
> >
> > ---------------------
> > On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler <
> maech...@stat.math.ethz.ch> wrote:
> >
> >
> >>>>>> Suharto Anggono Suharto Anggono via R-devel
> >>>>>>      on Thu, 10 Apr 2025 07:53:04 +0000 (UTC) writes:
> >
> >      > Chain of calls of C functions in coerce.c for
> as.character(<logical>) in R:
> >
> >      > do_asatomic
> >      > ascommon
> >      > coerceVector
> >      > coerceToString
> >      > StringFromLogical (for each element)
> >
> >      > The definition of 'StringFromLogical' in coerce.c :
> >
> >      > Chain of calls of C functions in coerce.c for
> as.character(<logical>) in R:
> >      >
> >      > do_asatomic
> >      > ascommon
> >      > coerceVector
> >      > coerceToString
> >      > StringFromLogical (for each element)
> >      >
> >      > The definition of 'StringFromLogical' in coerce.c :
> >      >
> >      > attribute_hidden SEXP StringFromLogical(int x, int *warn)
> >      > {
> >      >    int w;
> >      >    formatLogical(&x, 1, &w);
> >      >    if (x == NA_LOGICAL) return NA_STRING;
> >      >    else return mkChar(EncodeLogical(x, w));
> >      > }
> >      >
> >      > The definition of 'EncodeLogical' in printutils.c :
> >      >
> >      > const char *EncodeLogical(int x, int w)
> >      > {
> >      >    static char buff[NB];
> >      >    if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)),
> CHAR(R_print.na_string));
> >      >    else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
> >      >    else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
> >      >    buff[NB-1] = '\0';
> >      >    return buff;
> >      > }
> >      >
> >      > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
> >      > > system.time(as.character(L))
> >      >    user  system elapsed
> >      >    2.69    0.02    2.73
> >      > > system.time(c("FALSE", "TRUE")[L+1])
> >      >    user  system elapsed
> >      >    0.15    0.04    0.20
> >      > > system.time(c("FALSE", "TRUE")[L+1L])
> >      >    user  system elapsed
> >      >    0.08    0.05    0.13
> >      > > L <- rep(NA, 10^7)
> >      > > system.time(as.character(L))
> >      >    user  system elapsed
> >      >    0.11    0.00    0.11
> >      > > system.time(c("FALSE", "TRUE")[L+1])
> >      >    user  system elapsed
> >      >    0.16    0.06    0.22
> >      > > system.time(c("FALSE", "TRUE")[L+1L])
> >      >    user  system elapsed
> >      >    0.09    0.03    0.12
> >      >
> >      > `as.character` of a logical vector that is all NA is fast enough.
> >      > It appears that the call to 'formatLogical' inside > the C
> function
> >      > 'StringFromLogical' does not introduce much    > slowdown.
> >
> >
> >      > I found that using string literal inside the C function
> 'StringFromLogical', by replacing
> >      > EncodeLogical(x, w)
> >      > with
> >      > x ? "TRUE" : "FALSE"
> >      > (and the call to 'formatLogical' is not needed anymore), make it
> faster.
> >
> > indeed! ... and we also notice that the 'w' argument is neither
> > needed anymore, and that makes sense: At this point when you
> > know you have a an R logical value there are only three
> > possibilities and no reason ever to warn about the conversion.
> >
> >      > Alternatively,
> > or in addition !
> >
> >
> >      > "fast path" could be introduced in 'EncodeLogical', potentially
> also benefits format() in R.
> >      > For example, without replacing existing code, the following
> fragment could be inserted.
> >      >
> >      >    if(x == NA_LOGICAL) {if(w == R_print.na_width) return
> CHAR(R_print.na_string);}
> >      >    else if(x) {if(w == 4) return "TRUE";}
> >      >    else {if(w == 5) return "FALSE";}
> >      >
> >      > However, with either of them, c("FALSE", "TRUE")[L+1L] is still
> faster than as.character(L) .
> >      >
> >      > Precomputing or caching possible results of the C function
> 'StringFromLogical' allows as.character(L) to be as fast as c("FALSE",
> "TRUE")[L+1L] in R. For example, 'StringFromLogical' could be changed to
> >      >
> >      > attribute_hidden SEXP StringFromLogical(int x, int *warn)
> >      > {
> >      >    static SEXP TrueCh, FalseCh;
> >      >    if (x == NA_LOGICAL) return NA_STRING;
> >      >    else if (x) return TrueCh ? TrueCh : (TrueCh = mkChar("TRUE"));
> >      >    else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE"));
> >
> >      > }
> >
> > Indeed, and something along this line (storing the other two constant
> strings) was also
> > my thought when seeing the
> >    mkChar(x ? "TRUE" : "FALSE)
> > you implicitly proposed above.
> >
> > I'm looking into applying both speedups;
> > thank you very much, Suharto!
> >
> > Martin
> >
> >
> > --
> > Martin Maechler
> > ETH Zurich  and  R Core team
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Check for protection (was: table() and as.character() performance for logical values)

Reply via email to