[Rd] Non-ASCII citation keys prevent compiling with LC_ALL=C

Ivan Krylov via R-devel Sat, 16 Aug 2025 13:00:21 -0700

Hello R-devel,

I've been watching the development of automatic Rd bibliography
generation with great interest and I'm looking forward to using
\bibcitet{...} and \bibshow{*} in my packages. Currently, non-ASCII
characters used in the citation keys prevent R from successfully
compiling when the current locale encoding is unable to represent them:


% touch src/library/stats/man/factanal.Rd && LC_ALL=C make
...
installing parsed Rd
make[3]: Entering directory '.../src/library'
  base
Error: factanal.Rd:99: (converted from warning) Could not find
bibentries for the following keys: %s
  'R:J<U+00F6>reskog:1963'
Execution halted
make[3]: *** [Makefile:76: stats.Rdts] Error 1

But as long as the locale encoding can represent the key, it's fine:

% touch src/library/stats/man/factanal.Rd && \
 LC_ALL=en_GB.iso885915 luit make
(works well without a UTF-8 locale)

I think this can be made to work by telling tools:::process_Rd() ->
tools:::processRdChunk() to parse character strings in R code as UTF-8:

Index: src/library/tools/R/RdConv2.R
===================================================================
--- src/library/tools/R/RdConv2.R       (revision 88617)
+++ src/library/tools/R/RdConv2.R       (working copy)
@@ -229,8 +229,8 @@
        code <- structure(code[tags != "COMMENT"],
                          srcref = codesrcref) # retain for error locations
        chunkexps <- tryCatch(
-           parse(text = sub("\n$", "", as.character(code)),
-                 keep.source = options$keep.source),
+           parse(text = sub("\n$", "", enc2utf8(as.character(code))),
+                 keep.source = options$keep.source, encoding = "UTF-8"),
            error = function (e) stopRd(code, Rdfile, conditionMessage(e))
        )
 
That enc2utf8() may be extraneous, since tools::parse_Rd() is
documented to convert text to UTF-8 while parsing. The downsides are,
of course, parse(encoding=...) not working with MBCS locales and the
ever-present danger of breaking some user code that depends on the
current behaviour (this was tested using 'make check-devel', not on
CRAN packages).

Should R compile under LC_ALL=C? Maybe it's time for people whose
builds are failing to switch the continuous integration containers from
C to C.UTF-8?

-- 
Best regards,
Ivan

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Non-ASCII citation keys prevent compiling with LC_ALL=C

Reply via email to