Hello R-devel, I've been watching the development of automatic Rd bibliography generation with great interest and I'm looking forward to using \bibcitet{...} and \bibshow{*} in my packages. Currently, non-ASCII characters used in the citation keys prevent R from successfully compiling when the current locale encoding is unable to represent them:
% touch src/library/stats/man/factanal.Rd && LC_ALL=C make ... installing parsed Rd make[3]: Entering directory '.../src/library' base Error: factanal.Rd:99: (converted from warning) Could not find bibentries for the following keys: %s 'R:J<U+00F6>reskog:1963' Execution halted make[3]: *** [Makefile:76: stats.Rdts] Error 1 But as long as the locale encoding can represent the key, it's fine: % touch src/library/stats/man/factanal.Rd && \ LC_ALL=en_GB.iso885915 luit make (works well without a UTF-8 locale) I think this can be made to work by telling tools:::process_Rd() -> tools:::processRdChunk() to parse character strings in R code as UTF-8: Index: src/library/tools/R/RdConv2.R =================================================================== --- src/library/tools/R/RdConv2.R (revision 88617) +++ src/library/tools/R/RdConv2.R (working copy) @@ -229,8 +229,8 @@ code <- structure(code[tags != "COMMENT"], srcref = codesrcref) # retain for error locations chunkexps <- tryCatch( - parse(text = sub("\n$", "", as.character(code)), - keep.source = options$keep.source), + parse(text = sub("\n$", "", enc2utf8(as.character(code))), + keep.source = options$keep.source, encoding = "UTF-8"), error = function (e) stopRd(code, Rdfile, conditionMessage(e)) ) That enc2utf8() may be extraneous, since tools::parse_Rd() is documented to convert text to UTF-8 while parsing. The downsides are, of course, parse(encoding=...) not working with MBCS locales and the ever-present danger of breaking some user code that depends on the current behaviour (this was tested using 'make check-devel', not on CRAN packages). Should R compile under LC_ALL=C? Maybe it's time for people whose builds are failing to switch the continuous integration containers from C to C.UTF-8? -- Best regards, Ivan ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel