>>>>> Ivan Krylov via R-devel writes: > Hello R-devel, > I've been watching the development of automatic Rd bibliography > generation with great interest and I'm looking forward to using > \bibcitet{...} and \bibshow{*} in my packages.
Thanks! :-) > Currently, non-ASCII characters used in the citation keys prevent R > from successfully compiling when the current locale encoding is unable > to represent them: > % touch src/library/stats/man/factanal.Rd && LC_ALL=C make > ... > installing parsed Rd > make[3]: Entering directory '.../src/library' > base > Error: factanal.Rd:99: (converted from warning) Could not find > bibentries for the following keys: %s > 'R:J<U+00F6>reskog:1963' > Execution halted > make[3]: *** [Makefile:76: stats.Rdts] Error 1 > But as long as the locale encoding can represent the key, it's fine: > % touch src/library/stats/man/factanal.Rd && \ > LC_ALL=en_GB.iso885915 luit make > (works well without a UTF-8 locale) Oh dear. I thought we have coverage for this from building daily snapshots with LC_ALL=C, but apparently not. There were 10 non-ASCII keys so far: I have for now changed them to all ASCII. But clearly, when a package declares its Rd files to be in UTF-8 one would expect that Sexpr macros can also take UTF-8, but that's not so simple given that it involves calling the R parser. Your suggested change looks good to me: non-UTF-8 MBCS locales have a problem with parse(encoding = "UTF-8"), but I don't think we have real coverage for these. (Afaic, in principle, it might be nice to make these "work" via writing to a tempfile, parsing from their with re-encoding, and at the end run enc2utf8() on all strings obtained, but that's not so simple ...) Anyway, need to discuss this a bit more within R Core. For now, things "work" again with LC_ALL=C. (My regular checks use C.UTF-8, but I am not sure how universally available this is?) Best > I think this can be made to work by telling tools:::process_Rd() -> > tools:::processRdChunk() to parse character strings in R code as UTF-8: > Index: src/library/tools/R/RdConv2.R > =================================================================== > --- src/library/tools/R/RdConv2.R (revision 88617) > +++ src/library/tools/R/RdConv2.R (working copy) > @@ -229,8 +229,8 @@ > code <- structure(code[tags != "COMMENT"], > srcref = codesrcref) # retain for error locations > chunkexps <- tryCatch( > - parse(text = sub("\n$", "", as.character(code)), > - keep.source = options$keep.source), > + parse(text = sub("\n$", "", enc2utf8(as.character(code))), > + keep.source = options$keep.source, encoding = "UTF-8"), > error = function (e) stopRd(code, Rdfile, conditionMessage(e)) > ) > That enc2utf8() may be extraneous, since tools::parse_Rd() is > documented to convert text to UTF-8 while parsing. The downsides are, > of course, parse(encoding=...) not working with MBCS locales and the > ever-present danger of breaking some user code that depends on the > current behaviour (this was tested using 'make check-devel', not on > CRAN packages). > Should R compile under LC_ALL=C? Maybe it's time for people whose > builds are failing to switch the continuous integration containers from > C to C.UTF-8? > -- > Best regards, > Ivan > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel