On Sat, 29 Nov 2025, Mark Bravington wrote:
[You don't often get email from [email protected]. Learn why this is
important at https://aka.ms/LearnAboutSenderIdentification ]
Wouldn't the obvious thing be to not use an r string here?
"Obvious" in terms of keeping RCMD CHECK happy, certainly, but it'd be
antithetical to clear code--- the string I included in the post would become
incomprehensible to the maintainer (me).
IME raw strings in R are under-appreciated and little-known. They have lots of
uses besides regexes, whatever the intention(s) may or may not have been! EG I
use raw strings for formatted multi-line comments, and documentation, and
templated bits of text. Nicer code results.
Raw strings are very useful in a number of ways. That is why they were
added to R.
Anyway, I'd be perfectly happy with Duncan Murdoch's suggestion of making UTF-8 legit in R
& NAMESPACE generally. I suggested the minor incremental change of "only raw
strings" (i) because that's the only thing that affects me ATM, and (ii) just in case
there were unwelcome implications of UTF-8 for (iii) strings in general, or (iv) legal
variable names etc.
Keeping the ASCII-only restriction for code is important as it makes
the code easier to understand by a wider audience.
Allowing non-ASCII characters in literal strings, raw or regular, does
seem reasonable to me in principle, but others may see issues I am not
aware of.
But checking for non-ASCII characters in code while allowing non-ASCII
characters in string literals needs much more sophisticated check code
than we currently have. If you or anyone else want to see this happen
you can explore creating a patch and submit to bugzilla for
consideration.
Best,
luke
cheers
Mark
On Sun, Nov 30, 2025, at 03:44, Jeff Newmiller via R-package-devel wrote:
Wouldn't the obvious thing be to not use an r string here? Using r strings does
not imply the use of non-ascii characters (AFAIK they are intended for regex
patterns), and using regular strings does not imply you cannot use Unicode
(with \uxxxx).
At some point I would think that accepting Unicode in package source code would
become acceptable... but supporting Unicode in data objects does not implicitly
suggest that allowing Unicode in source code has to be supported so your
arguments don't IMO really bring any weight to the discussion.
On November 29, 2025 2:55:52 AM PST, Mark Bravington
<[email protected]> wrote:
Hi--- My package 'lyxport' has R code with several raw strings (see ?Quotes)
which contain UTF-8 characters (FWIW: in order to deal with wacky legacy Latex
characters). For example, one of the strings is:
converto <- r"--{
Ä \"A ä \"a Á \'A á \'a Ȧ \.A ȧ \.a Ā \=A
ā \=a  \^A â \^a À \`A à \`a Ą \k{A} ą \k{a}
<snipped>
Ŋ {\NG} Ø {\O} ø {\o} œ {\oe} Œ {\OE} ß {\ss} þ {\th}
Þ {\TH}
}--"
RCMD CHECK is not happy, and gives a Warning:
"Portable packages must use only ASCII characters in their R code and NAMESPACE
directives, except perhaps in comments. Use \uxxxx escapes for other characters."
and indeed that is as stated in "Writing R extensions", section 1.1.5 ("Package
subdirectories") and section 1.6.3, "Encoding issues".
But I wonder if this is still sensible now that
(i) R has raw strings (since ~R 4.0);
(ii) the DESCRIPTION file explicitly says "Encoding: UTF-8"; and
(iii) R >= 4.2 pretty much now enforces UTF-8 in Windows (and UTF-8 could even be a
"requirement" of this package, if that helped).
With "normal" strings then maybe the \uxxxx thing is reasonable; but shouldn't
the contents of raw strings be exempt? You can't put \uxxxx into a raw string, for
obvious reasons...
cheers
Mark
PS Of course, there are ways around the Warning (eg storing the strings as
files elsewhere in the package, and reading those files during the code) but
they are tedious, harder to maintain, and reduce clarity (imagine using \uxxxx
in the above!). Since I don't particularly care whether the package goes on
CRAN or not (it's living quite happily in R-universe), I've no plans to change
my code, but I would prefer to avoid Warnings that then have to be explained to
would-be users. And I am probably not the only person affected.
PPS The package has been working fine on Windows, Macs, and Linux.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
--
Sent from my phone. Please excuse my brevity.
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: [email protected]
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel