On Sat, 29 Nov 2025, Mark Bravington wrote:

[You don't often get email from [email protected]. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Wouldn't the obvious thing be to not use an r string here?

"Obvious" in terms of keeping RCMD CHECK happy, certainly, but it'd be 
antithetical to clear code--- the string I included in the post would become 
incomprehensible to the maintainer (me).

IME raw strings in R are under-appreciated and little-known. They have lots of 
uses besides regexes, whatever the intention(s) may or may not have been! EG I 
use raw strings for formatted multi-line comments, and documentation, and  
templated bits of text. Nicer code results.

Raw strings are very useful in a number of ways. That is why they were
added to R.

Anyway, I'd be perfectly happy with Duncan Murdoch's suggestion of making UTF-8 legit in R 
& NAMESPACE generally. I suggested the minor incremental change of "only raw 
strings" (i) because that's the only thing that affects me ATM, and (ii) just in case 
there were unwelcome implications of UTF-8 for (iii) strings in general, or (iv) legal 
variable names etc.

Keeping the ASCII-only restriction for code is important as it makes
the code easier to understand by a wider audience.

Allowing non-ASCII characters in literal strings, raw or regular, does
seem reasonable to me in principle, but others may see issues I am not
aware of.

But checking for non-ASCII characters in code while allowing non-ASCII
characters in string literals needs much more sophisticated check code
than we currently have. If you or anyone else want to see this happen
you can explore creating a patch and submit to bugzilla for
consideration.

Best,

luke

cheers
Mark


On Sun, Nov 30, 2025, at 03:44, Jeff Newmiller via R-package-devel wrote:
Wouldn't the obvious thing be to not use an r string here? Using r strings does 
not imply the use of non-ascii characters (AFAIK they are intended for regex 
patterns), and using regular strings does not imply you cannot use Unicode 
(with \uxxxx).

At some point I would think that accepting Unicode in package source code would 
become acceptable... but supporting Unicode in data objects does not implicitly 
suggest that allowing Unicode in source code has to be supported so your 
arguments don't IMO really bring any weight to the discussion.

On November 29, 2025 2:55:52 AM PST, Mark Bravington 
<[email protected]> wrote:
Hi--- My package 'lyxport' has R code with several raw strings (see ?Quotes) 
which contain UTF-8 characters (FWIW: in order to deal with wacky legacy Latex 
characters). For example, one of the strings is:

 converto <- r"--{
     Ä   \"A ä   \"a Á   \'A á   \'a Ȧ   \.A ȧ   \.a Ā   \=A
     ā   \=a    \^A â   \^a À   \`A à   \`a Ą   \k{A} ą   \k{a}
<snipped>
     Ŋ   {\NG} Ø   {\O}  ø   {\o}  œ   {\oe} Œ   {\OE} ß   {\ss} þ   {\th}
     Þ   {\TH}
   }--"

RCMD CHECK is not happy, and gives a Warning:

"Portable packages must use only ASCII characters in their R code and NAMESPACE 
directives, except perhaps in comments. Use \uxxxx escapes for other characters."

and indeed that is as stated in "Writing R extensions", section 1.1.5 ("Package 
subdirectories") and section 1.6.3, "Encoding issues".

But I wonder if this is still sensible now that

(i) R has raw strings (since ~R 4.0);
(ii) the DESCRIPTION file explicitly says "Encoding: UTF-8"; and
(iii) R >= 4.2 pretty much now enforces UTF-8 in Windows (and UTF-8 could even be a 
"requirement" of this package, if that helped).

With "normal" strings then maybe the \uxxxx thing is reasonable; but shouldn't 
the contents of raw strings be exempt? You can't put \uxxxx into a raw string, for 
obvious reasons...

cheers
Mark


PS Of course, there are ways around the Warning (eg storing the strings as 
files elsewhere in the package, and reading those files during the code) but 
they are tedious, harder to maintain, and reduce clarity (imagine using \uxxxx 
in the above!). Since I don't particularly care whether the package goes on 
CRAN or not (it's living quite happily in R-universe), I've no plans to change 
my code, but I would prefer to avoid Warnings that then have to be explained to 
would-be users. And I am probably not the only person affected.

PPS The package has been working fine on Windows, Macs, and Linux.

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

--
Sent from my phone. Please excuse my brevity.
[[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:   [email protected]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu/
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to