Re: [Rd] use of UTF-8 \uxxxx escape sequences in function arguments

Thomas Zumbrunn Thu, 19 Jan 2012 15:39:48 -0800

On Thursday 19 January 2012, peter dalgaard wrote:
> On Jan 18, 2012, at 23:54 , Thomas Zumbrunn wrote:
> >   plain("Zürich")  ## works
> >   plain("Z\u00BCrich")  ## fails
> >   escaped("Zürich")  ## fails
> >   escaped("Z\u00BCrich")  ## works
> 
> Using the correct UTF-8 code helps quite a bit:
> 
> U+00BC        ¼       c2 bc   VULGAR FRACTION ONE QUARTER
> U+00FC        ü       c3 bc   LATIN SMALL LETTER U WITH DIAERESIS


Thank you for pointing that out. How embarrassing - I systematically used the 
wrong representations. Even worse, I didn't carefully read "Writing R 
Extensions" which speaks of "Unicode as \uxxxx escapes" rather than "UTF-8 as 
\uxxxx escapes", so e.g. looking up the UTF-16 byte representations would have 
done the trick.

I didn't find a recommended method of replacing non-ASCII characters with 
Unicode \uxxxx escape sequences and ended up using the Unix command line tool 
"iconv". However, the iconv version installed on my GNU/Linux machine 
(openSUSE 11.4) seems to be outdated and doesn't support the very useful "--
unicode-subst" option yet. I installed "libiconv" from 
http://www.gnu.org/software/libiconv/, and now I can easily replace all non-
ASCII characters in my UTF-8 encoded R files with:

  iconv -f UTF-8 -t ASCII --unicode-subst="\u%04X" my-utf-8-encoded-file.R

Thomas Zumbrunn

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] use of UTF-8 \uxxxx escape sequences in function arguments

Reply via email to