Hi Andrew,
On 9/30/22 15:05, Andrew Hart via R-help wrote:
Hi everyone,
Recently I upgraded to R 4.2.1 which now uses UTF-8 internally as its
native encoding. Very nice. However, I've discovered that if I use
writeClipboard to try and move a string containing accented characters
to the Windows clipboard and then try and paste that into another
application (e.g. notepad, Eclipse, etc.), the accents turn out all
garbled. Here's an example:
writeClipboard("categoría")
Pasting the result into this e-mail message yields
CategorÃa
As near as I can tell, the problem seems to have something to do with
the format parameter of writeClipboard. By default, format has a value
of 1, which tells the clipboard to receive Text in the machine's
locale. If I set format=13 in the call, the accents transfer to the
clipboard correctly:
writeClipboard("categoría", format=13)
and the result is
Categoría
Ivan Krylov has kindly turned this into a bug report, please see
https://bugs.r-project.org/show_bug.cgi?id=18412
for more details. In short, yes, using format=13 is recommended, but
please note it has already been documented in ?writeClipboard.
It seems that format=13 may be a better default now that R is using
UTF-8. It would be nice not to have to specify the format every time I
want to copy text to the clipboard with writeClipboard.
Yes, I agree, I've changed the default to format=13.
Is writeClipboard supposed to perform any kind of encoding conversion
or is the format parameter merely informing the clipboard of the kind
of payload it's being handed?
Btw, with pre-4.2.0 versions of R, this wasn't a problem. I am very
much in favour of R using some kind of Unicode encoding natively, but
this wrinkle seems to be something the user shouldn't have to deal
with since the Windows clipboard is capable of holding Unicode text.
Any advice would be gratefully received.
This is a bit complicated and more can be found in the bug report
response. In short, the clipboard is capable of holding either "text"
(then with locale information) or "Unicode text". One can ask Windows
for either content and Windows will do the conversion, it would convert
from "text" to "Unicode text" using that locale. If that locale is not
filled in explicitly, it is the current input language (so the
"keyboard" the user has selected at the time of the copying to
clipboard, e.g. of writeClipboard). If that locale encoding doesn't
match the R current native encoding, and you are using "text",
characters may be mis-represented. This could have happened even before
R 4.2.0, but is more likely from R 4.2.0 when it uses UTF-8. Going via
"Unicode text" resolves the issue as the conversion to/from UTF-16LE is
done by readClipboard/writeClipboard using the R current native encoding.
Users who don't want to deal with these complexities can use the
higher-level connections interface (?connections, "clipboard").
Best
Tomas
Thanks,
Andrew.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.