Regarding the Windows character encoding issues Daniel Possenriede
posted about earlier this month, where non-Latin-1 strings were getting
marked as such
(https://stat.ethz.ch/pipermail/r-devel/2017-August/074731.html ):
The issue is that on Windows, when the character locale is Windows-1252,
R marks some (possibly all) native non-ASCII strings as "latin1". I
posted a related bug report:
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17329 . The bug
report also includes a link to a fix for a related issue: converting
strings from Windows native to UTF-8.
There is a work-around for this bug in the current development version
of the 'corpus' package (not on CRAN yet). See
https://github.com/patperry/r-corpus/issues/5 . I have tested this on a
Windows-1252 install of R, but I have not tested it on a Windows install
in another locale. It'd be great if someone with such an install would
test the fix and report back, either here or on the github issue.
Patrick
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel