Regarding the Windows character encoding issues Daniel Possenriede posted about earlier this month, where non-Latin-1 strings were getting marked as such (https://stat.ethz.ch/pipermail/r-devel/2017-August/074731.html ):

The issue is that on Windows, when the character locale is Windows-1252, R marks some (possibly all) native non-ASCII strings as "latin1". I posted a related bug report: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17329 . The bug report also includes a link to a fix for a related issue: converting strings from Windows native to UTF-8.

There is a work-around for this bug in the current development version of the 'corpus' package (not on CRAN yet). See https://github.com/patperry/r-corpus/issues/5 . I have tested this on a Windows-1252 install of R, but I have not tested it on a Windows install in another locale. It'd be great if someone with such an install would test the fix and report back, either here or on the github issue.


Patrick

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to