Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

Duncan Murdoch Wed, 10 Apr 2019 09:47:41 -0700

On 10/04/2019 12:32 p.m., Jeroen Ooms wrote:

On Wed, Apr 10, 2019 at 5:45 PM Duncan Murdoch <[email protected]> wrote:


On 10/04/2019 10:29 a.m., Yihui Xie wrote:

Since it is "technically easy" to disable the best fit conversion and
the best fit is rarely good, how about providing an option for
code/package authors to disable it? I'm asking because this is one of
the most painful issues in packages that may need to source() code
containing UTF-8 characters that are not representable in the Windows
native encoding. Examples include knitr/rmarkdown and shiny. Basically
users won't be able to knit documents or run Shiny apps correctly when
the code contains characters that cannot be represented in the native
encoding.


Wouldn't things be worse with it disabled than currently?  I'd expect
the line containing the "ř" to end up as NA instead of converting to "r".


I don't think it would be worse, because in this case R would not
implicitly convert strings to (best fit) latin1 on Windows, but
instead keep the (correct) string in its UTF-8 encoding. The NA only
appears if the user explicitly forces a conversion to latin1, which is
not the problem here I think.

The original problem that I can reproduce in RGui is that if you enter
  "ř" in RGui, R opportunistically converts this to latin1, because it
can. However if you enter text which can definitely not be represented
in latin1, R encodes the string correctly in UTF-8 form.

I think the pathways for text in RGui and text being sourced aredifferent. I agree fixing RGui in that way would make sense, but Yihuiwas talking about source().


Duncan Murdoch

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

Reply via email to