Or, if this cannot be done easily, please, disable the "utf-8" value in source(..., ) function on Windows R. source(..., encoding = "utf-8") -> error: "utf-8" does not work right on Windows. -> (or, at least) warning: "utf-8" is handled by "best fit" on Windows and some characters in string literals may be automatically changed.
Because, at this state, the UTF-8 encoding of R source files on Windows is a fake Unicode as it can handle only 256 different ANSI characters in reality. Thanks, Tomas On Thu, Apr 11, 2019 at 8:53 AM Tomáš Bořil <bor...@gmail.com> wrote: > > For me, this would be a perfect solution. > > I.e., do not use the “best” fit and leave it to user’s competence: > a) in some functions, utf-8 works > b) in others -> error is thrown (e.g., incomplete string, NA, etc.) > => user has to change the code with his/her intentional “best fit string > literal substitute” or use another function that can handle utf-8. > > Making an R code working right only on some platforms / trying to keep a > back-compatibility meaning “the code does not do what you want and the > behaviour differs depending on each every locale but at least, it does not > throw an error” is generally not a good idea - it is dangerous. Users / > coders should know that there is something wrong with their strings and some > characters are “eaten alive”. > > Tomas > > čt 11. 4. 2019 v 8:26 odesílatel Tomas Kalibera <tomas.kalib...@gmail.com> > napsal: >> >> On 4/10/19 6:32 PM, Jeroen Ooms wrote: >> > On Wed, Apr 10, 2019 at 5:45 PM Duncan Murdoch <murdoch.dun...@gmail.com> >> > wrote: >> >> On 10/04/2019 10:29 a.m., Yihui Xie wrote: >> >>> Since it is "technically easy" to disable the best fit conversion and >> >>> the best fit is rarely good, how about providing an option for >> >>> code/package authors to disable it? I'm asking because this is one of >> >>> the most painful issues in packages that may need to source() code >> >>> containing UTF-8 characters that are not representable in the Windows >> >>> native encoding. Examples include knitr/rmarkdown and shiny. Basically >> >>> users won't be able to knit documents or run Shiny apps correctly when >> >>> the code contains characters that cannot be represented in the native >> >>> encoding. >> >> Wouldn't things be worse with it disabled than currently? I'd expect >> >> the line containing the "ř" to end up as NA instead of converting to "r". >> > I don't think it would be worse, because in this case R would not >> > implicitly convert strings to (best fit) latin1 on Windows, but >> > instead keep the (correct) string in its UTF-8 encoding. The NA only >> > appears if the user explicitly forces a conversion to latin1, which is >> > not the problem here I think. >> > >> > The original problem that I can reproduce in RGui is that if you enter >> > "ř" in RGui, R opportunistically converts this to latin1, because it >> > can. However if you enter text which can definitely not be represented >> > in latin1, R encodes the string correctly in UTF-8 form. >> >> Rgui is a "Windows Unicode" application (uses UTF16-LE) but it needs to >> convert the input to native encoding before passing it to R, which is >> based on locales. However, that string is passed by R to the parser, >> which Rgui takes advantage of and converts non-representable characters >> to their \uxxxx escapes which are understood by the parser. Using this >> trick, Unicode characters can get to the parser from Rgui (but of course >> then still in risk of conversion later when the program runs). Rgui only >> escapes characters that cannot be represented, unfortunately, the >> standard C99 API for that implemented on Windows does the best fit. This >> could be fixed in Rgui by calling a special Windows API function and >> could be done, but with the mentioned risk that it would break existing >> uses that capture the existing behavior. >> >> This is the only place I know of where removing best fit would lead to >> correct representation of UTF-8 characters. Other places will give NA, >> some other escapes, code will fail to parse (e.g. "incomplete string", >> one can get that easily with source()). >> >> Tomas >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel