On Thu, Jul 7, 2016 at 10:11 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 07/07/2016 10:57 AM, Hadley Wickham wrote: >> >> If you print: >> >> "\xc9\x82\xbf" >> >> you get >> >> "\u0242\xbf" >> >> But if you try and evaluate that string you get: >> >>> "\u0242\xbf" >> >> Error: mixing Unicode and octal/hex escapes in a string is not allowed >> >> (Probably will only happen on mac/linux with default utf-8 encoding) > > > I'm not sure what should happen here, but that's not a legal string in a > UTF-8 locale, so it's not too surprising that things go wonky.
Here's bit more context on how I got that sequence of bytes: x <- "こんにちは" y <- iconv(x, to = "Shift-JIS") Encoding(y) y I did this to create an example to demonstrate how to handle encoding problems, and it's bit frustrating that I have to manually mangle the string in order to be able to re-use it in another session. Maybe strings with unknown encoding shouldn't use unicode escapes? Hadley -- http://hadley.nz ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel