Re: [Rd] String encoding problem

2016-07-07 Thread Duncan Murdoch
On 07/07/2016 12:51 PM, peter dalgaard wrote: > On 07 Jul 2016, at 18:15 , Hadley Wickham wrote: > > Right - I'm aware of that. But to me, it doesn't seem correct to > print a string that is not a valid R string. Why is an unknown > encoding printed like UTF-8? > It isn't

Re: [Rd] String encoding problem

2016-07-07 Thread peter dalgaard
> On 07 Jul 2016, at 18:15 , Hadley Wickham wrote: > > Right - I'm aware of that. But to me, it doesn't seem correct to > print a string that is not a valid R string. Why is an unknown > encoding printed like UTF-8? > It isn't -- no UTF-8 would have the \xbf. I may be

Re: [Rd] String encoding problem

2016-07-07 Thread Hadley Wickham
>>> I'm not sure what should happen here, but that's not a legal string in a >>> UTF-8 locale, so it's not too surprising that things go wonky. >> >> Here's bit more context on how I got that sequence of bytes: >> >> x <- "こんにちは" >> y <- iconv(x, to = "Shift-JIS") >> Encoding(y) >> y >> >> I did

Re: [Rd] String encoding problem

2016-07-07 Thread Simon Urbanek
> On Jul 7, 2016, at 11:40 AM, Hadley Wickham wrote: > > On Thu, Jul 7, 2016 at 10:11 AM, Duncan Murdoch > wrote: >> On 07/07/2016 10:57 AM, Hadley Wickham wrote: >>> >>> If you print: >>> >>> "\xc9\x82\xbf" >>> >>> you get >>> >>>

Re: [Rd] String encoding problem

2016-07-07 Thread Hadley Wickham
On Thu, Jul 7, 2016 at 10:11 AM, Duncan Murdoch wrote: > On 07/07/2016 10:57 AM, Hadley Wickham wrote: >> >> If you print: >> >> "\xc9\x82\xbf" >> >> you get >> >> "\u0242\xbf" >> >> But if you try and evaluate that string you get: >> >>> "\u0242\xbf" >> >> Error:

Re: [Rd] String encoding problem

2016-07-07 Thread Duncan Murdoch
On 07/07/2016 10:57 AM, Hadley Wickham wrote: If you print: "\xc9\x82\xbf" you get "\u0242\xbf" But if you try and evaluate that string you get: "\u0242\xbf" Error: mixing Unicode and octal/hex escapes in a string is not allowed (Probably will only happen on mac/linux with default

[Rd] String encoding problem

2016-07-07 Thread Hadley Wickham
If you print: "\xc9\x82\xbf" you get "\u0242\xbf" But if you try and evaluate that string you get: > "\u0242\xbf" Error: mixing Unicode and octal/hex escapes in a string is not allowed (Probably will only happen on mac/linux with default utf-8 encoding) Hadley -- http://hadley.nz