On 07/07/2016 12:51 PM, peter dalgaard wrote:
> On 07 Jul 2016, at 18:15 , Hadley Wickham <h.wick...@gmail.com> wrote:
>
> Right - I'm aware of that.  But to me, it doesn't seem correct to
> print a string that is not a valid R string. Why is an unknown
> encoding printed like UTF-8?
>

It isn't -- no UTF-8 would have the \xbf. I may be flogging a dead horse, but 
it seems to me that there are three alternatives:

- refuse the input (x <- "\xc9\x82\xbf" gives "sorry, not a UTF-8 string" or so)
- refuse to print it (print(x) gives "cannot print non-UTF-8 string")
- what happens now

and a fourth one might be to actually allow mixing of \u0007 and \x07 and \007, 
but I suspect that there are demons down the line which is why it is not 
happening now. (Does it ring a bell with anyone?)

A fifth option would be to use only hex escapes when invalid UTF-8 was found. That would echo back the input in this case. No idea if it would cause other problems.

Duncan Murdoch

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to