Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

Daniel Bünzli Fri, 08 May 2015 18:30:55 -0700

Le samedi, 9 mai 2015 à 02:33, Philippe Verdy a écrit :
> 2015-05-08 14:32 GMT+02:00 Daniel Bünzli <[email protected] 
> (mailto:[email protected])>:
> > Well did you test them all ? There's quite a big list here 
> > http://www.json.org. Taking a random one mentioned on that page leads me to 
> > http://golang.org/pkg/encoding/json/ in which they say that they replace 
> > invalid UTF-16 surrogate pairs by U+FFFD. This is really not very 
> > surprising since apparently go's strings as text are UTF-8 encoded so when 
> > you need to produce your results as UTF-8 then you don't have a lot of 
> > solutions... error and/or U+FFFD.
>  
>  
> I've already saif that JSON is UTF-8 encoded by default, but this does not 
> mean that JSON invalidates the escape sequence '\uD800' isolated in a string.


You didn't get what I said. When a parser returns a JSON string it just parsed 
and that it wants to give it back to the programmer using the native string of 
the language and that these strings happen to be UTF-8 encoded in this 
language, then in presence of such lone surrogates you are stuck and need to do 
something as you cannot encode them in the UTF-8 string.  

(I understand that in *your* interpretation this should not happen since I 
should define a special data type to represent these JSON strings so that they 
behave like JavaScript strings; that would be indeed very practical, none of my 
language native string tools can be used on that…)
  
Anyways, we are largely OT at this point.  

Best,

Daniel

Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

Reply via email to