I gave a look at the latest XMLParser but the API is different with a lot broken code on my face. Does XMLWriter class>>on: obsolete ? It bugs me with that but the class and method are still there, a Monticello trick I forget about? I don't even now how to port to new API. Is there a port guide? I guess this is for the better, but still frustrating and distracting from the main task...
Le 05/08/2011 16:23, Henrik Johansen a écrit : > > On Aug 5, 2011, at 3:41 54PM, Hilaire Fernandes wrote: > >> Le 05/08/2011 13:28, Henrik Johansen a écrit : >>> >>> On Aug 5, 2011, at 1:14 35PM, Hilaire Fernandes wrote: >>> >>>> It seems like when inputing accented character it is not by default in >>>> UTF-8. >>>> Is it the case with Pharo 1.3 ? >>>> >>>> Hilaire >>>> >>>> >>>> -- >>>> Education 0.2 -- http://blog.ofset.org/hilaire >>> >>> I'm not sure what you mean. >>> When in image, all the way from InputEvents to String representation, you >>> only deal with Unicode codePoints. >> >> Is seems it is 8 bits chars, when exported through XMLParser, it is >> 8bits string. I need to investigate further. >> >> Hilaire > It is an 8-bit character, since the codePoint fits in one byte. (see a) > Accented characters like é could be either: > a) One Unicode codepoint (U+00E9 (decimal 233) small acute e ) > b) Two Unicode codepoints ( U+0301 (decimal 769) combining acute accent + > U0065 (decimal 101) small e ). > > Internally, you'd see strings with character values corresponding to those > listed as decimal, ie the unicode codePoints. > b) would be a WideString, as 769 does not fit in a byte. > > However, if correctly converted to UTF8, their representations should be; > a) represented in 2 bytes ; 16r C3A9 > b) represented in 3 bytes: 16r CD81 65. > > Ie. it seems XMLParser does not encode it properly to utf8 when exporting. > Note: This is perfectly legal if the document contains an encoding attribute > specifying a one-byte encoding like iso-8859-1 or windows-1252. > (starts with <?xml version="1.0" encoding="windows-1252" ?> or some such) > Absent such an attribute, or a BOM indicating another Unicode encoding > though, it is a bug. > > Cheers, > Henry > > > -- Education 0.2 -- http://blog.ofset.org/hilaire
