pepe pepe schreef: > We have the following sequence of characters "...ización Map.." that is > the same than "...ización Map..." that after suffering some > transformations becomes to "...izaci�&56333;ap...." > AS you can see the two characters 56186 and 56333 seem to represent this > sequences "ón M". Any idea?.
Yes, your input text obviously gets flagged as being in UTF-8 format, even if it is Latin-1 (or any codepage that has a ó at index 243). Not only that, but the process making the mistake of thinking it is UTF-8 also makes the mistake of not generating an error for encountering malformed byte sequences, AND of outputting the result as two 16-bit numbers instead of one 21-bit number. If you take the byte sequence (hex) F3 6E 20 4D and treat it as UTF-8 and don't care it's not valid, this maps to the value (hex)1EE80D. Again, not caring this is not a valid codepoint, turning this into UTF-16 would yield U+DB7A U+DC0D, which is what you got in your output. Pim Blokland