Dominikus Scherkl wrote:
Converting text to/from UTF-8 is indeed common and important.Converting from and to utf-8 is an all-day topic, very important for all applications handling with unicode. So it is a special
Converting text that claims to be UTF-8 - but isn't - is different: It may be a spoofing attempt, or bytes may have been lost, or the text may not be UTF-8 at all, etc. How to handle non-UTF-8 text in a from-UTF-8 converter seems to be a judgement call, and application-specific.
(How does the converter know _why_ there is an illegal sequence?)
ISO 10646 and the RFC never allowed to generate overlong UTF-8. Unicode at least used to say "should not" for generation (but allowed decoding). Chances are nearly 100% that overlong UTF-8 was a spoofing attempt, or the result of something other than a UTF-8 encoder.Additional I think we should have a standardized way to display old utf-8 text without losing information (overlong utf-8 was allowed for years) ...
Viele Gr��e,
markus
--
Opinions expressed here may not reflect my company's positions unless otherwise noted.

