Björn Höhrmann wrote: > The simple solution to that is a small state machine that you > put each byte through...
Thank you very much for your suggestions, Björn. >From your reply as well as from your Web page titled "Flexible and Economical UTF-8 Decoder" http://bjoern.hoehrmann.de/utf-8/decoder/dfa/, it's obvious you're exactly the right C programmer to have written just the utility I'm looking for: a corrupted UTF-16 text reporting and repair utility. The purpose of the utility would be to fix UTF-16 text that is mostly viable but nonetheless broken due to one or more noncharacters or invalid surrogate-pair code units. The rationale for such a utility is to make UTF-16 text that iconv, Perl and other software chokes on viable and usable. Unfortunately, I'm not a good enough programmer to write such a utility in C or even Perl, the language I know best. Is this a project that interests you, by chance? I'm surprised I'm having difficulty finding an existing utility to repair broken UTF-16 text. I thought this was something many programmers would need, especially Web developers. Thank you again for your thoughtful reply. Jim Monty