There is charlint (http://www.w3.org/International/charlint/), which is
based on UTF-8. It may be possible to adapt it to UTF-16/32.
Regards, Martin.
On 2010/11/04 4:37, Jim Monty wrote:
Is there a utility, preferably open source and written in C, that inspects
UTF-16/UTF-16BE/UTF-16LE text and identifies broken surrogate pairs and illegal
characters? Ideally, the utility can both report illegal code units and "repair"
them by replacing them with U+FFFD.
Jim Monty
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp