Theodore H. Smith wrote:
I'd like to see a UTF-8 stress test file.

It should consist of lines of UTF-8, separated each by a newline. Each line should be malformed. Also, some idea of how to deal with the malformed UTF-8 should be noted in a separate file.

Really, I just want some way to verify that I can detect every kind of UTF-8 wrongness. I have some code I adapted from Unicode.org, but I want to make sure my adaptions haven't broken the code.

http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt




Reply via email to