I wanted to ask how can i know if a given text is UTF8 or ISO-8859-1?

If you need conversions, the simplest would be to do it manually using
look-up tables. AFAIK none of the Latin-1 characters take more than 2
bytes in UTF-8, so having 2*256 bytes long table won't hurt.

If you want to decode special Unicode things like right-to-left stuff,
I'd recommend some serious library, such as ICU (icu.sourceforge.net).

If you want to detect the encoding/codepage, I don't think it can be
done in general, unless you know what text to expect. I might be
wrong.

Peter

Reply via email to