Re: [GENERAL] Automatic locale detection?

Lexington Luthor Mon, 09 Oct 2006 04:54:50 -0700

Matthew Peter wrote:

Is it possible to automatically detect the language encoding of incomingdata? For instance if Japanese is used, is there a way to know it isJapanese from a bit in the charset, a dictionary-based evaluation orotherwise?

Have a look at http://www.mozilla.org/projects/intl/chardet.html andhttp://chardet.feedparser.org/ for some implementations of this idea.

These detectors are often inaccurate though (and sometimes failcompletely), see the warning at the bottom ofhttp://chardet.feedparser.org/docs/supported-encodings.html


Regards,
LL


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [GENERAL] Automatic locale detection?

Reply via email to