On Mon, 21 Apr 2008 17:50:43 +0100, Michael Foord <[EMAIL PROTECTED]> wrote: >[EMAIL PROTECTED] wrote: >> David> Is there some sort of text encoding detection module is the >> David> standard library? And, if not, is there any reason not to add >> David> one? >> >> No, there's not. I suspect the fact that you can't correctly determine the >> encoding of a chunk of text 100% of the time mitigates against it. >> > >The only approach I know of is a heuristic based approach. e.g. > >http://www.voidspace.org.uk/python/articles/guessing_encoding.shtml > >(Which was 'borrowed' from docutils in the first place.)
This isn't the only approach, although you're right that in general you have to rely on heuristics. See the charset detection features of ICU: http://www.icu-project.org/userguide/charsetDetection.html I think OSAF's pyicu exposes these APIs: http://pyicu.osafoundation.org/ Jean-Paul _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com