Re: [Python-Dev] Encoding detection in the standard library?

Jean-Paul Calderone Mon, 21 Apr 2008 10:03:44 -0700

On Mon, 21 Apr 2008 17:50:43 +0100, Michael Foord <[EMAIL PROTECTED]> wrote:
>[EMAIL PROTECTED] wrote:
>>     David> Is there some sort of text encoding detection module is the
>>     David> standard library?  And, if not, is there any reason not to add
>>     David> one?
>>
>> No, there's not.  I suspect the fact that you can't correctly determine the
>> encoding of a chunk of text 100% of the time mitigates against it.
>>
>
>The only approach I know of is a heuristic based approach. e.g.
>
>http://www.voidspace.org.uk/python/articles/guessing_encoding.shtml
>
>(Which was 'borrowed' from docutils in the first place.)


This isn't the only approach, although you're right that in general you
have to rely on heuristics.  See the charset detection features of ICU:

  http://www.icu-project.org/userguide/charsetDetection.html

I think OSAF's pyicu exposes these APIs:

  http://pyicu.osafoundation.org/

Jean-Paul
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

Reply via email to