[Python-Dev] Encoding detection in the standard library?

Jim Jewett Mon, 21 Apr 2008 20:30:35 -0700

David Wolever wrote:

> IMO, encoding estimation is something that
> many web programs will have to deal with,
> so it might as well be built in; I would prefer
> the option to run `text=input.encode('guess')`
> (or something similar) than relying on an external
> dependency or worse yet using a hand-rolled
> algorithm


The (still draft) html5 spec is trying to get error-correction
standardized, so it includes all sort of "if this fails, do X".
Encoding detection will be standardized, so there will be an external
standard that we can reference.

http://dev.w3.org/html5/spec/Overview.html#determining

Note that this portion of the spec is probably not stable yet, as
there was some new analysis on which "wrong" answers provided better
results on real world web pages.

e.g.,

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-March/014127.html

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-March/014190.html

There was also a recent analysis of how many characters it takes to
sniff successfully X% of the time on today's web, though I'm not
finding it at the moment.

-jJ
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Encoding detection in the standard library?

Reply via email to