David Wolever wrote: > IMO, encoding estimation is something that > many web programs will have to deal with, > so it might as well be built in; I would prefer > the option to run `text=input.encode('guess')` > (or something similar) than relying on an external > dependency or worse yet using a hand-rolled > algorithm
The (still draft) html5 spec is trying to get error-correction standardized, so it includes all sort of "if this fails, do X". Encoding detection will be standardized, so there will be an external standard that we can reference. http://dev.w3.org/html5/spec/Overview.html#determining Note that this portion of the spec is probably not stable yet, as there was some new analysis on which "wrong" answers provided better results on real world web pages. e.g., http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-March/014127.html http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-March/014190.html There was also a recent analysis of how many characters it takes to sniff successfully X% of the time on today's web, though I'm not finding it at the moment. -jJ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com