To the contrary, an encoding-guessing module is often needed, and guessing can be done with a pretty high success rate. Other Unicode libraries (e.g. ICU) contain guessing modules. I suppose the API could return two values: the guessed encoding and a confidence indicator. Note that the locale settings might figure in the guess.
On Mon, Apr 21, 2008 at 10:28 AM, Georg Brandl <[EMAIL PROTECTED]> wrote: > Christian Heimes schrieb: > > > David Wolever schrieb: > >> Is there some sort of text encoding detection module is the standard > >> library? > >> And, if not, is there any reason not to add one? > > > > You cannot detect the encoding unless it's explicitly defined through a > > header (e.g. the UTF BOM). It's technically impossible. The best you can > > do is an educated guess. > > Exactly, and in light of that, I'm -1 for such a standard module. > We've enough issues with modules implementing (apparently) fully > specified standards. :) > > Georg > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com