[CCing python-dev again]
On 2008-04-22 12:38, Greg Wilson wrote:
I don't think that should be part of the standard library. People
will mistake what it tells them for certain.
[etc]
These are all good arguments, but the fact remains that we can't control
our inputs (e.g., we're archiving mail messages sent to lists managed by
DrProject), and some of those inputs *don't* tell us how they're encoded.
Under those circumstances, what would you recommend?
I haven't done much research into this, but in general, I think it's
better to:
* first try to look at other characteristics of a text
message, e.g. language, origin, topic, etc.,
* then narrow down the number of encodings which could apply,
* rank them to try to avoid ambiguities and
* then try to see what percentage of the text you can decode using
each of the encodings in reverse ranking order (ie. more specialized
encodings should be tested first, latin-1 last).
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Apr 22 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com