On Sun, Sep 10, 2006 at 12:02:44PM -0700, Paul Prescod wrote:
> * Eastern Unix/Linux users using UTF-8 apps like gedit or apps "saving as"
> UTF-8
Finally I've got the definitive answer for "is Russia Europe or Asia?"
It is an Eastern country! At last! ;)
For these purposes, Russia is European, isn't it? Russian text can be subsumed by UTF-8 with relatively minor expansion, right? If so, then I would guess that UTF-8 would replace KOI8-R and iso8859-? for Russian eventually.
> Maybe the guessing algorithm should read the WHOLE FILE.
Zen: "In the face of ambiguity, refuse the temptation to guess."
Unfortunately this contradicts to not the only idea how much to read
but the to whole idea to guess encoding. So may be we are going in the
wrong direction. IMHO the right direction is to include a guessing script
in Tools directory.
That was the position I started with. Guido wanted a guessing mode. So I designed what seemed to me to be the least dangerous guessing mode possible:
1. Off by default.
2. Turned on by the keyword "guess".
3. Decodes the full text to check for encoding correctness.
Given these safeguards, I think that the feature is not only safe enough but also helpful.
Moving it to a script would not meet the central goal that it be easily usable by people who do not know much about encodings or Python.
Paul Prescod
_______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
