Chris Angelico writes:
 > On Sat, Jan 23, 2021 at 12:37 PM Inada Naoki <songofaca...@gmail.com> wrote:

 > > ## 1. Add `io.open_text()`, builtin `open_text()`, and
 > > `pathlib.Path.open_text()`.
 > >
 > > All functions are same to `io.open()` or `Path.open()`, except:
 > >
 > > * Default encoding is "utf-8".

I wonder if it might not be better to remove the encoding parameter
for this version.  Further comments below.

 > > * "b" is not allowed in the mode option.
 > 
 > I *really* don't like this, because it implies that open() will open
 > in binary mode.

I doubt that will be a common misunderstanding, as long as 'open_text'
is documented as a convenience wrapper for 'open' aimed primarily at
Windows programmers.

 > > How do you think about this idea? Is this worth enough to add a new
 > > built-in function?
 > 
 > Highly dubious.

I won't go so far as "highly", but yeah, dubious to me.  In my own
environment, while I still see Shift JIS data quite a bit, the rule is
that this or that correspondent sends it to me.  While a lot of the
University infrastructure used to default to Shift JIS, it now
defaults to UTF-8.  So I don't have a consistent rule by "kind of
data", ie, which scripts use 'open_text' and which 'open'.  If the
script processes data from "JIS users", it needs to accept a
command-line flag because other users *will* be sending that kind of
data in UTF-8.  Naoki's mileage may vary.

See below for additional comments.

 > I'd rather focus on just moving to UTF-8 as the default, rather
 > than bringing in a new function - especially with such a confusing
 > name.

I expect there are several bodies of users who will experience that as
quite obnoxious for a long time to come.  I *still* see a ton of stuff
that is Shift JIS, a fair amount of email in ISO-2022-JP, and in China
gb18030 isn't just a good idea, it's the law.  (OK, the precise
statement of the law is "must support", not "must use", but my Chinese
students all default to GB.)

The problem is that these users use some software that will create
text in a national language encoding by default and other that use
UTF-8 by default.  So I guess Naoki's hope is that "when I'm
processing Microsoft/Oracle-generated data, I use 'open_text', when
it's local software I use 'open'" becomes an easy and natural reponse
in such environments.

We don't see very many Asian language users on the python-* lists.  We
see a few more Russian users, I suspect quite a few Hebrew and Indic
users, maybe a few Arabic users.  So we should listen very carefully
to the few we do have where they come from tiny minorities of python-*
subscribers.

 > What exactly are the blockers on making open(fn) use UTF-8 by
 > default?

Backward incompatibility with just about every script in existence?

 > Can the proposals be written with that as the ultimate goal (even if
 > it's going to take X versions and multiple deprecation phases), rather
 > than aiming for a messy goal where people aren't sure which function
 > to use?

The problem is that on Windows there are a lot of installations that
continue to use non-UTF-8 encodings enough that users set their
preferred encoding that way.  I guess that folks where the majority of
their native-language alphabet is drawn from ASCII are by now almost
all using UTF-8 by default, but this is not so for East Asians (who
almost all still use a mixture of several encodings every day because
email still often defaults to national standard encodings).  I can't
speak to Cyrillic, Hebrew, Arabic, Indic languages, but I wouldn't be
surprised if they're somewhere in the middle.

Naoki can document that "open(..., encoding='...')" is strongly
preferred to 'open_text'.  Maybe a better name is "open_utf8", to
discourage people who want to use non-default encodings, or
programmatically chosen encodings, in that function.

As someone who avoids Windows like the plague, I have no real sense of
how important this is, and I like your argument from first
principles.  So on net, I guess I'm +/- 0 only because Naoki thinks it
important enough to spend quite a bit of skull sweat and effort on
this.

Steve
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/E2X4QYTOW47BVYVRWACOIBQA3H5BVZMQ/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to