Re: [Python-ideas] Things that won't change (proposed PEP)

2017-01-11 Thread Oleg Broytman
On Thu, Jan 12, 2017 at 12:42:46AM -0600, Nick Timkovich 
 wrote:
> Why mention sys.ps1 == '>>> ', is that some inside joke I'm unaware of?
> That is one of the easier things to modify (with a sitecustomize.py or
> whatever).

   With PYTHONSTARTUP.

> On Thu, Jan 12, 2017 at 12:03 AM, INADA Naoki 
> wrote:
> 
> > > Built-in functions
> > >
> > > --
> > >
> > >
> > >
> > > Python is an object-oriented language, but it is not *purely*
> > >
> > > object-oriented. Not everything needs to be `a method of some object  <
> > http://steve-yegge.blogspot.com.au/2006/03/execution-in-
> > kingdom-of-nouns.html>`_,
> > >
> > > and functions have their advantages.  See the
> > >
> > > `FAQ  > python-use-methods-for-some-functionality-e-g-list-index-
> > but-functions-for-other-e-g-len-list>`_
> > >
> > > for more detail.
> > >
> >
> > I don't like this FAQ entry.  See this issue: https://bugs.python.org/
> > issue27671

Oleg.
-- 
 Oleg Broytmanhttp://phdru.name/p...@phdru.name
   Programmers don't die, they just GOSUB without RETURN.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Nick Coghlan
On 11 January 2017 at 17:05, Stephen J. Turnbull
 wrote:
> Anyway, I need to look more carefully at the actual PEPs and see if
> there's something concrete to worry about.  But remember, we have
> about 18 months to chew over this if necessary

FWIW, I'm hoping to backport whatever improved handling of the C
locale that we agree on for Python 3.7+ to the initial system Python
3.6.0 release in Fedora 26 [1] - hence the section about redistributor
backports in PEP 538.

While the problems with the C locale have been known for a while, this
latest attempt to do something about it started as an idea I had for a
downstream Fedora-specific patch (which became PEP 538), while that
PEP in turn served as motivation for Victor to write PEP 540 as an
alternative approach that didn't depend on having the C.UTF-8 locale
available.

With the F26 Alpha at the end of February and the F26 Beta in late
April, I'm hoping we can agree on a way forward without requiring
months to make a decision :)

> -- I'm only asking for a few more days

Yeah, while I'd prefer not to see the discussions drag out
indefinitely, there's also still plenty of time for folks to consider
the PEPs closely and formulate their perspective.

Cheers,
Nick.

[1] https://fedoraproject.org/wiki/Releases/26/Schedule

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode)

2017-01-11 Thread Nick Coghlan
On 12 January 2017 at 08:15, Victor Stinner  wrote:
> Hi,
>
> I also implemented my PEP 540, you can now test it! Use the latest
> patch attached to:
>
>http://bugs.python.org/issue29240
>
>
> I made multiple changes since the first version of my PEP:
>
> * The UTF-8 Strict mode now only uses strict for inputs and outputs:
> it keeps surrogateescape for operating system data. Read the "Use the
> strict error handler for operating system data" alternative for the
> rationale.
>
> * The POSIX locale now enables the UTF-8 mode. See the "Don't modify
> the encoding of the POSIX locale" alternative for the rationale.
>
> * Specify the priority between -X utf8, PYTHONUTF8, PYTHONIOENCODING, etc.

Thanks Victor, I really like this version, and the next time I update
PEP 538 I'm going to replace the en_US.UTF-8 fallback in the current
proposal with a dependency on this PEP.

My one comment would be that in the summary tables, "Always works"
isn't the right phrase to describe potentially corrupting text data
instead of throwing an exception :)

Instead, I think it would make sense to retitle that column as
"Exception?" such that:

* the ideal state is "No exception, no mojibake", which is what we'll
now get when assuming (or forcing) UTF-8 is the correct thing to do,
and will also continue to get when the locale is set appropriately
(e.g. when handling GB18030 on Chinese systems)
* the problematic behaviour of earlier Python 3.x versions was "Yes
exception, no mojibake" when it assumed ASCII instead of UTF-8
* the problematic behaviour of Python 2.x in the specific examples
given is "No exception, yes mojibake", and potentially even "Yes
exception, yes mojibake" in cases where the implicit ASCII-based
decoding could be encountered

PEP 538 would then be a follow-on PEP that attempts to resolve the
ASCII locale encoding problem not only for CPython itself, but also
for any other C/C++ components sharing the same process, or launched
in subprocesses that inherit the current environment.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Things that won't change (proposed PEP)

2017-01-11 Thread Nick Timkovich
Why mention sys.ps1 == '>>> ', is that some inside joke I'm unaware of?
That is one of the easier things to modify (with a sitecustomize.py or
whatever).

On Thu, Jan 12, 2017 at 12:03 AM, INADA Naoki 
wrote:

> > Built-in functions
> >
> > --
> >
> >
> >
> > Python is an object-oriented language, but it is not *purely*
> >
> > object-oriented. Not everything needs to be `a method of some object  <
> http://steve-yegge.blogspot.com.au/2006/03/execution-in-
> kingdom-of-nouns.html>`_,
> >
> > and functions have their advantages.  See the
> >
> > `FAQ  python-use-methods-for-some-functionality-e-g-list-index-
> but-functions-for-other-e-g-len-list>`_
> >
> > for more detail.
> >
>
> I don't like this FAQ entry.  See this issue: https://bugs.python.org/
> issue27671
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Things that won't change (proposed PEP)

2017-01-11 Thread INADA Naoki
> Built-in functions
>
> --
>
>
>
> Python is an object-oriented language, but it is not *purely*
>
> object-oriented. Not everything needs to be `a method of some object  
> `_,
>
> and functions have their advantages.  See the
>
> `FAQ 
> `_
>
> for more detail.
>

I don't like this FAQ entry.  See this issue: https://bugs.python.org/issue27671
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Chris Barker
It seems to me that having a C locale can mean two things:

1) It really is meant to be ASCII

2) It's mis-configured (or un-configured), meaning the system encoding is
unknown.

if (2) then utf-8 is a fine default.

if (2), then there are two options:

1) Everything on the sytsem really is ASCII -- in which case, utf-8 would
"just work" -- no problem.

2) There are non-ascii file names, etc. on this supposedly ASCII system. In
which case, do folks expect their Python programs to find these issues and
raise errors? They may well expect that their Python program will not let
them try to save a non ASCII filename, for instance. But I suspect that
they wouldn't want it to raise an obscure encoding error -- but rather
would want the app to do somethign friendly.

So I see no downside to using utf-8 when the C locale is defined.

-CHB




On Wed, Jan 11, 2017 at 4:23 PM, INADA Naoki  wrote:

> > My PEP 540 is different than Nick's PEP 538, even for the POSIX
> > locale. I propose to always use the surrogateescape error handler,
> > whereas Nick wants to keep the strict error handler for inputs and
> > outputs.
> > https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler
> >
> > The surrogateescape error handler is useful to write programs which
> > work as pipes, as cat, grep, sed, ... UNIX program:
> > https://www.python.org/dev/peps/pep-0540/#producer-
> consumer-model-using-pipes
> >
> > You can get the behaviour of Nick's PEP 538 using my UTF-8 Strict
> > mode. Compare "UTF-8 mode" and "UTF-8 Strict mode" lines in the tables
> > of my use case. The UTF-8 mode always works, but can produce mojibake,
> > whereas UTF-8 Strict doesn't produce mojibake but can fail depending
> > on data and the locale.
> >
> > IMHO most users prefers usability ("just work") over correctness
> > (prevent mojibake).
> >
>
> I'm ±0 to surrogateescape by default.  I feel +1 for stdout and -1 for
> stdin.
>
> In output case, surrogateescape is weaker than strict, but it only allows
> surrgateescaped binary.  If program carefully use surrogateescaped decode,
> surrogateescape on stdout is safe enough.
>
> On the other hand, surrogateescape is very weak for input.  It accepts
> arbitrary bytes.
> It should be used carefully.
>
> But I agree different encoding handler between stdin/stdout is not
> beautiful.
> That's why I'm ±0.
>
>
> FYI, when http://bugs.python.org/issue15216 is merged, we can change
> error handler easily: ``sys.stdout.set_encoding(
> errors='surrogateescape')``
>
> So it's controllable from Python.  Some program which handles filenames may
> prefer surrogateescape, and some program like CGI may prefer strict
> UTF-8 because
> JSON and HTML5 shouldn't contain arbitrary bytes.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Things that won't change (proposed PEP)

2017-01-11 Thread Pavol Lisy
On 1/12/17, Steven D'Aprano  wrote:


> This shouldn't need saying, but Python 3 will not be abandoned.

Except Python 4 would come.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] How to respond to trolling (Guido van Rossum)

2017-01-11 Thread Simon Lovell

I feel I have to respond to this one.


More than half of what I suggested could have and should be implemented. 
In particular the truthiness of non-boolean data and the lack of a 
reasonable SQL syntax. Several other points have been discussed 
endlessly on the internet but without a satisfactory (IMO) answer being 
given. I don't know what is meant by some insults having been thrown in. 
Calling truthiness of non boolean data "Ugly" is an insult? It is ugly.



Yes, I should have double checked the chained assignment before posting 
and perhaps including some things which weren't changing added negative 
value.



Regarding this comment. 'I use vim, which is very respectable, thank 
you. You'd like me to use "EditPlus 2" or equivalent', I think you 
should familiarise yourself with the "map!" function in vi and vim - put 
it in your .exrc file or .vimrc (vim only). e.g. "map! if if ^M^Mendif".


Regarding the with function, to those not familiar with what I was 
referring to that is a construct in Delphi and some other languages 
which works like this:


ReallyLongFileDescriptor=open("file")
with ReallyLongFileDescriptor:
 x=readline()   // note the lack of "ReallyLongFileDescriptor."
 print x

Delphi is even worse in that you can add more than one prefix in your 
with statement.


Yes, you can put #endif at the end of every "if" statement etc. That 
requires a checker in the vein of the ccheck of yore to be enforced. 
These things aren't desirable.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Things that won't change (proposed PEP)

2017-01-11 Thread Mikhail V
On 12 January 2017 at 03:37, Steven D'Aprano  wrote:

> I have a proposal for an Informational PEP that lists things which will
> not change in Python. If accepted, it could be linked to from the signup
> page for the mailing list, and be the one obvious place to point
> newcomers to if they propose the same old cliches.
>
> Thoughts?
>

Excellent idea, I was going to ask about such list during my own attempts
here.

And my first though is about "will not change". Like: never ever change or
like: will not change in 10 years or 20 years.

And on this occassion, I'd look forward to some alternative
informal proposals repository.
For example something like "futuristic corner" for some
unusual topics but potentially useful for future development.
Now there is "informational PEP" section, but not very clear if it is more
informal than normal PEPs or how would one go with
merely marginal topics, including those things, which
will most obviously not change.
And probably your idea is exactly against this attitude, hard to say.

Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Things that won't change (proposed PEP)

2017-01-11 Thread Chris Barker
I think this is a fine idea, but I also think we could use a more verbose
FAC:

Frequently Asked Criticisms

Some of the same things, but it would focus on the "why" of many of the
issues.

Many of the things people (newbies, mostly) complain about are simply
taste, or legacy that isn't worth changing. But many are deliberate design
decisions that were thoroughly thought out, and have been well explained in
various places (i.e. zero-based indexing and open-on-the-right slicing). It
would be good to have it all in one place.

Maybe this PEP could be extended to include that, but it doesn't feel
PEP-like to me.

-CHB







On Wed, Jan 11, 2017 at 7:17 PM, Oleg Broytman  wrote:

> On Thu, Jan 12, 2017 at 01:37:41PM +1100, Steven D'Aprano <
> st...@pearwood.info> wrote:
> > Explicit self
> > -
> >
> > Explicit ``self`` is a feature, not a bug.  See the
> > `FAQ  be-used-explicitly-in-method-definitions-and-calls>`_
> > for more detail.
>
>If one thinks that ``self`` is too long and tedious to write she can
> use ``s`` instead.
>
> Oleg.
> --
>  Oleg Broytmanhttp://phdru.name/p...@phdru.name
>Programmers don't die, they just GOSUB without RETURN.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Things that won't change (proposed PEP)

2017-01-11 Thread Oleg Broytman
On Thu, Jan 12, 2017 at 01:37:41PM +1100, Steven D'Aprano  
wrote:
> Explicit self
> -
> 
> Explicit ``self`` is a feature, not a bug.  See the
> `FAQ 
> `_
> for more detail.

   If one thinks that ``self`` is too long and tedious to write she can
use ``s`` instead.

Oleg.
-- 
 Oleg Broytmanhttp://phdru.name/p...@phdru.name
   Programmers don't die, they just GOSUB without RETURN.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Things that won't change (proposed PEP)

2017-01-11 Thread Chris Angelico
On Thu, Jan 12, 2017 at 1:37 PM, Steven D'Aprano  wrote:
> I have a proposal for an Informational PEP that lists things which will
> not change in Python. If accepted, it could be linked to from the signup
> page for the mailing list, and be the one obvious place to point
> newcomers to if they propose the same old cliches.
>
>

+1. Sits in a similar place to PEP 3099; can some sort of
appropriate/memorable number be picked for this?

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Things that won't change (proposed PEP)

2017-01-11 Thread Steven D'Aprano
I have a proposal for an Informational PEP that lists things which will 

not change in Python. If accepted, it could be linked to from the signup 

page for the mailing list, and be the one obvious place to point 

newcomers to if they propose the same old cliches.



Thoughts?









* * * * * * * * * *





PEP: XXX

Title: Things that won't change in Python

Version: $Revision$

Last-Modified: $Date$

Author: Steven D'Aprano 

Status: Draft

Type: Informational

Content-Type: text/x-rst

Created: 11-Jan-2017

Post-History: 12-Jan-2017





Abstract





This PEP documents things which will not change in future versions of Python.





Rationale

=



This PEP hopes to reduce the noise on `Python-Ideas 
`_

and other mailing lists.  If you have a proposal for future Python

development, and it requires changing one of the things listed here, it

is dead in the water and has **no chance of being accepted**, either because

the benefit is too little, the cost of changing the language (including

backwards compatibility) is too high, or simply because it goes against

the design preferred by the BDFL.



Many of these things are already listed in the `FAQs 
`_.

You should be familiar with both Python and the FAQs before proposing

changes to the language.



Just because something is not listed here does not necessarily mean that

it will be changed.  Each proposal will be weighed on its merits, costs

compared to benefits.  Sometimes the decision will come down to a matter

of subjective taste, in which case the BDFL has the final say.





Language Direction

==



Python 3





This shouldn't need saying, but Python 3 will not be abandoned.





Python 2.8

--



There will be `no official Python 2.8 
`_ ,

although third parties are welcome to fork the language, backport Python

3 features, and maintain the hybrid themselves.  Just don't call it

"Python 2.8", or any other name which gives the impression that it

is maintained by the official Python core developers.





Type checking

-



Duck-typing remains a fundamental part of Python and `type checking 
`_

will not be mandatory.  Even if the Python interpreter someday gains a

built-in type checker, it will remain optional.





Syntax

==



Braces

--



It doesn't matter whether optional or mandatory, whether spelled ``{ }``

like in C, or ``BEGIN END`` like in Pascal, braces to delimit code blocks

are not going to happen.



For another perspective on this, try running ``from __future__ import braces``

at the interactive interpreter.



(There is a *tiny* loophole here: multi-statement lambda, or Ruby-like code

blocks have not been ruled out.  Such a feature may require some sort of

block delimiter -- but it won't be braces, as they clash with the syntax

for dicts and sets.)





Colons after statements that introduce a block

--



Statements which introduce a code block, such as ``class``, ``def``, or

``if``, require a colon.  Colons have been found to increase readability.

See the `FAQ 
`_

for more detail.





End statements

--



Python does not use ``END`` statements following blocks.  Given significant

indentation, they are redundant and add noise to the source code.  If you

really want end markers, use a comment ``# end``.





Explicit self

-



Explicit ``self`` is a feature, not a bug.  See the

`FAQ 
`_

for more detail.





Print function

--



The ``print`` statement in Python 1 and 2 was a mistake that Guido long

regretted.  Now that it has been corrected in Python 3, it will not be

reverted back to a statement.  See `PEP 3105 
`_

for more details.





Significant indentation

---



`Significant indentation 
`_

is a core feature of Python.





Other syntax





Python will not use ``$`` as syntax.  Guido doesn't like it.  (But it

is okay to use ``$`` in DSLs like template strings and regular

expressions.)





Built-in Functions And Types





Strings

---



Strings are `immutable 
`_

and represent Unicode code points, not bytes.





Bools

-



``bool`` is a subclass of ``int``, with ``True == 1`` and ``False == 0``.

This is mostly for historical reasons, but the benefi

Re: [Python-ideas] How to respond to trolling

2017-01-11 Thread Steven D'Aprano
On Wed, Jan 11, 2017 at 02:23:06PM +0900, Stephen J. Turnbull wrote:
> Steven D'Aprano writes:
> 
>  > Giving a newcomer the Silent Treatment because they've questioned some 
>  > undocumented set of features not open to change is not Open, Considerate 
>  > or Respectful (the CoC). Even if their ideas are ignorant or ill-thought 
>  > out, we must give them the benefit of the doubt and assume they are 
>  > making their comments in good faith rather than trolling.
> 
> Honest question: do you think that response has to be done in public?

"Has to be done in public" in the sense of being mandatory? No. There 
are pros and cons to both public and private messaging.

But in the sense of preferred, yes, I do think so.

Private responses could be the idiosyncratic response of a single weirdo 
who doesn't speak for the community. Public responses that don't get 
contradicted demonstrate community aggreement, and offer the OP a way 
to engage if they are willing to ask questions, learn from the answers, 
and moderate their tone.



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki
> My PEP 540 is different than Nick's PEP 538, even for the POSIX
> locale. I propose to always use the surrogateescape error handler,
> whereas Nick wants to keep the strict error handler for inputs and
> outputs.
> https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler
>
> The surrogateescape error handler is useful to write programs which
> work as pipes, as cat, grep, sed, ... UNIX program:
> https://www.python.org/dev/peps/pep-0540/#producer-consumer-model-using-pipes
>
> You can get the behaviour of Nick's PEP 538 using my UTF-8 Strict
> mode. Compare "UTF-8 mode" and "UTF-8 Strict mode" lines in the tables
> of my use case. The UTF-8 mode always works, but can produce mojibake,
> whereas UTF-8 Strict doesn't produce mojibake but can fail depending
> on data and the locale.
>
> IMHO most users prefers usability ("just work") over correctness
> (prevent mojibake).
>

I'm ±0 to surrogateescape by default.  I feel +1 for stdout and -1 for stdin.

In output case, surrogateescape is weaker than strict, but it only allows
surrgateescaped binary.  If program carefully use surrogateescaped decode,
surrogateescape on stdout is safe enough.

On the other hand, surrogateescape is very weak for input.  It accepts
arbitrary bytes.
It should be used carefully.

But I agree different encoding handler between stdin/stdout is not beautiful.
That's why I'm ±0.


FYI, when http://bugs.python.org/issue15216 is merged, we can change
error handler easily: ``sys.stdout.set_encoding(errors='surrogateescape')``

So it's controllable from Python.  Some program which handles filenames may
prefer surrogateescape, and some program like CGI may prefer strict
UTF-8 because
JSON and HTML5 shouldn't contain arbitrary bytes.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Victor Stinner
2017-01-06 10:50 GMT+01:00 M.-A. Lemburg :
> Victor: I think you are taking the UTF-8 idea a bit too far.
> Nick was trying to address the situation where the locale is
> set to "C", or rather not set at all (in which case the lib C
> defaults to the "C" locale). The latter is a fairly standard
> situation when piping data on Unix or when spawning processes
> which don't inherit the current OS environment.

My PEP 540 is different than Nick's PEP 538, even for the POSIX
locale. I propose to always use the surrogateescape error handler,
whereas Nick wants to keep the strict error handler for inputs and
outputs.
https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler

The surrogateescape error handler is useful to write programs which
work as pipes, as cat, grep, sed, ... UNIX program:
https://www.python.org/dev/peps/pep-0540/#producer-consumer-model-using-pipes

You can get the behaviour of Nick's PEP 538 using my UTF-8 Strict
mode. Compare "UTF-8 mode" and "UTF-8 Strict mode" lines in the tables
of my use case. The UTF-8 mode always works, but can produce mojibake,
whereas UTF-8 Strict doesn't produce mojibake but can fail depending
on data and the locale.

IMHO most users prefers usability ("just work") over correctness
(prevent mojibake).

So Nick and me don't have exaclty the same scope and use cases.

Victor
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode)

2017-01-11 Thread Victor Stinner
Hi,

I also implemented my PEP 540, you can now test it! Use the latest
patch attached to:

   http://bugs.python.org/issue29240


I made multiple changes since the first version of my PEP:

* The UTF-8 Strict mode now only uses strict for inputs and outputs:
it keeps surrogateescape for operating system data. Read the "Use the
strict error handler for operating system data" alternative for the
rationale.

* The POSIX locale now enables the UTF-8 mode. See the "Don't modify
the encoding of the POSIX locale" alternative for the rationale.

* Specify the priority between -X utf8, PYTHONUTF8, PYTHONIOENCODING, etc.


The PEP version 3 has a longer rationale with more example. IMHO the
"List a directory into stdout" use case is the most representative
case of "UNIX should just work" thing and encoding issues:
https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-stdout

It reads data from the operating system (directory content) and writes
it into an output (stdout). It combines two things which are similar
but different in subtle ways.

I included example with commands and their output to this use case, to
have a more "real world" example instead of a long list of theorical
things :-)

Read the PEP 540 online (HTML):

   https://www.python.org/dev/peps/pep-0540/

Full text below.

Victor


PEP: 540
Title: Add a new UTF-8 mode
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 5-January-2016
Python-Version: 3.7


Abstract


Add a new UTF-8 mode, disabled by default, to ignore the locale and
force the usage of the UTF-8 encoding.

Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't
bother users with encodings, but it can produce mojibake. The UTF-8 mode
can be configured as strict to prevent mojibake.

New ``-X utf8`` command line option and ``PYTHONUTF8`` environment
variable are added to control the UTF-8 mode. The POSIX locale enables
the UTF-8 mode.


Rationale
=

"It's not a bug, you must fix your locale" is not an acceptable answer
--

Since Python 3.0 was released in 2008, the usual answer to users getting
Unicode errors is to ask developers to fix their code to handle Unicode
properly. Most applications and Python modules were fixed, but users
keep reporting Unicode errors regulary: see the long list of issues in
the `Links`_ section below.

In fact, a second class of bug comes from a locale which is not properly
configured. The usual answer to such bug report is: "it is not a bug,
you must fix your locale".

Technically, the answer is correct, but from a practical point of view,
the answer is not acceptable. In many cases, "fixing the issue" is an
hard task. Moreover, sometimes, the usage of the POSIX locale is
deliberate.

A good example of a concrete issue are build systems which create a
fresh environment for each build using a chroot, a container, a virtual
machine or something else to get reproductible builds. Such setup
usually uses the POSIX locale.  To get 100% reproductible builds, the
POSIX locale is a good choice: see the `Locales section of
reproducible-builds.org
`_.

UNIX users don't expect Unicode errors, since the common command lines
tools like ``cat``, ``grep`` or ``sed`` never fail with Unicode errors.
These users expect that Python 3 "just works" with any locale and don't
bother them with encodings. From their point of the view, the bug is not
their locale but is obviously Python 3.

Since Python 2 handles data as bytes, it's rarer in Python 2
compared to Python 3 to get Unicode errors. It also explains why users
also perceive Python 3 as the root cause of their Unicode errors.

Some users expect that Python 3 just works with any locale and so don't
bother with mojibake, whereas some developers are working hard to prevent
mojibake and so expect that Python 3 fails early before creating
mojibake.

Since different group of users have different expectations, there is no
silver bullet which solves all issues at once. Last but not least,
backward compatibility should be preserved whenever possible.

Locale and operating system data


.. _operating system data:

Python uses an encoding called the "filesystem encoding" to decide how
to encode and decode data from/to the operating system:

* file content
* command line arguments: ``sys.argv``
* standard streams: ``sys.stdin``, ``sys.stdout``, ``sys.stderr``
* environment variables: ``os.environ``
* filenames: ``os.listdir(str)`` for example
* pipes: ``subprocess.Popen`` using ``subprocess.PIPE`` for example
* error messages: ``os.strerror(code)`` for example
* user and terminal names: ``os``, ``grp`` and ``pwd`` modules
* host name, UNIX socket path: see the ``socket`` module
* etc.

At startup, Python calls ``setlocale(LC_CTYPE, "")`` to use the user
``LC_CTYPE`` lo

Re: [Python-ideas] How to respond to trolling

2017-01-11 Thread Chris Kaynor
On Wed, Jan 11, 2017 at 12:02 PM, Xavier Combelle
 wrote:
> I did not read the thread, but it looks like the insult should be a red flag
> and a good time to stop immediately
> and baning the troll

Personally, when I read the original posting, there is quite a bit of
it that comes across as arrogant and ignorant, but none that comes
across as insulting. As such, I agree with some of the other replies
to this thread: some reply was needed to the original thread.

While the thread belonged on python-list, it was also not fully
off-topic for python-ideas: while the wording was of a review of
Python, and it was not worded as actually suggesting changes, it could
be read as indirectly proposing changes or new features. As such, I
feel that any reply to the thread should at least aim to point the
poster to the correct forum (in this case, python-list), and it is not
unreasonable to answer some of the points as though they are in fact
suggesting changes - likely with links to the rational of the original
decisions, or at least enough information for the original poster to
fairly easily find such rationals themselves (eg a tutorial page or
faq).
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] How to respond to trolling

2017-01-11 Thread Xavier Combelle
I did not read the thread, but it looks like the insult should be a red
flag and a good time to stop immediately
and baning the troll


Le 10/01/2017 à 22:58, Guido van Rossum a écrit :
> Whether the intent was to annoy or just to provoke, the effect was
> dozens of messages with people falling over each other trying to
> engage the OP, who clearly was ignorant of most language design issues
> and uninterested in learning, and threw some insults in for good
> measure. The respondents should have known better.
>
> -- 
> --Guido van Rossum (python.org/~guido )
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] api suggestions for the cProfile module

2017-01-11 Thread Giampaolo Rodola'
On Wed, Dec 21, 2016 at 1:50 AM, Thane Brimhall 
wrote:

> I use cProfile a lot, and would like to suggest three backwards-compatible
> improvements to the API.
>
> 1: When using cProfile on a specific piece of code I often use the
> enable() and disable() methods. It occurred to me that this would be an
> obvious place to use a context manager.
>

I think this makes sense.
I did that in https://bugs.python.org/issue9285 but unfortunately I got
stuck and the issue remained stagnant.
Signaling it here just in case somebody has some insights on how to proceed.



> 2: Enhance the `print_stats` method on Profile to accept more options
> currently available only through the pstats.Stats class. For example,
> strip_dirs could be a boolean argument, and limit could accept an int. This
> would reduce the number of cases you'd need to use the more complex API.
>

I'm not sure about this. I agree the current API is not the nicest one. I
use a wrapper on top of cProfile which does this:

stats = pstats.Stats(file.name)
if strip_dirs:
stats.strip_dirs()
if isinstance(sort, (tuple, list)):
stats.sort_stats(*sort)
else:
stats.sort_stats(sort)
stats.print_stats(lines)

With your proposal we would have 2 ways of doing the same thing and I'm not
entirely sure that is good.



3: I often forget which string keys are available for sorting. It would be
> nice to add an enum for these so a user could have their linter and IDE
> check that value pre-runtime. Since it would subclass `str` and `Enum` it
> would still work with all currently existing code.
>
> The current documentation contains the following code:
>
> import cProfile, pstats, io
> pr = cProfile.Profile()
> pr.enable()
> # ... do something ...
> pr.disable()
> s = io.StringIO()
> sortby = 'cumulative'
> ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
> ps.print_stats()
> print(s.getvalue())
>
> While the code below doesn't exactly match the functionality above (eg.
> not using StringIO), I envision the context manager working like this,
> along with some adjustments on how to get the stats from the profiler:
>
> import cProfile, pstats
> with cProfile.Profile() as pr:
> # ... do something ...
> pr.print_stats(sort=pstats.Sort.cumulative, limit=10, strip_dirs=True)
>
> As you can see, the code is shorter and somewhat more self-documenting.
> The best thing about these suggestions is that as far as I can tell they
> would be backwards-compatible API additions.
>
> What do you think? Thank you in advance for your time!
>
> /Thane
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Giampaolo - http://grodola.blogspot.com
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] How to respond to trolling

2017-01-11 Thread Chris Barker - NOAA Federal
 > the effect was dozens of messages with people falling over each
other trying to engage the OP,

Sure -- but all in one thread

> The respondents should have known better.

But we like to kibitz-- that's why (many of us) are on this list.

Anyway, all (most anyway) of the points brought up are :

A) not going to change
B) have been justified / explained in multiple blog posts, wiki pages,
and what have you.

So perhaps the best response would be:

"These are all fairly core Python design decisions -- do some googling
to find out why."

But it made me think that it would be good to have a single place that
addresses these kinds of thing to point people to.

There was the old "python warts" page, but this would be a "python
features page"

Maybe I'll start that if I find the roundtoits.

Or, if it already exists -- someone please point me to it.

-CHB
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Python reviewed

2017-01-11 Thread Chris Barker - NOAA Federal
 for range(1,1): means executing once to me.

The indexing/slicing approach was designed for indexing and slicing. Then
it made sense to have range() match. But range() is not part of the for
construction. It is a convenience function for providing an iterable of
integers. And you are welcome to write your own range-like iterable if you
want.

But if you want to look once, you'd use range(1), not range(1,2) anyway.
Clear as day.

And if you use: range(n, n+I), it is very clear that you will loop i times.

s[:n] + s[n:] == s// doesn't work. I don't think it should work though


Have you ever used a 1-based and closed-end indexing language that
supported slicing? I have (matlab), and these kinds of constructions are
really ugly and prone to error. It's not that you want to be able to divide
a sequence and immediately put it back together, it's that you often want
to do one thing with the first part of a sequence, and another with the
second part, and you don't want them to overlap.

len(s[:n]) == n   // works
len(s[:-n]) == n  // rather independent but would still work if
language is otherwise unchanged.
len(s[n:i]) == i - n  // doesn't work. Does it need to?


It's not that it HAS to - it's that it's much less likely that you will
make off by one errors if it does.

-CHB
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki
On Wed, Jan 11, 2017 at 7:46 PM, Stephan Houben  wrote:
> Hi INADA Naoki,
>
> (Sorry, I am unsure if INADA or Naoki is your first name...)

Never mind, I don't care about name ordering. (INADA is family name).

>
> While I am very much in favour of everything working "out of the box",
> an issue is that we don't have control over external code
> (be it Python extensions or external processes invoked from Python).
>
> And that code will only look at LANG/LC_TYPE and ignore any cleverness
> we build into Python.
>

I'm sorry, could you give me more concrete example?

My opinion is +1 to PEP 540, there should be an option to ignore locale
setting.  (And I hope it will be default setting in future version.)

What is your concern?


> For example, this may mean that a built-in Python string sort will give you
> a different ordering than invoking the external "sort" command.
> I have been bitten by this kind of issues, leading to spurious "diffs" if
> you try to use sorting to put strings into a canonical order.
>
> So my feeling is that people are ultimately not being helped by
> Python trying to be "nice", since they will be bitten by locale issues
> anyway. IMHO ultimately better to educate them to configure the locale.
> (I realise that people may reasonably disagree with this assessment ;-) )
>
> I would then recommend to set to en_US.UTF-8, which is slower and
> less elegant but at least more widely supported.

But someone can't accept 30x slower only sorting ASCII text.
At least, infrastructure engineer in my company loves C locale.

New Python programmer (e.g. there are many data scientists learning Python)
may want to work on Linux server, and learning about locale is not their
concern.
Web programmers are same.  Just want to print UTF-8.
Learning about locale may not worth enough for them.
But I think there should be an option, and I want to use it.


>
> By the way, I know a bit how Node.js deals with locales, and it doesn't try
> to compensate for "C" locales either. But what it *does* do is that
> Node never uses the locale settings to determine the encoding of a file:
> you either have to specify it explicitly OR it defaults to UTF-8 (the latter
> on output only).
> So in this respect it is by specification immune against misconfiguration of
> the encoding.
> However, other stuff (e.g. date formatting) will still be influenced by the
> "C" locale
> as usual.
>
>
> Stephan
>

Yes.  Both of PEP 538 and 540 is about encoding.
I'm sorry about my misleading word "locale-free".

There should be locale support for time formatting, at least UTF-8 locale.

Regards,
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Petr Viktorin

On 01/11/2017 11:46 AM, Stephan Houben wrote:

Hi INADA Naoki,

(Sorry, I am unsure if INADA or Naoki is your first name...)

While I am very much in favour of everything working "out of the box",
an issue is that we don't have control over external code
(be it Python extensions or external processes invoked from Python).

And that code will only look at LANG/LC_TYPE and ignore any cleverness
we build into Python.

For example, this may mean that a built-in Python string sort will give you
a different ordering than invoking the external "sort" command.
I have been bitten by this kind of issues, leading to spurious "diffs" if
you try to use sorting to put strings into a canonical order.


AFAIK, this would not be a problem under PEP 538, which effectively 
treats the "C" locale as "C.UTF-8". Strings of Unicode codepoints and 
the corresponding UTF-8-encoded bytes sort the same way.


Is that wrong, or do you have a better example of trouble with using 
"C.UTF-8" instead of "C"?



So my feeling is that people are ultimately not being helped by
Python trying to be "nice", since they will be bitten by locale issues
anyway. IMHO ultimately better to educate them to configure the locale.
(I realise that people may reasonably disagree with this assessment ;-) )

I would then recommend to set to en_US.UTF-8, which is slower and
less elegant but at least more widely supported.


What about the spurious diffs you'd get when switching from "C" to 
"en_US.UTF-8"?


$ LC_ALL=en_US.UTF-8 sort file.txt
a
a
A
A
$ LC_ALL=C sort file.txt
A
A
a
a



By the way, I know a bit how Node.js deals with locales, and it doesn't try
to compensate for "C" locales either. But what it *does* do is that
Node never uses the locale settings to determine the encoding of a file:
you either have to specify it explicitly OR it defaults to UTF-8 (the
latter on output only).
So in this respect it is by specification immune against
misconfiguration of the encoding.
However, other stuff (e.g. date formatting) will still be influenced by
the "C" locale
as usual.


I believe the main problem is that the "C" locale really means two very 
different things:


a) Text is encoded as 7-bit ASCII; higher codepoints are an error
b) No encoding was specified

In both cases, treating "C" as "C.UTF-8" is not bad:
a) For 7-bit "text", there's no real difference between these locales
b) UTF-8 is a much better default than ASCII




___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Stephan Houben
Hi INADA Naoki,

(Sorry, I am unsure if INADA or Naoki is your first name...)

While I am very much in favour of everything working "out of the box",
an issue is that we don't have control over external code
(be it Python extensions or external processes invoked from Python).

And that code will only look at LANG/LC_TYPE and ignore any cleverness
we build into Python.

For example, this may mean that a built-in Python string sort will give you
a different ordering than invoking the external "sort" command.
I have been bitten by this kind of issues, leading to spurious "diffs" if
you try to use sorting to put strings into a canonical order.

So my feeling is that people are ultimately not being helped by
Python trying to be "nice", since they will be bitten by locale issues
anyway. IMHO ultimately better to educate them to configure the locale.
(I realise that people may reasonably disagree with this assessment ;-) )

I would then recommend to set to en_US.UTF-8, which is slower and
less elegant but at least more widely supported.

By the way, I know a bit how Node.js deals with locales, and it doesn't try
to compensate for "C" locales either. But what it *does* do is that
Node never uses the locale settings to determine the encoding of a file:
you either have to specify it explicitly OR it defaults to UTF-8 (the
latter on output only).
So in this respect it is by specification immune against misconfiguration
of the encoding.
However, other stuff (e.g. date formatting) will still be influenced by the
"C" locale
as usual.


Stephan

2017-01-11 9:17 GMT+01:00 INADA Naoki :

> Here is one example of locale pitfall.
>
> ---
> # from http://unix.stackexchange.com/questions/169739/why-is-
> coreutils-sort-slower-than-python
>
> $ cat letters.py
> import string
> import random
>
> def main():
> for _ in range(1_000_000):
> c = random.choice(string.ascii_letters)
> print(c)
>
> main()
>
> $ python3 letters.py > letters.txt
>
> $ LC_ALL=C time sort letters.txt > /dev/null
> 0.35 real 0.32 user 0.02 sys
>
> $ LC_ALL=C.UTF-8 time sort letters.txt > /dev/null
> 0.36 real 0.33 user 0.02 sys
>
> $ LC_ALL=ja_JP.UTF-8 time sort letters.txt > /dev/null
>11.03 real10.95 user 0.04 sys
>
> $ LC_ALL=en_US.UTF-8 time sort letters.txt > /dev/null
>11.05 real10.97 user 0.04 sys
> ---
>
> This is why some engineer including me use C locale on Linux,
> at least when there are no C.UTF-8 locale.
>
> Off course, we can use LC_CTYPE=en_US.UTF-8, instead of LANG or LC_ALL.
> (I wonder if we can use LC_CTYPE=UTF-8...)
>
> But I dislike current situation that "people should learn
> how to configure locale properly, and pitfall of non-C locale, only for
> using UTF-8 on Python".
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki
>
>  > (I wonder if we can use LC_CTYPE=UTF-8...)
>
> Syntactically incorrect: that means the language UTF-8.
> "LC_TYPE=.UTF-8" might work, but IIRC the language tag is required,
> the region and encoding are optional.  Thus ja_JP, ja.UTF-8 are OK,
> but .UTF-8 is not.

I'm sorry.  I know it, but I'm not good at English.

I meant "I wish posix allowed LC_CTYPE=UTF-8 setting."
It's just my desire.

>
> Rant follows:
>
>  > But I dislike current situation that "people should learn how to
>  > configure locale properly, and pitfall of non-C locale, only for
>  > using UTF-8 on Python".
>
> You can use a distro that implements and defaults to the C.utf-8
> locale, and presumably you'll be OK tomorrow, well before 3.7 gets
> released.

Many people use new Python on legacy Linux which don't have C.UTF-8 locale.

I learned how to configure locale for using UTF-8 on Python.
But I don't want to force people to learn it, only for Python.


>
> Really, we're catering to users who won't set their locales properly
> and insist on old distros.  For Debian, C.utf-8 was suggested in
> 2009[1], and that RFE refers to other distros that had already
> implemented it.

CentOS 7 (and RHEL 7, maybe) seems don't provide C.UTF-8 by default.
It means C.UTF-8 is not "universal available" locale at least next 5 years.

$ cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
$ locale -a | grep ^C
C

>  I have all the sympathy in the world for them --
> systems *should* Just Work -- but I'm going to lean against kludges
> if they mean punishing people who actually learn about and conform to
> applicable standards (and that includes well-motivated, properly-
> documented, and carefully-implemented platform-specific extensions),
> or use systems designed by developers who do.[2]
>
> Footnotes:
> [1]  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=609306
>
> [2]  I know how bad standards can suck -- I'm a Mailman developer,
> looking at you RFC 561, er, 5322.  While I'm all for nonconformism if
> you take responsibility for any disasters that result, developers who
> conform on behalf of their users are heroes.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Stephen J. Turnbull
INADA Naoki writes:

 > Off course, we can use LC_CTYPE=en_US.UTF-8, instead of LANG or LC_ALL.

You can also use LC_COLLATE=C.

 > (I wonder if we can use LC_CTYPE=UTF-8...)

Syntactically incorrect: that means the language UTF-8.
"LC_TYPE=.UTF-8" might work, but IIRC the language tag is required,
the region and encoding are optional.  Thus ja_JP, ja.UTF-8 are OK,
but .UTF-8 is not.

Rant follows:

 > But I dislike current situation that "people should learn how to
 > configure locale properly, and pitfall of non-C locale, only for
 > using UTF-8 on Python".

You can use a distro that implements and defaults to the C.utf-8
locale, and presumably you'll be OK tomorrow, well before 3.7 gets
released.  (If there are no leftover mines in the field, I don't see
a good reason to wait for 3.8 given the known deficiencies of the C
locale and the precedent of PEPs 528/529.)

Really, we're catering to users who won't set their locales properly
and insist on old distros.  For Debian, C.utf-8 was suggested in
2009[1], and that RFE refers to other distros that had already
implemented it.  I have all the sympathy in the world for them --
systems *should* Just Work -- but I'm going to lean against kludges
if they mean punishing people who actually learn about and conform to
applicable standards (and that includes well-motivated, properly-
documented, and carefully-implemented platform-specific extensions),
or use systems designed by developers who do.[2]

Footnotes: 
[1]  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=609306

[2]  I know how bad standards can suck -- I'm a Mailman developer,
looking at you RFC 561, er, 5322.  While I'm all for nonconformism if
you take responsibility for any disasters that result, developers who
conform on behalf of their users are heroes.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki
Here is one example of locale pitfall.

---
# from 
http://unix.stackexchange.com/questions/169739/why-is-coreutils-sort-slower-than-python

$ cat letters.py
import string
import random

def main():
for _ in range(1_000_000):
c = random.choice(string.ascii_letters)
print(c)

main()

$ python3 letters.py > letters.txt

$ LC_ALL=C time sort letters.txt > /dev/null
0.35 real 0.32 user 0.02 sys

$ LC_ALL=C.UTF-8 time sort letters.txt > /dev/null
0.36 real 0.33 user 0.02 sys

$ LC_ALL=ja_JP.UTF-8 time sort letters.txt > /dev/null
   11.03 real10.95 user 0.04 sys

$ LC_ALL=en_US.UTF-8 time sort letters.txt > /dev/null
   11.05 real10.97 user 0.04 sys
---

This is why some engineer including me use C locale on Linux,
at least when there are no C.UTF-8 locale.

Off course, we can use LC_CTYPE=en_US.UTF-8, instead of LANG or LC_ALL.
(I wonder if we can use LC_CTYPE=UTF-8...)

But I dislike current situation that "people should learn
how to configure locale properly, and pitfall of non-C locale, only for
using UTF-8 on Python".
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/