Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

2017-12-08 Thread Nick Coghlan
On 9 December 2017 at 01:22, Victor Stinner  wrote:
> I updated my PEP: in the 4th version, locale.getpreferredencoding()
> now returns 'UTF-8' in the UTF-8 Mode.

+1, that's a good change, since it brings the "locale coercion failed"
case even closer to the "locale coercion succeeded" behaviour.

To continue with the CentOS 7 example: that actually does use a UTF-8
based locale by default, it's just en_US.UTF.8 rather than C.UTF-8.

Earlier versions of PEP 538 thus included "en_US.UTF-8" on the
candidate target locale list, but that turned out to cause assorted
problems due to the "C -> en_US" part of the coercion.

Cheers,
Nick.

P.S. Thinking back on the history of the changes though, it may be
worth revisiting the idea of "en_US.UTF-8" as a potential coercion
locale: it was dropped as a potential coercion target back when the
PEP still set both LANG & LC_ALL, whereas it now changes only
LC_CTYPE. That means setting it won't mess with LC_COLLATE, or any of
the other locale categories. That said, I'm not sure if there are
behavioural differences between "LC_CTYPE=C.UTF-8" and
"LC_CTYPE=en_US.UTF-8", so I'm inclined to leave that alone for now.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issues with PEP 526 Variable Notation at the class level

2017-12-08 Thread Nathaniel Smith
On Dec 7, 2017 12:49, "Eric V. Smith"  wrote:

The reason I didn't include it (as @dataclass(slots=True)) is because it
has to return a new class, and the rest of the dataclass features just
modifies the given class in place. I wanted to maintain that conceptual
simplicity. But this might be a reason to abandon that. For what it's
worth, attrs does have an @attr.s(slots=True) that returns a new class with
__slots__ set.


They actually switched to always returning a new class, regardless of
whether slots is set:

https://github.com/python-attrs/attrs/pull/260

You'd have to ask Hynek to get the full rationale, but I believe it was
both for consistency with slot classes, and for consistency with regular
class definition. For example, type.__new__ actually does different things
depending on whether it sees an __eq__ method, so adding a method after the
fact led to weird bugs with hashing. That class of bug goes away if you
always set up the autogenerated methods and then call type.__new__.

 -n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issues with PEP 526 Variable Notation at the class level

2017-12-08 Thread Guido van Rossum
On Fri, Dec 8, 2017 at 3:44 PM, Eric V. Smith  wrote:

> On 12/8/2017 1:28 PM, Raymond Hettinger wrote:
>
>>
>>
>> On Dec 7, 2017, at 12:47 PM, Eric V. Smith  wrote:
>>>
>>> On 12/7/17 3:27 PM, Raymond Hettinger wrote:
>>> ...
>>>
>>> I'm looking for guidance or workarounds for two issues that have arisen.

 First, the use of default values seems to completely preclude the use
 of __slots__.  For example, this raises a ValueError:

 class A:
 __slots__ = ['x', 'y']
 x: int = 10
 y: int = 20

>>>
>>> Hmm, I wasn't aware of that. I'm not sure I understand why that's an
>>> error. Maybe it could be fixed?
>>>
>>
>> The way __slots__ works is that the type() metaclass automatically
>> assigns member-objects to the class variables 'x' and 'y'.  Member objects
>> are descriptors that do the actual lookup.
>>
>> So, I don't think the language limitation can be "fixed".  Essentially,
>> we're wanting to use the class variables 'x' and 'y' to hold both member
>> objects and a default value.
>>
>
> Thanks. I figured this out after doing some research. Here's a thread
> "__slots__ and default values" from 14+ years ago from some guy named
> Hettinger:
> https://mail.python.org/pipermail/python-dev/2003-May/035575.html
>
> As to whether we add slots=True to @dataclasses, I'll let Guido decide.
>
> The code already exists as a separate decorator here:
> https://github.com/ericvsmith/dataclasses/blob/master/datacl
> ass_tools.py#L3, if you want to play with it.
>
> Usage:
>
> >>> @add_slots
> ... @dataclass
> ... class A:
> ... x: int = 10
> ... y: int = 20
> ...
> >>> a = A()
> >>> a
> A(x=10, y=20)
> >>> a.x = 15
> >>> a
> A(x=15, y=20)
> >>> a.z = 30
> Traceback (most recent call last):
>   File "", line 1, in 
> AttributeError: 'A' object has no attribute 'z'
>
> Folding it in to @dataclass is easy enough. On the other hand, since it
> just uses the dataclasses public API, it's not strictly required to be in
> @dataclass.
>

Let's do it. For most people the new class is an uninteresting
implementation detail; for the rest we can document clearly that it is
special.


> The second issue is that the different annotations give different
 signatures than would produced for manually written classes.  It is unclear
 what the best practice is for where to put the annotations and their
 associated docstrings.

>>>
>>> I don't have any suggestions here.
>>>
>>
>> I'm hoping the typing experts will chime in here.  The question is
>> straight-forward.  Where should we look for the signature and docstring for
>> constructing instances?  Should they be attached to the class, to
>> __init__(), or to __new__() when it used.
>>
>> It would be nice to have an official position on that before, it gets set
>> in stone through arbitrary choices made by pycharm, pydoc, mypy,
>> typing.NamedTuple, and dataclasses.dataclass.
>>
>
> I'm not sure I see why this would relate specifically to typing, since I
> don't think they'd inspect docstrings. But yes, it would be good to come to
> an agreement.
>

I don't recall in detail what all these tools and classes do with
docstrings. Maybe if someone summarizes the status quo and explains how PEP
557 changes that it will be simple to decide.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] iso8601 parsing

2017-12-08 Thread Chris Barker - NOAA Federal
On Dec 7, 2017, at 7:52 PM, Mike Miller  wrote:

Guess the argument for limiting what it accepts would be that every funky
variation will need to be supported until the endtimes, even those of
little use or utility.


I suppose so, but not that hard once implemented and tests in place.

How about this for a “practicality beats purity” approach:

.fromiso() will parse the most commonly used iso8601 compliant date time
strings.

It is guaranteed to properly parse the output of .isoformat()

It is Not a validator — it may except non-iso compliant strings, and may
give surprising results when passed such.


In any case, I sure hope it will accept iso strings both with and without
the “T”.

But again: Paul, do whatever you think is best.

-CHB









On the other hand, it might be good to keep the two implementations the
same for consistency reasons.

Thanks either way,
-Mike


On 2017-12-07 17:57, Chris Barker - NOAA Federal wrote:
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issues with PEP 526 Variable Notation at the class level

2017-12-08 Thread Eric V. Smith

On 12/8/2017 1:28 PM, Raymond Hettinger wrote:




On Dec 7, 2017, at 12:47 PM, Eric V. Smith  wrote:

On 12/7/17 3:27 PM, Raymond Hettinger wrote:
...


I'm looking for guidance or workarounds for two issues that have arisen.

First, the use of default values seems to completely preclude the use of 
__slots__.  For example, this raises a ValueError:

class A:
__slots__ = ['x', 'y']
x: int = 10
y: int = 20


Hmm, I wasn't aware of that. I'm not sure I understand why that's an error. 
Maybe it could be fixed?


The way __slots__ works is that the type() metaclass automatically assigns 
member-objects to the class variables 'x' and 'y'.  Member objects are 
descriptors that do the actual lookup.

So, I don't think the language limitation can be "fixed".  Essentially, we're 
wanting to use the class variables 'x' and 'y' to hold both member objects and a default 
value.


Thanks. I figured this out after doing some research. Here's a thread 
"__slots__ and default values" from 14+ years ago from some guy named 
Hettinger:

https://mail.python.org/pipermail/python-dev/2003-May/035575.html

As to whether we add slots=True to @dataclasses, I'll let Guido decide.

The code already exists as a separate decorator here: 
https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py#L3, 
if you want to play with it.


Usage:

>>> @add_slots
... @dataclass
... class A:
... x: int = 10
... y: int = 20
...
>>> a = A()
>>> a
A(x=10, y=20)
>>> a.x = 15
>>> a
A(x=15, y=20)
>>> a.z = 30
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'A' object has no attribute 'z'

Folding it in to @dataclass is easy enough. On the other hand, since it 
just uses the dataclasses public API, it's not strictly required to be 
in @dataclass.



The second issue is that the different annotations give different signatures 
than would produced for manually written classes.  It is unclear what the best 
practice is for where to put the annotations and their associated docstrings.


I don't have any suggestions here.


I'm hoping the typing experts will chime in here.  The question is 
straight-forward.  Where should we look for the signature and docstring for 
constructing instances?  Should they be attached to the class, to __init__(), 
or to __new__() when it used.

It would be nice to have an official position on that before, it gets set in 
stone through arbitrary choices made by pycharm, pydoc, mypy, 
typing.NamedTuple, and dataclasses.dataclass.


I'm not sure I see why this would relate specifically to typing, since I 
don't think they'd inspect docstrings. But yes, it would be good to come 
to an agreement.


Eric.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Proposed schedule for next 3.4 and 3.5 releases - end of January / early February

2017-12-08 Thread Larry Hastings



Howdy howdy.  I know nobody's excited by the prospect of 3.4 and 3.5 
releases--I mean, fer gosh sakes, neither of those versions even has 
f-strings!   But we're about due.  I prefer to release roughly every six 
months, and the current releases came out in early August.


Here's my proposed schedule:

   Sun Jan 21 2017 - release 3.4.8rc1 and 3.5.5rc1
   Sun Feb 04 2017 - release 3.4.8 final and 3.5.5 final

Unless I'm presented with good reasons to change it, that'll be the 
schedule.  I'll update the PEPs with the final release dates in about a 
week.


Just for fun, I'll remind everybody here that 3.4 and 3.5 are both in 
security-fixes-only mode.  This means two things:


1. These will be source-code-only releases; the Python core dev team
   won't release any more binary installers for 3.4 or 3.5.
2. I'm the only person permitted to accept PRs for 3.4 and 3.5. If you
   have security fixes for either of those versions, please add me as a
   reviewer.


Happy holidays to you and yours,


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issues with PEP 526 Variable Notation at the class level

2017-12-08 Thread Raymond Hettinger


> On Dec 7, 2017, at 12:47 PM, Eric V. Smith  wrote:
> 
> On 12/7/17 3:27 PM, Raymond Hettinger wrote:
> ...
> 
>> I'm looking for guidance or workarounds for two issues that have arisen.
>> 
>> First, the use of default values seems to completely preclude the use of 
>> __slots__.  For example, this raises a ValueError:
>> 
>>class A:
>>__slots__ = ['x', 'y']
>>x: int = 10
>>y: int = 20
> 
> Hmm, I wasn't aware of that. I'm not sure I understand why that's an error. 
> Maybe it could be fixed?

The way __slots__ works is that the type() metaclass automatically assigns 
member-objects to the class variables 'x' and 'y'.  Member objects are 
descriptors that do the actual lookup.

So, I don't think the language limitation can be "fixed".  Essentially, we're 
wanting to use the class variables 'x' and 'y' to hold both member objects and 
a default value.

> This doesn't help the general case (your class A), but it does at least solve 
> it for dataclasses. Whether it should be actually included, and what the 
> interface would look like, can be (and I'm sure will be!) argued.
> 
> The reason I didn't include it (as @dataclass(slots=True)) is because it has 
> to return a new class, and the rest of the dataclass features just modifies 
> the given class in place. I wanted to maintain that conceptual simplicity. 
> But this might be a reason to abandon that. For what it's worth, attrs does 
> have an @attr.s(slots=True) that returns a new class with __slots__ set.

I recommend that you follow the path taken by attrs and return a new class.   
Otherwise, we're leaving users with a devil's choice.  You can have default 
values or you can have slots, but you can't have both.

The slots are pretty important.  With slots, a three attribute instance is only 
64 bytes.  Without slots, it is 296 bytes.

> 
>> The second issue is that the different annotations give different signatures 
>> than would produced for manually written classes.  It is unclear what the 
>> best practice is for where to put the annotations and their associated 
>> docstrings.
> 
> I don't have any suggestions here.

I'm hoping the typing experts will chime in here.  The question is 
straight-forward.  Where should we look for the signature and docstring for 
constructing instances?  Should they be attached to the class, to __init__(), 
or to __new__() when it used.

It would be nice to have an official position on that before, it gets set in 
stone through arbitrary choices made by pycharm, pydoc, mypy, 
typing.NamedTuple, and dataclasses.dataclass.


Raymond




___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issues with PEP 526 Variable Notation at the class level

2017-12-08 Thread Guido van Rossum
Yes, I think this is a reasonable argument for adding a 'slots' option (off
by default) for @dataclass(). However I don't think we need to rush it in.
I'm not very happy with the general idea of slots any more, and I think
that it's probably being overused, and at the same time I expect that there
are a lot of classes with a slots declaration that still have a dict as
well, because they inherit from a class without slots.

I'm not sure what to do about docstrings -- I'm not a big user of pydoc and
I find help() often too verbose (I usually read the source. Maybe we could
add a 'doc' option to field()? That's similar to what we offer for
property().

On Thu, Dec 7, 2017 at 12:47 PM, Eric V. Smith  wrote:

> On 12/7/17 3:27 PM, Raymond Hettinger wrote:
> ...
>
> I'm looking for guidance or workarounds for two issues that have arisen.
>>
>> First, the use of default values seems to completely preclude the use of
>> __slots__.  For example, this raises a ValueError:
>>
>> class A:
>> __slots__ = ['x', 'y']
>> x: int = 10
>> y: int = 20
>>
>
> Hmm, I wasn't aware of that. I'm not sure I understand why that's an
> error. Maybe it could be fixed?
>
> Otherwise, I have a decorator that takes a dataclass and returns a new
> class with slots set:
>
> >>> from dataclasses import dataclass
> >>> from dataclass_tools import add_slots
> >>> @add_slots
> ... @dataclass
> ... class C:
> ...   x: int = 0
> ...   y: int = 0
> ...
> >>> c = C()
> >>> c
> C(x=0, y=0)
> >>> c.z = 3
> Traceback (most recent call last):
>   File "", line 1, in 
> AttributeError: 'C' object has no attribute 'z'
>
> This doesn't help the general case (your class A), but it does at least
> solve it for dataclasses. Whether it should be actually included, and what
> the interface would look like, can be (and I'm sure will be!) argued.
>
> The reason I didn't include it (as @dataclass(slots=True)) is because it
> has to return a new class, and the rest of the dataclass features just
> modifies the given class in place. I wanted to maintain that conceptual
> simplicity. But this might be a reason to abandon that. For what it's
> worth, attrs does have an @attr.s(slots=True) that returns a new class with
> __slots__ set.
>
> The second issue is that the different annotations give different
>> signatures than would produced for manually written classes.  It is unclear
>> what the best practice is for where to put the annotations and their
>> associated docstrings.
>>
>
> I don't have any suggestions here.
>
> Eric.
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%
> 40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2017-12-08 Thread Python tracker

ACTIVITY SUMMARY (2017-12-01 - 2017-12-08)
Python tracker at https://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open6315 (+34)
  closed 37691 (+26)
  total  44006 (+60)

Open issues with patches: 2434 


Issues opened (49)
==

#20891: PyGILState_Ensure on non-Python thread causes fatal error
https://bugs.python.org/issue20891  reopened by vstinner

#30213: ZipFile from 'a'ppend-mode file generates invalid zip
https://bugs.python.org/issue30213  reopened by serhiy.storchaka

#32107: Improve MAC address calculation and fix test_uuid.py
https://bugs.python.org/issue32107  reopened by xdegaye

#32196: Rewrite plistlib with functional style
https://bugs.python.org/issue32196  opened by serhiy.storchaka

#32198: \b reports false-positives in Indic strings involving combinin
https://bugs.python.org/issue32198  opened by jamadagni

#32202: [ctypes] all long double tests fail on android-24-x86_64
https://bugs.python.org/issue32202  opened by xdegaye

#32203: [ctypes] test_struct_by_value fails on android-24-arm64
https://bugs.python.org/issue32203  opened by xdegaye

#32206: Run modules with pdb
https://bugs.python.org/issue32206  opened by mariocj89

#32208: Improve semaphore documentation
https://bugs.python.org/issue32208  opened by Garrett Berg

#32209: Crash in set_traverse Within the Garbage Collector's collect_g
https://bugs.python.org/issue32209  opened by connorwfitzgerald

#32210: Add platform.android_ver()  to test.pythoninfo for Android pla
https://bugs.python.org/issue32210  opened by xdegaye

#32211: Document the bug in re.findall() and re.finditer() in 2.7 and 
https://bugs.python.org/issue32211  opened by serhiy.storchaka

#32212: few discrepancy between source and docs in logging
https://bugs.python.org/issue32212  opened by Michal Plichta

#32215: sqlite3 400x-600x slower depending on formatting of an UPDATE 
https://bugs.python.org/issue32215  opened by bforst

#32216: Document PEP 557 Data Classes
https://bugs.python.org/issue32216  opened by eric.smith

#32217: freeze.py fails to work.
https://bugs.python.org/issue32217  opened by Decorater

#32218: add __iter__ to enum.Flag members
https://bugs.python.org/issue32218  opened by Guy Gangemi

#32219: SSLWantWriteError being raised by blocking SSL socket
https://bugs.python.org/issue32219  opened by njs

#32220: multiprocessing: passing file descriptor using reduction break
https://bugs.python.org/issue32220  opened by frickenate

#32221: Converting ipv6 address to string representation using getname
https://bugs.python.org/issue32221  opened by socketpair

#3: pygettext doesn't extract docstrings for functions with type a
https://bugs.python.org/issue3  opened by Tobotimus

#32223: distutils doesn't correctly read UTF-8 content from config fil
https://bugs.python.org/issue32223  opened by delivrance

#32224: socket.create_connection needs to support full IPv6 argument
https://bugs.python.org/issue32224  opened by Matthew Stoltenberg

#32225: Implement PEP 562: module __getattr__ and __dir__
https://bugs.python.org/issue32225  opened by levkivskyi

#32226: Implement PEP 560: Core support for typing module and generic 
https://bugs.python.org/issue32226  opened by levkivskyi

#32227: singledispatch support for type annotations
https://bugs.python.org/issue32227  opened by lukasz.langa

#32228: truncate() changes current stream position
https://bugs.python.org/issue32228  opened by andreymal

#32229: Simplify hiding developer warnings in user facing applications
https://bugs.python.org/issue32229  opened by ncoghlan

#32230: -X dev doesn't set sys.warnoptions
https://bugs.python.org/issue32230  opened by ncoghlan

#32231: -bb option should override -W options
https://bugs.python.org/issue32231  opened by ncoghlan

#32232: building extensions as builtins is broken in 3.7
https://bugs.python.org/issue32232  opened by doko

#32234: Add context management to mailbox.Mailbox
https://bugs.python.org/issue32234  opened by sblondon

#32235: test_xml_etree test_xml_etree_c failures with 2.7 and 3.6 bran
https://bugs.python.org/issue32235  opened by doko

#32236: open() shouldn't silently ignore buffering=1 in binary mode
https://bugs.python.org/issue32236  opened by izbyshev

#32237: test_xml_etree leaked [1, 1, 1] references, sum=3
https://bugs.python.org/issue32237  opened by vstinner

#32238: Handle "POSIX" in the legacy locale detection
https://bugs.python.org/issue32238  opened by ncoghlan

#32240: Add the const qualifier for PyObject* array arguments
https://bugs.python.org/issue32240  opened by serhiy.storchaka

#32241: Add the const qualifier for char and wchar_t pointers to unmod
https://bugs.python.org/issue32241  opened by serhiy.storchaka

#32243: Tests that set aggressive switch interval hang in Cygwin on a 
https://bugs.python.org/issue32243  opened by erik.bray

#32244: Multiprocessing: 

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

2017-12-08 Thread Victor Stinner
2017-12-08 17:29 GMT+01:00 Ethan Furman :
> For those of us trying to follow along, is this change to open() one that
> Inada-san was worried about?  Has something else changed?

I agree that my PEP is evolving quickly, that's why I added a "Version
History" at the end:
https://www.python.org/dev/peps/pep-0540/#version-history

"""
Version History
===

* Version 4: ``locale.getpreferredencoding()`` now returns ``'UTF-8'``
  in the UTF-8 Mode.
* Version 3: The UTF-8 Mode does not change the ``open()`` default error
  handler (``strict``) anymore, and the Strict UTF-8 Mode has been
  removed.
* Version 2: Rewrite the PEP from scratch to make it much shorter and
  easier to understand.
* Version 1: First version posted to python-dev.
"""

Naoki disliked the usage of the surrogateescape error handler for
open(). I "fixed" this in the PEP version 3: open() error handler is
not modified by the PEP.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

2017-12-08 Thread Ethan Furman

There were some concerns about open() earlier:

On Wed, 6 Dec 2017 at 06:10 INADA Naoki wrote:
> I think PEP 538 and PEP 540 should behave almost identical except
> changing locale or not.  So I need very strong reason if PEP 540
> changes default error handler of open().

Brett replied:
> I don't have enough locale experience to weigh in as an expert,
> but I already was leaning towards INADA-san's logic of not wanting
> to change open() and this makes me really not want to change it.

On 12/08/2017 07:22 AM, Victor Stinner wrote:

"""
Effects of the UTF-8 Mode:

[...]

Side effects:

* ``open()`` uses the UTF-8 encoding by default.


For those of us trying to follow along, is this change to open() one that Inada-san was worried about?  Has something 
else changed?


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

2017-12-08 Thread Victor Stinner
2017-12-08 16:22 GMT+01:00 Victor Stinner :
> I updated my PEP: in the 4th version, locale.getpreferredencoding()
> now returns 'UTF-8' in the UTF-8 Mode.

Sorry, I forgot to mention that I already updated the implementation
to the latest version of the PEP:
https://github.com/python/cpython/pull/855

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

2017-12-08 Thread Victor Stinner
I updated my PEP: in the 4th version, locale.getpreferredencoding()
now returns 'UTF-8' in the UTF-8 Mode.

https://www.python.org/dev/peps/pep-0540/

I also clarified the direct effects of the UTF-8 Mode, but also listed
the most user visible changes as "Side effects".

"""
Effects of the UTF-8 Mode:

* ``sys.getfilesystemencoding()`` returns ``'UTF-8'``.
* ``locale.getpreferredencoding()`` returns ``UTF-8``, its
  *do_setlocale* argument and the locale encoding are ignored.
* ``sys.stdin`` and ``sys.stdout`` error handler is set to
  ``surrogateescape``

Side effects:

* ``open()`` uses the UTF-8 encoding by default.
* ``os.fsdecode()`` and ``os.fsencode()`` use the UTF-8 encoding.
* Command line arguments, environment variables and filenames use the
  UTF-8 encoding.
"""

Thank you Naokia INADA for your quick feedback, it was very helpful
and I really like how the PEP evolves!

IMHO the PEP 540 version 4 is just perfect and ready for
pronouncement! (... until someone finds another flaw, obviously!)

Victor


2017-12-08 13:58 GMT+01:00 Victor Stinner :
> 2017-12-08 6:11 GMT+01:00 INADA Naoki :
>> Or should we change loale.getpreferredencoding() to return UTF-8
>> instead of ASCII always, regardless of PEP 538 and 540?
>
> On the POSIX locale, if the locale coercion works (PEP 538),
> locale.getpreferredencoding() returns UTF-8. We are good.
>
> The question is for platforms like Centos 7 where the locale coercion
> (PEP 538) doesn't work and so Python uses UTF-8 (PEP 540), whereas the
> locale probably uses ASCII (or maybe Latin1).
>
> My current implementation of the PEP 540 is cheating for open(): if
> sys.flags.utf8_mode is non-zero, use the UTF-8 encoding rather than
> calling locale.getpreferredencoding().
>
> I checked the stdlib, and I found many places where
> locale.getpreferredencoding() is used to get the user preferred
> encoding:
>
> * builtin open(): default encoding
> * cgi.FieldStorage: encode the query string
> * encoding._alias_mbcs(): check if the requested encoding is the ANSI code 
> page
> * gettext.GNUTranslations: lgettext() and lngettext() methods
> * xml.etree.ElementTree: ElementTree.write(encoding='unicode')
>
> In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all
> use the UTF-8 encoding by default. So locale.getpreferredencoding()
> should return UTF-8 if the UTF-8 mode is enabled.
>
> The private _alias_mbcs() method can be modified to call directly
> _locale._getdefaultlocale()[1] to get the ANSI code page.
>
> Question: do we need to add an option to getpreferredencoding() to
> return the locale encoding even if the UTF-8 mode is enabled. If yes,
> what should be the API? locale.getpreferredencoding(utf8_mode=False)?
>
> Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

2017-12-08 Thread Victor Stinner
2017-12-08 15:01 GMT+01:00 INADA Naoki :
>> In short, locale coercion and UTF-8 mode will be both enabled by the
>> POSIX locale.
>
> Hm, it is bit surprising because I thought UTF-8 mode is fallback
> of locale coercion when coercion is failed or disabled.

I rewrote the "differences between the PEP 538 and the PEP 540" as a
new section "Relationship with the locale coercion (PEP 538)".

https://www.python.org/dev/peps/pep-0540/#relationship-with-the-locale-coercion-pep-538

"""
Relationship with the locale coercion (PEP 538)
===

The POSIX locale enables the locale coercion (PEP 538) and the UTF-8
mode (PEP 540). When the locale coercion is enabled, enabling the UTF-8
mode has no (additional) effect.

Locale coercion only impacts non-Python code like C libraries, whereas
the Python UTF-8 Mode only impacts Python code: the two PEPs are
complementary.

On platforms where locale coercion is not supported like Centos 7, the
POSIX locale only enables the UTF-8 Mode. In this case, Python code uses
the UTF-8 encoding and ignores the locale encoding, whereas non-Python
code uses the locale encoding which is usually ASCII for the POSIX
locale.

While the UTF-8 Mode is supported on all platforms and can be enabled
with any locale, the locale coercion is not supported by all platforms
and is restricted to the POSIX locale.

The UTF-8 Mode has only an impact on Python child processes when the
``PYTHONUTF8`` environment variable is set to ``1``, whereas the locale
coercion sets the ``LC_CTYPE`` environment variables which impacts all
child processes.

The benefit of the locale coercion approach is that it helps ensure that
encoding handling in binary extension modules and child processes is
consistent with Python's encoding handling. The upside of the UTF-8 Mode
approach is that it allows an embedding application to change the
interpreter's behaviour without having to change the process global
locale settings.
"""

I hope that it's now better explained.

In short, the two PEPs are really complementary.

> As PEP 538 [1], all coercion target locales uses surrogateescape
> for stdin and stdout.
> So, do you mean "UTF-8 mode enabled as flag level, but it has no
> real effects"?

Right and it was a deliberate choice of Nick Coghlan when he designed
the PEP 538, to make sure that the two PEPs are complementary and
"compatible".

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

2017-12-08 Thread INADA Naoki
On Fri, Dec 8, 2017 at 7:22 PM, Victor Stinner  wrote:
>>
>> Both of PEP 538 (locale coercion) and PEP 540 (UTF-8 mode) shares
>> same logic to detect POSIX locale.
>>
>> When POSIX locale is detected, locale coercion is tried first. And if
>> locale coercion
>> succeeds,  UTF-8 mode is not used because locale is not POSIX anymore.
>
> No, I would like to enable the UTF-8 mode as well in this case.
>
> In short, locale coercion and UTF-8 mode will be both enabled by the
> POSIX locale.
>

Hm, it is bit surprising because I thought UTF-8 mode is fallback
of locale coercion when coercion is failed or disabled.

As PEP 538 [1], all coercion target locales uses surrogateescape
for stdin and stdout.
So, do you mean "UTF-8 mode enabled as flag level, but it has no
real effects"?

[1]: 
https://www.python.org/dev/peps/pep-0538/#changes-to-the-default-error-handling-on-the-standard-streams

Since coercion target locales and UTF-8 mode do same thing,
I think this is not a big issue.
But I want it is clarified in the PEP.

Regards,
---
INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

2017-12-08 Thread Victor Stinner
2017-12-08 6:11 GMT+01:00 INADA Naoki :
> Or should we change loale.getpreferredencoding() to return UTF-8
> instead of ASCII always, regardless of PEP 538 and 540?

On the POSIX locale, if the locale coercion works (PEP 538),
locale.getpreferredencoding() returns UTF-8. We are good.

The question is for platforms like Centos 7 where the locale coercion
(PEP 538) doesn't work and so Python uses UTF-8 (PEP 540), whereas the
locale probably uses ASCII (or maybe Latin1).

My current implementation of the PEP 540 is cheating for open(): if
sys.flags.utf8_mode is non-zero, use the UTF-8 encoding rather than
calling locale.getpreferredencoding().

I checked the stdlib, and I found many places where
locale.getpreferredencoding() is used to get the user preferred
encoding:

* builtin open(): default encoding
* cgi.FieldStorage: encode the query string
* encoding._alias_mbcs(): check if the requested encoding is the ANSI code page
* gettext.GNUTranslations: lgettext() and lngettext() methods
* xml.etree.ElementTree: ElementTree.write(encoding='unicode')

In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all
use the UTF-8 encoding by default. So locale.getpreferredencoding()
should return UTF-8 if the UTF-8 mode is enabled.

The private _alias_mbcs() method can be modified to call directly
_locale._getdefaultlocale()[1] to get the ANSI code page.

Question: do we need to add an option to getpreferredencoding() to
return the locale encoding even if the UTF-8 mode is enabled. If yes,
what should be the API? locale.getpreferredencoding(utf8_mode=False)?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

2017-12-08 Thread Victor Stinner
Hi,

Oh, locale.getpreferredencoding(), that's a good question :-)

2017-12-08 6:02 GMT+01:00 INADA Naoki :
> But I want to clarify more about difference/relationship between PEP
> 538 and 540.
>
> If I understand correctly:
>
> Both of PEP 538 (locale coercion) and PEP 540 (UTF-8 mode) shares
> same logic to detect POSIX locale.
>
> When POSIX locale is detected, locale coercion is tried first. And if
> locale coercion
> succeeds,  UTF-8 mode is not used because locale is not POSIX anymore.

No, I would like to enable the UTF-8 mode as well in this case.

In short, locale coercion and UTF-8 mode will be both enabled by the
POSIX locale.


> If locale coercion is disabled or failed, UTF-8 mode is used automatically,
> unless it is disabled explicitly.

PEP 540 is always enabled if the POSIX locale is detected. Only
PYTHONUTF8=0 or -X utf8=0 disable it in this case.

Disabling locale coercion doesn't disable the PEP 540.


> UTF-8 mode is similar to C.UTF-8 or other locale coercion target locales.
> But UTF-8 mode is different from C.UTF-8 locale in these ways because
> actual locale is not changed:
>
> * Libraries using locale (e.g. readline) works as in POSIX locale.  So UTF-8
>   cannot be used in such libraries.

My assumption is that very few C library rely on the locale encoding.
The wchar_t* type is rarely used. You may only get issues if Python
pass UTF-8 encoded string to a C library which tries to decode it from
the locale encoding which is not UTF-8. For example, with the POSIX
locale, if the locale encoding is ASCII, you can get a decoding error
if a C library tries to decode a UTF-8 encoded string coming from
Python.

But the encoding problem is not restricted to the current process. For
the "producer | consumer" model, if the producer is a Python 3.7
application using UTF-8 mode and so encoding text to UTF-8 to stdout,
an application may be unable to decode the UTF-8 data. Here we enter
the grey area of encodings. Which applications rely use the locale
encoding? Which applications always use UTF-8? Do some applications
try UTF-8 first, or falls back on the locale encoding? (OpenSSL does
that on filenames for example, as the glib if I recall correctly.)

Until we know exactly how UTF-8 is used in the "wild", I chose to make
the UTF-8 an opt-in option for locales other than POSIX. I expect a
few bugs reports later which will help us to adjust our encodings.

> * locale.getpreferredencoding() returns 'ASCII' instead of 'UTF-8'.  So
>   libraries depending on locale.getpreferredencoding() may raise
>   UnicodeErrors.

Right.


> Or locale.getpreferredencoding() returns UTF-8 in UTF-8 mode too?

Here is where the PEP 538 plays very nicely with the PEP 540. On
platforms where the locale coercion is supported (Fedora, macOS,
FreeBSD, maybe other Linux distributons), on the POSIX locale,
locale.getpreferredencoding() will return UTF-8 and functions like
mbstowcs() will use the UTF-8 encoding internally.

Currently, in the implementation of my PEP 540, I chose to modify
open() to use UTF-8 if the UTF-8 mode is used, rather using
locale.getpreferredencoding().

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com