date:20130721

Re: Multirelease effort: Moving to Python 3

2013-07-21 Thread Nick Coghlan

On 07/19/2013 06:50 PM, Nick Coghlan wrote:
> On 07/19/2013 01:56 PM, Andrew McNabb wrote:
>> On Thu, Jul 18, 2013 at 11:24:22AM -0400, Bohuslav Kabrda wrote:
>>>
>>> From packaging point of view, this will probably require:
>>> 1) Renaming python package to python2
>>> 2) Renaming python3 package to python
>>> 3) Switching the %{?with_python3} conditionals in specfiles to 
>>> %{?with_python2} (we will probably create a script to automate this, at 
>>> least partially)
>>
>> Renaming the python package to python2 kind of makes sense, but renaming
>> the python3 package to python seems needlessly confusing.  Wouldn't it
>> make sense to just keep python2 and python3 side by side without
>> ambiguity until some long future date when python2 disappears?
> 
> I wrote PEP 394 after Arch forced the issue (by switching the python
> symlink to Python 3),

Oops, credit where it's due: Kerrick wrote the initial version, then I
altered it quite a bit during the subsequent discussions on python-dev :)

Cheers,
Nick.

-- 
Nick Coghlan
Red Hat Infrastructure Engineering & Development, Brisbane

Testing Solutions Team Lead
Beaker Development Lead (http://beaker-project.org/)
___
python-devel mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/python-devel

Re: Multirelease effort: Moving to Python 3

2013-07-21 Thread Nick Coghlan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/20/2013 06:11 AM, Toshio Kuratomi wrote:
> pythonic is a very vague statement and I wouldn't consider most of
> your list to be examples of those.  Yes, python3 may be a *better*
> language (and I would include most of your list as "features of
> python3 that python2 does not have) but a more pythonic language...
> that's not something that you can readily measure.  For instance, I
> can make the case that python3's unicode handling is less pythonic
> than python2 as it violates three rules of the zen of python:
> 
> Explicit is better than implicit. Errors should never pass
> silently. In the face of ambiguity, refuse the temptation to
> guess.
> 
> (To be fair, python2 violated some of these rules in its unicode
> handling as well, although errors should never pass silently would
> probably take some work to convince most people :-)

The *only* reason Python 3 allows any Unicode errors to pass silently
is because Python 2 tolerated broken system configurations (like
non-UTF-8 filesystem metadata on nominally UTF-8 systems) by treating
them as opaque 8-bit strings that were retrieved from OS interfaces
and then passed back unmodified (see PEP 383 for details). If Python 3
didn't work on those systems, people would blame Python 3, not the
already broken system configuration ("But Python 2 works, why is
Python 3 so broken?"). os.listdir() -> open() is the canonical example
of the kind of "round trip" activity that we felt we needed to support
even for systems with improperly encoded metadata (including file names).

You can already force Python 3 into completely strict mode by doing:

import codecs; codecs.register_error("surrogateescape",
codecs.strict_errors)

>>> b"\xe9".decode("ascii", errors="surrogateescape")
'\udce9'

>>> import codecs; codecs.register_error("surrogateescape",
>>> codecs.strict_errors) b"\xe9".decode("ascii",
>>> errors="surrogateescape")
>>> 
Traceback (most recent call last):

  File "", line 1, in 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position
0: ordinal not in range(128)

I'd eventually like Python 3 to support a string tainting model, but
who knows when I'll find time to actually work on that :(

To clarify what I mean by "string tainting" (since it's more than the
simple tainted-or-not model used in other languages), currently Python
3 unicode strings may exist in a state similar to 8-bit strings in
Python 2: they're tainted by particular encoding assumptions, so
combining them with arbitrary other pieces of "text" or encoding them
to a different output format isn't a valid operation. Unfortunately,
Python 3, like Python 2, currently allows you to combine strings
tainted by such assumptions without complaint, unless/until you try to
encode them again and than you *might* get an error, or you might just
get invalid data. It's significantly less common in Python 3 than it
was in Python 2 (as it requires that you decode data using an encoding
that doesn't actually match the data contents, whereas in Python 2 it
just required combining 8-bit data that used two different encodings),
but the problem definitely isn't solved at this point, just mitigated.

Tainting would involve having the surrogateescape codec set an
attribute on a string recording the encoding assumption if it had to
embed any surrogates in the Private Use Area, as well as a keyword
only "taint" argument to decode operations (e.g. to force tainting
when using "latin-1" as a universal text codec). Various string
operations would then be modified to use the following rules:

* Both input strings untainted? Output is untainted.
* One input tainted, one untainted? Output is tainted with the same
assumption as the tainted input
* Both inputs tainted with the same assumption? Output is also tainted
with that assumption.
* Inputs tainted with different assumptions? Immediate ValueError
complaining about the taint mismatch

String encoding would be updated to trigger a ValueError when asked to
encode tainted strings to an encoding other than the tainted one.

Strings would likely gain a "remove_taint" method (name TBD), that did
the "encode using tainted encoding, redecode using correct encoding".

And yes, this could be used for traditional tainting as well - setting
the taint assumption to something like "" (e.g. for user
input) or "" (e.g. for not-yet-hashed user credentials)
would be enough to prevent serialisation under this scheme.

However, I have my hands full with packaging issues at the moment (see
PEP 426), and then I still want to fix the model for embedding CPython
(see PEP 432). So if it's left to me, there's no way this idea could
become reality before Python 3.5. I may at least lob it in the
direction of python-ideas, though, to see if someone else is prepared
to run with it...

Cheers,
Nick.

- -- 
Nick Coghlan
Red Hat Infrastructure Engineering & Development, Brisbane

Testing Solutions Team Lead
Beaker Development Lead (http:/

Re: Multirelease effort: Moving to Python 3

2013-07-21 Thread Nick Coghlan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/20/2013 07:44 AM, Toshio Kuratomi wrote:
> On Fri, Jul 19, 2013 at 05:30:36PM -0400, Matthew Miller wrote:
>> On Fri, Jul 19, 2013 at 01:45:41PM -0700, Toshio Kuratomi wrote:
>>> * python-cheetah -- Development slowed way down after they made
>>> their last release in 2010 and announced that the next release
>>> cheetah-3.0 would include python3 support.  Probably need to
>>> contact upstream about this and may need to prepare the patch
>>> to do the port.
>> 
>> What I'd _really_ like to do is get cheetah factored out of
>> cloud-init. (https://bugzilla.redhat.com/show_bug.cgi?id=974327).
>> It brings in a whole dependency chain of which python2 vs.
>> python3 is the least of the troubles.

Are Matthew's replies stuck in the python-devel moderation queue?
(Just realised the likely reason I'm only seeing half this
conversation...)

Hmm, also just realised I've been hitting Reply-List, so my replies
are only going to python-devel...

> If your needs are very minimal, python3-tempita might be a good
> choice. If you actually do need more features than that,
> python-mako and python-jinja2 are popular.  Note that both of those
> have a few deps (but hopefully not as bad as cheetah).  (Also --
> the python3 version of mako has less deps than the python2
> version... I think that it just because those deps haven't been
> ported to python3 yet and the package can operate with reduced
> functionality without them.  the deps fo the python3 version might 
> expand i nthe future).

Jinja2 is excellent, with very high quality error reporting - an oft
overlooked feature in a templating tool! (it's actually Armin
Ronacher's fault I started thinking about how to deal with the problem
of surrogate escaped strings escaping from their intended "retrieve
from OS API, pass straight back to OS API" box - he did an excellent
write-up of how this can go wrong after finishing the Werkzeug and
Jinja2 Python 3 updates: see
http://lucumr.pocoo.org/2013/5/21/porting-to-python-3-redux/ and
http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/).

Cheers,
Nick.

- -- 
Nick Coghlan
Red Hat Infrastructure Engineering & Development, Brisbane

Testing Solutions Team Lead
Beaker Development Lead (http://beaker-project.org/)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJR7HyxAAoJEHEkJo9fMO/LZa0H/i1YQpgb0WPTB/jq+3Raqsp1
DKtYp+nGmMZxXDrfTms1GTc3hnRWRM7GV91D2VK16U3t7G0awmx9Ib1xrXgdnWee
Pvh7zD0CMbvFOwOPJsSB2c3kqRaVymUB+tLbY2Z9+GUfSTNL6kfIDbhj5Krm7qiV
ASNBjfOSIzMv0oc/cbekH9WmTX+5A5GpXnKe1MAk/La/Gn/+0Yp45mfueUb0bedb
mZRjYaI62CoxFaWDzcUTjfkxeOiexKTy6ZAyFqgeEHn+x592ENOhQr3p61nwEuJj
Q46+3mEu7iYNLlIHOX9VVK8p69NMwZYgDVjF3ykbWwrSXBjOAfNqIpB5wUhFI8E=
=7kEP
-END PGP SIGNATURE-
___
python-devel mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/python-devel

Re: Multirelease effort: Moving to Python 3

2013-07-21 Thread Toshio Kuratomi

On Mon, Jul 22, 2013 at 10:15:31AM +1000, Nick Coghlan wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 07/20/2013 06:11 AM, Toshio Kuratomi wrote:
> > pythonic is a very vague statement and I wouldn't consider most of
> > your list to be examples of those.  Yes, python3 may be a *better*
> > language (and I would include most of your list as "features of
> > python3 that python2 does not have) but a more pythonic language...
> > that's not something that you can readily measure.  For instance, I
> > can make the case that python3's unicode handling is less pythonic
> > than python2 as it violates three rules of the zen of python:
> > 
> > Explicit is better than implicit. Errors should never pass
> > silently. In the face of ambiguity, refuse the temptation to
> > guess.
> > 
> > (To be fair, python2 violated some of these rules in its unicode
> > handling as well, although errors should never pass silently would
> > probably take some work to convince most people :-)
> 
> The *only* reason Python 3 allows any Unicode errors to pass silently
> is because Python 2 tolerated broken system configurations (like
> non-UTF-8 filesystem metadata on nominally UTF-8 systems) by treating
> them as opaque 8-bit strings that were retrieved from OS interfaces
> and then passed back unmodified (see PEP 383 for details). If Python 3
> didn't work on those systems, people would blame Python 3, not the
> already broken system configuration ("But Python 2 works, why is
> Python 3 so broken?"). os.listdir() -> open() is the canonical example
> of the kind of "round trip" activity that we felt we needed to support
> even for systems with improperly encoded metadata (including file names).
> 

Actually, surrogateescape is a *great* improvement over the previous python3
behaviour of silently dropping data that it did not understand.

If python3 could just finally fix outputting text with surrogateescaped
bytes then it would finally clean up the last portion of this and I would be
able to stop pointing out the various ways that python3's unicode handling
is just as broken as pyhton2's -- just in different ways. :-)

> 
> Tainting would involve having the surrogateescape codec set an
> attribute on a string recording the encoding assumption if it had to
> embed any surrogates in the Private Use Area, as well as a keyword
> only "taint" argument to decode operations (e.g. to force tainting
> when using "latin-1" as a universal text codec). Various string
> operations would then be modified to use the following rules:
> 
> * Both input strings untainted? Output is untainted.
> * One input tainted, one untainted? Output is tainted with the same
> assumption as the tainted input
> * Both inputs tainted with the same assumption? Output is also tainted
> with that assumption.
>
This sounds like it might be nice.  The one thing I'm a little unsure about
is that it sounds like code is going to have to handle this explicitly.
Judging from the way all but a select few people handle Text vs encoded
bytes right now, that seems like it won't achieve very much.  OTOH, I could
see this as being an additional bit of information that's entirely optional
whether people use it.  I think that could be helpful in some cases of
debugging.  (OTOH, often when encoding vs text issues arise it's because the
coder and program have no way to know the correct encoding.  When that
happens, so the extra information might not be that useful for the majority
of cases anyway).

> * Inputs tainted with different assumptions? Immediate ValueError
> complaining about the taint mismatch
> 
> String encoding would be updated to trigger a ValueError when asked to
> encode tainted strings to an encoding other than the tainted one.
> 

I'm a little leery of these.  The reason is that after using both python2
and the early versions of python3 I became a firm believer that the problem
with python2's unicode handling wasn't that it threw exceptions, rather the
problem was that the same bit of code was too prone to passing through
certain data without error and throwing an error with other data.
Programmers who tested their code with only ascii data or only data encoded
in their locale's encoding, or only when their locale was a utf-8 encoding
were unable to replicate or understand the errors that their user's got when
they ran them in the crazy real-world environments that user's inevitably
have.  These rules that throw an Exception suffer from the same reliance on
the specific data and environment and will lead to similar tracebacks that
programmers won't be able to easily replicate.

-Toshio


pgpLwK37BS6mb.pgp
Description: PGP signature
___
python-devel mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/python-devel

Re: Multirelease effort: Moving to Python 3

2013-07-21 Thread Toshio Kuratomi

On Mon, Jul 22, 2013 at 10:28:33AM +1000, Nick Coghlan wrote:
> -BEGIN PGP SIGNED MESSAGE-
> 
> Are Matthew's replies stuck in the python-devel moderation queue?
> (Just realised the likely reason I'm only seeing half this
> conversation...)
> 
> Hmm, also just realised I've been hitting Reply-List, so my replies
> are only going to python-devel...
> 
Yeah, the bulk of the conversation has been hitting the
devel lists.fedoraproject.org list.  Only a few messages have gone only to
python-devel lists.fp.o  (and quite a few messages are going only to
devel lists.fp.o)

> > If your needs are very minimal, python3-tempita might be a good
> > choice. If you actually do need more features than that,
> > python-mako and python-jinja2 are popular.  Note that both of those
> > have a few deps (but hopefully not as bad as cheetah).  (Also --
> > the python3 version of mako has less deps than the python2
> > version... I think that it just because those deps haven't been
> > ported to python3 yet and the package can operate with reduced
> > functionality without them.  the deps fo the python3 version might 
> > expand i nthe future).
> 
> Jinja2 is excellent, with very high quality error reporting - an oft
> overlooked feature in a templating tool! (it's actually Armin
> Ronacher's fault I started thinking about how to deal with the problem
> of surrogate escaped strings escaping from their intended "retrieve
> from OS API, pass straight back to OS API" box - he did an excellent
> write-up of how this can go wrong after finishing the Werkzeug and
> Jinja2 Python 3 updates: see
> http://lucumr.pocoo.org/2013/5/21/porting-to-python-3-redux/ and
> http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/).

  Armin Ronacher is one of the few people that I find is a reliable
ally in identifying unicode vs byte issues.  Relating a bit to your earlier
email where you were talking about throwing an exception when mixing tainted
strings and my feeling that that would be a design wart -- Armin's
unicodenazi module ( https://pypi.python.org/pypi/unicode-nazi ) is an
example of trying to make errors in python2 text+bytes handling show up when
code is run regardless of the data (it still isn't perfect as someone can
give u'string' as a parameter in testing and b'string' as a parameter in
real-life but does catch a lot of places where people are successfully
mixing u'string' and b'string' only because they are only testing with an
ascii dataset.)

-Toshio


pgpZlOmElA8B8.pgp
Description: PGP signature
___
python-devel mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/python-devel

Re: Multirelease effort: Moving to Python 3

Re: Multirelease effort: Moving to Python 3

Re: Multirelease effort: Moving to Python 3

Re: Multirelease effort: Moving to Python 3

Re: Multirelease effort: Moving to Python 3

5 matches

Site Navigation

Mail list logo

Footer information