On Tue, 15 Jul 2014 23:01:25 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>:
> 
>> Unicode strings in Python 2 are second class entities.
> 
> I don't see that. They form a type just like, say, complex.

I didn't say they were a second class type. I choose my words carefully, 
although I guess what I was trying to get across may have been a bit 
subtle, sorry about that. But if you read on, where I explain some of the 
consequences, it should be clear: Python 2.x has the assumption that 
strings are 8-bit deeply embedded in the compiler.


>> It's not just that people will, in general, take the lazy way and write
>> "foo" instead of u"foo" for their strings.
> 
> People live with their choices, and I don't see the consequences of that
> lazy way as very bad.

The consequences are that it is too hard to write correct text handling 
code in Python 2, and too many programs which assume that text=ASCII as 
if it were 1965.

In the same way that a language like Python is supposed to make it hard 
for programmers (good, lazy or careless programmers) to write code with 
(say) buffer overflow bugs, so a language like Python is supposed to make 
it hard for programmers to write code that assumes that text is 8-bit. It 
is disgraceful that in 2014 there are still languages like PHP that don't 
know how to handle text, and Python fortunately is not one of them.


> In fact, I find the lazy use of Unicode strings at least as scary as the
> lazy use of byte strings, especially since Python 3 sneaks Unicode to
> the outer interfaces of the program (files, IPC).

I'm not entirely sure I understand what you mean by "lazy use of Unicode 
strings". And I especially don't understand what you mean by "sneak". The 
fact that strings are Unicode is *the* biggest and most obvious new 
feature of Python 3.


>> But it is that the whole Python virtual machine is based on
>> byte-strings, not Unicode strings, and u"" strings are bolted on top.
> 
> The internal implementation of the VM is free to change as long as the
> external semantics stay the same.

Which is the whole point. *They cannot*, or at least not without a level 
of effort far beyond what is reasonable for an all-volunteer effort. And 
even if they could, why bother when most developers will then ignore that 
and use "" byte strings because it saves one character typing?

The Python devs aren't slaves, they get to choose what features they work 
on and which they don't. They don't owe *anybody* any feature they don't 
want to build, or care to support, and that includes continuing the 2.x 
series. That leaves you with choices:

- You can follow the lead of the core developers and migrate to 3.x in 
your own time, when it works for you.

- You can get enough people on the PSF board, and enough trusted, core 
developers, that the old guard get pushed out and you can take over and 
set the direction of Python development.

- You can hunker down and stick with Python 2 forever, and do without 
free support after 2020.

- You can stick with Python 2 until 2020, or pay for support until 2023, 
then reconsider the decision not to migrate.

- You can fork Python and take over support of MyPython 2.7.

- Or you can port your code to another language.

Perhaps the *stupidest* thing the author of the "Python 3 is killing 
Python" blog post wrote was that it's easier to port Python code to a 
*completely different language*. I cannot fathom the idiocy of somebody 
who bitches and moans that having to re-write or redesign, oh, let's 
conservatively say 5% of your Python 2 code is harder than writing your 
code *completely from scratch* in a completely different language, with 
completely different third party libraries.


And you can make that choice on a project-by-project basis.

As of right now, *new* projects ought to be written in Python 3.3 or 
better, unless you have a compelling reason not to. You don't have to 
port old projects in order to take advantage of Python 3 for new projects.



>> [steve@ando ~]$ python3.3 -c "π = 3.14; print(π+1)" 4.140000000000001
>> [steve@ando ~]$ python2.7 -c "π = 3.14; print(π+1)"
>>   File "<string>", line 1
>>     π = 3.14; print(π+1)
>>     ^
>> SyntaxError: invalid syntax
> 
> My native language uses ä and ö, but I don't see any pressing need to
> embed those characters in identifiers.

And good for you that you don't. I mean it. But there are 7 billion 
people in the world, and they're not all Python programmers, but most 
people are keen to program in their native tongues rather than English.


>> Python 2 "helpfully" tries to guess what you want when you work with
>> bytes-pretending-to-be-strings, and when it guesses right, it's nice,
>> but when it guesses wrongly, you'll left with mysterious encoding and
>> decoding errors from code that don't appear to involve either. The
>> whole thing is a mess.
> 
> I can't think of a matching example.


[steve@ando ~]$ python2.7 -c "print u'π' + 'p'"
Ïp

Where did the Ï come from?

[steve@ando ~]$ python3.3 -c "print('π' + 'p')"
πp



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to