On Wed, Jun 4, 2014 at 7:41 AM, Paul Sokolovsky <pmis...@gmail.com> wrote:
> Hello,
> On Wed, 4 Jun 2014 03:08:57 +1000
> Chris Angelico <ros...@gmail.com> wrote:
> []
>> With that encouragement, I just cloned your repo and built it on amd64
>> Debian Wheezy. Works just fine! Except... I've just found one fairly
>> major problem with your support of Python 3.x syntax. Your str type is
>> documented as not supporting Unicode. Is that a current flaw that
>> you're planning to remove, or a design limitation? Either way, I'm a
>> bit dubious about a purported version 1 that doesn't do one of the
>> things that Py3 is especially good at - matched by very few languages
>> in its encouragement of best practice with Unicode support.
> I should start with saying that it's MicroPython what made me look at
> Python3. So for me, it already did lot of boon by getting me from under
> the rock, so now instead of "at my job, we use python 2.x" I may report
> "at my job, we don't wait when our distro will kick us in the ass, and
> add 'from __future__ import print_function' whenever we touch some
> code".

And that's a good thing :) Using Python 2.7 and starting to put in the
future directives breaks nothing, and will save you time later.

> With that in mind, I, as many others, think that forcing Unicode bloat
> upon people by default is the most controversial feature of Python3.
> The reason is that you go very long way dealing with languages of the
> people of the world by just treating strings as consisting of 8-bit
> data. I'd say, that's enough for 90% of applications. Unicode is needed
> only if one needs to deal with multiple languages *at the same time*,
> which is fairly rare (remaining 10% of apps).

Absolutely not. This is the mentality that results in web applications
that break on "funny characters", which is completely the wrong way to
look at it. The truth is, there are not many funny characters in
Unicode at all; I found these, but that's about it:


Your code should accept any valid character with equal correctness.
(Note to jmf: Correctness does not necessarily imply exact nanosecond
performance, just that the right result is reached.) These days,
Unicode *is* needed everywhere. You might think you can get away with
"8-bit data", but is that 8-bit data actually encoded Latin-1 or
UTF-8? There's a vast difference between them, and you'll hit it in
any English text with U+00A9 ©, or U+201C U+201D quotes, or any of a
large number of other common non-ASCII characters. Oh, and the three I
just mentioned happen to be in CP-1252, another common 8-bit encoding,
and a lot of people and programs don't know how to tell CP-1252 from
Latin-1 and label one as the other.

Unicode is needed on anything that touches the internet, which is a
*lot* more than 10% of applications. Unicode is also needed on
anything that shares files with anyone who speaks more than one
language, or uses any symbol that isn't in ASCII, or pretty much
anything beyond plain English with a restricted set of punctuation.
And even if you can guarantee that you're working only with English
and only with ASCII, you still need to be aware that ASCII text is
different "stuff" from a JPEG file, although it's possible to bury
your head in the sand over that one.

> But generally, there's no strict roadmap for MicroPython features.
> While core of the language (parser, compiler, VM) is developed by
> Damien, many other features were already contributed by the community
> (project went open-source at the beginning of the year). So, if someone
> will want to see Unicode support up to the level of providing patches,
> it gladly will be accepted. The only thing we established is that we
> want to be able to scale down, and thus almost all features should be
> configurable.

And that's exactly what's happening right now.



Reply via email to