Re: Unicode [was Re: Cult-like behaviour]

2018-07-17 Thread Tim Chase
On 2018-07-17 08:37, Marko Rauhamaa wrote:
> Tim Chase :
> > Wait, but now you're talking about vendors. Much of the crux of
> > this discussion has been about personal scripts that don't need to
> > marshal Unicode strings in and out of various functions/objects.  
> 
> In both personal and professional settings, you face the same
> issues. But you don't want to build on something that will
> disappear from the Linux distros.

Right.  Distros are moving away from ASCII-only to proper Unicode
(however it is encoded) support.  Certainly wouldn't want to build on
something that's disappearing from distros, so best to build on
Py3 and Unicode strings.  ;-)

-tkc


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Marko Rauhamaa
Tim Chase :

> On 2018-07-16 23:59, Marko Rauhamaa wrote:
>> Tim Chase :
>> > While the python world has moved its efforts into improving
>> > Python3, Python2 hasn't suddenly stopped working.  
>> 
>> The sword of Damocles is hanging on its head. Unless a consortium is
>> erected to support Python2, no vendor will be able to use it in the
>> medium term.
>
> Wait, but now you're talking about vendors. Much of the crux of this
> discussion has been about personal scripts that don't need to
> marshal Unicode strings in and out of various functions/objects.

In both personal and professional settings, you face the same issues.
But you don't want to build on something that will disappear from the
Linux distros.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Tim Chase
On 2018-07-16 23:59, Marko Rauhamaa wrote:
> Tim Chase :
> > While the python world has moved its efforts into improving
> > Python3, Python2 hasn't suddenly stopped working.  
> 
> The sword of Damocles is hanging on its head. Unless a consortium is
> erected to support Python2, no vendor will be able to use it in the
> medium term.

Wait, but now you're talking about vendors. Much of the crux of this
discussion has been about personal scripts that don't need to
marshal Unicode strings in and out of various functions/objects.

If you have a py2 script that works with py2 and breaks with py3, and
you don't want to update to py3 unicode-strings-by-default, then
stick with py2.  They even coexist nicely on the same machine.

It doesn't have a self-destruct clause.  As long as py2 continues to
build, it will continue to run which is a long lifetime.  To point,
I still have the "joy" of maintaining some py2.4 code that's in
production.  Would I rather upgrade it to 3.x?  You bet.  But the
powers in place are willing to forego python updates in order to not
rock the boat.

-tkc


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence

On 16/07/18 21:16, Rhodri James wrote:

On 16/07/18 20:58, Terry Reedy wrote:

On 7/16/2018 1:27 PM, Jim Lee wrote:

90% of the world *is* "beneath my notice" when it comes to 
programming for myself.   I really don't care if that's not PC enough 
for you.


Had you actually read my words with *intent* rather than *reaction*, 
you would notice that I suggested the *option* of turning off 
Unicode.  I didn't say get *rid* of Unicode.  I didn't say make it 
*harder* to use Unicode.  Once again - reaction rather than reading.


Obviously, the most vocal representatives of the Python community are 
too sensitive about their language to enable rational discussion.


My empirical observation is that the more abrasive posters get 
rewarded with more response, while my attempts to engage in rational 
discussion, without ad hominems, gets less.


I wouldn't disagree with you.  Fortunately Jim has pulled the "storming 
off in a huff rather than answer a question anyone actually asked" 
defence, so we can go back to debating about important things like how 
to spell assignment expressions.


Oh wait... :-)



Cheeky :)

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread MRAB

On 2018-07-16 21:59, Marko Rauhamaa wrote:

Tim Chase :

While the python world has moved its efforts into improving Python3,
Python2 hasn't suddenly stopped working.


The sword of Damocles is hanging on its head. Unless a consortium is
erected to support Python2, no vendor will be able to use it in the
medium term.

Given the recent events, maybe that's exactly what's going to happen. A
business consortium will take it on themselves to support and enhance
Python2 ad infinitum. I wouldn't be surprised.

(Although it might make me regret my knee-jerk porting effort.)


In open source, it's up to those with the itch to scratch it.

Someone finally did, and it's called Tauthon.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Chris Angelico
On Tue, Jul 17, 2018 at 6:32 AM, Tim Chase
 wrote:
> On 2018-07-16 18:31, Steven D'Aprano wrote:
>> You say that all you want is a switch to turn off Unicode (and
>> replace it with what? Kanji strings? Cyrillic? Shift_JS? no of
>> course not, I'm being absurd -- replace it with ASCII, what else
>> could any right-thinking person want, right?).
>
> But we already have this.  If I want to turn off Unicode strings, I
> type "python2", and if I want to enable Unicode strings, I type
> "python3".
>
> While the python world has moved its efforts into improving Python3,
> Python2 hasn't suddenly stopped working.  It just stopped receiving
> improvements.  If the "old-man shakes-fist at progress" crowd
> doesn't like unicode stings in Py3, just keep on using Py2.  You
> (generic) won't get arrested.  There are no church^WPython police.

Except that Python 2 still supports Unicode, and Python 3 still
supports bytes. Py3 just makes a stronger distinction between text and
bytes.

>>> b"Hello, %s!" % b"world"
b'Hello, world!'

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Marko Rauhamaa
Tim Chase :
> While the python world has moved its efforts into improving Python3,
> Python2 hasn't suddenly stopped working.

The sword of Damocles is hanging on its head. Unless a consortium is
erected to support Python2, no vendor will be able to use it in the
medium term.

Given the recent events, maybe that's exactly what's going to happen. A
business consortium will take it on themselves to support and enhance
Python2 ad infinitum. I wouldn't be surprised.

(Although it might make me regret my knee-jerk porting effort.)


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Tim Chase
On 2018-07-16 18:31, Steven D'Aprano wrote:
> You say that all you want is a switch to turn off Unicode (and
> replace it with what? Kanji strings? Cyrillic? Shift_JS? no of
> course not, I'm being absurd -- replace it with ASCII, what else
> could any right-thinking person want, right?).

But we already have this.  If I want to turn off Unicode strings, I
type "python2", and if I want to enable Unicode strings, I type
"python3".

While the python world has moved its efforts into improving Python3,
Python2 hasn't suddenly stopped working.  It just stopped receiving
improvements.  If the "old-man shakes-fist at progress" crowd
doesn't like unicode stings in Py3, just keep on using Py2.  You
(generic) won't get arrested.  There are no church^WPython police.

-tkc


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Chris Angelico
On Tue, Jul 17, 2018 at 6:16 AM, Rhodri James  wrote:
> On 16/07/18 20:58, Terry Reedy wrote:
>>
>> On 7/16/2018 1:27 PM, Jim Lee wrote:
>>
>>> 90% of the world *is* "beneath my notice" when it comes to programming
>>> for myself.   I really don't care if that's not PC enough for you.
>>>
>>> Had you actually read my words with *intent* rather than *reaction*, you
>>> would notice that I suggested the *option* of turning off Unicode.  I didn't
>>> say get *rid* of Unicode.  I didn't say make it *harder* to use Unicode.
>>> Once again - reaction rather than reading.
>>>
>>> Obviously, the most vocal representatives of the Python community are too
>>> sensitive about their language to enable rational discussion.
>>
>>
>> My empirical observation is that the more abrasive posters get rewarded
>> with more response, while my attempts to engage in rational discussion,
>> without ad hominems, gets less.
>
>
> I wouldn't disagree with you.  Fortunately Jim has pulled the "storming off
> in a huff rather than answer a question anyone actually asked" defence, so
> we can go back to debating about important things like how to spell
> assignment expressions.
>
> Oh wait... :-)
>

+1 QOTD.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James

On 16/07/18 20:58, Terry Reedy wrote:

On 7/16/2018 1:27 PM, Jim Lee wrote:

90% of the world *is* "beneath my notice" when it comes to programming 
for myself.   I really don't care if that's not PC enough for you.


Had you actually read my words with *intent* rather than *reaction*, 
you would notice that I suggested the *option* of turning off 
Unicode.  I didn't say get *rid* of Unicode.  I didn't say make it 
*harder* to use Unicode.  Once again - reaction rather than reading.


Obviously, the most vocal representatives of the Python community are 
too sensitive about their language to enable rational discussion.


My empirical observation is that the more abrasive posters get rewarded 
with more response, while my attempts to engage in rational discussion, 
without ad hominems, gets less.


I wouldn't disagree with you.  Fortunately Jim has pulled the "storming 
off in a huff rather than answer a question anyone actually asked" 
defence, so we can go back to debating about important things like how 
to spell assignment expressions.


Oh wait... :-)

--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Anders Wegge Keller
På Mon, 16 Jul 2018 11:33:46 -0700
Jim Lee  skrev:

> Go right ahead.  I find it surprising that Stephen isn't banned, 
> considering the fact that he ridicules anyone he doesn't agree with.  
> But I guess he's one of the 'good 'ol boys', and so exempt from the code 
> of conduct.

Well said!

-- 
//Wegge
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Terry Reedy

On 7/16/2018 1:27 PM, Jim Lee wrote:

90% of the world *is* "beneath my notice" when it comes to programming 
for myself.   I really don't care if that's not PC enough for you.


Had you actually read my words with *intent* rather than *reaction*, you 
would notice that I suggested the *option* of turning off Unicode.  I 
didn't say get *rid* of Unicode.  I didn't say make it *harder* to use 
Unicode.  Once again - reaction rather than reading.


Obviously, the most vocal representatives of the Python community are 
too sensitive about their language to enable rational discussion.


My empirical observation is that the more abrasive posters get rewarded 
with more response, while my attempts to engage in rational discussion, 
without ad hominems, gets less.


--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Terry Reedy

On 7/16/2018 1:13 PM, Jim Lee wrote:

I just think that a language should allow one to bypass Unicode handling 
easily *when it's not needed*.


Both for patching IDLE and for my currently private work, I usually only 
use Ascii, and no unicode escapes.  When I do, it does not matter 
whether editor and python internally use ascii unicode or ascii bytes. 
So I don't understand 'bypass Unicode handling'.


When I do want to use other characters, whether to test IDLE or just for 
fun, Python 3 in much nicer.  Since I have not bothered to learn ann 
non-Englich Windows Input Methods, I just use \u or, for non-BMP 
chars, \U000n escapes.  I don't need a 'u' prefix or unicode(s, 
encoding=???) conversion.  Thus, I was able to expand IDLE's font sample 
of the font selection dialog tab from 40 ascii chars to this.



AaBbCcDdEeFfGgHhIiJj
1234567890#:+=(){}[]
¢£¥§©«®¶½ĞÀÁÂÃÄÅÇÐØß


ɐɕɘɞɟɤɫɮɰɷɻʁʃʆʎʞʢʫʭʯ
ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκ
БбДдЖжПпФфЧчЪъЭэѠѤѬӜ


אבגדהוזחטיךכלםמןנסעף
ابجدهوزحطي٠١٢٣٤٥٦٧٨٩


०१२३४५६७८९अआइईउऊएऐओऔ
௦௧௨௩௪௫௬௭௮௯அஇஉஎ


〇一二三四五六七八九
汉字漢字人木火土金水
가냐더려모뵤수유즈치
あいうえおアイウエオ

*You* may not care about the non-Ascii parts, but people who use other 
scripts do.



So I don't understand why you are bothered by having the option of 
easily using other characters if you want to, or if external 
circumstances were to compel you.  I love it.


--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James

On 16/07/18 18:38, Rhodri James wrote:
Actually having an option of turning off Unicode *does* make it harder 
to use, because you end up coming across programs that have Unicode and 
surprise you when they misbehave.  And yes I saw that 90% of your 
programs aren't intended to get out into the world.  90% is never meant 
to leave the office.  90% of that does anyway.


I meant to say "90% *of my Python code* is never meant to leave the 
office."  Never post when in a hurry :-(


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Jim Lee




On 07/16/18 11:31, Steven D'Aprano wrote:

On Mon, 16 Jul 2018 10:27:18 -0700, Jim Lee wrote:


Had you actually read my words with *intent* rather than *reaction*, you
would notice that I suggested the *option* of turning off Unicode.

Yes, I know what you wrote, and I read it with intent.

Jim, you seem to be labouring under the misapprehension that anytime
somebody spots a flaw in your argument, or an unpleasant implication of
your words, it can only be because they must not have read your words
carefully. Believe me, that is not the case.

YOU are the one who raised the specter of politically correct groupthink,
not me. That's dog-whistle politics. But okay, let's move on from that.

You say that all you want is a switch to turn off Unicode (and replace it
with what? Kanji strings? Cyrillic? Shift_JS? no of course not, I'm being
absurd -- replace it with ASCII, what else could any right-thinking
person want, right?). Let's look at this from a purely technical
perspective:

Python already has two string data types, bytes and text. You want
something that is almost functionally identical to bytes, but to call it
text, presumably because you don't want to have to prefix your strings
with a b"" (that was also Marko's objection to byte strings).

Let's say we do it. Now we have three string implementations that need to
be added, documented, tested, maintained, instead of two.

(Are you volunteering to do this work?)

Now we need to double the testing: every library needs to be tested
twice, once with the "Unicode text" switch on, once with it off, to
ensure that features behave as expected in the appropriate mode.

Is this switch a build-time option, so that we have interpreters built
with support for Unicode and interpreters built without it? We've been
there: it's a horribly bad idea. We used to have Python builds with
threading support, and others without threading support. We used to have
Python builds with "wide Unicode" and others with "narrow Unicode".
Nothing good comes of this design.

Or perhaps the switch is a runtime global option?

Surely you can imagine the opportunities for bugs, both obvious crashing
bugs and non-obvious silent failure bugs, that will occur when users run
libraries intended for one mode under the other mode. Not every library
is going to be fully tested under both modes.

Perhaps it is a compile-time option that only affects the current module,
like the __future__ imports. That's a bit more promising, it might even
use the __future__ infrastructure -- but then you have the problem of
interaction between modules that have this switch enabled and those that
have it disabled.

More complexity, more cruft, more bugs.

It's not clear that your switch gives us *any* advantage at all, except
the warm fuzzy feelings that no dirty foreign characters might creep into
our pure ASCII strings. Hmm, okay, but frankly apart from when I copy and
paste code from the internet and it ends up bringing in en-dashes and
curly quotes instead of hyphens and type-writer quotes, that never
happens to me by accident, and I'm having a lot of trouble seeing how it
could.

If you want ASCII byte strings, you have them right now -- you just have
to use the b"" string syntax.

If you want ASCII strings without the b prefix, you have them right now.
Just use only ASCII characters in your strings.

I'm simply not seeing the advantage of:

 from __future__ import no_unicode
 print("Hello World!")  # stand in for any string handling on ASCII

over

 print("Hello World!")

which works just as well if you control the data you are working with and
know that it is pure ASCII.




Had you spoken this way from the start instead of ridiculing and name 
calling, perhaps we could have reached an agreement.


However, the point is moot, as I have unsubscribed from the list. The 
conversations here (especially yours) are too condescending to waste 
more time with.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Jim Lee



On 07/16/18 10:40, Mark Lawrence wrote:

On 16/07/18 18:27, Jim Lee wrote:


Obviously, the most vocal representatives of the Python community are 
too sensitive about their language to enable rational discussion.
Please moderators ban this person as he's going down the same line as 
bartc and similar, it is completely unacceptable, he's just the latest 
in a long line of trolls.




That was completely predictable (though I expected it from a different 
person).


Go right ahead.  I find it surprising that Stephen isn't banned, 
considering the fact that he ridicules anyone he doesn't agree with.  
But I guess he's one of the 'good 'ol boys', and so exempt from the code 
of conduct.


Bye guys.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James

On 16/07/18 19:31, Steven D'Aprano wrote:

I'm simply not seeing the advantage of:

 from __future__ import no_unicode
 print("Hello World!")  # stand in for any string handling on ASCII


Sure this should be "from __past__ import no_unicode"?

gd

--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 10:27:18 -0700, Jim Lee wrote:

> Had you actually read my words with *intent* rather than *reaction*, you
> would notice that I suggested the *option* of turning off Unicode.

Yes, I know what you wrote, and I read it with intent.

Jim, you seem to be labouring under the misapprehension that anytime 
somebody spots a flaw in your argument, or an unpleasant implication of 
your words, it can only be because they must not have read your words 
carefully. Believe me, that is not the case.

YOU are the one who raised the specter of politically correct groupthink, 
not me. That's dog-whistle politics. But okay, let's move on from that.

You say that all you want is a switch to turn off Unicode (and replace it 
with what? Kanji strings? Cyrillic? Shift_JS? no of course not, I'm being 
absurd -- replace it with ASCII, what else could any right-thinking 
person want, right?). Let's look at this from a purely technical 
perspective:

Python already has two string data types, bytes and text. You want 
something that is almost functionally identical to bytes, but to call it 
text, presumably because you don't want to have to prefix your strings 
with a b"" (that was also Marko's objection to byte strings).

Let's say we do it. Now we have three string implementations that need to 
be added, documented, tested, maintained, instead of two.

(Are you volunteering to do this work?)

Now we need to double the testing: every library needs to be tested 
twice, once with the "Unicode text" switch on, once with it off, to 
ensure that features behave as expected in the appropriate mode.

Is this switch a build-time option, so that we have interpreters built 
with support for Unicode and interpreters built without it? We've been 
there: it's a horribly bad idea. We used to have Python builds with 
threading support, and others without threading support. We used to have 
Python builds with "wide Unicode" and others with "narrow Unicode". 
Nothing good comes of this design.

Or perhaps the switch is a runtime global option?

Surely you can imagine the opportunities for bugs, both obvious crashing 
bugs and non-obvious silent failure bugs, that will occur when users run 
libraries intended for one mode under the other mode. Not every library 
is going to be fully tested under both modes.

Perhaps it is a compile-time option that only affects the current module, 
like the __future__ imports. That's a bit more promising, it might even 
use the __future__ infrastructure -- but then you have the problem of 
interaction between modules that have this switch enabled and those that 
have it disabled.

More complexity, more cruft, more bugs.

It's not clear that your switch gives us *any* advantage at all, except 
the warm fuzzy feelings that no dirty foreign characters might creep into 
our pure ASCII strings. Hmm, okay, but frankly apart from when I copy and 
paste code from the internet and it ends up bringing in en-dashes and 
curly quotes instead of hyphens and type-writer quotes, that never 
happens to me by accident, and I'm having a lot of trouble seeing how it 
could.

If you want ASCII byte strings, you have them right now -- you just have 
to use the b"" string syntax.

If you want ASCII strings without the b prefix, you have them right now. 
Just use only ASCII characters in your strings.

I'm simply not seeing the advantage of:

from __future__ import no_unicode
print("Hello World!")  # stand in for any string handling on ASCII

over 

print("Hello World!")

which works just as well if you control the data you are working with and 
know that it is pure ASCII.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James

On 16/07/18 18:27, Jim Lee wrote:
90% of the world *is* "beneath my notice" when it comes to programming 
for myself.   I really don't care if that's not PC enough for you.


Had you actually read my words with *intent* rather than *reaction*, you 
would notice that I suggested the *option* of turning off Unicode.  I 
didn't say get *rid* of Unicode.  I didn't say make it *harder* to use 
Unicode.  Once again - reaction rather than reading.


Actually having an option of turning off Unicode *does* make it harder 
to use, because you end up coming across programs that have Unicode and 
surprise you when they misbehave.  And yes I saw that 90% of your 
programs aren't intended to get out into the world.  90% is never meant 
to leave the office.  90% of that does anyway.


I still don't get why strings being Unicode is such a problem for you. 
Could you explain?  I've only ever had problems with strings *not* being 
Unicode, and I really don't understand what has you so hot under the collar.



--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence

On 16/07/18 18:13, Jim Lee wrote:



I just think that a language should allow one to bypass Unicode handling 
easily *when it's not needed*.




I have no idea what this is meant to mean.  I've written loads of code 
for my own purposes and I've never had to think about Unicode, so why 
should anybody need to bypass it?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence

On 16/07/18 18:27, Jim Lee wrote:


Obviously, the most vocal representatives of the Python community are 
too sensitive about their language to enable rational discussion.
Please moderators ban this person as he's going down the same line as 
bartc and similar, it is completely unacceptable, he's just the latest 
in a long line of trolls.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Jim Lee



On 07/16/18 03:39, Steven D'Aprano wrote:

Good for you.

But Python is not a programming language written to satisfy the needs of
people like you, and ONLY people like you.

It is a language written to satisfy the needs of people from Uzbekistan,
and China, and Japan, and India, and Brazil, and France, and Russia, and
Australia, and the UK, and mathematicians, and historians, and linguists,
and, yes, even people who think that if ISO-8859-7 was good enough for
Jesus, the whole world ought to be using it.



When I create a one-time use program to visualize some data on a
graph, I don't care if anyone else can read the axis labels but me.
These are realities.  A good programming language will allow for these
realities without putting the burden on the programmer to turn *every*
program into a politically correct, globalization compliant model of
modern groupthink.

And here we get to the crux of the matter. It isn't really the technical
issues of Unicode that annoy you. It is the loss of privilege that you,
as an ASCII user, no longer get to dismiss 90% of the world as beneath
your notice.

Nice.



90% of the world *is* "beneath my notice" when it comes to programming 
for myself.   I really don't care if that's not PC enough for you.


Had you actually read my words with *intent* rather than *reaction*, you 
would notice that I suggested the *option* of turning off Unicode.  I 
didn't say get *rid* of Unicode.  I didn't say make it *harder* to use 
Unicode.  Once again - reaction rather than reading.


Obviously, the most vocal representatives of the Python community are 
too sensitive about their language to enable rational discussion.



-Jim

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Jim Lee



On 07/16/18 03:26, Steven D'Aprano wrote:


But the thing is, that complexity is *inherent in the domain*. You can
try to deal with it without Unicode, and as soon as you have users
expecting to use more than one code page, you're doomed.



No, I'm not doomed, because there *are* no other users.  Never will be.  
I thought I made that clear.


Many programming tasks do not go beyond the machine they were written on.

You seem to think I'm trying to rid the world of Unicode - I've stressed 
that I'm not.


I just think that a language should allow one to bypass Unicode handling 
easily *when it's not needed*.


-Jim

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Tue, 17 Jul 2018 02:22:59 +1000, Chris Angelico wrote:

> On Tue, Jul 17, 2018 at 2:05 AM, Mark Lawrence 
> wrote:
>> Out of curiosity where does my mum's Welsh come into the equation as I
>> believe that it is not recognised by the EU as a language?
>>
>>
> What characters does it use? Mostly Latin letters? 

Yes, Welsh uses the Latin script. It has an alphabet of 29 letters 
(including 8 digraphs), plus four diacritics used on some vowels:

circumflex   e.g. â

acute accent e.g. é

diaeresise.g. ï

grave accent e.g. ẁ

Yes, w is a vowel in Welsh -- and very occasionally in English as well.

http://www.dictionary.com/e/w-vowel/


Accented vowels are not considered separate letters.

https://en.wikipedia.org/wiki/Welsh_orthography

Some older sources will exclude J (making 28 letters). Patagonian Welsh 
also includes the letter "V", although that's non-standard.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence

On 16/07/18 17:22, Chris Angelico wrote:

On Tue, Jul 17, 2018 at 2:05 AM, Mark Lawrence  wrote:

On 16/07/18 15:17, Dan Sommers wrote:


On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote:


... people who think that if ISO-8859-7 was good enough for Jesus ...



It may have been good enough for his disciples, but Jesus spoke Aramaic.

Also, ISO-8859-7 doesn't cover ancient polytonic Greek; it only covers
modern monotonic Greek.

See also the Unicode Greek FAQ (https://www.unicode.org/faq/greek.html).



Out of curiosity where does my mum's Welsh come into the equation as I
believe that it is not recognised by the EU as a language?



What characters does it use? Mostly Latin letters? If so, it's easy -
most Western European languages are covered by the basic Latin
alphabetics (the ASCII ones), plus the combining diacriticals (U+0300
and following), plus a small handful of language-specific characters
(eg U+0130/U+0131 for Turkish). There are combined forms of some of
these, which can be found via NFC normalization, and a few ligatures
for some languages, but by and large, that's all you need for most
Latin-derived languages.

ChrisA



Frankly I haven't got the faintest idea or I wouldn't be asking.  The 
only thing that I am aware of is if you try pronouncing any Welsh name 
that starts with Ll, and there are lots of them, you need a huge amount 
of phlegm.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence

On 16/07/18 17:26, Larry Martell wrote:

On Mon, Jul 16, 2018 at 12:05 PM, Mark Lawrence  wrote:

On 16/07/18 15:17, Dan Sommers wrote:


On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote:


... people who think that if ISO-8859-7 was good enough for Jesus ...



It may have been good enough for his disciples, but Jesus spoke Aramaic.

Also, ISO-8859-7 doesn't cover ancient polytonic Greek; it only covers
modern monotonic Greek.

See also the Unicode Greek FAQ (https://www.unicode.org/faq/greek.html).



Out of curiosity where does my mum's Welsh come into the equation as I
believe that it is not recognised by the EU as a language?


Is she from Llanfair­pwllgwyngyll­gogery­chwyrn­drobwll­llan­tysilio­gogo­goch?



She was from a small mining village near Tredegar.  The name was and is 
unprounoncable to an English speaking person hence Tredegar had to suffice.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


I18N and Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Sun, 15 Jul 2018 17:28:15 -0700, Jim Lee wrote:

> Unicode is an attempt to solve at least one I18N issue

If you're going to insist on digging your heels in and using definitions 
which nobody else does, this discussion is going to go nowhere fast.

Unicode is (ideally) a universal character set; in practice it is an 
industry standard for the consistent encoding, representation, and 
handling of text expressed in most of the world's writing systems. 

I18N is recognised as the abbreviation for internationalization and 
localization.

https://en.wikipedia.org/wiki/Internationalization_and_localization

There is no overlap between the two: Unicode doesn't help with 
internationalization (except in the non-trivial but purely mechanical 
sense that it removes the need for metadata specifying the current code 
page), and internationalization doesn't require Unicode:

(1) Unicode provides no support for internationalization or localization. 
Just because I have the Unicode string "street" in my application, 
doesn't mean it magically transforms to "Straße" when used by German 
users.

(2) Internationalization can occur even between groups of users who share 
a single character set, even ASCII. My application might display "Rubbish 
Bin" in the UK and Australia and "Trash Can" in the USA.



If you think that Unicode is about internationalization, you are 
labouring under serious misapprehensions about the nature of both Unicode 
and internationalization.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James

On 16/07/18 17:05, Mark Lawrence wrote:

On 16/07/18 15:17, Dan Sommers wrote:

On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote:


... people who think that if ISO-8859-7 was good enough for Jesus ...


It may have been good enough for his disciples, but Jesus spoke Aramaic.

Also, ISO-8859-7 doesn't cover ancient polytonic Greek; it only covers
modern monotonic Greek.

See also the Unicode Greek FAQ (https://www.unicode.org/faq/greek.html).



Out of curiosity where does my mum's Welsh come into the equation as I 
believe that it is not recognised by the EU as a language?


Actually the EU does recognise Welsh as a language, just not as an 
official language (one that EU primary and secondary legislation is 
translated into).  It isn't an official language of the UK government 
either, just the Welsh Assembly.


As fonts and Unicode go, there's also a question of what's required to 
correctly notate modern Welsh.  Back in the late 1980s Acorn asked four 
Welsh-language scholars that question.  They got four different answers :-(


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James

On 16/07/18 17:22, Chris Angelico wrote:

What characters does it use? Mostly Latin letters?


Basic Latin plus U+0174 (LATIN CAPITAL LETTER W WITH CIRCUMFLEX) through 
to U+0177 (LATIN SMALL LETTER Y WITH CIRCUMFLEX) I think.


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Larry Martell
On Mon, Jul 16, 2018 at 12:05 PM, Mark Lawrence  wrote:
> On 16/07/18 15:17, Dan Sommers wrote:
>>
>> On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote:
>>
>>> ... people who think that if ISO-8859-7 was good enough for Jesus ...
>>
>>
>> It may have been good enough for his disciples, but Jesus spoke Aramaic.
>>
>> Also, ISO-8859-7 doesn't cover ancient polytonic Greek; it only covers
>> modern monotonic Greek.
>>
>> See also the Unicode Greek FAQ (https://www.unicode.org/faq/greek.html).
>>
>
> Out of curiosity where does my mum's Welsh come into the equation as I
> believe that it is not recognised by the EU as a language?

Is she from Llanfair­pwllgwyngyll­gogery­chwyrn­drobwll­llan­tysilio­gogo­goch?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Chris Angelico
On Tue, Jul 17, 2018 at 2:05 AM, Mark Lawrence  wrote:
> On 16/07/18 15:17, Dan Sommers wrote:
>>
>> On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote:
>>
>>> ... people who think that if ISO-8859-7 was good enough for Jesus ...
>>
>>
>> It may have been good enough for his disciples, but Jesus spoke Aramaic.
>>
>> Also, ISO-8859-7 doesn't cover ancient polytonic Greek; it only covers
>> modern monotonic Greek.
>>
>> See also the Unicode Greek FAQ (https://www.unicode.org/faq/greek.html).
>>
>
> Out of curiosity where does my mum's Welsh come into the equation as I
> believe that it is not recognised by the EU as a language?
>

What characters does it use? Mostly Latin letters? If so, it's easy -
most Western European languages are covered by the basic Latin
alphabetics (the ASCII ones), plus the combining diacriticals (U+0300
and following), plus a small handful of language-specific characters
(eg U+0130/U+0131 for Turkish). There are combined forms of some of
these, which can be found via NFC normalization, and a few ligatures
for some languages, but by and large, that's all you need for most
Latin-derived languages.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence

On 16/07/18 15:17, Dan Sommers wrote:

On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote:


... people who think that if ISO-8859-7 was good enough for Jesus ...


It may have been good enough for his disciples, but Jesus spoke Aramaic.

Also, ISO-8859-7 doesn't cover ancient polytonic Greek; it only covers
modern monotonic Greek.

See also the Unicode Greek FAQ (https://www.unicode.org/faq/greek.html).



Out of curiosity where does my mum's Welsh come into the equation as I 
believe that it is not recognised by the EU as a language?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Anders Wegge Keller


> The buzzing noise you just heard was the joke whizzing past your head 
> *wink*

 I have twins aged four. They also like to yell "I cheated!", whenever they
are called out.

 In general, you need to get rid of tat teenage brat persona you practice.
The "ranting rick" charade was especially toe-curling.

-- 
//Wegge
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 14:17:35 +, Dan Sommers wrote:

> On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote:
> 
>> ... people who think that if ISO-8859-7 was good enough for Jesus ...
> 
> It may have been good enough for his disciples, but Jesus spoke Aramaic.

The buzzing noise you just heard was the joke whizzing past your head 
*wink*

It was a riff on the apocryphal American (occasionally other nationality) 
who said that if English was good enough for Jesus Christ, it is good 
enough for everyone:

http://itre.cis.upenn.edu/~myl/languagelog/archives/003084.html

with the twist that in my example, I picked *another* language rather 
than English. I shouldn't have picked Greek, an unfortunate choice that 
may have lead you to imagine I was serious. Perhaps ISO-8859-5 (Cyrillic) 
or Shift_JIS would have been funnier :-(

And of course there is the absurdity of any ISO standards existing two 
thousand years ago.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Dan Sommers
On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote:

> ... people who think that if ISO-8859-7 was good enough for Jesus ...

It may have been good enough for his disciples, but Jesus spoke Aramaic.

Also, ISO-8859-7 doesn't cover ancient polytonic Greek; it only covers
modern monotonic Greek.

See also the Unicode Greek FAQ (https://www.unicode.org/faq/greek.html).

-- 
https://mail.python.org/mailman/listinfo/python-list


Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Sun, 15 Jul 2018 18:02:51 -0700, Jim Lee wrote:

> On 07/15/18 17:17, MRAB wrote:
>> On 2018-07-16 00:10, Jim Lee wrote:
[...]
>>> Have you never heard of programming BEFORE Unicode existed?
>>>
>>> How ever did we get along?

Mostly by not exchanging data with anyone else using a different language 
or operating system.

As one of those people who *did* need to exchange data, between Windows 
using Latin-1 and Macs using MacRoman, I can absolutely tell you that we 
got on **REALLY, REALLY, REALLY BADLY** with data loss and corruption an 
almost guarantee.


[...]
> Yes, it was.  However, dealing with Unicode is also annoying.  If there
> were only one encoding, such as UTF-8, I wouldn't mind so much.

O_o

As an application developer, you should (almost) never need to use any 
Unicode encoding other than UTF-8.

[...]
> But I don't speak Esperanto,  and my programs don't generally care what
> characters are used for European currencies.  When I create a simple
> program that takes a text file (created by me) and munges it into a
> different format, I don't care if someone from Uzbekistan can read it or
> not.

Good for you.

But Python is not a programming language written to satisfy the needs of 
people like you, and ONLY people like you.

It is a language written to satisfy the needs of people from Uzbekistan, 
and China, and Japan, and India, and Brazil, and France, and Russia, and 
Australia, and the UK, and mathematicians, and historians, and linguists, 
and, yes, even people who think that if ISO-8859-7 was good enough for 
Jesus, the whole world ought to be using it.


> When I create a one-time use program to visualize some data on a
> graph, I don't care if anyone else can read the axis labels but me.
> These are realities.  A good programming language will allow for these
> realities without putting the burden on the programmer to turn *every*
> program into a politically correct, globalization compliant model of
> modern groupthink.

And here we get to the crux of the matter. It isn't really the technical 
issues of Unicode that annoy you. It is the loss of privilege that you, 
as an ASCII user, no longer get to dismiss 90% of the world as beneath 
your notice.

Nice.

-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Sun, 15 Jul 2018 17:39:55 -0700, Jim Lee wrote:

> On 07/15/18 17:18, Steven D'Aprano wrote:
>> On Sun, 15 Jul 2018 16:08:15 -0700, Jim Lee wrote:
>>
>>> Python3 is intrinsically tied to Unicode for string handling.
>>> Therefore, the Python programmer is forced to deal with it (in all but
>>> trivial cases), rather than given a choice.  So I don't understand how
>>> I can illustrate my point with Python code since Python won't let me
>>> deal with strings without also dealing with Unicode.
>> Nonsense.
>>
>> b"Look ma, a Python 2 style ASCII string."
>>
>>
> As I said, all but trivial cases.
> 
> Do you consider separating Unicode strings from byte strings, having to
> decode and encode from one to the other, 

If you use nothing but byte strings, you don't need to separate the non-
existent text strings from the byte strings, nor do you need to decode or 
encode.


> and knowing which
> functions/methods accept one, the other, or both as arguments, 

That's certainly a real complication, if I may stretch the meaning of the 
word "complication" beyond breaking point. Surely you are already having 
to read the documentation of the function to learn what arguments it 
takes, and what types they are (int or float, list or iterator, 'r' or 
'a', etc). If someone can't deal with the question of "unicode or bytes" 
as well, then perhaps they ought to consider a career change to something 
less demanding, like politics.

If, as you insinuate, all your data is 100% ASCII, then you have nothing 
to fear. Just treat 

str(bytes_obj, 'ASCII')
bytes(str_obj, 'ASCII')

as the equivalent of a cast or coercion, and you won't go wrong. (Of 
course, in 2018, the number of applications that can truly say all their 
data is pure ASCII is vanishingly small.)

Or use Latin-1, if you want to do the most simple-minded thing that you 
can to make errors go away, without caring about correctness.

But the thing is, that complexity is *inherent in the domain*. You can 
try to deal with it without Unicode, and as soon as you have users 
expecting to use more than one code page, you're doomed.


> as "not dealing with Unicode"?  I don't.

Frankly, I do.

Dealing with all the vagaries of human text *is* complicated, that's the 
nature of the beast. Dealing with the complexities of Unicode can be as 
complex as dealing with the complexities of floating point arithmetic.

(But neither of those are even in the same ballpark as dealing with the 
complexities of *not* using Unicode: legacy code pages and encodings are 
a nightmare to deal with.)

Nevertheless, just as casual users can go a very, very long way just 
treating floats as the real numbers we learn about in school, and trust 
that IEEE-754 semantics will mean your answers are "close enough", so the 
casual user can go a very long way ignoring the complexities of Unicode, 
so long as they control their own data and know what it is.

If you don't know what your data is, then you're doomed, Unicode or no 
Unicode. (If you don't think that's a problem, if you think that "just 
treat text as octets" works, then people like you are the reason there is 
so much mojibake in the world, screwing it up for the rest of us.)



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list