Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-17 Thread Stephen J. Turnbull
Steven D'Aprano writes: > On Wed, Sep 17, 2014 at 09:21:56AM +0900, Stephen J. Turnbull wrote: > > > Guido's mantra is something like "Python's str doesn't contain > > characters or even code points[1], it contains code units." > > But is that true? It's not. That's why I wrote the slight

[Python-Dev] Smuggling bytes in a UTF-16 implementation of str/unicode (was: Multilingual programming article on the Red Hat Developer blog)

2014-09-17 Thread Stephen J. Turnbull
Jeff Allen writes: > This feels like a jython-dev discussion. But anyway ... Well, if the same representation could be used in Jython you could just point to PEP 383 and be done with it. > u'\udc83' in u'abc\U00010083xyz' IMHO being able to type that is a bug. There should be no literal nota

Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-17 Thread Martin v. Löwis
Am 17.09.14 10:56, schrieb Steven D'Aprano: > On Wed, Sep 17, 2014 at 09:21:56AM +0900, Stephen J. Turnbull wrote: > >> Guido's mantra is something like "Python's str doesn't contain >> characters or even code points[1], it contains code units." > > But is that true? It used to be true, and stop

Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-17 Thread Antoine Pitrou
Seriously, can this discussion move somewhere else? This has nothing to do on python-dev. Thank you Antoine. On Wed, 17 Sep 2014 18:56:02 +1000 Steven D'Aprano wrote: > On Wed, Sep 17, 2014 at 09:21:56AM +0900, Stephen J. Turnbull wrote: > > > Guido's mantra is something like "Python's str

Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-17 Thread Steven D'Aprano
On Wed, Sep 17, 2014 at 09:21:56AM +0900, Stephen J. Turnbull wrote: > Guido's mantra is something like "Python's str doesn't contain > characters or even code points[1], it contains code units." But is that true? If it were true, I would expect to be able to make Python text strings containing

[Python-Dev] Smuggling bytes in a UTF-16 implementation of str/unicode (was: Multilingual programming article on the Red Hat Developer blog)

2014-09-17 Thread Jeff Allen
This feels like a jython-dev discussion. But anyway ... On 17/09/2014 00:57, Stephen J. Turnbull wrote: The CPython representation uses trailing surrogates only[1], so it's never possible to interpret them as anything but non-characters -- as soon as you encounter them you know that it's a lone

Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-17 Thread R. David Murray
Sorry for the mojibake. I've not yet gotten around to actually using the email package to write a smarter replacement for nmh, which is what I use for email, and I always forget that I need to manually tell nmh when there non-ascii in the message... On Wed, 17 Sep 2014 03:02:33 -0400, "R. David M

Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-17 Thread R. David Murray
On Wed, 17 Sep 2014 14:42:56 +1000, Steven D'Aprano wrote: > On Wed, Sep 17, 2014 at 11:14:15AM +1000, Chris Angelico wrote: > > On Wed, Sep 17, 2014 at 5:29 AM, R. David Murray > > wrote: > > > > Basically, we are pretending that the each smuggled > > > byte is single character for string pars