python 3.3 repr

2013-11-15 Thread Robin Becker
I'm trying to understand what's going on with this simple program if __name__=='__main__': print(repr=%s % repr(u'\xc1')) print(%%r=%r % u'\xc1') On my windows XP box this fails miserably if run directly at a terminal C:\tmp \Python33\python.exe bang.py Traceback (most recent

Re: python 3.3 repr

2013-11-15 Thread Ned Batchelder
On Friday, November 15, 2013 6:28:15 AM UTC-5, Robin Becker wrote: I'm trying to understand what's going on with this simple program if __name__=='__main__': print(repr=%s % repr(u'\xc1')) print(%%r=%r % u'\xc1') On my windows XP box this fails miserably if run directly at a

Re: python 3.3 repr

2013-11-15 Thread Robin Becker
On 15/11/2013 11:38, Ned Batchelder wrote: .. In Python3, repr() will return a Unicode string, and will preserve existing Unicode characters in its arguments. This has been controversial. To get the Python 2 behavior of a pure-ascii representation, there is the new builtin ascii(),

Re: python 3.3 repr

2013-11-15 Thread Ned Batchelder
On Friday, November 15, 2013 7:16:52 AM UTC-5, Robin Becker wrote: On 15/11/2013 11:38, Ned Batchelder wrote: .. In Python3, repr() will return a Unicode string, and will preserve existing Unicode characters in its arguments. This has been controversial. To get the Python 2

Re: python 3.3 repr

2013-11-15 Thread Roy Smith
In article b6db8982-feac-4036-8ec4-2dc720d41...@googlegroups.com, Ned Batchelder n...@nedbatchelder.com wrote: In Python3, repr() will return a Unicode string, and will preserve existing Unicode characters in its arguments. This has been controversial. To get the Python 2 behavior of a

Re: python 3.3 repr

2013-11-15 Thread Robin Becker
On 15/11/2013 13:54, Ned Batchelder wrote: . No, but I've found that significant programs that run on both 2 and 3 need to have some shims to make the code work anyway. You could do this: try: repr = ascii except NameError: pass yes I tried that, but

Re: python 3.3 repr

2013-11-15 Thread Serhiy Storchaka
15.11.13 15:54, Ned Batchelder написав(ла): No, but I've found that significant programs that run on both 2 and 3 need to have some shims to make the code work anyway. You could do this: try: repr = ascii except NameError: pass and then use repr throughout. Or

Re: python 3.3 repr

2013-11-15 Thread Robin Becker
.. I'm still stuck on Python 2, and while I can understand the controversy (It breaks my Python 2 code!), this seems like the right thing to have done. In Python 2, unicode is an add-on. One of the big design drivers in Python 3 was to make unicode the standard. The idea behind

Re: python 3.3 repr

2013-11-15 Thread Joel Goldstick
Some of us have been doing this long enough to remember when just plain text meant only a single case of the alphabet (and a subset of ascii punctuation). On an ASR-33, your C program would print like: MAIN() \( PRINTF(HELLO, ASCII WORLD); \) because ASR-33's didn't have curly

Re: python 3.3 repr

2013-11-15 Thread Robin Becker
On 15/11/2013 14:40, Serhiy Storchaka wrote: .. and then use repr throughout. Or rather try: ascii except NameError: ascii = repr and then use ascii throughout. apparently you can import ascii from future_builtins and the print() function is available

Re: python 3.3 repr

2013-11-15 Thread Robin Becker
... became popular. Really? you cried and laughed over 7 vs. 8 bits? That's lovely (?). ;). That eighth bit sure was less confusing than codepoint translations no we had 6 bits in 60 bit words as I recall; extracting the nth character involved division by 6; smart people did

Re: python 3.3 repr

2013-11-15 Thread Joel Goldstick
On Fri, Nov 15, 2013 at 10:03 AM, Robin Becker ro...@reportlab.com wrote: ... became popular. Really? you cried and laughed over 7 vs. 8 bits? That's lovely (?). ;). That eighth bit sure was less confusing than codepoint translations no we had 6 bits in 60 bit words as I

Re: python 3.3 repr

2013-11-15 Thread Ned Batchelder
On Friday, November 15, 2013 9:43:17 AM UTC-5, Robin Becker wrote: Things went wrong when utf8 was not adopted as the standard encoding thus requiring two string types, it would have been easier to have a len function to count bytes as before and a glyphlen to count glyphs. Now as I

Re: python 3.3 repr

2013-11-15 Thread Chris Angelico
On Sat, Nov 16, 2013 at 1:43 AM, Robin Becker ro...@reportlab.com wrote: .. I'm still stuck on Python 2, and while I can understand the controversy (It breaks my Python 2 code!), this seems like the right thing to have done. In Python 2, unicode is an add-on. One of the big design

Re: python 3.3 repr

2013-11-15 Thread Robin Becker
On 15/11/2013 15:07, Joel Goldstick wrote: Cool, someone here is older than me! I came in with the 8080, and I remember split octal, but sixes are something I missed out on. The pdp 10/15 had 18 bit words and could be organized as 3*6 or 2*9, pdp 8s had 12 bits I think, then

Re: python 3.3 repr

2013-11-15 Thread Roy Smith
On Nov 15, 2013, at 10:18 AM, Robin Becker wrote: The pdp 10/15 had 18 bit words and could be organized as 3*6 or 2*9 I don't know about the 15, but the 10 had 36 bit words (18-bit halfwords). One common character packing was 5 7-bit characters per 36 bit word (with the sign bit left over).

Re: python 3.3 repr

2013-11-15 Thread Robin Becker
. Dealing with bytes and Unicode is complicated, and the 2-3 transition is not easy, but let's please not spread the misunderstanding that somehow the Flexible String Representation is at fault. However you store Unicode code points, they are different than bytes, and it is complex

Re: python 3.3 repr

2013-11-15 Thread Antoon Pardon
Op 15-11-13 16:39, Robin Becker schreef: . Dealing with bytes and Unicode is complicated, and the 2-3 transition is not easy, but let's please not spread the misunderstanding that somehow the Flexible String Representation is at fault. However you store Unicode code points, they are

Re: python 3.3 repr

2013-11-15 Thread Chris Angelico
On Sat, Nov 16, 2013 at 2:39 AM, Robin Becker ro...@reportlab.com wrote: Dealing with bytes and Unicode is complicated, and the 2-3 transition is not easy, but let's please not spread the misunderstanding that somehow the Flexible String Representation is at fault. However you store Unicode

Re: python 3.3 repr

2013-11-15 Thread William Ray Wing
On Nov 15, 2013, at 10:18 AM, Robin Becker ro...@reportlab.com wrote: On 15/11/2013 15:07, Joel Goldstick wrote: Cool, someone here is older than me! I came in with the 8080, and I remember split octal, but sixes are something I missed out on. The pdp 10/15 had 18 bit

Re: python 3.3 repr

2013-11-15 Thread Gene Heskett
On Friday 15 November 2013 11:28:19 Joel Goldstick did opine: On Fri, Nov 15, 2013 at 10:03 AM, Robin Becker ro...@reportlab.com wrote: ... became popular. Really? you cried and laughed over 7 vs. 8 bits? That's lovely (?). ;). That eighth bit sure was less confusing

Re: python 3.3 repr

2013-11-15 Thread Zero Piraeus
: On Fri, Nov 15, 2013 at 10:32:54AM -0500, Roy Smith wrote: Anybody remember RAD-50? It let you represent a 6-character filename (plus a 3-character extension) in a 16 bit word. RT-11 used it, not sure if it showed up anywhere else. Presumably 16 is a typo, but I just had a moderate amount

Re: python 3.3 repr

2013-11-15 Thread Chris Angelico
On Sat, Nov 16, 2013 at 4:06 AM, Zero Piraeus z...@etiol.net wrote: : On Fri, Nov 15, 2013 at 10:32:54AM -0500, Roy Smith wrote: Anybody remember RAD-50? It let you represent a 6-character filename (plus a 3-character extension) in a 16 bit word. RT-11 used it, not sure if it showed up

Re: python 3.3 repr

2013-11-15 Thread Steven D'Aprano
On Fri, 15 Nov 2013 14:43:17 +, Robin Becker wrote: Things went wrong when utf8 was not adopted as the standard encoding thus requiring two string types, it would have been easier to have a len function to count bytes as before and a glyphlen to count glyphs. Now as I understand it we

Re: python 3.3 repr

2013-11-15 Thread Chris Angelico
On Sat, Nov 16, 2013 at 4:10 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: No, UTF-8 is okay for writing to files, but it's not suitable for text strings. Correction: It's _great_ for writing to files (and other fundamentally byte-oriented streams, like network connections).

Re: python 3.3 repr

2013-11-15 Thread Serhiy Storchaka
15.11.13 17:32, Roy Smith написав(ла): Anybody remember RAD-50? It let you represent a 6-character filename (plus a 3-character extension) in a 16 bit word. RT-11 used it, not sure if it showed up anywhere else. In three 16-bit words. -- https://mail.python.org/mailman/listinfo/python-list

Re: python 3.3 repr

2013-11-15 Thread Cousin Stanley
We don't say len({42: None}) to discover that the dict requires 136 bytes, why would you use len(heåvy) to learn that it uses 23 bytes ? #!/usr/bin/env python # -*- coding: utf-8 -*- illustrate the difference in length of python objects and the size of their system

Re: python 3.3 repr

2013-11-15 Thread Neil Cerutti
On 2013-11-15, Chris Angelico ros...@gmail.com wrote: Other languages _have_ gone for at least some sort of Unicode support. Unfortunately quite a few have done a half-way job and use UTF-16 as their internal representation. That means there's no difference between U+0012, U+0123, and U+1234,

Re: python 3.3 repr

2013-11-15 Thread Mark Lawrence
On 15/11/2013 16:36, Gene Heskett wrote: On Friday 15 November 2013 11:28:19 Joel Goldstick did opine: On Fri, Nov 15, 2013 at 10:03 AM, Robin Becker ro...@reportlab.com wrote: ... became popular. Really? you cried and laughed over 7 vs. 8 bits? That's lovely (?). ;). That

Unicode stdin/stdout (was: Re: python 3.3 repr)

2013-11-15 Thread random832
Of course, the real solution to this issue is to replace sys.stdout on windows with an object that can handle Unicode directly with the WriteConsoleW function - the problem there is that it will break code that expects to be able to use sys.stdout.buffer for binary I/O. I also wasn't able to get

Re: python 3.3 repr

2013-11-15 Thread Gene Heskett
On Friday 15 November 2013 13:52:40 Mark Lawrence did opine: On 15/11/2013 16:36, Gene Heskett wrote: On Friday 15 November 2013 11:28:19 Joel Goldstick did opine: On Fri, Nov 15, 2013 at 10:03 AM, Robin Becker ro...@reportlab.com wrote: ... became popular. Really?

Re: python 3.3 repr

2013-11-15 Thread Terry Reedy
On 11/15/2013 6:28 AM, Robin Becker wrote: I'm trying to understand what's going on with this simple program if __name__=='__main__': print(repr=%s % repr(u'\xc1')) print(%%r=%r % u'\xc1') On my windows XP box this fails miserably if run directly at a terminal C:\tmp

Re: python 3.3 repr

2013-11-15 Thread Steven D'Aprano
On Fri, 15 Nov 2013 17:47:01 +, Neil Cerutti wrote: The unicode support I'm learning in Go is, Everything is utf-8, right? RIGHT?!? It also has the interesting behavior that indexing strings retrieves bytes, while iterating over them results in a sequence of runes. It comes with