Re: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer
Terry Reedy wrote: > If orig_data were mutable (the new buffer, as proposed in the PEP), would > not > > for i in range(len(orig_data)): > orig_data[i] &= 0x1F > > do it in place? (I don't have .0a1 to try on the current bytes.) Good catch! Python 3.0a1 (py3k:58282, Sep 29 2007, 15:07:57) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 >>> orig_data = b"abc" >>> orig_data b'abc' >>> for i in range(len(orig_data)): ... orig_data[i] &= 0x1F ... >>> orig_data b'\x01\x02\x03' It'd be useful and more efficient if the new buffer type would support the bit wise operations directly: >>> orig_data &= 0x1F TypeError: unsupported operand type(s) for &=: 'bytes' and 'int' >>> orig_data &= b"\x1F" TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes' Christian ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer
I am hereby accepting my own PEP 3137. The responses fell into three categories: enthusiastic +1s, textual corrections, and ideas for future enhancements. That's about as positive as it gets for any proposal. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
[Python-3000] Emacs22 python.el support for py3k
I've submitted patches to emacs for python 3000 support. It does not handle any new syntax but the emacs<->python interaction works again. This applies to the python.el that ships with emacs22, not python-mode.el. The changes are available in emacs cvs. If you don't want to build a new copy it should be sufficient to pull the files python.el, emacs.py, emacs2.py and emacs3.py. -- Adam Hupp | http://hupp.org/adam/ ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Emacs22 python.el support for py3k
On 10/2/07, Adam Hupp <[EMAIL PROTECTED]> wrote: > I've submitted patches to emacs for python 3000 support. It does not > handle any new syntax but the emacs<->python interaction works again. > This applies to the python.el that ships with emacs22, not > python-mode.el. Just curious -- how do python.el and python-mode.el differ? > The changes are available in emacs cvs. If you don't want to build a > new copy it should be sufficient to pull the files python.el, > emacs.py, emacs2.py and emacs3.py. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Emacs22 python.el support for py3k
On 10/2/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > Just curious -- how do python.el and python-mode.el differ? Off the top of my head: * python-mode.el did not play well with transient-mark-mode (mark-block didn't work). transient-mark-mode highlights the marked region and is required for other functions (e.g. comment-dwim). * python-mode.el had problems with syntax highlighting in the presence of triple quoted strings and in comments. python.el does not. * python.el is supposed to be more consistent with other major modes. e.g. M-; for comment. * python.el ships with emacs. There are claims that python-mode.el was not as well maintained for FSF emacs as XEmacs. -- Adam Hupp | http://hupp.org/adam/ ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Emacs22 python.el support for py3k
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Oct 2, 2007, at 11:28 AM, Adam Hupp wrote: > On 10/2/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: >> >> Just curious -- how do python.el and python-mode.el differ? > > Off the top of my head: > > * python-mode.el did not play well with transient-mark-mode > (mark-block didn't work). transient-mark-mode highlights the marked > region and is required for other functions (e.g. comment-dwim). > > * python-mode.el had problems with syntax highlighting in the > presence of triple quoted strings and in comments. python.el does > not. > > * python.el is supposed to be more consistent with other major modes. > e.g. M-; for comment. > > * python.el ships with emacs. There are claims that python-mode.el > was not as well maintained for FSF emacs as XEmacs. It would be nice if there were only one mode that worked with both FSF Emacs and XEmacs and merged the best qualities of both modes. I don't have much time to work on that, and I suspect Skip is pretty busy too. Adam, if you're interested, willing, and able to help develop such a merge, [EMAIL PROTECTED] would be the place to do so. I'd certainly be willing to test and I'd try to do a limited amount of XEmacs compatibility hacking. - -Barry -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRwJk2XEjvBPtnXfVAQJ9ZgP/bbG+OSHEnWGCBIXibnTzxEUL2ifIO8YU E/odKLMogXKFc40/weansKpjX9+Mv+/ye7a49HPH+AZ2vxKJsFvZVHill6F3pbh2 bd+94O1AkYIsuJwO7u3Pc3clje85jXDSUtmPRM3yWGweLDNNDaS4kxE02tNqdSTd rKiHn4gUzYk= =zMKd -END PGP SIGNATURE- ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Last call for PEP 3137: Immutable Bytes andMutable Buffer
"Christian Heimes" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] | Terry Reedy wrote: | > If orig_data were mutable (the new buffer, as proposed in the PEP), would | > not | > | > for i in range(len(orig_data)): | > orig_data[i] &= 0x1F | > | > do it in place? (I don't have .0a1 to try on the current bytes.) | | Good catch! | | Python 3.0a1 (py3k:58282, Sep 29 2007, 15:07:57) | [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 | >>> orig_data = b"abc" | >>> orig_data | b'abc' | >>> for i in range(len(orig_data)): | ... orig_data[i] &= 0x1F | ... | >>> orig_data | b'\x01\x02\x03' Thanks for testing this! Glad it worked. This sort of thing makes having bytes/buffer[i] an int a plus. (Just noticed, PEP accepted.) | It'd be useful and more efficient if the new buffer type would support | the bit wise operations directly: | | >>> orig_data &= 0x1F | TypeError: unsupported operand type(s) for &=: 'bytes' and 'int' This sort of broadcast behavior seems like numpy territory to me. Or better for a buffer subclass. Write it first in Python, using loops like above (partly for documentation and other implementations), then in C when interest and usage warrents. | >>> orig_data &= b"\x1F" | TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes' Ugh is my response. Stick with the first ;-). Terry Jan Reedy ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Emacs22 python.el support for py3k
So is python.el a descendant of python-mode.el, or an independent development? On 10/2/07, Adam Hupp <[EMAIL PROTECTED]> wrote: > On 10/2/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > > > Just curious -- how do python.el and python-mode.el differ? > > Off the top of my head: > > * python-mode.el did not play well with transient-mark-mode > (mark-block didn't work). transient-mark-mode highlights the marked > region and is required for other functions (e.g. comment-dwim). > > * python-mode.el had problems with syntax highlighting in the > presence of triple quoted strings and in comments. python.el does > not. > > * python.el is supposed to be more consistent with other major modes. > e.g. M-; for comment. > > * python.el ships with emacs. There are claims that python-mode.el > was not as well maintained for FSF emacs as XEmacs. > > -- > Adam Hupp | http://hupp.org/adam/ > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Emacs22 python.el support for py3k
On 10/2/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > So is python.el a descendant of python-mode.el, or an independent development? I've never seen a definitive statement but I believe it was developed independently. -- Adam Hupp | http://hupp.org/adam/ ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Emacs22 python.el support for py3k
Guido> So is python.el a descendant of python-mode.el, or an independent Guido> development? Adam> I've never seen a definitive statement but I believe it was Adam> developed independently. Correct. Skip ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Python, int/long and GMP
Dnia 28-09-2007, Pt o godzinie 18:58 +0200, Victor Stinner pisze: > I don't know GMP internals. I thaught that GMP uses an hack for small > integers. It does not. (And I'm glad that it does not, because it allows for super-specialized representation of small integers where even the space for mpz_t itself is not allocated. An GMP-internal optimization for the same cases would be underutilized and thus wasteful.) > I may also use Python garbage collector for GMP memory allocations > since GMP allows to use my own memory allocating functions. This would make linking with another library which uses GMP impossible (unless the allocator is compatible with malloc, reentrant etc.). Glasgow Haskell has been unfortunate to go that way. > GMP also has its own reference counter mechanism :-/ It does not. -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
[Python-3000] Are strs sequences of characters or disguised byte strings?
In Python 3.0a1, exec() appears to normalize strings, but in other cases they don't appear to be normalized, and this leads to results that appear to be counter-intuitive in some cases, at least to me. >>> c1 = "\u00C7" >>> c2 = "C\u0327" >>> c3 = "\u0043\u0327" >>> c1, c2, c3 ('\xc7', 'C\u0327', 'C\u0327') >>> print(c1, c2) Ç Ç Clearly c1 and c2 are different at the byte level. But if we use them to create variables using exec(), Python appears to normalize them: >>> dir() ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3'] >>> exec("C\u0327 = 5") >>> dir() ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] >>> Ç 5 >>> exec("\u00C7 = -7") >>> dir() ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] >>> Ç -7 This seems to be the right behaviour to me, since from the point of view of a programmer, Ç is the name of the variable, no matter what the underlying byte encoding used to represent the variable's name. >>> print(c1, c2) Ç Ç >>> c1.encode("utf8") == c2.encode("utf8") False This is what I'd expect, since here I'm comparing the actual bytes. But when I compare them as strings I really expect them to be compared as sequences of characters (in a human sense), so this: >>> c1 == c2 False seems counter-intuitive to me. It is easy to fix: >>> from unicodedata import normalize >>> normalize("NFKD", c1) == normalize("NFKD", c2) True but isn't it asking a lot of Python users to use normalize() whenever they want to perform such a basic operation as string comparison? Another issue that arises is that you can end up with duplicate dictionary keys and set elements. (The duplication is in human terms, in byte terms the keys/set elements differ of course): >>> d = {c1: 1, c2: 2} >>> d {'C\u0327': 2, '\xc7': 1} >>> for k, v in d.items(): ... print(k, v) ... Ç 2 Ç 1 I think this is surprising. >>> s = {c1, c2} >>> s {'C\u0327', '\xc7'} >>> for x in s: ... print(x) ... Ç Ç And the same result applies to sets of course. I don't know what the performance costs would be for always normalizing strings, but it seems to me that if strings are not normalized, then they are really being treated as byte strings thinly disguised as strings rather than as true sequences of characters whose byte representation is a detail that programmers can ignore (unless they choose to explicitly decode). -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Are strs sequences of characters or disguised byte strings?
String objects are arrays of code units. They can represent normalized and unnormalized Unicode text just as easily, and even invalid data, like half a surrogate and other illegal code units. It is up to the application (or perhaps at some point the library) to implement various checks and normalizations. AFAIK this is the same stance that Java and C# take -- the String types there don't concern themselves with the higher levels of Unicode standard compliance. (Though those languages probably have more library support than Python does -- perhaps someone can contribute something, like wrappers for ICU?) However, for identifiers occurring in source code, we *do* normalize before comparing them. PEP 3131 should explain this. --Guido On 10/2/07, Mark Summerfield <[EMAIL PROTECTED]> wrote: > In Python 3.0a1, exec() appears to normalize strings, but in other cases > they don't appear to be normalized, and this leads to results that > appear to be counter-intuitive in some cases, at least to me. > > >>> c1 = "\u00C7" > >>> c2 = "C\u0327" > >>> c3 = "\u0043\u0327" > >>> c1, c2, c3 > ('\xc7', 'C\u0327', 'C\u0327') > >>> print(c1, c2) > Ç Ç > > Clearly c1 and c2 are different at the byte level. But if we use them to > create variables using exec(), Python appears to normalize them: > > >>> dir() > ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3'] > >>> exec("C\u0327 = 5") > >>> dir() > ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] > >>> Ç > 5 > >>> exec("\u00C7 = -7") > >>> dir() > ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] > >>> Ç > -7 > > This seems to be the right behaviour to me, since from the point of view > of a programmer, Ç is the name of the variable, no matter what the > underlying byte encoding used to represent the variable's name. > > >>> print(c1, c2) > Ç Ç > >>> c1.encode("utf8") == c2.encode("utf8") > False > > This is what I'd expect, since here I'm comparing the actual bytes. > > But when I compare them as strings I really expect them to be compared > as sequences of characters (in a human sense), so this: > > >>> c1 == c2 > False > > seems counter-intuitive to me. It is easy to fix: > > >>> from unicodedata import normalize > >>> normalize("NFKD", c1) == normalize("NFKD", c2) > True > > but isn't it asking a lot of Python users to use normalize() whenever > they want to perform such a basic operation as string comparison? > > Another issue that arises is that you can end up with duplicate > dictionary keys and set elements. (The duplication is in human terms, in > byte terms the keys/set elements differ of course): > > >>> d = {c1: 1, c2: 2} > >>> d > {'C\u0327': 2, '\xc7': 1} > >>> for k, v in d.items(): > ... print(k, v) > ... > Ç 2 > Ç 1 > > I think this is surprising. > > >>> s = {c1, c2} > >>> s > {'C\u0327', '\xc7'} > >>> for x in s: > ... print(x) > ... > Ç > Ç > > And the same result applies to sets of course. > > I don't know what the performance costs would be for always normalizing > strings, but it seems to me that if strings are not normalized, then > they are really being treated as byte strings thinly disguised as > strings rather than as true sequences of characters whose byte > representation is a detail that programmers can ignore (unless they > choose to explicitly decode). > > -- > Mark Summerfield, Qtrac Ltd., www.qtrac.eu > > ___ > Python-3000 mailing list > Python-3000@python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com