Re: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer

2007-10-02 Thread Christian Heimes
Terry Reedy wrote:
> If orig_data were mutable (the new buffer, as proposed in the PEP), would 
> not
> 
> for i in range(len(orig_data)):
>   orig_data[i] &= 0x1F
> 
> do it in place? (I don't have .0a1 to try on the current bytes.)

Good catch!

Python 3.0a1 (py3k:58282, Sep 29 2007, 15:07:57)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
>>> orig_data = b"abc"
>>> orig_data
b'abc'
>>> for i in range(len(orig_data)):
...   orig_data[i] &= 0x1F
...
>>> orig_data
b'\x01\x02\x03'

It'd be useful and more efficient if the new buffer type would support
the bit wise operations directly:

>>> orig_data &= 0x1F
TypeError: unsupported operand type(s) for &=: 'bytes' and 'int'
>>> orig_data &= b"\x1F"
TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes'

Christian

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer

2007-10-02 Thread Guido van Rossum
I am hereby accepting my own PEP 3137. The responses fell into three
categories: enthusiastic +1s, textual corrections, and ideas for
future enhancements. That's about as positive as it gets for any
proposal. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Emacs22 python.el support for py3k

2007-10-02 Thread Adam Hupp
I've submitted patches to emacs for python 3000 support.  It does not
handle any new syntax but the emacs<->python interaction works again.
This applies to the python.el that ships with emacs22, not
python-mode.el.

The changes are available in emacs cvs.  If you don't want to build a
new copy it should be sufficient to pull the files python.el,
emacs.py, emacs2.py and emacs3.py.

-- 
Adam Hupp | http://hupp.org/adam/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Emacs22 python.el support for py3k

2007-10-02 Thread Guido van Rossum
On 10/2/07, Adam Hupp <[EMAIL PROTECTED]> wrote:
> I've submitted patches to emacs for python 3000 support.  It does not
> handle any new syntax but the emacs<->python interaction works again.
> This applies to the python.el that ships with emacs22, not
> python-mode.el.

Just curious -- how do python.el and python-mode.el differ?

> The changes are available in emacs cvs.  If you don't want to build a
> new copy it should be sufficient to pull the files python.el,
> emacs.py, emacs2.py and emacs3.py.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Emacs22 python.el support for py3k

2007-10-02 Thread Adam Hupp
On 10/2/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
>
> Just curious -- how do python.el and python-mode.el differ?

Off the top of my head:

 * python-mode.el did not play well with transient-mark-mode
(mark-block didn't work).   transient-mark-mode highlights the marked
region and is required for other functions (e.g. comment-dwim).

 * python-mode.el had problems with syntax highlighting in the
presence of triple quoted strings and in comments.  python.el does
not.

 * python.el is supposed to be more consistent with other major modes.
 e.g. M-; for comment.

 * python.el ships with emacs.  There are claims that python-mode.el
was not as well maintained for FSF emacs as XEmacs.

-- 
Adam Hupp | http://hupp.org/adam/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Emacs22 python.el support for py3k

2007-10-02 Thread Barry Warsaw
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Oct 2, 2007, at 11:28 AM, Adam Hupp wrote:

> On 10/2/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
>>
>> Just curious -- how do python.el and python-mode.el differ?
>
> Off the top of my head:
>
>  * python-mode.el did not play well with transient-mark-mode
> (mark-block didn't work).   transient-mark-mode highlights the marked
> region and is required for other functions (e.g. comment-dwim).
>
>  * python-mode.el had problems with syntax highlighting in the
> presence of triple quoted strings and in comments.  python.el does
> not.
>
>  * python.el is supposed to be more consistent with other major modes.
>  e.g. M-; for comment.
>
>  * python.el ships with emacs.  There are claims that python-mode.el
> was not as well maintained for FSF emacs as XEmacs.

It would be nice if there were only one mode that worked with both  
FSF Emacs and XEmacs and merged the best qualities of both modes.  I  
don't have much time to work on that, and I suspect Skip is pretty  
busy too.  Adam, if you're interested, willing, and able to help  
develop such a merge, [EMAIL PROTECTED] would be the place to do  
so.

I'd certainly be willing to test and I'd try to do a limited amount  
of XEmacs compatibility hacking.

- -Barry

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRwJk2XEjvBPtnXfVAQJ9ZgP/bbG+OSHEnWGCBIXibnTzxEUL2ifIO8YU
E/odKLMogXKFc40/weansKpjX9+Mv+/ye7a49HPH+AZ2vxKJsFvZVHill6F3pbh2
bd+94O1AkYIsuJwO7u3Pc3clje85jXDSUtmPRM3yWGweLDNNDaS4kxE02tNqdSTd
rKiHn4gUzYk=
=zMKd
-END PGP SIGNATURE-
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Last call for PEP 3137: Immutable Bytes andMutable Buffer

2007-10-02 Thread Terry Reedy

"Christian Heimes" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
| Terry Reedy wrote:
| > If orig_data were mutable (the new buffer, as proposed in the PEP), 
would
| > not
| >
| > for i in range(len(orig_data)):
| >   orig_data[i] &= 0x1F
| >
| > do it in place? (I don't have .0a1 to try on the current bytes.)
|
| Good catch!
|
| Python 3.0a1 (py3k:58282, Sep 29 2007, 15:07:57)
| [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
| >>> orig_data = b"abc"
| >>> orig_data
| b'abc'
| >>> for i in range(len(orig_data)):
| ...   orig_data[i] &= 0x1F
| ...
| >>> orig_data
| b'\x01\x02\x03'

Thanks for testing this!  Glad it worked.  This sort of thing makes having 
bytes/buffer[i] an int a plus.  (Just noticed, PEP accepted.)

| It'd be useful and more efficient if the new buffer type would support
| the bit wise operations directly:
|
| >>> orig_data &= 0x1F
| TypeError: unsupported operand type(s) for &=: 'bytes' and 'int'

This sort of broadcast behavior seems like numpy territory to me.  Or 
better for a buffer subclass.  Write it first in Python, using loops like 
above (partly for documentation and other implementations), then in C when 
interest and usage warrents.

| >>> orig_data &= b"\x1F"
| TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes'

Ugh is my response.  Stick with the first ;-).

Terry Jan Reedy



___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Emacs22 python.el support for py3k

2007-10-02 Thread Guido van Rossum
So is python.el a descendant of python-mode.el, or an independent development?

On 10/2/07, Adam Hupp <[EMAIL PROTECTED]> wrote:
> On 10/2/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> >
> > Just curious -- how do python.el and python-mode.el differ?
>
> Off the top of my head:
>
>  * python-mode.el did not play well with transient-mark-mode
> (mark-block didn't work).   transient-mark-mode highlights the marked
> region and is required for other functions (e.g. comment-dwim).
>
>  * python-mode.el had problems with syntax highlighting in the
> presence of triple quoted strings and in comments.  python.el does
> not.
>
>  * python.el is supposed to be more consistent with other major modes.
>  e.g. M-; for comment.
>
>  * python.el ships with emacs.  There are claims that python-mode.el
> was not as well maintained for FSF emacs as XEmacs.
>
> --
> Adam Hupp | http://hupp.org/adam/
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Emacs22 python.el support for py3k

2007-10-02 Thread Adam Hupp
On 10/2/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> So is python.el a descendant of python-mode.el, or an independent development?

I've never seen a definitive statement but I believe it was developed
independently.

-- 
Adam Hupp | http://hupp.org/adam/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Emacs22 python.el support for py3k

2007-10-02 Thread skip

Guido> So is python.el a descendant of python-mode.el, or an independent
Guido> development?

Adam> I've never seen a definitive statement but I believe it was
Adam> developed independently.

Correct.

Skip
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Python, int/long and GMP

2007-10-02 Thread Marcin 'Qrczak' Kowalczyk
Dnia 28-09-2007, Pt o godzinie 18:58 +0200, Victor Stinner pisze:

> I don't know GMP internals. I thaught that GMP uses an hack for small
> integers.

It does not.

(And I'm glad that it does not, because it allows for super-specialized
representation of small integers where even the space for mpz_t itself
is not allocated. An GMP-internal optimization for the same cases would
be underutilized and thus wasteful.)

> I may also use Python garbage collector for GMP memory allocations
> since GMP allows to use my own memory allocating functions.

This would make linking with another library which uses GMP impossible
(unless the allocator is compatible with malloc, reentrant etc.).
Glasgow Haskell has been unfortunate to go that way.

> GMP also has its own reference counter mechanism :-/

It does not.

-- 
   __("< Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Are strs sequences of characters or disguised byte strings?

2007-10-02 Thread Mark Summerfield
In Python 3.0a1, exec() appears to normalize strings, but in other cases
they don't appear to be normalized, and this leads to results that
appear to be counter-intuitive in some cases, at least to me.

>>> c1 = "\u00C7"
>>> c2 = "C\u0327"
>>> c3 = "\u0043\u0327"
>>> c1, c2, c3
('\xc7', 'C\u0327', 'C\u0327')
>>> print(c1, c2)
Ç Ç

Clearly c1 and c2 are different at the byte level. But if we use them to
create variables using exec(), Python appears to normalize them:

>>> dir()
['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3']
>>> exec("C\u0327 = 5")
>>> dir()
['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7']
>>> Ç
5
>>> exec("\u00C7 = -7")
>>> dir()
['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7']
>>> Ç
-7

This seems to be the right behaviour to me, since from the point of view
of a programmer, Ç is the name of the variable, no matter what the
underlying byte encoding used to represent the variable's name.

>>> print(c1, c2)
Ç Ç
>>> c1.encode("utf8") == c2.encode("utf8")
False

This is what I'd expect, since here I'm comparing the actual bytes.

But when I compare them as strings I really expect them to be compared
as sequences of characters (in a human sense), so this:

>>> c1 == c2
False

seems counter-intuitive to me. It is easy to fix:

>>> from unicodedata import normalize
>>> normalize("NFKD", c1) == normalize("NFKD", c2)
True

but isn't it asking a lot of Python users to use normalize() whenever
they want to perform such a basic operation as string comparison?

Another issue that arises is that you can end up with duplicate
dictionary keys and set elements. (The duplication is in human terms, in
byte terms the keys/set elements differ of course):

>>> d = {c1: 1, c2: 2}
>>> d
{'C\u0327': 2, '\xc7': 1}
>>> for k, v in d.items():
... print(k, v)
...
Ç 2
Ç 1

I think this is surprising.

>>> s = {c1, c2}
>>> s
{'C\u0327', '\xc7'}
>>> for x in s:
... print(x)
...
Ç
Ç

And the same result applies to sets of course.

I don't know what the performance costs would be for always normalizing
strings, but it seems to me that if strings are not normalized, then
they are really being treated as byte strings thinly disguised as
strings rather than as true sequences of characters whose byte
representation is a detail that programmers can ignore (unless they
choose to explicitly decode).

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Are strs sequences of characters or disguised byte strings?

2007-10-02 Thread Guido van Rossum
String objects are arrays of code units. They can represent normalized
and unnormalized Unicode text just as easily, and even invalid data,
like half a surrogate and other illegal code units. It is up to the
application (or perhaps at some point the library) to implement
various checks and normalizations. AFAIK this is the same stance that
Java and C# take -- the String types there don't concern themselves
with the higher levels of Unicode standard compliance. (Though those
languages probably have more library support than Python does --
perhaps someone can contribute something, like wrappers for ICU?)

However, for identifiers occurring in source code, we *do* normalize
before comparing them. PEP 3131 should explain this.

--Guido

On 10/2/07, Mark Summerfield <[EMAIL PROTECTED]> wrote:
> In Python 3.0a1, exec() appears to normalize strings, but in other cases
> they don't appear to be normalized, and this leads to results that
> appear to be counter-intuitive in some cases, at least to me.
>
> >>> c1 = "\u00C7"
> >>> c2 = "C\u0327"
> >>> c3 = "\u0043\u0327"
> >>> c1, c2, c3
> ('\xc7', 'C\u0327', 'C\u0327')
> >>> print(c1, c2)
> Ç Ç
>
> Clearly c1 and c2 are different at the byte level. But if we use them to
> create variables using exec(), Python appears to normalize them:
>
> >>> dir()
> ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3']
> >>> exec("C\u0327 = 5")
> >>> dir()
> ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7']
> >>> Ç
> 5
> >>> exec("\u00C7 = -7")
> >>> dir()
> ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7']
> >>> Ç
> -7
>
> This seems to be the right behaviour to me, since from the point of view
> of a programmer, Ç is the name of the variable, no matter what the
> underlying byte encoding used to represent the variable's name.
>
> >>> print(c1, c2)
> Ç Ç
> >>> c1.encode("utf8") == c2.encode("utf8")
> False
>
> This is what I'd expect, since here I'm comparing the actual bytes.
>
> But when I compare them as strings I really expect them to be compared
> as sequences of characters (in a human sense), so this:
>
> >>> c1 == c2
> False
>
> seems counter-intuitive to me. It is easy to fix:
>
> >>> from unicodedata import normalize
> >>> normalize("NFKD", c1) == normalize("NFKD", c2)
> True
>
> but isn't it asking a lot of Python users to use normalize() whenever
> they want to perform such a basic operation as string comparison?
>
> Another issue that arises is that you can end up with duplicate
> dictionary keys and set elements. (The duplication is in human terms, in
> byte terms the keys/set elements differ of course):
>
> >>> d = {c1: 1, c2: 2}
> >>> d
> {'C\u0327': 2, '\xc7': 1}
> >>> for k, v in d.items():
> ... print(k, v)
> ...
> Ç 2
> Ç 1
>
> I think this is surprising.
>
> >>> s = {c1, c2}
> >>> s
> {'C\u0327', '\xc7'}
> >>> for x in s:
> ... print(x)
> ...
> Ç
> Ç
>
> And the same result applies to sets of course.
>
> I don't know what the performance costs would be for always normalizing
> strings, but it seems to me that if strings are not normalized, then
> they are really being treated as byte strings thinly disguised as
> strings rather than as true sequences of characters whose byte
> representation is a detail that programmers can ignore (unless they
> choose to explicitly decode).
>
> --
> Mark Summerfield, Qtrac Ltd., www.qtrac.eu
>
> ___
> Python-3000 mailing list
> Python-3000@python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com