Re: [Python-3000] issue 3187 and decoding filenames

2008-08-21 Thread Victor Stinner
Le Thursday 21 August 2008 15:30:22 Benjamin Peterson, vous avez écrit :
> Issue 3187 is a case where os.listdir tries to decode filenames with
> the default file system encoding, but failing that, simply returns the
> unencoded bytestring. This was obviously ok in 2.x, but not so good in
> py3k where bytes are cleanly separated from unicode.

I'm trying to write a workaround in Python, but I'm unable to write my own 
class which is compatible with "buffer interface"... What is this interface?

It works with bytes, but I don't want to inherit bytes (nor str).

  class MyBuffer(bytes):
 def __new__(self, data):
obj = bytes.__new__(self, data)
obj.myattribute = 42
    return obj


-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Victor Stinner
Hi,

After reading the previous discussion, here is new proposition.

Python 2.x and Windows are not affected by this issue. Only Python3 on POSIX 
(eg. Linux or *BSD) is affected.

Some system are broken, but Python have to be able to open/copy/move/remove 
files with an "invalid filename".

The issue can wait for Python 3.0.1 / 3.1.

Windows
---

On Windows, we might reject bytes filenames for all file operations: open(), 
unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)

POSIX OS


The default behaviour should be to use unicode and raise an error if 
conversion to unicode fails. It should also be possible to use bytes using 
bytes arguments and optional arguments (for getcwd).

 - listdir(unicode) -> unicode and raise an error on invalid filename
 - listdir(bytes) -> bytes
 - getcwd() -> unicode
 - getcwd(bytes=True) -> bytes
 - open(): accept bytes or unicode

os.path.*() should accept operations on bytes filenames, but maybe not on 
bytes+unicode arguments. os.path.join('directory', b'filename'): raise an 
error (or use *implicit* conversion to bytes)?

When the user wants to display a filename to the screen, he can uses:
   text = str(filename, fs_encoding, "replace")

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Victor Stinner
Le Monday 29 September 2008 06:43:55, vous avez écrit :
> It will make users happy, and it's simple enough to implement for
> python 3.0.

I dislike your argument. A "quick and dirty hack" is always faster to 
implement than a real solution, but we may hits later new issues if we don't 
choose the right solution.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-29 Thread Victor Stinner
Patches are already avaible in the issue #3187 (os.listdir):

Le Monday 29 September 2008 14:07:55 Victor Stinner, vous avez écrit :
>  - listdir(unicode) -> unicode and raise an error on invalid filename

Need raise_decoding_errors.patch (don't clear Unicode error

>  - listdir(bytes) -> bytes

Always working.

>  - getcwd() -> unicode
>  - getcwd(bytes=True) -> bytes

Need merge_os_getcwd_getcwdu.patch

Note that current implement of getcwd() uses PyUnicode_FromString() to encode 
the directory, whereas getcwdu() uses the correct code (PyUnicode_Decode). So 
I merged both functions to keep only the correct version: getcwdu() => 
getcwd().

>  - open(): accept bytes or unicode

Need io_byte_filename.patch (just remove a check)

> os.path.*() should accept operations on bytes filenames, but maybe not on
> bytes+unicode arguments. os.path.join('directory', b'filename'): raise an
> error (or use *implicit* conversion to bytes)?

os.path.join() already reject mixing bytes + str.

But os.path.join(), glob.glob(), fnmatch.*(), etc. doesn't support bytes. I 
wrote some patches like:
 - glob1_bytes.patch: Fix glob.glob() to accept invalid directory name
 - fnmatch_bytes.patch: Patch fnmatch.filter() to accept bytes filenames

But I dislike both patches since they mix bytes and str. So this part still 
need some work.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Victor Stinner
Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez écrit :
> >  - getcwd() -> unicode
> >  - getcwd(bytes=True) -> bytes
>
> Please let's not introduce boolean flags like this. How about
> ``getcwdb`` in parallel with the old ``getcwdu``?

Yeah, you're right. So i wrote a new patch: os_getcwdb.patch

With my patch we get (Python3):
 * os.getcwd() -> unicode
 * os.getcwdb() -> bytes

Previously in Python2 it was:
 * os.getcwd() -> str (bytes)
 * os.getcwdu() -> unicode

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Victor Stinner
Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
> >>  - listdir(unicode) -> unicode and raise an error on invalid filename
>
> I know I keep flipflopping on this one, but the more I think about it
> the more I believe it is better to drop those names than to raise an
> exception. Otherwise a "naive" program that happens to use
> os.listdir() can be rendered completely useless by a single non-UTF-8
> filename. Consider the use of os.listdir() by the glob module. If I am
> globbing for *.py, why should the presence of a file named b'\xff'
> cause it to fail?

It would be hard for a newbie programmer to understand why he's unable to find 
his very important file ("important r?port.doc") using os.listdir(). And yes, 
if your file system is broken, glob() will fail.

If we choose to support bytes on Linux, a robust and portable program have to 
use only bytes filenames on Linux to always be able to list and open files.

A full example to list files and display filenames:

  import os
  import os.path
  import sys
  if os.path.supports_unicode_filenames:
 cwd = getcwd()
  else:
 cwd = getcwdb()
 encoding = sys.getfilesystemencoding()
  for filename in os.listdir(cwd):
 if os.path.supports_unicode_filenames:
text = str(filename, encoding, "replace)
 else:
text = filename
 print("=== File {0} ===".format(text))
 for line in open(filename):
...

We need an "if" to choose the directory. The second "if" is only needed to 
display the filename. Using bytes, it would be possible to write better code 
detect the real charset (eg. ISO-8859-1 in a UTF-8 file system) and so 
display correctly the filename and/or propose to rename the file. Would it 
possible using UTF-8b / PUA hacks?

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Victor Stinner
Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez écrit :
> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
> encoding (if it were UTF-8 otherwise), despite possible surprises when a
> such-encoded filename escapes from Python.

If I understand correctly this solution. The idea is to change the default 
file system encoding, right? Eg. if your filesystem is UTF-8, use ISO-8859-1 
to make sure that UTF-8 conversion will never fail.

Let's try with an ugly directory on my UTF-8 file system:
$ find
.
./têste
./ô
./a?b
./dossié
./dossié/abc
./dir?name
./dir?name/xyz

Python3 using encoding=ISO-8859-1:
>>> import os; os.listdir(b'.')
[b't\xc3\xaaste', b'\xc3\xb4', b'a\xffb', b'dossi\xc3\xa9', b'dir\xffname']
>>> files=os.listdir('.'); files
['têste', 'ô', 'aÿb', 'dossié', 'dirÿname']
>>> open(files[0]).close()
>>> os.listdir(files[-1])
['xyz']

Ok, I have unicode filenames and I'm able to open a file and list a directory. 
The problem is now to display correctly the filenames.

For me "unicode" sounds like "text (characters) encoded in the correct 
charset". In this case, unicode is just a storage for *bytes* in a custom 
charset.

How can we mix  with ? Eg. os.path.join('dossié', "fichié") : first argument is encoded 
in ISO-8859-1 whereas the second argument is encoding in Unicode. It's 
something like that:
   str(b'dossi\xc3\xa9', 'ISO-8859-1') + '/' + 'fichi\xe9'

Whereas the correct (unicode) result should be: 
   'dossié/fichié'
as bytes in ISO-8859-1:
   b'dossi\xc3\xa9/fichi\xc3\xa9'
as bytes in UTF-8:
   b'dossi\xe9/fichi\xe9'

Change the default file system encoding to store bytes in Unicode is like 
introducing a new Python type: .

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Victor Stinner
Le Tuesday 30 September 2008 01:31:45 Adam Olsen, vous avez écrit :
> The alternative is not be valid unicode, but since we can't use such 
> objects with external libs, can't even print them, we might as well 
> call them something else.  We already have a name for that: bytes.

:-)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Patch for an initial support of bytes filename in Python3

2008-09-29 Thread Victor Stinner
Hi,

See attached patch: python3_bytes_filename.patch

Using the patch, you will get:
 - open() support bytes
 - listdir(unicode) -> only unicode, *skip* invalid filenames 
   (as asked by Guido)
 - remove os.getcwdu()
 - create os.getcwdb() -> bytes
 - glob.glob() support bytes
 - fnmatch.filter() support bytes
 - posixpath.join() and posixpath.split() support bytes

Mixing bytes and str is invalid. Examples raising a TypeError:
 - posixpath.join(b'x', 'y')
 - fnmatch.filter([b'x', 'y'], '*')
 - fnmatch.filter([b'x', b'y'], '*')
 - glob.glob1('.', b'*')
 - glob.glob1(b'.', '*')

$ diffstat ~/python3_bytes_filename.patch
 Lib/fnmatch.py|7 +++-
 Lib/glob.py   |   15 ++---
 Lib/io.py |2 -
 Lib/posixpath.py  |   20 
 Modules/posixmodule.c |   83 
++
 5 files changed, 62 insertions(+), 65 deletions(-)

TODO:
 - review this patch :-)
 - support non-ASCII bytes in fnmatch.filter()
 - fix other functions, eg. posixpath.isabs() and fnmatch.fnmatchcase()
 - fix functions written in C: grep FileSystemDefaultEncoding
 - make sure that mixing bytes and str is rejected

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
Index: Lib/posixpath.py
===
--- Lib/posixpath.py	(révision 66687)
+++ Lib/posixpath.py	(copie de travail)
@@ -59,14 +59,18 @@
 """Join two or more pathname components, inserting '/' as needed.
 If any component is an absolute path, all previous path components
 will be discarded."""
+if isinstance(a, bytes):
+sep = b'/'
+else:
+sep = '/'
 path = a
 for b in p:
-if b.startswith('/'):
+if b.startswith(sep):
 path = b
-elif path == '' or path.endswith('/'):
+elif not path or path.endswith(sep):
 path +=  b
 else:
-path += '/' + b
+path += sep + b
 return path
 
 
@@ -78,10 +82,14 @@
 def split(p):
 """Split a pathname.  Returns tuple "(head, tail)" where "tail" is
 everything after the final slash.  Either part may be empty."""
-i = p.rfind('/') + 1
+if isinstance(p, bytes):
+sep = b'/'
+else:
+sep = '/'
+i = p.rfind(sep) + 1
 head, tail = p[:i], p[i:]
-if head and head != '/'*len(head):
-head = head.rstrip('/')
+if head and head != sep*len(head):
+head = head.rstrip(sep)
 return head, tail
 
 
Index: Lib/glob.py
===
--- Lib/glob.py	(révision 66687)
+++ Lib/glob.py	(copie de travail)
@@ -27,7 +27,7 @@
 return
 dirname, basename = os.path.split(pathname)
 if not dirname:
-for name in glob1(os.curdir, basename):
+for name in glob1(None, basename):
 yield name
 return
 if has_magic(dirname):
@@ -49,9 +49,8 @@
 def glob1(dirname, pattern):
 if not dirname:
 dirname = os.curdir
-if isinstance(pattern, str) and not isinstance(dirname, str):
-dirname = str(dirname, sys.getfilesystemencoding() or
-   sys.getdefaultencoding())
+if isinstance(pattern, bytes):
+dirname = dirname.encode("ASCII")
 try:
 names = os.listdir(dirname)
 except os.error:
@@ -73,6 +72,12 @@
 
 
 magic_check = re.compile('[*?[]')
+magic_check_bytes = re.compile(b'[*?[]')
 
 def has_magic(s):
-return magic_check.search(s) is not None
+if isinstance(s, bytes):
+match = magic_check_bytes.search(s)
+else:
+match = magic_check.search(s)
+return match is not None
+
Index: Lib/fnmatch.py
===
--- Lib/fnmatch.py	(révision 66687)
+++ Lib/fnmatch.py	(copie de travail)
@@ -43,7 +43,12 @@
 result=[]
 pat=os.path.normcase(pat)
 if not pat in _cache:
-res = translate(pat)
+if isinstance(pat, bytes):
+pat_str = str(pat, "ASCII")
+res_str = translate(pat_str)
+res = res_str.encode("ASCII")
+else:
+res = translate(pat)
 _cache[pat] = re.compile(res)
 match=_cache[pat].match
 if os.path is posixpath:
Index: Lib/io.py
===
--- Lib/io.py	(révision 66687)
+++ Lib/io.py	(copie de travail)
@@ -180,7 +180,7 @@
 opened in a text mode, and for bytes a BytesIO can be used like a file
 opened in a binary mode.
 """
-if not isi

Re: [Python-3000] [Python-Dev] Patch for an initia l support of bytes filename in Python3

2008-09-30 Thread Victor Stinner
Hi,

> This is the most sane contribution I've seen so far :).

Oh thanks.

> Do I understand properly that (listdir(bytes) -> bytes)?

Yes, os.listdir(bytes)->bytes. It's already the current behaviour.

But with Python3 trunk, os.listdir(str) -> str ... or bytes (if unicode 
conversion fails).

> If so, this seems basically sane to me, since it provides text behavior
> where possible and allows more sophisticated filesystem wrappers (i.e.
> Twisted's FilePath, Will McGugan's "FS") to do more tricky things,
> separating filenames for display to the user and filenames for exchange
> with the FS.

It's the goal of my patch. Let people do what you want with bytes: rename the 
file, try the best charset to display the filename, etc.

> >- remove os.getcwdu()
> >- create os.getcwdb() -> bytes
> >- glob.glob() support bytes
> >- fnmatch.filter() support bytes
> >- posixpath.join() and posixpath.split() support bytes
>
> It sounds like maybe there should be some 2to3 fixers in here somewhere,
> too?

IMHO a programmer should not use bytes for filenames. Only specific programs 
used to fix a broken system (eg. convmv program), a backup program, etc. 
should use bytes. So the "default" type (type and not charset) for filenames 
should be str in Python3.

If my patch would be applied, 2to3 have to replace getcwdu() to getcwd(). 
That's all.

> Not necessarily as part of this patch, but somewhere related?  I 
> don't know what they would do, but it does seem quite likely that code
> which was previously correct under 2.6 (using bytes) would suddenly be
> mixing bytes and unicode with these APIs.

It looks like 2to3 convert all text '...' or u'...' to unicode (str). So 
converted programs will use str for filenames.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Victor Stinner
Le Tuesday 30 September 2008 15:53:09 Guido van Rossum, vous avez écrit :
> On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> 
wrote:
> >> Change the default file system encoding to store bytes in Unicode is
> >> like introducing a new Python type: .
> >
> > Exactly. Seems like the best solution to me, despite your polemics.
>
> Martin, I don't understand why you are in favor of storing raw bytes
> encoded as Latin-1 in Unicode string objects, which clearly gives rise
> to mojibake. In the past you have always been staunchly opposed to API
> changes or practices that could lead to mojibake (and you had me quite
> convinced).

If I understood correctly, the goal of Python3 is the clear *separation* of 
bytes and characters. Store bytes in Unicode is pratical because it doesn't 
need to change the existing code, but it doesn't fix the problem, it's just 
move problems which be raised later.

I didn't get an answer to my question: what is the result  + ? I guess that the result is 
 instead of raising an error 
(invalid types). So again: why introducing a new type instead of reusing 
existing Python types?

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Filename: unicode normalization

2008-09-30 Thread Victor Stinner
Since it's hard to follow the filename thread on two mailing list, i'm 
starting a new thread only on python-3000 about unicode normalization of the 
filenames.

Bad news: it looks like Linux doesn't normalize filenames. So if you used NFC 
to create a file, you have to reuse NFC to open your file (and the same for 
NFD).

Python2 example to create files in the different forms:
>>> name=u'xäx'
>>> from unicodedata import normalize
>>> open(u'NFD-' + normalize('NFD', name), 'w').close()
>>> open(u'NFC-' + normalize('NFC', name), 'w').close()
>>> open(u'NFKC-' + normalize('NFKC', name), 'w').close()
>>> open(u'NFKD-' + normalize('NFKD', name), 'w').close()
>>> import os
>>> os.listdir('.')
['NFD-xa\xcc\x88x', 'NFC-x\xc3\xa4x', 'NFKC-x\xc3\xa4x', 'NFKD-xa\xcc\x88x']
>>> os.listdir(u'.')
[u'NFD-xa\u0308x', u'NFC-x\xe4x', u'NFKC-x\xe4x', u'NFKD-xa\u0308x']

Directory listing using Python3:
>>> import os
>>> [ name.encode('utf-8') for name in  os.listdir('.') ]
[b'NFD-xa\xcc\x88x', b'NFC-x\xc3\xa4x', b'NFKC-x\xc3\xa4x', 
b'NFKD-xa\xcc\x88x']
>>> os.listdir('.')
['NFD-xäx', 'NFC-xäx', 'NFKC-xäx', 'NFKD-xäx']

Same results, correct. Then try to open files:
>>> open(normalize('NFC', 'NFC-xäx')).close()
>>> open(normalize('NFD', 'NFC-xäx')).close()
IOError: [Errno 2] No such file or directory: 'NFC-xäx'
>>> open(normalize('NFD', 'NFD-xäx')).close()
>>> open(normalize('NFC', 'NFD-xäx')).close()
IOError: [Errno 2] No such file or directory: 'NFD-xäx'

If the user chooses a result from os.listdir(): no problem (if he has good 
eyes and he's able to find the difference between 'xäx' (NFD) and 'xäx' 
(NFC) :-D).

If the user enters the filename using the keyboard (on the command line or a 
GUI dialog), you have to hope that the keyboard is encoded in the same norm 
than the filename was encoded...

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Victor Stinner
Le Wednesday 01 October 2008 00:28:22 Martin v. Löwis, vous avez écrit :
> I don't think we will manage to release Python 3.0 this year if that
> change is to be implemented. And then, I don't think the release manager
> will agree to such a delay.

The minimum change is to disallow bytes/str mix:
 - os.listdir(unicode)->unicode and ignore invalid files
   (current behaviour is to return unicode and bytes)
 - os.readlink(unicode)->unicode or raise an error
   (current behaviour is to return unicode or bytes)
 - remove os.getcwdu() (use its code -which is better- for getcwd) 
   and fix the test_unicode_file.py

listdir() change (ignore invalid filenames) is important to avoid strange bugs 
in os.path.*(), glob.*() or on displaying a filename.

I can generate a specific patch for these issues. It's just a subset of my 
last patch.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] [Python-Dev] New proposition for Pyt hon3 bytes filename issue

2008-10-01 Thread Victor Stinner
Le Wednesday 01 October 2008 04:06:25 [EMAIL PROTECTED], vous avez écrit :
> b = gtk.Button(u"\u/hello/world")
>
> which emits this message:
> TypeError: OGtkButton.__init__() argument 1 must be string without
> null bytes or None, not unicode
>
> SQLite has a similar problem with NULLs, and I'm definitely sticking
> paths in there, too.

I think that you can say "all C libraries".

Would it possible to convert the encoded string to bytes just before call Gtk? 
(job done by some Python internals, not as an explicit conversion)

I don't know if it would help the discussion, but Java uses its own modified 
UTF-8 encoding:
 * NUL byte is encoded as 0xc0 0x80 instead of 0x00
 * Java doesn't support unicode > 0x (boh!)
http://java.sun.com/javase/6/docs/api/java/io/DataInput.html#modified-utf-8

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] PEP: Python3 and UnicodeDecodeError

2008-10-02 Thread Victor Stinner
This is a PEP describing the behaviour of Python3 on UnicodeDecodeError. It's 
a *draft*, don't hesitate to comment it. This document suppose that my patch 
to allow bytes filenames is accept which is not the case today.

While I was writing this document I found poential problems in Python3. So 
here is a TODO list (things to be checked):

FIXME: PyUnicode_DecodeFSDefaultAndSize(): errors="replace"!
FIXME: import.c uses ASCII if default file system is unknown, whereas other
   functions uses UTF-8
FIXME: Write a function in Python3 to convert a bytes filename to a nice
   string
FIXME: When bytearray is accepted or not?
FIXME: Allow bytes/str mix for shutil.copy*()? The ignore callback will get
   bytes or unicode?
FIXME: Use a shorter title for this PEP :-)

Can anyone write a section about bytes encoding in Unicode using escape 
sequence?

What is the best tool to work on a PEP? I hate email threads, and I would 
prefer SVN / Mercurial / anything else.
---

Title: Python3 and UnicodeDecodeError for the command line, 
   environment variables and filenames

Introduction


Python3 does its best to give you texts encoded as a valid unicode characters
strings. When it hits an invalid bytes sequence (according to the used
charset), it has two choices: drops the value or raises an UnicodeDecodeError.
This document present the behaviour of Python3 for the command line,
environment variables and filenames.

Example of an invalid bytes sequence: ::

>>> str(b'\xff', 'utf8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff (...)

whereas the same byte sequence is valid in another charset like ISO-8859-1: ::

>>> str(b'\xff', 'iso-8859-1')
'ÿ'


Default encoding


Python uses "UTF-8" as the default Unicode encoding. You can read the default
charset using sys.getdefaultencoding(). The "default encoding" is used by
PyUnicode_FromStringAndSize().

A function sys.setdefaultencoding() exists, but it raises a ValueError for
charset different than UTF-8 since the charset is hardcoded in
PyUnicode_FromStringAndSize().


Command line


Python creates a nice unicode table for sys.argv using mbstowcs(): ::

$ ./python -c 'import sys; print(sys.argv)' 'Ho hé !'
['-c', 'Ho hé !']

On Linux, mbstowcs() uses LC_CTYPE environement variable to choose the
encoding. On an invalid bytes sequence, Python quits directly with an exit
code 1. Example with UTF-8 locale: ::

 $ python3.0 $(echo -e 'invalid:\xff')
 Could not convert argument 1 to string


Environment variables
=

Python uses "_wenviron" on Windows which are contains unicode (UTF-16-LE)
strings.  On other OS, it uses "environ" variable and the UTF-8 charset. It
drops a variable if its key or value is not convertible to unicode.
Example: ::

env -i HOME=/home/my PATH=$(echo -e "\xff") python
>>> import os; list(os.environ.items())
[('HOME', '/home/my')]

Both key and values are unicode strings. Empty key and/or value are allowed.


Filenames
=

Introduction


Python2 uses byte filenames everywhere, but it was also possible to use
unicode filenames. Examples:
 - os.getcwd() gives bytes whereas os.getcwdu() always returns unicode
 - os.listdir(unicode) creates bytes or unicode filenames (fallback to bytes
   on UnicodeDecodeError), os.readlink() has the same behaviour
 - glob.glob() converts the unicode pattern to bytes, and so create bytes
   filenames
 - open() supports bytes and unicode

Since listdir() mix bytes and unicode, you are not able to manipulate easily
filenames: ::

>>> path=u'.'
>>> for name in os.listdir(path):
...  print repr(name)
...  print repr(os.path.join(path, name))
...
u'valid'
u'./valid'
'invalid\xff'
Traceback (most recent call last):
  ...
  File "/usr/lib/python2.5/posixpath.py", line 65, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff (...)

Python3 supports both types, bytes and unicode, but disallow mixing them. If
you ask for unicode, you will always get unicode or an exception is raised.

You should only use unicode filenames, except if you are writing a program
fixing file system encoding, a backup tool or you users are unable to fix
their broken system.

Windows
---

Microsoft Windows since Windows 95 only uses Unicode (UTF-16-LE) filenames.
So you should only use unicode filenames.

Non Windows (POSIX)
---

POSIX OS like Linux uses bytes for historical reasons. In the best case, all
filenames will be encoded as valid UTF-8 strings and Python creates valid
unicode strings. But since system calls uses bytes, the file system may
returns an invalid filename, or a program can creates a file with an invalid
filename.

An invalid filename is a string which can not be decoded to unicode using the
default file system encoding (which is UTF-8 most of the time).

A robust program have to use only the bytes t

Re: [Python-3000] PEP: Python3 and UnicodeDecodeError

2008-10-02 Thread Victor Stinner
Le Thursday 02 October 2008 14:07:50 M.-A. Lemburg, vous avez écrit :
> On 2008-10-02 13:50, Victor Stinner wrote:
> > This is a PEP (...)
>
> The PEP doesn't appear to address any potential changes. Wouldn't
> it be better to add such information to the Python3 documentation
> itself ?!

I don't know the right name of this document. Yeah, it may move to Doc/ in 
Python3 source code.

> > Example of an invalid bytes sequence: ::
> > >>> str(b'\xff', 'utf8')
> > UnicodeDecodeError
> >
> > >>> str(b'\xff', 'iso-8859-1')
> > 'ÿ'
>
> You have left out all the options you have by using a different
> error handling mechanism (using a third parameter to str()), e.g.
> 'replace', 'ignore', etc.

Yes, I can explain why replace and ignore can *not* be use in this case. If 
you use ignore or replace, filenames will be valid unicode strings, but you 
will be unable to open / copy / remove you file.

> > Default encoding
> > 
> >
> > Python uses "UTF-8" as the default Unicode encoding. You can read the
> > default charset using sys.getdefaultencoding(). The "default encoding" is
> > used by PyUnicode_FromStringAndSize().
>
> Not only there: the C API makes various assumptions on the default
> encoding as well. We should probably drop the term "default encoding"
> altogether and replace it with "utf-8".

The concept of "default encoding" is unclear in Python. Yes, we might remove 
sys.getdefaultencoding() and write that PyUnicode_FromStringAndSize() uses 
the UTF-8 charset.

> sys.setdefaultencoding() should probably be dropped altogether from
> Python3.

Yes.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Issues about Python script encoding

2008-10-02 Thread Victor Stinner
Python3 traceback have bugs making debugging harder:

[Py3k] line number is wrong after encoding declaration
   http://bugs.python.org/issue2384

PyTraceBack_Print() doesn't respect # coding: xxx header
   http://bugs.python.org/issue3975

Both issues has patch + testcase.

--

About the coding header, IDLE doesn't read #coding: header. Here is a fix (use 
tokenize.detect_encoding):
http://bugs.python.org/issue4008

And finally, two more patches for the encoding detecting in:
http://bugs.python.org/issue4016
 -> use tokenize.detect_encoding() in linecache (instead of a duplicate
incomplete (eg. no UTF-8 BOM support) code to detect the encoding)
 -> reuse codecs.BOM_UTF8 in tokenize

That's all for today :)

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] PEP: Python3 and UnicodeDecodeError

2008-10-02 Thread Victor Stinner
Le Thursday 02 October 2008 14:31:06, vous avez écrit :
> Victor - the Python wiki is also one of the easiest places to work on
> early PEP drafts. See
> http://wiki.python.org/moin/PythonEnhancementProposals.

Ok, I converted the document to the wiki syntax:
http://wiki.python.org/moin/Python3UnicodeDecodeError

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Issues about Python script encoding

2008-10-02 Thread Victor Stinner
Le Thursday 02 October 2008 22:32:43 Martin v. Löwis, vous avez écrit :
> > About the coding header, IDLE doesn't read #coding: header. Here is a fix
> > (use tokenize.detect_encoding):
> > http://bugs.python.org/issue4008
>
> Are you really sure about that? It did in the past.

Try IDLE in an ASCII terminal:
   python Tools/scripts/idle idle-3.0rc1-quits-when-run.py

(the .py file is attached to the issue).

IDLE use open(filename, 'r') without setting the encoding. io module is not 
aware of the #coding: header.

The issue is maybe related to the terminal locale since IDLE uses a "locale 
encoding" (import IOBinding; IOBinding.encoding) which is marked 
as "deprecated" in IDLE source code.

(We should use the bug tracker to discuss this issue)

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Proposed Python 3.0 schedule

2008-10-07 Thread Victor Stinner
Hi,

First of all, please read my document:
http://wiki.python.org/moin/Python3UnicodeDecodeError

I moved the document to a public wiki to allow anyone to edit it!

Le Tuesday 07 October 2008 05:22:09 James Y Knight, vous avez écrit :
> On Oct 6, 2008, at 8:52 PM, Benjamin Peterson wrote:
> > I'm not sure we do. Correct me if I'm wrong, but the "big ticket",
> > issue bytes/unicode filepaths, has been resolved.

Python3 now accepts bytes for os.listdir(), open() (io.open()), os.unlink(), 
os.path.*(), etc. But it's not enough to say that Python3 can use bytes 
everywhere. It would take months or *years* to fix all issues related to 
bytes and unicode. Remember, this task started in 2000 with Python *2.0* 
(creation of the unicode type).

> Well, if you mean that the resolution decided upon is to "simply"
> allow access to all system APIs using either byte or unicode strings,
> then it seems to me that there's a rather large amount of work left to
> do...

If you know a problem, open a ticket and propose a solution. It's not possible 
to list all new problems since we don't know them yet :-)

>   - Having os.getcwdb isn't much use when you can't even run python in
> the first place when the current directory has "bad" bytes in it.

My python3.0 works correctly in a directory with an invalid name. What is your 
OS / locale / Python version? Please create a ticket if needed.

>   - I'd think "find . -type f -print0 | xargs -0 python -c 'pass'"
> ought to work (with files with "bad" bytes being returned by find),

First, fix your home directory :-) There are good tools (convmv?) to fix 
invalid filenames.

> which means that Python shouldn't blow up and refuse to start when
> there's a non-properly-encoding argv ("Could not convert argument 1 to
> string" and exiting isn't appropriate behavior)

Why not? It's a good idea to break compatibility to refuse invalid bytes 
sequences. You can still uses the command line, an input file or a GUI to 
read raw bytes sequences.

>   - Of course, just being able to start the interpreter isn't quite
> enough: you'll want to be able to access that argument list too,
> somehow (add sys.argvb?).

If we create sys.argvb, what shoul be done if sys.argv creation failed? 
sys.argv would be empty or unset? Or some values would be removed (and so 
argv[2] is argv[1])? I think that many (a lot of) programs suppose that 
sys.argv exists and "is valid". If you introduce a special case (sometimes, 
sys.argv doesn't exist or is truncated !?), it will introduce new issues.

>   - There's no os.environb for bytewise access to the environment.
> Seems important.

It would be strange if you can put a variable in bytes to os.environb whereas 
os.environ would not get the key. I know two major usages of the environment:
 (1) read a variable in Python
 (2) put a variable for a child process 

(1) can be done with os.getenv() and returns None if the variable (key or 
value) is an invalid bytes sequence.

(2) can be done with subprocess.Popen(). subprocess doesn't support bytes yet 
but I wrote patches: #4035 and #4036.

>   - Isn't it a potential security issue that " 'WHATEVER' in
> os.environ" can return False if WHATEVER had some "bad" bytes in it,
> but spawning a subprocess actually will include WHATEVER in the
> subprocess's environment?

Yes. Python should remove the key while creating os.environ.

> - Shouldn't this work? subprocess.call(b'/bin/echo')

Yes. Most programs (at least on Linux and Mac) supports bytes and so you 
should be able use bytes arguments in their command lines, see issues #4035 
and #4036.

>   - I suppose sys.path should handle bytestrings on the path, and
> should be populated using the bytes-version of os.environ so that
> PYTHONPATH gets read in properly. Which of course implies that all the
> importers need to handle byte filenames.

If your file system is broken, rename your directory but don't introduce a 
special case for sys.path. 

>   - zipfile.ZipFile(b'whatever.zip') doesn't work.

Since zipfile uses bytes in its file structure, zipfile should accept bytes. 
But the right question is: should this issue block Python3 or can we wait for 
Python 3.1 (maybe 3.0.1)?

--

People wants to try the new Python version! Python3 introduces new amazing 
features like "keyword only arguments". The bytes/unicode problem is old and 
only affects broken systems

Windows (90% of the computers in the world?) only uses characters for the 
filenames, environment and command line. Mac and Linux use UTF-8 most of the 
time, and slowly everything speaks UTF-8! Python3 should not be delayed 
because of this problem.

About the ini

Re: [Python-3000] No rc2 tonight

2008-10-17 Thread Victor Stinner
> 1210 imaplib does not run under Python 3
> 3727 poplib module broken by str to unicode conversion
>- These both have patches that need review
> 3714 nntplib module broken by str to unicode conversion
>- This issue seems pretty far from resolution

I worked on these modules. First I tried to use unicode everywhere but then I 
realized that each email can use a different encoding. Using a fixed charset 
is meanless, that's why I wrote new patches (for poplib and imaplib) to 
return emails (and other status messages) as bytes strings.

Since nntplib also transport emails, I think that my current patch 
(nntplib_unicode.patch) is invalid and I should write another one using 
bytes. If I don't have time to fix it quickly, please leave 3714 at 
state "deferred blocker".

Barry: you closed the issue #4125 but the specified revision number is the 
commit fixing issue #3988. runtests.sh have to use the -bb flag!

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] email libraries: use byte or unicode strings?

2008-10-28 Thread Victor Stinner
Hi,

I worked on poplib, imaplib and nntplib to fix them in Python3. First I tried 
to use unicode everywhere because I love unicode and I don't want to care 
about the charset. So I used a default charset (ISO-8859-1), but it doesn't 
work because each email can use a different charset. The charset is written 
in the email header but I don't want to hack the libraries to parse the 
headers: poplib should only support the POP3 protocol, email parsing is 
complex and should be done by another module (later, after fetching the 
email).

Current status: poplib, imaplib and nntplib are broken

--

I wrote patches for poplib and imaplib to use only byte strings. 
I "backported" poplib tests from python trunk and I used different POP3 and 
IMAP servers to test the libraries.

Can anyone review my patches? Issues #1210 and #3727.

--

I don't know the NNTP protocol and so I'm unable to test it. But nntplib 
should also use byte strings only.

Note: imaplib and nntplib have no test :-(

--

What about smtplib or smtpd?

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] close() on open(fd, closefd=False)

2008-10-31 Thread Victor Stinner
> Rightnow close() doesn't do anything and you can still write 
> or read after close(). This behavior is surprising to the user.
> I like to change close() to set the internal fd attribute 
> to -1 (meaning close) but keep the fd open.

Let take an example:
---
passwd = open('/etc/passwd', 'rb')

readonly = open(passwd.fileno(), closefd=False)
print("readonly: {0!r}".format(readonly.readline()))
# close readonly stream, but no passwd
readonly.close()
try:
readonly.readline()
print("ERROR: read() on a closed file!")
except Exception as err:
# Expected behaviour
pass

# passwd is not closed
print("passwd: {0!r}".format(passwd.readline()))
passwd.close()
---

The current behaviour is to accept read/write on a closed file. Sorry 
benjamin, but it's not a feature: it's a bug :-) and passwd.readline() works.

I wrote a patch to implement your suggestion crys and it works as expected: 
when readonly stream is closed, read is blocked but passwd.readline() still 
works. I will attach my patch to the issue 4233.

Victor
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] bug in idle on rc1

2008-11-05 Thread Victor Stinner
Le Monday 03 November 2008 12:12:57 [EMAIL PROTECTED], vous avez écrit :
> in run.py in Python_30\Lib\idlelib
> the line:sockthread.set_daemon(True)
> has to be changed to:sockthread.setDaemon(True)

It's already fixed in python trunk:
   http://svn.python.org/view?rev=66518&view=rev

That's why we are all waiting on barry for python 3.0rc2 :-)

Thanks for the report, but next time, please use the tracker:
   http://bugs.python.org/

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] [Python-Dev] RELEASED Python 3.0rc2

2008-11-07 Thread Victor Stinner
Hi,

Great job Barry and all contributors who fixed the "last" bugs ;-)

Le Friday 07 November 2008 04:53:35 Barry Warsaw, vous avez écrit :
> We expect only critical bugs to be fixed between now and the 
> final release, currently planned for 03-Dec-2008.

The document "What's new in Python 3.0" in should be updated:
   http://docs.python.org/dev/3.0/whatsnew/3.0.html

"PEP 352: Exceptions must derive from BaseException. This is the root of the 
exception hierarchy."
   I prefer to derive from Exception to be able to use 
   "exept Exception as: ..." which doesn't catch SystemExit 
   nor KeyboardInterrupt.

"PEP 3134: Exception chaining. (The __context__ feature from the PEP hasn’t 
been implemented yet in 3.0a2.)"
   The feature is now implemented!

"PEP 237: long renamed to int. (...) sys.maxint was also removed since the int 
type has no maximum value anymore."
   What about the new sys.maxsize constant? Oh, it's written at the bottom, 
   "Removed sys.maxint. Use sys.maxsize." Both paragraphs should be merged.

"Optimizations (...) 33% slower (...) we expect to be optimizing string and 
integer operations significantly before the final 3.0 release!"
   I don't expect "significant" changes before the final release. I worked on
   some patches about the int type (use base 2^30 instead of 2^15, GMP, etc.),
   but all patches optimize large integer (more than 1 digit, or more than 20
   digits) whereas most integers in Python are very small.

   About str=>unicode, I also don't expect changes. On character in now 
   4 bytes (or 2 bytes), whereas Python2 used 1 byte. This introduce an
   overhead. Most functions at C level use an conversion from byte
   string (ASCII) to Unicode (eg. PyErr_SetString). We should directly use
   wide string (eg. PyErr_SetWideString?).

"Porting to Python 3.0"
   This section is very small and gives few informations. There is nothing
   about 2to3 (just two references in the document). I also read somewhere
   that someone wrote a document explaining how to port a C extension to
   Python3.

What about a link to the "What's new in Python 2.6" document? Most people are 
still using Python 2.4 or 2.5. And Python3 is Python 2.5 +  + .

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] None in Comparisons

2008-11-11 Thread Victor Stinner
Le Tuesday 11 November 2008 15:21:03 M.-A. Lemburg, vous avez écrit :
> Because None is already special, has had this feature for a very
> long time (...)

Yeah, Python3 breaks compatibility by removing old dummy behaviour like 
comparaison between bytes and characters, or between an integer an None ;-)

I like the new behaviour, it helps to detect bugs earlier ! I hope that 
the -bb option will be enabled by default in Python 2.7 :-)

You can use an explicit comparaison to None as workaround for your problem:
   (x is None) or (x < y)

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Status of the email package ? (or: email package and unicode)

2008-11-12 Thread Victor Stinner
Hi,

poplib, imaplib and nntplib are fixed in Python 3.0rc2, cool.

I tested the smtplib module. It looks like .sendmail() requires 
an ASCII message (7 bits).

I tried to use the email package to encode my message. But the problem is that 
I'm unable to use characters different not in the ASCII charset! See the 
reported bugs at:
   http://bugs.python.org/issue4306

Before the Python 3.0 final, we have to test the email package with unicode 
characters! I wrote two small patches, one includes at little test :-)

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] 2.6.1 and 3.0

2008-11-18 Thread Victor Stinner
Le Tuesday 18 November 2008 11:03:02 Facundo Batista, vous avez écrit :
> 2008/11/17 Barry Warsaw <[EMAIL PROTECTED]>:
> > Martin suggests, and I agree, that we should release Python 3.0 final and
> > 2.6.1 at the same time.  Makes sense to me.  That would mean that Python
> > 2.6.1 should be ready on 03-Dec (well, if Python 3.0 is ready then!).
>
> 2.6.1 only two months after 2.6? Why so quickly?

Release Early, Release Often?

I love release :-) I don't like waiting months to see the bugfixes applied 
everywhere.

Victor
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Possible py3k problem.

2008-11-19 Thread Victor Stinner
> Attached program works with

GSL is needed. Debian package: libgsl0-dev

dump.py works correctly on computer:
- Debian Sid
- python 3.0 trunk
- i386

Problem specific to x86_64?

Where is the issue? :-)

Victor
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Possible py3k problem.

2008-11-19 Thread Victor Stinner
Le Wednesday 19 November 2008 10:21:16 Victor Stinner, vous avez écrit :
> > Attached program works with
>
> GSL is needed. Debian package: libgsl0-dev
>
> dump.py works correctly on computer:

Ooops, "./python dump.py" is ok but "./python dump.py 1" does crash (on i386 
and x86_64).

On i386, ffi_closure_SYSV_inner() crashs at "cif = closure->cif;" because 
closure is NULL.

Victor
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Possible py3k problem.

2008-11-19 Thread Victor Stinner
Le Wednesday 19 November 2008 15:39:43 Martin (gzlist), vous avez écrit :
> This is covered in the documentation, isn't it?
>
> 
>
> Important note for callback functions:
>
> Make sure you keep references to CFUNCTYPE objects as long as they are
> used from C code. ctypes doesn't, and if you don't, they may be
> garbage collected, crashing your program when a callback is made.

Oh yes, I remember this problem... I took me hours/days to understand the 
problem.

Is there a FAQ for ctypes? To list the most common problems. The bug is in the 
documentation :-)

Victor
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] unicode_test

2008-11-24 Thread Victor Stinner
Le Monday 24 November 2008 12:13:54 Ali art, vous avez écrit :
> (...) IDLE 3.0rc3 -> File -> New Window -> i wrote "print('çğışü')" without
> qutes-> File -> Save -> Python30 -> i gave file name "Unicode_test" without
> qutes-> and Run -> Run Module -> it gives error "invalid character in
> identifier"

This bug may be related to:
   http://bugs.python.org/issue4323

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] py3k-struni: proposition to fix ctypes bug, ctypes c_char creates bytes

2007-08-08 Thread Victor Stinner
Hi,

I hear Guido's request to fix last py3k-struni bugs. I downloaded subversion 
trunk and started to work on ctypes tests.

The problem is that ctypes c_char (and c_char_p) creates unicode string 
instead of byte string. I attached a proposition (patch) to change this 
behaviour (use bytes for c_char).

So in next example, it will display 'bytes' and not 'str':
  from ctypes import c_buffer, c_char
  buf = c_buffer("abcdef")
  print (type(buf[0]))

Other behaviour changes:
 - repr(c_char) adds a "b"
   eg. repr(c_char('x')) is "c_char(b'x')" instead of "c_char('x')"
 - bytes is mutable whereas str is not: 
   this may break some modules based on ctypes

Victor Stinner aka haypo
http://hachoir.org/
Index: Modules/_ctypes/cfield.c
===
--- Modules/_ctypes/cfield.c	(révision 56831)
+++ Modules/_ctypes/cfield.c	(copie de travail)
@@ -1156,7 +1156,7 @@
 static PyObject *
 c_get(void *ptr, Py_ssize_t size)
 {
-	return PyUnicode_FromStringAndSize((char *)ptr, 1);
+	return PyBytes_FromStringAndSize((char *)ptr, 1);
 }
 
 #ifdef CTYPES_UNICODE
Index: Lib/ctypes/test/test_buffers.py
===
--- Lib/ctypes/test/test_buffers.py	(révision 56831)
+++ Lib/ctypes/test/test_buffers.py	(copie de travail)
@@ -7,21 +7,21 @@
 b = create_string_buffer(32)
 self.failUnlessEqual(len(b), 32)
 self.failUnlessEqual(sizeof(b), 32 * sizeof(c_char))
-self.failUnless(type(b[0]) is str)
+self.failUnless(type(b[0]) is bytes)
 
 b = create_string_buffer("abc")
 self.failUnlessEqual(len(b), 4) # trailing nul char
 self.failUnlessEqual(sizeof(b), 4 * sizeof(c_char))
-self.failUnless(type(b[0]) is str)
-self.failUnlessEqual(b[0], "a")
+self.failUnless(type(b[0]) is bytes)
+self.failUnlessEqual(b[0], b"a")
 self.failUnlessEqual(b[:], "abc\0")
 
 def test_string_conversion(self):
 b = create_string_buffer("abc")
 self.failUnlessEqual(len(b), 4) # trailing nul char
 self.failUnlessEqual(sizeof(b), 4 * sizeof(c_char))
-self.failUnless(type(b[0]) is str)
-self.failUnlessEqual(b[0], "a")
+self.failUnless(type(b[0]) is bytes)
+self.failUnlessEqual(b[0], b"a")
 self.failUnlessEqual(b[:], "abc\0")
 
 try:
Index: Lib/ctypes/test/test_arrays.py
===
--- Lib/ctypes/test/test_arrays.py	(révision 56831)
+++ Lib/ctypes/test/test_arrays.py	(copie de travail)
@@ -48,12 +48,12 @@
 # CharArray("abc")
 self.assertRaises(TypeError, CharArray, "abc")
 
-self.failUnlessEqual(ca[0], "a")
-self.failUnlessEqual(ca[1], "b")
-self.failUnlessEqual(ca[2], "c")
-self.failUnlessEqual(ca[-3], "a")
-self.failUnlessEqual(ca[-2], "b")
-self.failUnlessEqual(ca[-1], "c")
+self.failUnlessEqual(ca[0], b"a")
+self.failUnlessEqual(ca[1], b"b")
+self.failUnlessEqual(ca[2], b"c")
+self.failUnlessEqual(ca[-3], b"a")
+self.failUnlessEqual(ca[-2], b"b")
+self.failUnlessEqual(ca[-1], b"c")
 
 self.failUnlessEqual(len(ca), 3)
 
Index: Lib/ctypes/test/test_repr.py
===
--- Lib/ctypes/test/test_repr.py	(révision 56831)
+++ Lib/ctypes/test/test_repr.py	(copie de travail)
@@ -22,7 +22,7 @@
 self.failUnlessEqual("___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] py3k-struni: proposition to fix ctypes bug, ctypes c_char creates bytes

2007-08-08 Thread Victor Stinner
On Wednesday 08 August 2007 18:45:38 you wrote:
> Thanks! Would you mind submitting to SF and assigning to Thomas Heller
> (theller I think)?
>
> And update the wiki (http://wiki.python.org/moin/Py3kStrUniTests)

Thomas Heller did it. Thanks ;-)

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] tp_bytes and __bytes__ magic method

2007-08-08 Thread Victor Stinner
On Thursday 09 August 2007 00:22:51 Christian Heimes wrote:
> Hey Pythonistas!
>
> Victor Stinner just made a good point at #python. The py3k has no magic
> method and type slot for bytes.

And another problem: mix of __str__ and __unicode__ methods.

class A:
  def __str__(self): return '__str__'

class B:
  def __str__(self): return '__str__'
  def __unicode__(self): return '__unicode__'

print (repr(str( A() )))  # display '__str__'
print (repr(str( B() )))  # display '__unicode__'


Proposition:

  __str__() -> str (2.x) becomes __bytes__() -> bytes (3000)
  __unicode__() -> unicode (2.x) becomes __str__() -> str (3000)

Victor Stinner aka haypo
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] tp_bytes and __bytes__ magic method

2007-08-08 Thread Victor Stinner
On Thursday 09 August 2007 00:54:47 Guido van Rossum wrote:
> On 8/8/07, Christian Heimes <[EMAIL PROTECTED]> wrote:
> > Victor Stinner just made a good point at #python. The py3k has no magic
> > method and type slot for bytes (...)
> > I can think of a bunch of use cases for a magic method.
>
> Such as?

I'm writting on email module and I guess that some __str__ methods should 
return bytes instead of str (and so should be renamed to __bytes__). Maybe 
the one of Message class (Lib/email/message.py).

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] fix email module for bytes/str

2007-08-08 Thread Victor Stinner
Hi,

I started to work on email module, but I have trouble to understand if a 
function should returns bytes or str (because I don't know email module).

Header.encode() -> bytes?
Message.as_string() -> bytes?
decode_header() -> list of (bytes, str|None) or (str, str|None)?
base64MIME.encode() -> bytes?

message_from_string() <- bytes?

Message.get_payload() -> bytes or str?

A charset name type is str, right?

---

Things to change to get bytes:
 - replace StringIO with BytesIO
 - add 'b' prefix, eg. '' becomes b''
 - replace "%s=%s" % (x, y) with b''.join((x, b'=', y))
   => is it the best method to concatenate bytes?

Problems (to port python 2.x code to 3000):
 - When obj.lower() is used, I expect obj to be str but it's bytes
 - obj.strip() doesn't work when obj is a byte, it requires an
   argument but I don't know the right value! Maybe b'\n\r\v\t '?
 - iterate on a bytes object gives number and not bytes object, eg.
  for c in b"small text":
     if re.match("(\n|\r)", c): ...
   Is it possible to 'bytes' regex? re.compile(b"x") raise an exception

-- 
Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] bytes regular expression?

2007-08-08 Thread Victor Stinner
Hi,

Since Python 3000 regular expressions are now Unicode by default, how can I 
use bytes regex? Very simplified example of my problem:
  import re
  print( re.sub("(.)", b"[\\1]", b'abc') )

This code fails with exception:
  File "(...)/py3k-struni/Lib/re.py", line 241, in _compile_repl
 p = _cache_repl.get(key)
  TypeError: unhashable type: 'bytes'

Does "frozen bytes type" (immutable) exist to be able to use a cache?

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] bytes regular expression?

2007-08-09 Thread Victor Stinner
Hi,

On Thursday 09 August 2007 06:07:12 Guido van Rossum wrote:
> A quick temporary hack is to use buffer(b'abc') instead. (buffer() is
> so incredibly broken that it lets you hash() even if the underlying
> object is broken. :-)

I prefer str8 which looks to be a good candidate for "frozenbytes" type.

> The correct solution is to fix the re library to avoid using hash()
> directly on the underlying data type altogether; that never had sound
> semantics (as proven by the buffer() hack above).

re module uses a dictionary to store compiled expressions and the key is a 
tuple (pattern, flags) where pattern is a bytes (str8) or str and flags is an 
int.

re module bugs:
 1. _compile() doesn't support bytes
 2. escape() doesn't support bytes

My attached patch fix both bugs:
 - convert bytes to str8 in _compile() to be able to hash it
 - add a special version of escape() for bytes

I don't know the best method to create a bytes in a for. In Python 2.x, the 
best method is to use a list() and ''.join(). Since bytes is mutable I 
choosed to use append() and concatenation (a += b).

I also added new unit test for escape() function with bytes argument.

You may not apply my patch directly. I don't know Python 3000 very well nor 
Python coding style. But my patch should help to fix the bugs ;-)

-

Why re module has code for Python < 2.2 (optional finditer() function)? Since 
the code is now specific to Python 3000, we should use new types like set 
(use a set for _alphanum instead of a dictionary) and functions like 
enumerate (in _escape for str block).

Victor Stinner
http://hachoir.org/
Index: Lib/re.py
===
--- Lib/re.py	(révision 56838)
+++ Lib/re.py	(copie de travail)
@@ -177,6 +177,9 @@
 
 def compile(pattern, flags=0):
 "Compile a regular expression pattern, returning a pattern object."
+if isinstance(pattern, bytes):
+# Use str8 instead of bytes because bytes isn't hashable
+pattern = str8(pattern)
 return _compile(pattern, flags)
 
 def purge():
@@ -193,18 +196,34 @@
 _alphanum[c] = 1
 del c
 
+_alphanum_bytes = set(b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890')
+
 def escape(pattern):
 "Escape all non-alphanumeric characters in pattern."
-s = list(pattern)
-alphanum = _alphanum
-for i in range(len(pattern)):
-c = pattern[i]
-if c not in alphanum:
-if c == "\000":
-s[i] = "\\000"
+if isinstance(pattern, bytes):
+alphanum = _alphanum_bytes
+s = b''
+for c in pattern:
+if c not in alphanum:
+if not c:
+s += b"\\000"
+else:
+s.append(92)
+s.append(c)
 else:
-s[i] = "\\" + c
-return pattern[:0].join(s)
+s.append(c)
+return s
+else:
+alphanum = _alphanum
+s = list(pattern)
+for i in range(len(pattern)):
+c = pattern[i]
+if c not in alphanum:
+if c == "\000":
+s[i] = "\\000"
+else:
+s[i] = "\\" + c
+return ''.join(s)
 
 # 
 # internals
Index: Lib/test/test_re.py
===
--- Lib/test/test_re.py	(révision 56838)
+++ Lib/test/test_re.py	(copie de travail)
@@ -397,18 +397,32 @@
 self.assertEqual(re.search("\s(b)", " b").group(1), "b")
 self.assertEqual(re.search("a\s", "a ").group(0), "a ")
 
-def test_re_escape(self):
-p=""
-for i in range(0, 256):
-p = p + chr(i)
-self.assertEqual(re.match(re.escape(chr(i)), chr(i)) is not None,
- True)
-self.assertEqual(re.match(re.escape(chr(i)), chr(i)).span(), (0,1))
+def _test_re_escape(self, use_bytes):
+if use_bytes:
+p=bytes()
+for i in range(0, 256):
+p.append(i)
+self.assertEqual(re.match(re.escape(chr(i)), chr(i)) is not None,
+ True)
+self.assertEqual(re.match(re.escape(chr(i)), chr(i)).span(), (0,1))
+else:
+p=""
+for i in range(0, 256):
+p = p + chr(i)
+self.assertEqual(re.match(re.escape(chr(i)), chr(i)) is not None,
+ True)
+self.assertEqual(re.match(re.escape(chr(i)), chr(i)).span(), (0,1))
 
 pat=re.compile(re.escape(p))
 self.as

Re: [Python-3000] bytes regular expression?

2007-08-09 Thread Victor Stinner
On Thursday 09 August 2007 17:40:58 I wrote:
> My attached patch fix both bugs:
>  - convert bytes to str8 in _compile() to be able to hash it
>  - add a special version of escape() for bytes

My first try was buggy for this snippet code:
   import re
   assert type(re.sub(b'', b'', b'')) is bytes
   assert type(re.sub(b'(x)', b'[\\1]', b'x')) is bytes

My first patch mix bytes and str8 and so re.sub fails in some cases.

So here is a new patch using str8 in dictionary key and str in regex parsing 
(sre_parse.py) (and then reconvert to bytes for 'literals' variable).

Victor Stinner
http://hachoir.org/
Index: Lib/re.py
===
--- Lib/re.py	(révision 56838)
+++ Lib/re.py	(copie de travail)
@@ -193,18 +193,34 @@
 _alphanum[c] = 1
 del c
 
+_alphanum_bytes = set(b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890')
+
 def escape(pattern):
 "Escape all non-alphanumeric characters in pattern."
-s = list(pattern)
-alphanum = _alphanum
-for i in range(len(pattern)):
-c = pattern[i]
-if c not in alphanum:
-if c == "\000":
-s[i] = "\\000"
+if isinstance(pattern, bytes):
+alphanum = _alphanum_bytes
+s = b''
+for c in pattern:
+if c not in alphanum:
+if not c:
+s += b"\\000"
+else:
+s.append(92)
+s.append(c)
 else:
-s[i] = "\\" + c
-return pattern[:0].join(s)
+s.append(c)
+return s
+else:
+alphanum = _alphanum
+s = list(pattern)
+for i in range(len(pattern)):
+c = pattern[i]
+if c not in alphanum:
+if c == "\000":
+s[i] = "\\000"
+else:
+s[i] = "\\" + c
+return ''.join(s)
 
 # 
 # internals
@@ -218,7 +234,10 @@
 
 def _compile(*key):
 # internal: compile pattern
-cachekey = (type(key[0]),) + key
+if isinstance(key[0], bytes):
+cachekey = (type(key[0]), str8(key[0]), key[1])
+else:
+cachekey = (type(key[0]),) + key
 p = _cache.get(cachekey)
 if p is not None:
 return p
@@ -236,12 +255,20 @@
 _cache[cachekey] = p
 return p
 
-def _compile_repl(*key):
+def _compile_repl(repl, pattern):
 # internal: compile replacement pattern
+if isinstance(repl, bytes):
+cacherepl = str8(repl)
+else:
+cacherepl = repl
+if isinstance(pattern, bytes):
+cachepattern = str8(pattern)
+else:
+cachepattern = pattern
+key = (cacherepl, cachepattern)
 p = _cache_repl.get(key)
 if p is not None:
 return p
-repl, pattern = key
 try:
 p = sre_parse.parse_template(repl, pattern)
 except error as v:
Index: Lib/test/test_re.py
===
--- Lib/test/test_re.py	(révision 56838)
+++ Lib/test/test_re.py	(copie de travail)
@@ -397,18 +397,32 @@
 self.assertEqual(re.search("\s(b)", " b").group(1), "b")
 self.assertEqual(re.search("a\s", "a ").group(0), "a ")
 
-def test_re_escape(self):
-p=""
-for i in range(0, 256):
-p = p + chr(i)
-self.assertEqual(re.match(re.escape(chr(i)), chr(i)) is not None,
- True)
-self.assertEqual(re.match(re.escape(chr(i)), chr(i)).span(), (0,1))
+def _test_re_escape(self, use_bytes):
+if use_bytes:
+p=bytes()
+for i in range(0, 256):
+p.append(i)
+self.assertEqual(re.match(re.escape(chr(i)), chr(i)) is not None,
+ True)
+self.assertEqual(re.match(re.escape(chr(i)), chr(i)).span(), (0,1))
+else:
+p=""
+for i in range(0, 256):
+p = p + chr(i)
+self.assertEqual(re.match(re.escape(chr(i)), chr(i)) is not None,
+ True)
+self.assertEqual(re.match(re.escape(chr(i)), chr(i)).span(), (0,1))
 
 pat=re.compile(re.escape(p))
 self.assertEqual(pat.match(p) is not None, True)
 self.assertEqual(pat.match(p).span(), (0,256))
 
+def test_re_escape_str(self):
+self._test_re_escape(False)
+
+def test_re_escape_bytes(self):
+self._test_re_escape(True)
+
 def test_pickling(self):
 import pickle
 self.pickle_test(pickle)
Index: Lib/sre_compile.py

Re: [Python-3000] bytes regular expression?

2007-08-10 Thread Victor Stinner
On Thursday 09 August 2007 19:21:27 Thomas Heller wrote:
> Victor Stinner schrieb:
> > I prefer str8 which looks to be a good candidate for "frozenbytes" type.
>
> I love this idea!  Leave str8 as it is, maybe extend Python so that it
> understands the s"..." literals and we are done.

Hum, today str8 is between bytes and str types. str8 has more methods (eg. 
lower()) than bytes, its behaviour is different in comparaison (b'a' != 'a' 
but str8('a') == 'a') and issubclass(str8, basestring) is True.

I think that a frozenbytes type is required for backward compatibility (in 
Python 2.x, "a" is immutable). Eg. use bytes as key for a dictionary (looks 
to be needed in re and dbm modules).

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] [Email-SIG] fix email module for python 3000 (bytes/str)

2007-08-10 Thread Victor Stinner
Hi,

On Thursday 09 August 2007 02:41:08 Victor Stinner wrote:
> I started to work on email module to port it for Python 3000, but I have
> trouble to understand if a function should returns bytes or str (because I
> don't know email module).

It's really hard to convert email module to Python 3000 because it does mix 
byte strings and (unicode) character strings...

I wrote some notes about bytes/str helping people to migrate Python 2.x code 
to Python 3000, or at least to explain the difference between Python 
2.x "str" type and Python 3000 "bytes" type:
   http://wiki.python.org/moin/BytesStr

About email module, some deductions:
 test_email.py: openfile() must use 'rb' file mode for all tests
 base64MIME.decode() and base64MIME.encode() should accept bytes and str
 base64MIME.decode() result type is bytes
 base64MIME.encode() result type should be... bytes or str, no idea

Other decode() and encode() functions should use same rules about types.

Python modules (binascii and base64) choosed bytes type for encode result.

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] bytes: compare bytes to integer

2007-08-10 Thread Victor Stinner
Hi,

I don't like the behaviour of Python 3000 when we compare a bytes strings
with length=1:
   >>> b'xyz'[0] == b'x'
   False

The code can be see as:
   >>> ord(b'x') == b'x'
   False

or also:
   >>> 120 == b'x'
   False


Two solutions:
 1. b'xyz'[0] returns a new bytes object (b'x' instead of 120) 
like b'xyz'[0:1] does
 2. allow to compare a bytes string of 1 byte with an integer

I prefer (2) since (1) is wrong: bytes contains integers and not bytes!

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Fix imghdr module for bytes

2007-08-10 Thread Victor Stinner
Hi,

I just see that function what() of imghdr module requires str type for 
argument h which is totally wrong! An image file is composed of bytes and not 
characters.

Attached patch should fix it. Notes:
 - I used .startswith() instead of h[:len(s)] == s
 - I used h[0] == ord(b'P') instead of h[0] == b'P' because the second syntax 
doesn't work (see my other email "bytes: compare bytes to integer")
- str is allowed but doesn't work: what() always returns None

I dislike "h[0] == ord(b'P')", in Python 2.x it's simply "h[0] == 'P'". A 
shorter syntax would be "h[0] == 80" but I prefer explicit test. It's maybe 
stupid, we manipulate bytes and not character, so "h[0] == 80" is 
acceptable... maybe with a comment?


imghdr is included in unit tests?


Victor Stinner
http://hachoir.org/
Index: Lib/imghdr.py
===
--- Lib/imghdr.py	(révision 56910)
+++ Lib/imghdr.py	(copie de travail)
@@ -36,14 +36,14 @@
 
 def test_rgb(h, f):
 """SGI image library"""
-if h[:2] == '\001\332':
+if h[:2] == b'\001\332':
 return 'rgb'
 
 tests.append(test_rgb)
 
 def test_gif(h, f):
 """GIF ('87 and '89 variants)"""
-if h[:6] in ('GIF87a', 'GIF89a'):
+if h[:6] in (b'GIF87a', b'GIF89a'):
 return 'gif'
 
 tests.append(test_gif)
@@ -51,7 +51,7 @@
 def test_pbm(h, f):
 """PBM (portable bitmap)"""
 if len(h) >= 3 and \
-h[0] == 'P' and h[1] in '14' and h[2] in ' \t\n\r':
+h[0] == ord(b'P') and h[1] in b'14' and h[2] in b' \t\n\r':
 return 'pbm'
 
 tests.append(test_pbm)
@@ -59,7 +59,7 @@
 def test_pgm(h, f):
 """PGM (portable graymap)"""
 if len(h) >= 3 and \
-h[0] == 'P' and h[1] in '25' and h[2] in ' \t\n\r':
+h[0] == ord(b'P') and h[1] in b'25' and h[2] in b' \t\n\r':
 return 'pgm'
 
 tests.append(test_pgm)
@@ -67,55 +67,54 @@
 def test_ppm(h, f):
 """PPM (portable pixmap)"""
 if len(h) >= 3 and \
-h[0] == 'P' and h[1] in '36' and h[2] in ' \t\n\r':
+h[0] == ord(b'P') and h[1] in b'36' and h[2] in b' \t\n\r':
 return 'ppm'
 
 tests.append(test_ppm)
 
 def test_tiff(h, f):
 """TIFF (can be in Motorola or Intel byte order)"""
-if h[:2] in ('MM', 'II'):
+if h[:2] in (b'MM', b'II'):
 return 'tiff'
 
 tests.append(test_tiff)
 
 def test_rast(h, f):
 """Sun raster file"""
-if h[:4] == '\x59\xA6\x6A\x95':
+if h[:4] == b'\x59\xA6\x6A\x95':
 return 'rast'
 
 tests.append(test_rast)
 
 def test_xbm(h, f):
 """X bitmap (X10 or X11)"""
-s = '#define '
-if h[:len(s)] == s:
+if h.startswith(b'#define '):
 return 'xbm'
 
 tests.append(test_xbm)
 
 def test_jpeg(h, f):
 """JPEG data in JFIF format"""
-if h[6:10] == 'JFIF':
+if h[6:10] == b'JFIF':
 return 'jpeg'
 
 tests.append(test_jpeg)
 
 def test_exif(h, f):
 """JPEG data in Exif format"""
-if h[6:10] == 'Exif':
+if h[6:10] == b'Exif':
 return 'jpeg'
 
 tests.append(test_exif)
 
 def test_bmp(h, f):
-if h[:2] == 'BM':
+if h[:2] == b'BM':
 return 'bmp'
 
 tests.append(test_bmp)
 
 def test_png(h, f):
-if h[:8] == "\211PNG\r\n\032\n":
+if h[:8] == b"\211PNG\r\n\032\n":
 return 'png'
 
 tests.append(test_png)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Fix sndhdr module for bytes

2007-08-10 Thread Victor Stinner
Hi,

As imghdr, sndhdr tests were strill based on Unicode strings instead of bytes.

Attached patch should fix the module. I'm very, I was unable to test it.

Note: I replaced aifc.openfp with aifc.open since it's the new public 
function.

sndhdr requires some cleanup: it doesn't check division by zero in functions 
test_hcom and test_voc. I think that division by zero means that the file is 
invalid. I didn't want to fix these bugs in the same patch. So first I'm 
waiting your comments about this one :-)

Victor Stinner
http://hachoir.org/
Index: Lib/sndhdr.py
===
--- Lib/sndhdr.py	(révision 56910)
+++ Lib/sndhdr.py	(copie de travail)
@@ -57,17 +57,17 @@
 
 def test_aifc(h, f):
 import aifc
-if h[:4] != 'FORM':
+if h[:4] != b'FORM':
 return None
-if h[8:12] == 'AIFC':
+if h[8:12] == b'AIFC':
 fmt = 'aifc'
-elif h[8:12] == 'AIFF':
+elif h[8:12] == b'AIFF':
 fmt = 'aiff'
 else:
 return None
 f.seek(0)
 try:
-a = aifc.openfp(f, 'r')
+a = aifc.open(f, 'r')
 except (EOFError, aifc.Error):
 return None
 return (fmt, a.getframerate(), a.getnchannels(), \
@@ -77,9 +77,9 @@
 
 
 def test_au(h, f):
-if h[:4] == '.snd':
+if h[:4] == b'.snd':
 f = get_long_be
-elif h[:4] in ('\0ds.', 'dns.'):
+elif h[:4] in (b'\0ds.', b'dns.'):
 f = get_long_le
 else:
 return None
@@ -106,7 +106,7 @@
 
 
 def test_hcom(h, f):
-if h[65:69] != 'FSSD' or h[128:132] != 'HCOM':
+if h[65:69] != b'FSSD' or h[128:132] != b'HCOM':
 return None
 divisor = get_long_be(h[128+16:128+20])
 return 'hcom', 22050/divisor, 1, -1, 8
@@ -115,12 +115,12 @@
 
 
 def test_voc(h, f):
-if h[:20] != 'Creative Voice File\032':
+if h[:20] != b'Creative Voice File\032':
 return None
 sbseek = get_short_le(h[20:22])
 rate = 0
-if 0 <= sbseek < 500 and h[sbseek] == '\1':
-ratecode = ord(h[sbseek+4])
+if 0 <= sbseek < 500 and h[sbseek] == 1:
+ratecode = h[sbseek+4]
 rate = int(100.0 / (256 - ratecode))
 return 'voc', rate, 1, -1, 8
 
@@ -129,7 +129,7 @@
 
 def test_wav(h, f):
 # 'RIFF'  'WAVE' 'fmt ' 
-if h[:4] != 'RIFF' or h[8:12] != 'WAVE' or h[12:16] != 'fmt ':
+if h[:4] != b'RIFF' or h[8:12] != b'WAVE' or h[12:16] != b'fmt ':
 return None
 style = get_short_le(h[20:22])
 nchannels = get_short_le(h[22:24])
@@ -141,7 +141,7 @@
 
 
 def test_8svx(h, f):
-if h[:4] != 'FORM' or h[8:12] != '8SVX':
+if h[:4] != b'FORM' or h[8:12] != b'8SVX':
 return None
 # Should decode it to get #channels -- assume always 1
 return '8svx', 0, 1, 0, 8
@@ -150,7 +150,7 @@
 
 
 def test_sndt(h, f):
-if h[:5] == 'SOUND':
+if h[:5] == b'SOUND':
 nsamples = get_long_le(h[8:12])
 rate = get_short_le(h[20:22])
 return 'sndt', rate, 1, nsamples, 8
@@ -159,7 +159,7 @@
 
 
 def test_sndr(h, f):
-if h[:2] == '\0\0':
+if h[:2] == b'\0\0':
 rate = get_short_le(h[2:4])
 if 4000 <= rate <= 25000:
 return 'sndr', rate, 1, -1, 8
@@ -172,16 +172,16 @@
 #-#
 
 def get_long_be(s):
-return (ord(s[0])<<24) | (ord(s[1])<<16) | (ord(s[2])<<8) | ord(s[3])
+return (s[0]<<24) | (s[1]<<16) | (s[2]<<8) | s[3]
 
 def get_long_le(s):
-return (ord(s[3])<<24) | (ord(s[2])<<16) | (ord(s[1])<<8) | ord(s[0])
+return (s[3]<<24) | (s[2]<<16) | (s[1]<<8) | s[0]
 
 def get_short_be(s):
-return (ord(s[0])<<8) | ord(s[1])
+return (s[0]<<8) | s[1]
 
 def get_short_le(s):
-return (ord(s[1])<<8) | ord(s[0])
+return (s[1]<<8) | s[0]
 
 
 ##
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] bytes: compare bytes to integer

2007-08-12 Thread Victor Stinner
On Sunday 12 August 2007 19:11:18 Bill Janssen wrote:
> Why not just write
>
>b'xyz'[0:1] == b'x'

It's just strange to write:
   'abc'[0] == 'a'
for character string and:
   b'abc'[0:1] == b'a'
for byte string.

The problem in my brain is that str is a special case since a str item is also 
a string, where a bytes item is an integer.

It's clear that "[5, 9, 10][0] == [5]" is wrong, but for bytes and str it's 
not intuitive because of b'...' syntax. If I had to wrote [120, 121, 122] 
instead of b'xyz' it would be easier to understand that first value is an 
integer and not the *letter* X or the *string* X.


I dislike b'xyz'[0:1] == b'x' since I want to check first item and not to 
compare substrings.

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] [Email-SIG] fix email module for python 3000 (bytes/str)

2007-08-12 Thread Victor Stinner
On Sunday 12 August 2007 16:50:05 Barry Warsaw wrote:
> In r56957 I committed changes to sndhdr.py and imghdr.py so that they
> compare what they read out of the files against proper byte
> literals.

So nobody read my patches? :-( See my emails "[Python-3000] Fix imghdr module 
for bytes" and "[Python-3000] Fix sndhdr module for bytes" from last 
saturday. But well, my patches look similar.

Barry's patch is incomplete: test_voc() is wrong.

I attached a new patch:
 - fix "h[sbseek] == b'\1'" and "ratecode = ord(h[sbseek+4])" in test_voc()
 - avoid division by zero
 - use startswith method: replace h[:2] == b'BM' by h.startswith(b'BM')
 - use aifc.open() instead of old aifc.openfp()
 - use ord(b'P') instead of ord('P')

Victor Stinner aka haypo
http://hachoir.org/
Index: Lib/imghdr.py
===
--- Lib/imghdr.py	(révision 56969)
+++ Lib/imghdr.py	(copie de travail)
@@ -36,7 +36,7 @@
 
 def test_rgb(h, f):
 """SGI image library"""
-if h[:2] == b'\001\332':
+if h.startswith(b'\001\332'):
 return 'rgb'
 
 tests.append(test_rgb)
@@ -51,7 +51,7 @@
 def test_pbm(h, f):
 """PBM (portable bitmap)"""
 if len(h) >= 3 and \
-h[0] == ord('P') and h[1] in b'14' and h[2] in b' \t\n\r':
+h[0] == ord(b'P') and h[1] in b'14' and h[2] in b' \t\n\r':
 return 'pbm'
 
 tests.append(test_pbm)
@@ -59,7 +59,7 @@
 def test_pgm(h, f):
 """PGM (portable graymap)"""
 if len(h) >= 3 and \
-h[0] == ord('P') and h[1] in b'25' and h[2] in b' \t\n\r':
+h[0] == ord(b'P') and h[1] in b'25' and h[2] in b' \t\n\r':
 return 'pgm'
 
 tests.append(test_pgm)
@@ -67,7 +67,7 @@
 def test_ppm(h, f):
 """PPM (portable pixmap)"""
 if len(h) >= 3 and \
-h[0] == ord('P') and h[1] in b'36' and h[2] in b' \t\n\r':
+h[0] == ord(b'P') and h[1] in b'36' and h[2] in b' \t\n\r':
 return 'ppm'
 
 tests.append(test_ppm)
@@ -81,41 +81,33 @@
 
 def test_rast(h, f):
 """Sun raster file"""
-if h[:4] == b'\x59\xA6\x6A\x95':
+if h.startswith(b'\x59\xA6\x6A\x95'):
 return 'rast'
 
 tests.append(test_rast)
 
 def test_xbm(h, f):
 """X bitmap (X10 or X11)"""
-s = b'#define '
-if h[:len(s)] == s:
+if h.startswith(b'#define '):
 return 'xbm'
 
 tests.append(test_xbm)
 
 def test_jpeg(h, f):
-"""JPEG data in JFIF format"""
-if h[6:10] == b'JFIF':
+"""JPEG data in JFIF or Exif format"""
+if h[6:10] in (b'JFIF', b'Exif'):
 return 'jpeg'
 
 tests.append(test_jpeg)
 
-def test_exif(h, f):
-"""JPEG data in Exif format"""
-if h[6:10] == b'Exif':
-return 'jpeg'
-
-tests.append(test_exif)
-
 def test_bmp(h, f):
-if h[:2] == b'BM':
+if h.startswith(b'BM'):
 return 'bmp'
 
 tests.append(test_bmp)
 
 def test_png(h, f):
-if h[:8] == b'\211PNG\r\n\032\n':
+if h.startswith(b'\211PNG\r\n\032\n'):
 return 'png'
 
 tests.append(test_png)
Index: Lib/sndhdr.py
===
--- Lib/sndhdr.py	(révision 56969)
+++ Lib/sndhdr.py	(copie de travail)
@@ -57,7 +57,7 @@
 
 def test_aifc(h, f):
 import aifc
-if h[:4] != b'FORM':
+if h.startswith(b'FORM'):
 return None
 if h[8:12] == b'AIFC':
 fmt = 'aifc'
@@ -67,7 +67,7 @@
 return None
 f.seek(0)
 try:
-a = aifc.openfp(f, 'r')
+a = aifc.open(f, 'r')
 except (EOFError, aifc.Error):
 return None
 return (fmt, a.getframerate(), a.getnchannels(),
@@ -77,7 +77,7 @@
 
 
 def test_au(h, f):
-if h[:4] == b'.snd':
+if h.startswith(b'.snd'):
 func = get_long_be
 elif h[:4] in (b'\0ds.', b'dns.'):
 func = get_long_le
@@ -100,7 +100,11 @@
 else:
 sample_bits = '?'
 frame_size = sample_size * nchannels
-return filetype, rate, nchannels, data_size / frame_size, sample_bits
+if frame_size:
+nframe = data_size / frame_size
+else:
+nframe = -1
+return filetype, rate, nch

Re: [Python-3000] bytes regular expression?

2007-08-12 Thread Victor Stinner
On Thursday 09 August 2007 19:39:50 you wrote:
> So why not just skip caching for anything that doesn't hash()?  If
> you're really worried about efficiency, simply re.compile() the
> expression once and don't rely on the re module's internal cache.

I tried to keep backward compatibility.

Why character string are "optimized" (cached) but not byte string? Since regex 
parsing is slow, it's a good idea to avoid recomputation in re.compile().

Regular expression for bytes are useful for file, network, picture, etc. 
manipulation.

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] [Email-SIG] fix email module for python 3000 (bytes/str)

2007-08-13 Thread Victor Stinner
Hi,

On Monday 13 August 2007 19:51:18 Guido van Rossum wrote:
> Checked in. But next time please do use SF to submit patches (and feel
> free to assign them to me and mail the list about it).

Ah yes, you already asked to use SF. I will use it next time.

> On 8/12/07, Victor Stinner <[EMAIL PROTECTED]> wrote:
> > On Sunday 12 August 2007 16:50:05 Barry Warsaw wrote:
> > > In r56957 I committed changes to sndhdr.py and imghdr.py so that they
> > > compare what they read out of the files against proper byte
> > > literals.
> >
> > So nobody read my patches?
> > (...) 
> > I attached a new patch 
> > (...)
> >  - use ord(b'P') instead of ord('P')
>
> This latter one is questionable. If you really want to compare to
> bytes, perhaps write h[:1] == b'P' instead of b[0] == ord(b'P')?

Someone proposed c'P' syntax for ord(b'P') which is like an alias for 80.

I prefer letters than number when letters have sens.

I also think (I may be wrong) that b'xyz'[0] == 80 is faster than b'xyz'[:1] 
== b'x' since b'xyz'[:1] creates a new object. If we keep speed argument, 
b'xyz'[0] == ord(b'P') may be slower than b'xyz'[:1] == b'x' since ord(b'P') 
is recomputed each time (is it right?).

But well, speed argument is stupid since it's a micro-optimization :-)

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Questions about email bytes/str (python 3000)

2007-08-13 Thread Victor Stinner
Hi,

After many tests, I'm unable to convert email module to Python 3000. I'm also 
unable to take decision of the best type for some contents.



(1) Email parts should be stored as byte or character string?

Related methods: Generator class, Message.get_payload(), Message.as_string().

Let's take an example: multipart (MIME) email with latin-1 and base64 (ascii) 
sections. Mix latin-1 and ascii => mix bytes. So the best type should be 
bytes.

=> bytes



(2) Parsing file (raw string): use bytes or str in parsing?

The parser use methods related to str like splitlines(), lower(), strip(). But 
it should be easy to rewrite/avoid these methods. I think that low-level 
parsing should be done on bytes. At the end, or when we know the charset, we 
can convert to str.

=> bytes



About base64, I agree with Bill Janssen:
 - base64MIME.decode converts string to bytes
 - base64MIME.encode converts bytes to string

But decode may accept bytes as input (as base64 modules does): use 
str(value, 'ascii', 'ignore') or str(value, 'ascii', 'strict').


I wrote 4 differents (non-working) patches. So I you want to work on email 
module and Python 3000, please first contact me. When I will get a better 
patch, I will submit it.


Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Questions about email bytes/str (python 3000)

2007-08-15 Thread Victor Stinner
On Wednesday 15 August 2007 03:44:54 Bill Janssen wrote:
> > (...) I think that base64MIME.encode() may have to accept strings.
>
> Personally, I think it would avoid more errors if it didn't.

Yeah, how can you guess which charset the user want to use? For most user, 
there is only one charset: latin-1. So I you use UTF-8, he will not 
understand conversion errors.

Another argument: I like bidirectional codec:
   decode(encode(x)) == x
   encode(decode(x)) == x

So if you mix bytes and str, these relations will be wrong.

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] format() method and % operator

2007-08-17 Thread Victor Stinner
Hi,

I read many people saying that
   "{0} {1}".format('Hello', 'World')
is easiert to read than
   "%s %s" % ('Hello', 'World')


But for me it looks to be more complex: we have to maintain indexes (0, 1, 
2, ...), marker is different ({0} != {1}), etc.


I didn't read the PEP nor all email discussions. So can you tell me if it 
would be possible to write simply:
   "{} {}".format('Hello', 'World')


Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] python 3 closes sys.stdout

2007-08-26 Thread Victor Stinner
On Sunday 26 August 2007 23:23:37 Amaury Forgeot d'Arc wrote:
>  internal_close(PyFileIOObject *self)
>  {
> int save_errno = 0;
> -   if (self->fd >= 0) {
> +   if (self->fd >= 3) {
> int fd = self->fd;
> self->fd = -1;
> Py_BEGIN_ALLOW_THREADS

Hum, a before fix would be to add an option to choose if the file should be 
closed or not on object destruction.

Victor Stinner aka haypo
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Immutable bytes -- looking for volunteer

2007-09-19 Thread Victor Stinner
Hi,

On Tuesday 18 September 2007 04:18:01 Guido van Rossum wrote:
> I'm considering the following option: bytes would always be immutable,
> (...) make b[0] return a bytes array of length 1 instead of a small int

Great idea! That will help migration from Python 2.x to Python 3.0. Choosing 
between byte and character string is already a difficult choice. So choosing 
between mutable (current bytes type) and immutable string (current str type) 
is a more difficult choice.

And substring behaviour change (python 2.x => 3) was also strange for python 
programmers.

>>> 'xyz'[0]
'x'
>>> b"xyz"[0]
120

This result is not symmetric. I would prefer what Guido proposes:

>>> 'xyz'[0]
'x'
>>> b"xyz"[0]
b'x'

And so be able to write such tests:

>>> b"xyz"[:2] == b'xy'
True
>>> b"xyz"[0:1] == b'x'
True
>>> b"xyz"[0] == b'x'
True

Victor Stinner
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Unicode and OS strings

2007-09-19 Thread Victor Stinner
Hi,

On Thursday 13 September 2007 18:22:12 Marcin 'Qrczak' Kowalczyk wrote:
> What should happen when a command line argument or an environment
> variable is not decodable using the system encoding (on Unix where
> from the OS point of view it is an array of bytes)?

On Linux, filenames are *byte* string and not *character* string. I always 
have his problem with Python 2.x. I converted filename (argv[x]) to Unicode 
to be able to format error messages in full unicode... but it's not possible. 
Linux allows invalid utf8 filename even on full utf8 installation (ubuntu), 
see Marcin's examples.

So I propose to keep sys.argv as byte string array. If you try to create 
unicode strings, you will be unable to write a program to convert filesystem 
with "broken" filenames (see convmv program for example) or open file with 
broken "filename" (broken: invalid byte sequence for UTF/JIS/Big5/... 
charset).

---

For Python 2.x, my solution is to keep byte string for I/O and use unicode 
string for error messages. Function to convert any byte string (filename 
string) to Unicode:

def unicodeFilename(filename, charset=None):
if not charset:
charset = getTerminalCharset()
try:
return unicode(filename, charset)
except UnicodeDecodeError:
return makePrintable(filename, charset, to_unicode=True)

makePrintable() replace invalid byte sequence by escape string, example:

>>> from hachoir_core.tools import makePrintable
>>> makePrintable("a\x80", "utf8", to_unicode=True)
u'a\\x80'
>>> print makePrintable("a\x80", "utf8", to_unicode=True)
a\x80

Source code of function makePrintable:
http://hachoir.org/browser/trunk/hachoir-core/hachoir_core/tools.py#L225

Source code of function getTerminalCharset():
http://hachoir.org/browser/trunk/hachoir-core/hachoir_core/i18n.py#L23

Victor Stinner
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Python, int/long and GMP

2007-09-28 Thread Victor Stinner
On Friday 28 September 2007 18:44:43 you wrote:
> > GMP doesn't have a concept of a non-complex structure. It always
> > allocates memory. (...)

I don't know GMP internals. I thaught that GMP uses an hack for small 
integers.

> > Also, removing python's caching of integers < 100 as you did in this
> > patch is surely a *huge* killer of performance.

Oh yes, I removed the cache because I would like to quickly get a working 
Python version. It took me two weeks to write the patch. It's not easy to get 
into CPython source code! And integer is one of the most important type!

> I can vouch for that.  Allocation can easily dominate performance.  It
> invalidates the rest of the benchmark.

I may also use Python garbage collector for GMP memory allocations since GMP 
allows to use my own memory allocating functions.

GMP also has its own reference counter mechanism :-/

Victor
-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Python, int/long and GMP

2007-09-30 Thread Victor Stinner
Hi,

I wrote another patch with two improvment: use small integer cache and use 
Python memory allocation functions. Now GMP overhead (pystones result) is 
only -2% and not -20% (previous patch).

Since the patch is huge, I prefer to leave copy on my server:
http://www.haypocalc.com/tmp/py3k-long_gmp-v2.patch

Victor
-- 
Victor Stinner
http://hachoir.org/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com