date:20100916

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Hagen Fürstenau

> Why not? Since the I/O speed problem is fixed, I have no idea what you
> are referring to.  Please do be concrete.

There's still a performance issue with pickling, but if issue 3873 could
be resolved, Python 3 would actually be faster there.

- Hagen



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] standards for distribution names

2010-09-16 Thread Chris Withers


Hi All,

Following on from this question:

http://twistedmatrix.com/pipermail/twisted-python/2010-September/022877.html

...I'd thought that the "correct names" for distributions would have 
been documented in one of:


http://www.python.org/dev/peps/pep-0345
http://www.python.org/dev/peps/pep-0376
http://www.python.org/dev/peps/pep-0386

...but having read them, I drew a blank.

Where are the standards for this or is it still a case of "whatever 
setuptools does"?


Chris
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Antoine Pitrou

On Wed, 15 Sep 2010 19:55:16 -0500
Jacob Kaplan-Moss  wrote:
> On Wed, Sep 15, 2010 at 6:31 PM, Jesse Noller  wrote:
> > My goal (personally) is to make sure python 3.2 is perfectly good for use 
> > in web applications, and is therefore a much more interesting porting 
> > target for web projects/libraries and frameworks.
> 
> To try (again) to make things concrete here:
> 
> I didn't work to get Django running on Python 3.0 because it was just too 
> slow.
> 
> I'm not working to get Django running on Python 3.1 because I don't
> feel confident I'll be able to put any apps I write into production.
> 
> If Python 3.2 is the same, I won't feel any motivation to target it
> and I'll get to be lazy and wait for Python 3.3.

Why won't you feel confident? Are there any specific issues (apart from
the lack of a WSGI PEP)?
If they are technical problems, they should be reported on the bug
tracker.
If they are representational, cultural or psychological issues, I'm
not sure what we can do. But delaying the release won't solve them.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Jesse Noller

On Thu, Sep 16, 2010 at 8:26 AM, Antoine Pitrou  wrote:
> On Wed, 15 Sep 2010 19:55:16 -0500
> Jacob Kaplan-Moss  wrote:
>> On Wed, Sep 15, 2010 at 6:31 PM, Jesse Noller  wrote:
>> > My goal (personally) is to make sure python 3.2 is perfectly good for use 
>> > in web applications, and is therefore a much more interesting porting 
>> > target for web projects/libraries and frameworks.
>>
>> To try (again) to make things concrete here:
>>
>> I didn't work to get Django running on Python 3.0 because it was just too 
>> slow.
>>
>> I'm not working to get Django running on Python 3.1 because I don't
>> feel confident I'll be able to put any apps I write into production.
>>
>> If Python 3.2 is the same, I won't feel any motivation to target it
>> and I'll get to be lazy and wait for Python 3.3.
>
> Why won't you feel confident? Are there any specific issues (apart from
> the lack of a WSGI PEP)?
> If they are technical problems, they should be reported on the bug
> tracker.
> If they are representational, cultural or psychological issues, I'm
> not sure what we can do. But delaying the release won't solve them.
>
> Regards
>
> Antoine.

Can we please give it a little bit of time to hear from the WSGI /
Web-Sig folks? I've encouraged bugs to be filed, and discussions to
happen here so we know if things (and what those things are), should
be fixed. Is there any need, other then our current schedule to push
3.2 out until we can at least get some feedback from the interested
parties?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Nick Coghlan

On Thu, Sep 16, 2010 at 10:26 PM, Antoine Pitrou  wrote:
> Why won't you feel confident? Are there any specific issues (apart from
> the lack of a WSGI PEP)?
> If they are technical problems, they should be reported on the bug
> tracker.
> If they are representational, cultural or psychological issues, I'm
> not sure what we can do. But delaying the release won't solve them.

There are some APIs that should be able to handle bytes *or* strings,
but the current use of string literals in their implementation means
that bytes don't work. This turns out to be a PITA for some networking
related code which really wants to be working with raw bytes (e.g.
URLs coming off the wire).

For example:

>>> import urllib.parse as parse
>>> parse.urlsplit("http://www.ubuntu.com";)
SplitResult(scheme='http', netloc='www.ubuntu.com', path='', query='',
fragment='')
>>> parse.urlsplit(b"http://www.ubuntu.com";)
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/ncoghlan/devel/py3k/Lib/urllib/parse.py", line 178, in urlsplit
i = url.find(':')
TypeError: expected an object with the buffer interface

There's no real reason urlsplit (and similar urllib.parse APIs)
shouldn't support bytes, but the internal use of string literals
currently prevents it.

We don't seem to have created a tracker issue from the discussion back
in June where this came up, so I went ahead and created one just now:
http://bugs.python.org/issue9873

I think there were other APIs mentioned back then beyond the
urllib.parse ones, but I didn't find them when I went trawling through
the list archives yesterday. If anyone else thinks of any APIs that
should allow bytes as well as strings (or vice-versa) feel free to add
them to that issue.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Barry Warsaw

On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote:

>There are some APIs that should be able to handle bytes *or* strings,
>but the current use of string literals in their implementation means
>that bytes don't work. This turns out to be a PITA for some networking
>related code which really wants to be working with raw bytes (e.g.
>URLs coming off the wire).

Note that email has exactly the same problem.  A general solution -- even if
embodied in *well documented* best-practices and convention -- would really
help make the stdlib work consistently, and I bet third party libraries too.

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Paul Moore

On 16 September 2010 07:16, Terry Reedy  wrote:
>> I'm not working to get Django running on Python 3.1 because I don't
>> feel confident I'll be able to put any apps I write into production.
>
> Why not? Since the I/O speed problem is fixed, I have no idea what you are
> referring to.  Please do be concrete.

At the risk of putting words into Jacob's mouth, I understood him to
mean that "production quality" WSGI servers either do not exist, or do
not implement a consistently defined spec (i.e., everyone is doing
their own thing to adapt WSGI to Python 3).

There is something of a chicken and egg situation here as with
everywhere else (scientific users weren't moving until scipy did, lots
of projects based round Twisted can't go until Twisted does, ...) but
in the case of web/WSGI, there's a standard, defined in a PEP, with a
reference implementation (wsgiref) in the stdlib. So the core has a
greater interest.

Personally, I don't write web applications (not even in Python :-)) so
my interest is minimal. But I think the issue is real, and it's valid
for the core team to be concerned. Whether I'd want to delay 3.2, I'm
not so sure - certainly not indefinitely, there should be a "put up or
shut up" deadline. But I'd be sad if Python 3 saw a reversion to the
days of "Python isn't a good web development language because there's
no standard infrastructure" comments that was the situation before
WSGI existed...

Paul.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread R. David Murray

On Thu, 16 Sep 2010 09:52:48 -0400, Barry Warsaw  wrote:
> On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote:
> >There are some APIs that should be able to handle bytes *or* strings,
> >but the current use of string literals in their implementation means
> >that bytes don't work. This turns out to be a PITA for some networking
> >related code which really wants to be working with raw bytes (e.g.
> >URLs coming off the wire).
> 
> Note that email has exactly the same problem.  A general solution -- even if
> embodied in *well documented* best-practices and convention -- would really
> help make the stdlib work consistently, and I bet third party libraries too.

Allowing bytes-in -> bytes-out where possible would definitely be a help
(and Guido has endorsed this, IIUC), but some care has to be taken to
understand the API contract of the method in question before blindly
applying it.  Are you "merely" allowing bytes to be processed as ASCII
strings, or does processing the bytes *correctly* imply that you are
converting from an ASCII encoding of text in order to process it?
In Python2, the latter might not generate unicode yet still produce
a correct result most of the time, but a big point of Python3 is to
eliminate that "most of the time", so we need to be careful not to
reintroduce it.  This was all covered in the thread Nick refers to;
I just want to emphasize that one needs to look at the API contract
carefully before making it polymorphic (in Guido's sense of the term).

If the way to do this is well documented best practices, we first
have to figure out what those best practices are.   To do that we have
to write some real-world code.  I'm trying one approach in email6:
Bytes and String subclasses, where the subclasses have an attribute
named 'literals' derived from a utility module that does this:

literals = dict(
empty = '',
colon = ':',
newline = '\n',
space = ' ',
tab = '\t',
fws = ' \t',
headersep = ': ',
)

class _string_literals:
pass
class _bytes_literals:
pass

for name, value in literals.items():
setattr(_string_literals, name, value)
setattr(_bytes_literals, name, bytes(value, 'ASCII'))
del literals, name, value

And the subclasses do:

class BytesHeader(BaseHeader):
lit = email.utils._bytes_literals

class StringHeader(BaseHeader):
lit = email.utils._string_literals

And then BaseHeader uses self.lit.colon, etc, when manipulating strings.
It also has to use slice notation rather than indexing when looking at
individual characters, which is a PITA but not terrible.

I'm not saying this is the best approach, since this is all experimental
code at the moment, but it is *an* approach

--
R. David Murray  www.bitdance.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] r84771 - python/branches/release27-maint/Lib/test/test_io.py

2010-09-16 Thread Georg Brandl

Maybe you want to mention *who* warns?

Georg

Am 13.09.2010 10:20, schrieb florent.xicluna:
> Author: florent.xicluna
> Date: Mon Sep 13 10:20:19 2010
> New Revision: 84771
> 
> Log:
> Silence warning about 1/0
> 
> Modified:
>python/branches/release27-maint/Lib/test/test_io.py
> 
> Modified: python/branches/release27-maint/Lib/test/test_io.py
> ==
> --- python/branches/release27-maint/Lib/test/test_io.py   (original)
> +++ python/branches/release27-maint/Lib/test/test_io.py   Mon Sep 13 
> 10:20:19 2010
> @@ -2484,7 +2484,7 @@
>  signal.signal(signal.SIGALRM, self.oldalrm)
>  
>  def alarm_interrupt(self, sig, frame):
> -1/0
> +1 // 0
>  
>  @unittest.skipUnless(threading, 'Threading required for this test.')
>  def check_interrupted_write(self, item, bytes, **fdopen_kwargs):


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] r84771 - python/branches/release27-maint/Lib/test/test_io.py

2010-09-16 Thread Antoine Pitrou

On Thu, 16 Sep 2010 17:27:50 +0200
Georg Brandl  wrote:
> Maybe you want to mention *who* warns?

I suppose it's the -3 flag:

$ ~/cpython/27/python -3 -c "1/0"
-c:1: DeprecationWarning: classic int division
Traceback (most recent call last):
  File "", line 1, in 
ZeroDivisionError: integer division or modulo by zero



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] r84847 - python/branches/py3k/Doc/library/re.rst

2010-09-16 Thread Georg Brandl

That reminds me of the undocumented re.Scanner -- which is meant to do
exactly this.  Wouldn't it be about time to document or remove it?

Georg

Am 16.09.2010 14:02, schrieb raymond.hettinger:
> Author: raymond.hettinger
> Date: Thu Sep 16 14:02:17 2010
> New Revision: 84847
> 
> Log:
> Add tokenizer example to regex docs.
> 
> Modified:
>python/branches/py3k/Doc/library/re.rst
> 
> Modified: python/branches/py3k/Doc/library/re.rst
> ==
> --- python/branches/py3k/Doc/library/re.rst   (original)
> +++ python/branches/py3k/Doc/library/re.rst   Thu Sep 16 14:02:17 2010
> @@ -1282,3 +1282,66 @@
> <_sre.SRE_Match object at ...>
> >>> re.match("", r"\\")
> <_sre.SRE_Match object at ...>
> +
> +
> +Writing a Tokenizer
> +^^^
> +
> +A `tokenizer or scanner `_
> +analyzes a string to categorize groups of characters.  This is a useful first
> +step in writing a compiler or interpreter.
> +
> +The text categories are specified with regular expressions.  The technique is
> +to combine those into a single master regular expression and to loop over
> +successive matches::
> +
> +Token = collections.namedtuple('Token', 'typ value line column')
> +
> +def tokenize(s):
> +tok_spec = [
> +('NUMBER', r'\d+(.\d+)?'),  # Integer or decimal number
> +('ASSIGN', r':='),  # Assignment operator
> +('END', ';'),   # Statement terminator
> +('ID', r'[A-Za-z]+'),   # Identifiers
> +('OP', r'[+*\/\-]'),# Arithmetic operators
> +('NEWLINE', r'\n'), # Line endings
> +('SKIP', r'[ \t]'), # Skip over spaces and tabs
> +]
> +tok_re = '|'.join('(?P<%s>%s)' % pair for pair in tok_spec)
> +gettok = re.compile(tok_re).match
> +line = 1
> +pos = line_start = 0
> +mo = gettok(s)
> +while mo is not None:
> +typ = mo.lastgroup
> +if typ == 'NEWLINE':
> +line_start = pos
> +line += 1
> +elif typ != 'SKIP':
> +yield Token(typ, mo.group(typ), line, mo.start()-line_start)
> +pos = mo.end()
> +mo = gettok(s, pos)
> +if pos != len(s):
> +raise RuntimeError('Unexpected character %r on line %d' 
> %(s[pos], line))
> +
> +>>> statements = '''\
> +total := total + price * quantity;
> +tax := price * 0.05;
> +'''
> +>>> for token in tokenize(statements):
> +... print(token)
> +...
> +Token(typ='ID', value='total', line=1, column=8)
> +Token(typ='ASSIGN', value=':=', line=1, column=14)
> +Token(typ='ID', value='total', line=1, column=17)
> +Token(typ='OP', value='+', line=1, column=23)
> +Token(typ='ID', value='price', line=1, column=25)
> +Token(typ='OP', value='*', line=1, column=31)
> +Token(typ='ID', value='quantity', line=1, column=33)
> +Token(typ='END', value=';', line=1, column=41)
> +Token(typ='ID', value='tax', line=2, column=9)
> +Token(typ='ASSIGN', value=':=', line=2, column=13)
> +Token(typ='ID', value='price', line=2, column=16)
> +Token(typ='OP', value='*', line=2, column=22)
> +Token(typ='NUMBER', value='0.05', line=2, column=24)
> +Token(typ='END', value=';', line=2, column=28)


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Toshio Kuratomi

On Thu, Sep 16, 2010 at 09:52:48AM -0400, Barry Warsaw wrote:
> On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote:
> 
> >There are some APIs that should be able to handle bytes *or* strings,
> >but the current use of string literals in their implementation means
> >that bytes don't work. This turns out to be a PITA for some networking
> >related code which really wants to be working with raw bytes (e.g.
> >URLs coming off the wire).
> 
> Note that email has exactly the same problem.  A general solution -- even if
> embodied in *well documented* best-practices and convention -- would really
> help make the stdlib work consistently, and I bet third party libraries too.
> 
I too await a solution with abated breath :-) I've been working on
documenting best practices for APIs and Unicode and for this type of
function (take bytes or unicode and output the same type), knowing the
encoding is seems like a requirement in most cases:

http://packages.python.org/kitchen/designing-unicode-apis.html#take-either-bytes-or-unicode-output-the-same-type

I'd love to add another strategy there that shows how you can robustly
operate on bytes without knowing the encoding but from writing that, I think
that anytime you simplify your API you have to accept limitations on the
data you can take in.  (For instance, some simplifications can handle
anything except ASCII-incompatible encodings).

-Toshio


pgpAJSHDGRHtD.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Antoine Pitrou

On Thu, 16 Sep 2010 11:30:12 -0400
"R. David Murray"  wrote:
> 
> And then BaseHeader uses self.lit.colon, etc, when manipulating strings.
> It also has to use slice notation rather than indexing when looking at
> individual characters, which is a PITA but not terrible.
> 
> I'm not saying this is the best approach, since this is all experimental
> code at the moment, but it is *an* approach

Out of curiousity, can you explain why polymorphism is needed for
e-mail? I would assume that headers are bytes until they are parsed, at
which point they become a pair of unicode strings (one for the header
name and one for its value).

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] r84847 - python/branches/py3k/Doc/library/re.rst

2010-09-16 Thread Michael Foord


 On 16/09/2010 16:37, Georg Brandl wrote:

That reminds me of the undocumented re.Scanner -- which is meant to do
exactly this.  Wouldn't it be about time to document or remove it?



There was a long discussion about this on the bug tracker (the 
suggestion to document it was rejected at the time).


http://bugs.python.org/issue5337

Michael Foord


Georg

Am 16.09.2010 14:02, schrieb raymond.hettinger:

Author: raymond.hettinger
Date: Thu Sep 16 14:02:17 2010
New Revision: 84847

Log:
Add tokenizer example to regex docs.

Modified:
python/branches/py3k/Doc/library/re.rst

Modified: python/branches/py3k/Doc/library/re.rst
==
--- python/branches/py3k/Doc/library/re.rst (original)
+++ python/branches/py3k/Doc/library/re.rst Thu Sep 16 14:02:17 2010
@@ -1282,3 +1282,66 @@
 <_sre.SRE_Match object at ...>
 >>>  re.match("", r"\\")
 <_sre.SRE_Match object at ...>
+
+
+Writing a Tokenizer
+^^^
+
+A `tokenizer or scanner`_
+analyzes a string to categorize groups of characters.  This is a useful first
+step in writing a compiler or interpreter.
+
+The text categories are specified with regular expressions.  The technique is
+to combine those into a single master regular expression and to loop over
+successive matches::
+
+Token = collections.namedtuple('Token', 'typ value line column')
+
+def tokenize(s):
+tok_spec = [
+('NUMBER', r'\d+(.\d+)?'),  # Integer or decimal number
+('ASSIGN', r':='),  # Assignment operator
+('END', ';'),   # Statement terminator
+('ID', r'[A-Za-z]+'),   # Identifiers
+('OP', r'[+*\/\-]'),# Arithmetic operators
+('NEWLINE', r'\n'), # Line endings
+('SKIP', r'[ \t]'), # Skip over spaces and tabs
+]
+tok_re = '|'.join('(?P<%s>%s)' % pair for pair in tok_spec)
+gettok = re.compile(tok_re).match
+line = 1
+pos = line_start = 0
+mo = gettok(s)
+while mo is not None:
+typ = mo.lastgroup
+if typ == 'NEWLINE':
+line_start = pos
+line += 1
+elif typ != 'SKIP':
+yield Token(typ, mo.group(typ), line, mo.start()-line_start)
+pos = mo.end()
+mo = gettok(s, pos)
+if pos != len(s):
+raise RuntimeError('Unexpected character %r on line %d' %(s[pos], 
line))
+
+>>>  statements = '''\
+total := total + price * quantity;
+tax := price * 0.05;
+'''
+>>>  for token in tokenize(statements):
+... print(token)
+...
+Token(typ='ID', value='total', line=1, column=8)
+Token(typ='ASSIGN', value=':=', line=1, column=14)
+Token(typ='ID', value='total', line=1, column=17)
+Token(typ='OP', value='+', line=1, column=23)
+Token(typ='ID', value='price', line=1, column=25)
+Token(typ='OP', value='*', line=1, column=31)
+Token(typ='ID', value='quantity', line=1, column=33)
+Token(typ='END', value=';', line=1, column=41)
+Token(typ='ID', value='tax', line=2, column=9)
+Token(typ='ASSIGN', value=':=', line=2, column=13)
+Token(typ='ID', value='price', line=2, column=16)
+Token(typ='OP', value='*', line=2, column=22)
+Token(typ='NUMBER', value='0.05', line=2, column=24)
+Token(typ='END', value=';', line=2, column=28)





--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Guido van Rossum

On Thu, Sep 16, 2010 at 8:42 AM, Toshio Kuratomi  wrote:
> On Thu, Sep 16, 2010 at 09:52:48AM -0400, Barry Warsaw wrote:
>> On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote:
>>
>> >There are some APIs that should be able to handle bytes *or* strings,
>> >but the current use of string literals in their implementation means
>> >that bytes don't work. This turns out to be a PITA for some networking
>> >related code which really wants to be working with raw bytes (e.g.
>> >URLs coming off the wire).
>>
>> Note that email has exactly the same problem.  A general solution -- even if
>> embodied in *well documented* best-practices and convention -- would really
>> help make the stdlib work consistently, and I bet third party libraries too.
>>
> I too await a solution with abated breath :-) I've been working on
> documenting best practices for APIs and Unicode and for this type of
> function (take bytes or unicode and output the same type), knowing the
> encoding is seems like a requirement in most cases:
>
> http://packages.python.org/kitchen/designing-unicode-apis.html#take-either-bytes-or-unicode-output-the-same-type
>
> I'd love to add another strategy there that shows how you can robustly
> operate on bytes without knowing the encoding but from writing that, I think
> that anytime you simplify your API you have to accept limitations on the
> data you can take in.  (For instance, some simplifications can handle
> anything except ASCII-incompatible encodings).

In all cases I can imagine where such polymorphic functions make
sense, the necessary and sufficient assumption should be that the
encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
Latin-N variant, and AFAIK also the popular CJK encodings other than
UTF-16. This is the same assumption made by Python's byte type when
you use "character-based" methods like lower().

--Guido

__
(*) In my mind ASCII and 7-bit are synonymous, but unfortunately there
are droves of naive users who believe that ASCII includes all 256
possible 8-bit bytes using some encoding -- typically the default
encoding of their DOS or Windows box. :-(

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Martin (gzlist)

On 16/09/2010, Guido van Rossum  wrote:
>
> In all cases I can imagine where such polymorphic functions make
> sense, the necessary and sufficient assumption should be that the
> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
> Latin-N variant, and AFAIK also the popular CJK encodings other than
> UTF-16. This is the same assumption made by Python's byte type when
> you use "character-based" methods like lower().

Well, depends on what exactly you're doing, it's pretty easy to go wrong:

Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> os.path.split("C:\\十")
('C:\\', '十')
>>> os.path.split("C:\\十".encode(sys.getfilesystemencoding()))
(b'C:\\\x8f', b'')

Similar things can catch out web developers once they step outside the
percent encoding.

Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Guido van Rossum

On Thu, Sep 16, 2010 at 10:46 AM, Martin (gzlist)  wrote:
> On 16/09/2010, Guido van Rossum  wrote:
>>
>> In all cases I can imagine where such polymorphic functions make
>> sense, the necessary and sufficient assumption should be that the
>> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
>> Latin-N variant, and AFAIK also the popular CJK encodings other than
>> UTF-16. This is the same assumption made by Python's byte type when
>> you use "character-based" methods like lower().
>
> Well, depends on what exactly you're doing, it's pretty easy to go wrong:
>
> Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on 
> win32
> Type "help", "copyright", "credits" or "license" for more information.
 import os, sys
 os.path.split("C:\\十")
> ('C:\\', '十')
 os.path.split("C:\\十".encode(sys.getfilesystemencoding()))
> (b'C:\\\x8f', b'')
>
> Similar things can catch out web developers once they step outside the
> percent encoding.

Well, that character is not 7-bit ASCII. Of course things will go
wrong there. That's the whole point of what I said, isn't it?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] standards for distribution names

2010-09-16 Thread P.J. Eby


At 12:08 PM 9/16/2010 +0100, Chris Withers wrote:

Following on from this question:

http://twistedmatrix.com/pipermail/twisted-python/2010-September/022877.html

...I'd thought that the "correct names" for distributions would have 
been documented in one of:


...

Where are the standards for this or is it still a case of "whatever 
setuptools does"?


Actually, in this case, it's "whatever distutils does".  If you don't 
build your .exe's with Distutils, or if you rename them after the 
fact, then setuptools won't recognize them as things it can consume.


FYI, Twisted has a long history of releasing distribution files that 
are either built using non-distutils tools or else renamed after being built.


Note, too, that if the Windows exe's they're providing aren't built 
by the distutils bdist_wininst command, then setuptools is probably 
not going to be able to consume them, no matter what they're called.






___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Toshio Kuratomi

On Thu, Sep 16, 2010 at 10:56:56AM -0700, Guido van Rossum wrote:
> On Thu, Sep 16, 2010 at 10:46 AM, Martin (gzlist)  
> wrote:
> > On 16/09/2010, Guido van Rossum  wrote:
> >>
> >> In all cases I can imagine where such polymorphic functions make
> >> sense, the necessary and sufficient assumption should be that the
> >> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
> >> Latin-N variant, and AFAIK also the popular CJK encodings other than
> >> UTF-16. This is the same assumption made by Python's byte type when
> >> you use "character-based" methods like lower().
> >
> > Well, depends on what exactly you're doing, it's pretty easy to go wrong:
> >
> > Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on 
> > win32
> > Type "help", "copyright", "credits" or "license" for more information.
>  import os, sys
>  os.path.split("C:\\十")
> > ('C:\\', '十')
>  os.path.split("C:\\十".encode(sys.getfilesystemencoding()))
> > (b'C:\\\x8f', b'')
> >
> > Similar things can catch out web developers once they step outside the
> > percent encoding.
> 
> Well, that character is not 7-bit ASCII. Of course things will go
> wrong there. That's the whole point of what I said, isn't it?
> 
You were talking about encodings that were supersets of 7-bit ASCII.
I think Martin was demonstrating a byte string that was a superset of 7-bit
ASCII being fed to a stdlib function which went wrong.

-Toshio


pgpTUIwKWOepG.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] standards for distribution names

2010-09-16 Thread P.J. Eby


At 12:08 PM 9/16/2010 +0100, Chris Withers wrote:
...I'd thought that the "correct names" for distributions would have 
been documented in one of:


http://www.python.org/dev/peps/pep-0345
http://www.python.org/dev/peps/pep-0376
http://www.python.org/dev/peps/pep-0386

...but having read them, I drew a blank.


Forgot to mention: see distinfo_dirname() in PEP 376 for an 
explanation of distribution-name normalization.


(Case-insensitivity and os-specific case handling is not addressed in 
the PEPs, though, AFAICT.)


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python 2.7 Won't Build

2010-09-16 Thread Tom Browder

I am trying to rebujild the 2.7 maintenance branch and get this error
on Ubuntu 10.04.1 LTS:

XXX lineno: 743, opcode: 0
Traceback (most recent call last):
 File "/usr/local/src/python-2.7-maint-svn/Lib/site.py", line 62, in 
   import os
 File "/usr/local/src/python-2.7-maint-svn/Lib/os.py", line 743, in 
   def urandom(n):
SystemError: unknown opcode

I installed it successfully once so I may be getting conflicts, but I
can't figure out why.  There were some similar bugs reported in
previous versions but I didn't see a clear solution.

I have done "make distclean" and "./configure".  I have unset my
PYTHONPATH and LD_LIBRARY_PATH, but python2.7 is my default python.

I guess my next step will be to manually remove the installed python
2.7 unless I hear some other suggestions.

And I will file a bug report soon unless that is inappropriate.

Thanks,

-Tom

Thomas M. Browder, Jr.
Niceville, Florida
USA
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Brett Cannon

On Thu, Sep 16, 2010 at 06:28, Nick Coghlan  wrote:
> On Thu, Sep 16, 2010 at 10:26 PM, Antoine Pitrou  wrote:
>> Why won't you feel confident? Are there any specific issues (apart from
>> the lack of a WSGI PEP)?
>> If they are technical problems, they should be reported on the bug
>> tracker.
>> If they are representational, cultural or psychological issues, I'm
>> not sure what we can do. But delaying the release won't solve them.
>
> There are some APIs that should be able to handle bytes *or* strings,
> but the current use of string literals in their implementation means
> that bytes don't work. This turns out to be a PITA for some networking
> related code which really wants to be working with raw bytes (e.g.
> URLs coming off the wire).
>
> For example:
>
 import urllib.parse as parse
 parse.urlsplit("http://www.ubuntu.com";)
> SplitResult(scheme='http', netloc='www.ubuntu.com', path='', query='',
> fragment='')
 parse.urlsplit(b"http://www.ubuntu.com";)
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "/home/ncoghlan/devel/py3k/Lib/urllib/parse.py", line 178, in urlsplit
>    i = url.find(':')
> TypeError: expected an object with the buffer interface
>
> There's no real reason urlsplit (and similar urllib.parse APIs)
> shouldn't support bytes, but the internal use of string literals
> currently prevents it.
>
> We don't seem to have created a tracker issue from the discussion back
> in June where this came up, so I went ahead and created one just now:
> http://bugs.python.org/issue9873

When I do my two months of PSF-sponsored core work (expected to be
Jan/Feb) I was planning on (finally) redoing the dev docs, writing a
HOWTO for maintaining a Python 2/3 code base, and cleaning up the test
suite. But I am starting to think I should change the last one to
solving this polymorphism problem in a way that can be applied across
the board in the stdlib.

>
> I think there were other APIs mentioned back then beyond the
> urllib.parse ones, but I didn't find them when I went trawling through
> the list archives yesterday. If anyone else thinks of any APIs that
> should allow bytes as well as strings (or vice-versa) feel free to add
> them to that issue.

Or create separate issues and make them dependencies for issue9873.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 2.7 Won't Build

2010-09-16 Thread Brett Cannon

Go ahead and file the bug, but chances are that some other installed
Python is executing the code and picking up the .pyc files which have
bytecode new to Python 2.7.

On Thu, Sep 16, 2010 at 11:41, Tom Browder  wrote:
> I am trying to rebujild the 2.7 maintenance branch and get this error
> on Ubuntu 10.04.1 LTS:
>
> XXX lineno: 743, opcode: 0
> Traceback (most recent call last):
>  File "/usr/local/src/python-2.7-maint-svn/Lib/site.py", line 62, in 
>   import os
>  File "/usr/local/src/python-2.7-maint-svn/Lib/os.py", line 743, in 
>   def urandom(n):
> SystemError: unknown opcode
>
> I installed it successfully once so I may be getting conflicts, but I
> can't figure out why.  There were some similar bugs reported in
> previous versions but I didn't see a clear solution.
>
> I have done "make distclean" and "./configure".  I have unset my
> PYTHONPATH and LD_LIBRARY_PATH, but python2.7 is my default python.
>
> I guess my next step will be to manually remove the installed python
> 2.7 unless I hear some other suggestions.
>
> And I will file a bug report soon unless that is inappropriate.
>
> Thanks,
>
> -Tom
>
> Thomas M. Browder, Jr.
> Niceville, Florida
> USA
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 2.7 Won't Build

2010-09-16 Thread Tom Browder

On Thu, Sep 16, 2010 at 13:48, Brett Cannon  wrote:
> Go ahead and file the bug, but chances are that some other installed
> Python is executing the code and picking up the .pyc files which have
> bytecode new to Python 2.7.

But isn't that a problem with the build system?  It seems to me it
should be using all modules from within the build, thus there should
be no such error.

Regards,

-Tom
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Jacob Kaplan-Moss

On Thu, Sep 16, 2010 at 9:59 AM, Paul Moore  wrote:
> On 16 September 2010 07:16, Terry Reedy  wrote:
>>> I'm not working to get Django running on Python 3.1 because I don't
>>> feel confident I'll be able to put any apps I write into production.
>>
>> Why not? Since the I/O speed problem is fixed, I have no idea what you are
>> referring to.  Please do be concrete.
>
> At the risk of putting words into Jacob's mouth, I understood him to
> mean that "production quality" WSGI servers either do not exist, or do
> not implement a consistently defined spec (i.e., everyone is doing
> their own thing to adapt WSGI to Python 3).

Yup, exactly.

Deploying web apps under Python 2 right now is actually pretty
awesome. There's a clear leader in mod_wsgi that's fast, stable, easy
to use, and under active development. There's a few great lightweight
pure-Python servers, some new-hotness (Gunicorn) and some
tried-and-true (CherryPy). There's a fast-as-hell bleeding-edge option
(nginx + uwsgi). And those are just the ones I've successfully put
into production -- there're still *more* options if one of those won't
cut it.

The key here is that switching between all of these deployment
situations is *incredibly* easy. Actually, this very afternoon I'm
planning to experiment with a switch from mod_wsgi to gunicon. I'm
confident enough with the inter-op that I'm going to make the switch
on a production web server, monitor it for a bit, then switch back.

I've budgeted an hour for this, and I'll probably end up spending half
that time playing Minecraft while I gather statistics.

Python 3 offers me none of this. I don't have a wide variety of tools
to choose from. Worse, I don't even have a guarantee of
interoperability between the tools that *do* exist.

---

I'm sorry if I'm coming across as a complainer here. It's a
frustrating situation for me: I want to start using Python 3, but
until there's a working web stack waiting for me I just can't justify
the time. And unfortunately I'm just not familiar enough with the
problem(s) to have any real shot at working towards a solution, and
I'm *certainly* not enough of an expert to work on a PEP or spec. So
all I can really do is agitate.

Jacob
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Guido van Rossum

On Thu, Sep 16, 2010 at 11:16 AM, Toshio Kuratomi  wrote:
> On Thu, Sep 16, 2010 at 10:56:56AM -0700, Guido van Rossum wrote:
>> On Thu, Sep 16, 2010 at 10:46 AM, Martin (gzlist)  
>> wrote:
>> > On 16/09/2010, Guido van Rossum  wrote:
>> >>
>> >> In all cases I can imagine where such polymorphic functions make
>> >> sense, the necessary and sufficient assumption should be that the
>> >> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
>> >> Latin-N variant, and AFAIK also the popular CJK encodings other than
>> >> UTF-16. This is the same assumption made by Python's byte type when
>> >> you use "character-based" methods like lower().
>> >
>> > Well, depends on what exactly you're doing, it's pretty easy to go wrong:
>> >
>> > Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on 
>> > win32
>> > Type "help", "copyright", "credits" or "license" for more information.
>>  import os, sys
>>  os.path.split("C:\\十")
>> > ('C:\\', '十')
>>  os.path.split("C:\\十".encode(sys.getfilesystemencoding()))
>> > (b'C:\\\x8f', b'')
>> >
>> > Similar things can catch out web developers once they step outside the
>> > percent encoding.
>>
>> Well, that character is not 7-bit ASCII. Of course things will go
>> wrong there. That's the whole point of what I said, isn't it?
>>
> You were talking about encodings that were supersets of 7-bit ASCII.
> I think Martin was demonstrating a byte string that was a superset of 7-bit
> ASCII being fed to a stdlib function which went wrong.

Whoops, sorry. I don't have access to Windows so I can't reproduce
this though. I also don't understand it. What is the Unicode codepoint
for that 十 character? What is sys.getfilesystemencoding()? What is the
value of "C:\\十".encode(sys.getfilesystemencoding())?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 2.7 Won't Build

2010-09-16 Thread Brett Cannon

Please file the bug and it can be discussed further there.

On Thu, Sep 16, 2010 at 12:05, Tom Browder  wrote:
> On Thu, Sep 16, 2010 at 13:48, Brett Cannon  wrote:
>> Go ahead and file the bug, but chances are that some other installed
>> Python is executing the code and picking up the .pyc files which have
>> bytecode new to Python 2.7.
>
> But isn't that a problem with the build system?  It seems to me it
> should be using all modules from within the build, thus there should
> be no such error.
>
> Regards,
>
> -Tom
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 2.7 Won't Build

2010-09-16 Thread Barry Warsaw

On Sep 16, 2010, at 01:41 PM, Tom Browder wrote:

>I am trying to rebujild the 2.7 maintenance branch and get this error
>on Ubuntu 10.04.1 LTS:

I just tried this on my vanilla 10.04.1 system.  I checked out release27-maint
ran configure && make.  It built without problem.

>XXX lineno: 743, opcode: 0
>Traceback (most recent call last):
> File "/usr/local/src/python-2.7-maint-svn/Lib/site.py", line 62, in
>  import os
> File "/usr/local/src/python-2.7-maint-svn/Lib/os.py", line 743, in
>  def urandom(n):
>SystemError: unknown opcode
>
>I installed it successfully once so I may be getting conflicts, but I
>can't figure out why.  There were some similar bugs reported in
>previous versions but I didn't see a clear solution.

I installed Python 2.7 to /usr/local, then did a make distclean, configure,
make.  Again, successfully.

>I have done "make distclean" and "./configure".  I have unset my
>PYTHONPATH and LD_LIBRARY_PATH, but python2.7 is my default python.
>
>I guess my next step will be to manually remove the installed python
>2.7 unless I hear some other suggestions.

When you say "installed python 2.7" do you mean the one you installed to
/usr/local from a from-source build, or something else (e.g. a Python 2.7
package perhaps)?

>And I will file a bug report soon unless that is inappropriate.

Sure.  Please +nosy me.  But I think something else is going on.
-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Martin (gzlist)

On 16/09/2010, Guido van Rossum  wrote:
> On Thu, Sep 16, 2010 at 11:16 AM, Toshio Kuratomi 
> wrote:
>> You were talking about encodings that were supersets of 7-bit ASCII.
>> I think Martin was demonstrating a byte string that was a superset of
>> 7-bit
>> ASCII being fed to a stdlib function which went wrong.
>
> Whoops, sorry. I don't have access to Windows so I can't reproduce
> this though. I also don't understand it. What is the Unicode codepoint
> for that 十 character? What is sys.getfilesystemencoding()? What is the
> value of "C:\\十".encode(sys.getfilesystemencoding())?

My fault, should have been clearer. I was trying to demonstrate that
there's a difference between the unix-friendly encodings like UTF-8
and the EUC codecs which only use high-bit characters for non-ascii
text, and the ISO-2022 codecs and Shift JIS.

In the example I gave, 十 encodes in CP932 as '\x8f\\', and the
function gets confused by the second byte. Obviously the right answer
there is just to use unicode, rather than write a function that works
with weird multibyte codecs.

Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 2.7 Won't Build

2010-09-16 Thread Tom Browder

On Thu, Sep 16, 2010 at 14:36, Barry Warsaw  wrote:
> On Sep 16, 2010, at 01:41 PM, Tom Browder wrote:
>
>>I am trying to rebujild the 2.7 maintenance branch and get this error
>>on Ubuntu 10.04.1 LTS:
>
> I just tried this on my vanilla 10.04.1 system.  I checked out release27-maint
> ran configure && make.  It built without problem.
>
>>XXX lineno: 743, opcode: 0
>>Traceback (most recent call last):
>> File "/usr/local/src/python-2.7-maint-svn/Lib/site.py", line 62, in
>>  import os
>> File "/usr/local/src/python-2.7-maint-svn/Lib/os.py", line 743, in
>>  def urandom(n):
>>SystemError: unknown opcode
...

> When you say "installed python 2.7" do you mean the one you installed to
> /usr/local from a from-source build, or something else (e.g. a Python 2.7
> package perhaps)?

It was the released source tarball for 2.7, and I get the same error
when I try it from that directory.

-Tom

Thomas M. Browder, Jr.
Niceville, Florida
USA
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Steve Holden

On 9/16/2010 3:07 PM, Jacob Kaplan-Moss wrote:
> On Thu, Sep 16, 2010 at 9:59 AM, Paul Moore  wrote:
>> On 16 September 2010 07:16, Terry Reedy  wrote:
 I'm not working to get Django running on Python 3.1 because I don't
 feel confident I'll be able to put any apps I write into production.
>>>
>>> Why not? Since the I/O speed problem is fixed, I have no idea what you are
>>> referring to.  Please do be concrete.
>>
>> At the risk of putting words into Jacob's mouth, I understood him to
>> mean that "production quality" WSGI servers either do not exist, or do
>> not implement a consistently defined spec (i.e., everyone is doing
>> their own thing to adapt WSGI to Python 3).
> 
> Yup, exactly.
> 
> Deploying web apps under Python 2 right now is actually pretty
> awesome. There's a clear leader in mod_wsgi that's fast, stable, easy
> to use, and under active development. There's a few great lightweight
> pure-Python servers, some new-hotness (Gunicorn) and some
> tried-and-true (CherryPy). There's a fast-as-hell bleeding-edge option
> (nginx + uwsgi). And those are just the ones I've successfully put
> into production -- there're still *more* options if one of those won't
> cut it.
> 
> The key here is that switching between all of these deployment
> situations is *incredibly* easy. Actually, this very afternoon I'm
> planning to experiment with a switch from mod_wsgi to gunicon. I'm
> confident enough with the inter-op that I'm going to make the switch
> on a production web server, monitor it for a bit, then switch back.
> 
> I've budgeted an hour for this, and I'll probably end up spending half
> that time playing Minecraft while I gather statistics.
> 
> Python 3 offers me none of this. I don't have a wide variety of tools
> to choose from. Worse, I don't even have a guarantee of
> interoperability between the tools that *do* exist.
> 
> ---
> 
> I'm sorry if I'm coming across as a complainer here. It's a
> frustrating situation for me: I want to start using Python 3, but
> until there's a working web stack waiting for me I just can't justify
> the time. And unfortunately I'm just not familiar enough with the
> problem(s) to have any real shot at working towards a solution, and
> I'm *certainly* not enough of an expert to work on a PEP or spec. So
> all I can really do is agitate.
> 
I think you are entitled to describe real-world use cases that Python 3
needs to start solving to be accepted as production-ready.

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
DjangoCon US September 7-9, 2010http://djangocon.us/
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.x as the official release

2010-09-16 Thread Éric Araujo

Le 15/09/2010 21:45, Tarek Ziadé a écrit :
> Could we remove in any case the wsgiref.egg-info file ? Since we've
> been working on a new format for that (PEP 376), that should be
> starting to get used in the coming years, it'll be a bit of a
> non-sense to have that metadata file in the sdtlib shipped with 3,2

On a related subject: Would it make sense not to run install_egg_info
from install anymore?  We probably can’t remove the command because of
backward compat, but we could stop running it (thus creating egg-info
files) by default.

Regards

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 2.7 Won't Build

2010-09-16 Thread Barry Warsaw

On Sep 16, 2010, at 02:56 PM, Tom Browder wrote:

>On Thu, Sep 16, 2010 at 14:36, Barry Warsaw  wrote:
>> When you say "installed python 2.7" do you mean the one you
>> installed to /usr/local from a from-source build, or something else
>> (e.g. a Python 2.7 package perhaps)?
>
>It was the released source tarball for 2.7, and I get the same error
>when I try it from that directory.

Yep, sorry, I still cannot reproduce it.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.x as the official release

2010-09-16 Thread P.J. Eby


At 10:18 PM 9/16/2010 +0200, Ãric Araujo wrote:
Le 15/09/2010 21:45, Tarek ZiadÃ© a Ã©crit : > Could we remove in 
any case the wsgiref.egg-info file ? Since we've > been working on a 
new format for that (PEP 376), that should be > starting to get used 
in the coming years, it'll be a bit of a > non-sense to have that 
metadata file in the sdtlib shipped with 3,2 On a related subject: 
Would it make sense not to run install_egg_info from install 
anymore?  We probably canât remove the command because of backward 
compat, but we could stop running it (thus creating egg-info files) by default.


If you're talking about distutils2 on Python 3, then of course 
anything goes: backward compatibility isn't an issue.  For 2.x, not 
writing the files would indeed produce backward compatibility problems.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread R. David Murray

On Thu, 16 Sep 2010 17:40:53 +0200, Antoine Pitrou  wrote:
> On Thu, 16 Sep 2010 11:30:12 -0400
> "R. David Murray"  wrote:
> > 
> > And then BaseHeader uses self.lit.colon, etc, when manipulating strings.
> > It also has to use slice notation rather than indexing when looking at
> > individual characters, which is a PITA but not terrible.
> > 
> > I'm not saying this is the best approach, since this is all experimental
> > code at the moment, but it is *an* approach
> 
> Out of curiousity, can you explain why polymorphism is needed for
> e-mail? I would assume that headers are bytes until they are parsed, at
> which point they become a pair of unicode strings (one for the header
> name and one for its value).

Currently email accepts strings as input, and produces strings as output.

It needs to also accept bytes as input, and emit bytes as output, because
unicode can only be used as a 7-bit clean data transmission channel,
and that's too restrictive for many email applications (many of which
need to deal with "dirty" (non-RFC conformant) 8bit data. [1]

Backward compatibility says "case closed".

If we were designing from scratch, we could insist that input to the
parser is always bytes, and when the model is serialized it always
produces bytes.  It is possible that one could live with that, but I
don't think it is optimal.

Given a message, there are many times you want to serialize it as text
(for example, for presentation in a UI).  You could provide alternate
serialization methods to get text out on demandbut then what if
someone wants to push that text representation back in to email to
rebuild a model of the message?  So now we have both a bytes parser
and a string parser.

What do we store in the model?  We could say that the model is always
text.  But then we lose information about the original bytes message,
and we can't reproduce it.  For various reasons (mailman being a big one),
this is not acceptable.  So we could say that the model is always bytes.
But we want access to (for example) the header values as text, so header
lookup should take string keys and return string values[2].  But for
certain types of processing, particularly examination of "dirty",
non-RFC conforming input data, you need to be able to access the raw
bytes data.

What about email files on disk?  They could be bytes, or they could be,
effectively, text (for example, utf-8 encoded).  On disk, using utf-8,
one might store the text representation of the message, rather than
the wire-format (ASCII encoded) version.  We might want to write such
messages from scratch.  As I said above, we could insist that files on
disk be in wire-format, and for many applications that would work fine,
but I think people would get mad at us if didn't support text files[3].

So, after much discussion, what we arrived at (so far!) is a model
that mimics the Python3 split between bytes and strings.  If you
start with bytes input, you end up with a BytesMessage object.
If you start with string input to the parser, you end up with a
StringMessage.  If you have a BytesMessage and you want to do
something with the text version of the message, you decode it:

print(mymsg.decode())

If the message is RFC conformant, the message contains all the information
needed to decode it correctly.  If its not conformant, email does the
best it can and registers defects for the non-conformant bits (or,
optionally, email6 will raise errors when the policy is set to strict).

If you have a StringMessage and you want to use it where wire-format is
needed, you encode it:

outmsg = mymsg.encode()
smtpserver.sendmail(
bytes(outmsg['from']),
[bytes(x) for x in itertools.chain(
outmsg['to'], outmsg['cc'], outmsg['bcc'])],
outmsg.serialize(policy=email.policy.SMTP))

Encoding uses the utf-8 character set by default, but this can be modified
by changing the policy.  The trick for gathering the list of addresses is
how I *think* that part of the API is going to work:  iterating the object
that models an address header gives you a list of address objects, and
converting one of those to a bytes string gives you the wire-format byte
string representing a single address.  Also note that this is the new API;
in compatibility mode (which is controlled by the policy) you'd get the
old behavior of just getting the string representation of the whole header
back (but then you'd have to parse it to turn it into a list of addresses).

The point here is that because we've encoded the message to a
BytesMessage, what we get when we turn the pieces into a bytes string
are the wire-format byte strings that are required for transmission;
for example, non-ASCII characters will be encoded according to
the policy and then RFC2047 transfer encoded as needed.

At this point you may notice there's a problem with the example above.
We actually need to decode each of those byte strings using the ASCII
codec before passing them as arguments to smt

Re: [Python-Dev] Python 2.7 Won't Build

2010-09-16 Thread Tom Browder

I'm attempting to file a bug but keep getting:

An error has occurred

A problem was encountered processing your request. The tracker
maintainers have been notified of the problem.

-Tom

Thomas M. Browder, Jr.
Niceville, Florida
USA
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 2.7 Won't Build

2010-09-16 Thread Victor Stinner

Le jeudi 16 septembre 2010 23:10:22, Tom Browder a écrit :
> I'm attempting to file a bug but keep getting:

File another bug about this bug!

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.x as the official release

2010-09-16 Thread Éric Araujo

> If you're talking about distutils2 on Python 3, then of course 
> anything goes: backward compatibility isn't an issue.  For 2.x, not 
> writing the files would indeed produce backward compatibility problems.

I was talking about distutils in 3.2 (or in the release where
wsgiref.egg-info goes away).  install_egg_info.py has already been
turned into install_distinfo.py in distutils2, following PEP 376.

Thank you for your reply, I withdraw my suggestion.

Regards

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Antoine Pitrou

On Thu, 16 Sep 2010 16:51:58 -0400
"R. David Murray"  wrote:
>
> What do we store in the model?  We could say that the model is always
> text.  But then we lose information about the original bytes message,
> and we can't reproduce it.  For various reasons (mailman being a big one),
> this is not acceptable.  So we could say that the model is always bytes.
> But we want access to (for example) the header values as text, so header
> lookup should take string keys and return string values[2].

Why can't you have both in a single class? If you create the class
using a bytes source (a raw message sent by SMTP, for example), the
class automatically parses and decodes it to unicode strings; if you
create the class using an unicode source (the text body of the e-mail
message and the list of recipients, for example), the class
automatically creates the bytes representation.

(of course all processing can be done lazily for performance reasons)

> What about email files on disk?  They could be bytes, or they could be,
> effectively, text (for example, utf-8 encoded). 

Such a file can be two things:
- the raw encoding of a whole message (including headers, etc.), then
  it should be fed as a bytes object
- the single text body of a hypothetical message, then it should be fed
  as a unicode object

I don't see any possible middle-ground.

> On disk, using utf-8,
> one might store the text representation of the message, rather than
> the wire-format (ASCII encoded) version.  We might want to write such
> messages from scratch.

But then the user knows the encoding (by "user" I mean what/whoever
calls the email API) and mentions it to the email package.

What I'm having an issue with is that you are talking about a bytes
representation and an unicode representation of a message. But they
aren't representations of the same things:
- if it's a bytes representation, it will be the whole, raw message
  including envelope / headers (also, MIME sections etc.)
- if it's an unicode representation, it will only be a section of the
  message decodable as such (a text/plain MIME section, for example;
  or a decoded header value; or even a single e-mail address part of a
  decoded header)

So, there doesn't seem to be any reason for having both a BytesMessage
and an UnicodeMessage at the same abstraction level. They are both
representing different things at different abstraction levels. I don't
see any potential for confusion: raw assembled e-mail message = bytes;
decoded text section of a message = unicode.

As for the problem of potential "bogus" raw e-mail data
(e.g., undecodable headers), well, I guess the library has to make a
choice between purity and practicality, or perhaps let the user choose
themselves. For example, through a `strict` flag. If `strict` is true,
raise an error as soon as a non-decodable byte appears in a header, if
`strict` is false, decode it through a default (encoding, errors)
convention which can be overriden by the user (a sensible possibility
being "utf-8, surrogateescape" to allow for lossless round-tripping).

> As I said above, we could insist that files on
> disk be in wire-format, and for many applications that would work fine,
> but I think people would get mad at us if didn't support text files[3].

Again, this simply seems to be two different abstraction levels:
pre-generated raw email messages including headers, or a single text
waiting to be embedded in an actual e-mail.

> Anyway, what polymorphism means in email is that if you put in bytes,
> you get a BytesMessage, if you put in strings you get a StringMessage,
> and if you want the other one you convert.

And then you have two separate worlds while ultimately the same
concepts are underlying. A library accepting BytesMessage will crash
when a program wants to give a StringMessage and vice-versa. That
doesn't sound very practical.

> [1] Now that surrogateesscape exists, one might suppose that strings
> could be used as an 8bit channel, but that only works if you don't need
> to *parse* the non-ASCII data, just transmit it.

Well, you can parse it, precisely. Not only, but it round-trips if you
unparse it again:

>>> header_bytes = b"From: bogus\xFFname "
>>> name, value = header_bytes.decode("utf-8", "surrogateescape").split(":")
>>> name
'From'
>>> value
' bogus\udcffname '
>>> "{0}:{1}".format(name, value).encode("utf-8", "surrogateescape")
b'From: bogus\xffname '

In the end, what I would call a polymorphic best practice is "try to
avoid bytes/str polymorphism if your domain is well-defined
enough" (which I admit URLs aren't necessarily; but there's no
question a single text/XXX e-mail section is text, and a whole
assembled e-mail message is bytes).

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 2.7 Won't Build

2010-09-16 Thread Tom Browder

USAOn Thu, Sep 16, 2010 at 16:36, Victor Stinner
 wrote:
> Le jeudi 16 septembre 2010 23:10:22, Tom Browder a écrit :
>> I'm attempting to file a bug but keep getting:
>
> File another bug about this bug!

I did, and eventually discovered the problem: I tried to "nosy" Barry
as requested by adding his e-mail address, but that causes an error in
the tracker.  After I finally figured that out, I successfully entered
the original bug (and reported it on the "tracker bug").

-Tom

Thomas M. Browder, Jr.
Niceville, Florida
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Glyph Lefkowitz

On Sep 16, 2010, at 4:51 PM, R. David Murray wrote:

> Given a message, there are many times you want to serialize it as text
> (for example, for presentation in a UI).  You could provide alternate
> serialization methods to get text out on demandbut then what if
> someone wants to push that text representation back in to email to
> rebuild a model of the message?

You tell them "too bad, make some bytes out of that text."  Leave it up to the 
application.  Period, the end, it's not the library's job.  If you pushed the 
text out to a 'view message source' UI representation, then the vicissitudes of 
the system clipboard and other encoding and decoding things may corrupt it in 
inscrutable ways.  You can't fix it.  Don't try.

> So now we have both a bytes parser and a string parser.

Why do so many messages on this subject take this for granted?  It's wrong for 
the email module just like it's wrong for every other package.

There are plenty of other (better) ways to deal with this problem.  Let the 
application decide how to fudge the encoding of the characters back into bytes 
that can be parsed.  "In the face of ambiguity, refuse the temptation to guess" 
and all that.  The application has more of an idea of what's going on than the 
library here, so let it make encoding decisions.

Put another way, there's nothing wrong with having a text parser, as long as it 
just encodes the text according to some known encoding and then parses the 
bytes :).

> So, after much discussion, what we arrived at (so far!) is a model
> that mimics the Python3 split between bytes and strings.  If you
> start with bytes input, you end up with a BytesMessage object.
> If you start with string input to the parser, you end up with a
> StringMessage.

That may be a handy way to deal with some grotty internal implementation 
details, but having a 'decode()' method is broken.  The thing I care about, as 
a consumer of this API, is that there is a clearly defined "Message" interface, 
which gives me a uniform-looking place where I can ask for either characters 
(if I'm displaying them to the user) or bytes (if I'm putting them on the 
wire).  I don't particularly care where those bytes came from.  I don't care 
what decoding tricks were necessary to produce the characters.

Now, it may be worthwhile to have specific normalization / debrokenifying 
methods which deal with specific types of corrupt data from the wire; 
encoding-guessing, replacement-character insertion or whatever else are fine 
things to try.  It may also be helpful to keep around a list of errors in the 
message, for inspection.  But as we know, there are lots of ways that MIME data 
can go bad other than encoding, so that's just one variety of error that we 
might want to keep around.

(Looking at later messages as I'm about to post this, I think this all sounds 
pretty similar to Antoine's suggestions, with respect to keeping the 
implementation within a single class, and not having 
BytesMessage/UnicodeMessage at the same abstraction level.)___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Barry Warsaw

On Sep 16, 2010, at 06:11 PM, Glyph Lefkowitz wrote:

>That may be a handy way to deal with some grotty internal
>implementation details, but having a 'decode()' method is broken.  The
>thing I care about, as a consumer of this API, is that there is a
>clearly defined "Message" interface, which gives me a uniform-looking
>place where I can ask for either characters (if I'm displaying them to
>the user) or bytes (if I'm putting them on the wire).  I don't
>particularly care where those bytes came from.  I don't care what
>decoding tricks were necessary to produce the characters.

But first you have to get to that Message interface.  This is why the current
email package separates parsing and generating from the representation model.
You could conceivably have a parser that rot13's all the payload, or just
parses the headers and leaves the payload as a blob of bytes.  But the parser
tries to be lenient in what it accepts, so that one bad header doesn't cause
it to just punt on everything that follows.  Instead, it parses what it can
and registers a defect on that header, which the application can then reason
about, because it has a Message object.  If it were to just throw up its hands
(i.e. raise an exception), you'd basically be left with a blob of useless crap
that will just get /dev/null'd.

>Now, it may be worthwhile to have specific normalization /
>debrokenifying methods which deal with specific types of corrupt data
>from the wire; encoding-guessing, replacement-character insertion or
>whatever else are fine things to try.  It may also be helpful to keep
>around a list of errors in the message, for inspection.  But as we
>know, there are lots of ways that MIME data can go bad other than
>encoding, so that's just one variety of error that we might want to
>keep around.

Right.  The middle ground IMO is what the current parser does.  It recognizes
the problem, registers a defect, and tries to recover, but it doesn't fix the
corrupt data.  So for example, if you had a valid RFC 2047 encoded Subject but
a broken X-Foo header, you'd at least still end up with a Message object.  The
value of the good headers would be things from which you can get the unicode
value, the raw bytes value, parse its parameters, munge it, etc. while the bad
header might be something you can only get the raw bytes from.

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Glyph Lefkowitz


On Sep 16, 2010, at 7:34 PM, Barry Warsaw wrote:

> On Sep 16, 2010, at 06:11 PM, Glyph Lefkowitz wrote:
> 
>> That may be a handy way to deal with some grotty internal
>> implementation details, but having a 'decode()' method is broken.  The
>> thing I care about, as a consumer of this API, is that there is a
>> clearly defined "Message" interface, which gives me a uniform-looking
>> place where I can ask for either characters (if I'm displaying them to
>> the user) or bytes (if I'm putting them on the wire).  I don't
>> particularly care where those bytes came from.  I don't care what
>> decoding tricks were necessary to produce the characters.
> 
> But first you have to get to that Message interface.  This is why the current
> email package separates parsing and generating from the representation model.
> You could conceivably have a parser that rot13's all the payload, or just
> parses the headers and leaves the payload as a blob of bytes.  But the parser
> tries to be lenient in what it accepts, so that one bad header doesn't cause
> it to just punt on everything that follows.  Instead, it parses what it can
> and registers a defect on that header, which the application can then reason
> about, because it has a Message object.  If it were to just throw up its hands
> (i.e. raise an exception), you'd basically be left with a blob of useless crap
> that will just get /dev/null'd.

Oh, absolutely.  Please don't interpret anything I say as meaning that the 
email API should not handle broken data.  I'm just saying that you should not 
expect broken data to round-trip through translation to characters and back, 
any more than you should expect a broken PNG to round-trip through a 
translation to a 2d array of pixels and back.

>> Now, it may be worthwhile to have specific normalization /
>> debrokenifying methods which deal with specific types of corrupt data
>> from the wire; encoding-guessing, replacement-character insertion or
>> whatever else are fine things to try.  It may also be helpful to keep
>> around a list of errors in the message, for inspection.  But as we
>> know, there are lots of ways that MIME data can go bad other than
>> encoding, so that's just one variety of error that we might want to
>> keep around.
> 
> Right.  The middle ground IMO is what the current parser does.  It recognizes
> the problem, registers a defect, and tries to recover, but it doesn't fix the
> corrupt data.  So for example, if you had a valid RFC 2047 encoded Subject but
> a broken X-Foo header, you'd at least still end up with a Message object.  The
> value of the good headers would be things from which you can get the unicode
> value, the raw bytes value, parse its parameters, munge it, etc. while the bad
> header might be something you can only get the raw bytes from.


My take on this would be that you should always be able to get bytes or 
characters, but characters are always suspect, in that once you've decoded, if 
you had invalid bytes, then they're replacement characters (or your choice of 
encoding fix).___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread R. David Murray

On Thu, 16 Sep 2010 18:11:30 -0400, Glyph Lefkowitz  
wrote:
> On Sep 16, 2010, at 4:51 PM, R. David Murray wrote:
> 
> > Given a message, there are many times you want to serialize it as text
> > (for example, for presentation in a UI).  You could provide alternate
> > serialization methods to get text out on demandbut then what if
> > someone wants to push that text representation back in to email to
> > rebuild a model of the message?
> 
> You tell them "too bad, make some bytes out of that text."  Leave it up
> to the application.  Period, the end, it's not the library's job.  If
> you pushed the text out to a 'view message source' UI representation,
> then the vicissitudes of the system clipboard and other encoding and
> decoding things may corrupt it in inscrutable ways.  You can't fix it. 
> Don't try.

Say we start with this bytes input:

To: Glyph Lefkowitz 
From: "R. David Murray" 
Subject: =?utf-8?q?p=F6stal?=

A simple message.

Part of the responsibility of the email module is to provide that
in text form on demand, so the application gets:

To: Glyph Lefkowitz 
From: "R. David Murray" 
Subject: pÃ¶stal

A simple message.

Now the application allows the user to do some manipulation of that, and
we have:

To: "R. David Murray" 
From: Glyph Lefkowitz 
Subject: Re: pÃ¶stal

A simple reply.

How does the application "make some bytes out of that text" before passing
it back to email?  The application shouldn't have to know how to do
RFC2047 encoding, certainly, that's one of the jobs of the email module.
If the application just encodes the above as UTF8, then it also has to
be calling an email API that knows it is getting bytes input that has
not been transfer-encoded, and needs to be told the encoding, so that
it can do the correct transfer encoding.  In that case why not have
the API be pass in the text, with an optional override for the default
utf-8 encoding that email will otherwise use?

Perhaps some of the disconnect here with Antoine (and Jean-Paul, on IRC)
is that the email-sig feels that the format of data handled by the email
module (rfcx822-style headers, perhaps with a body, perhaps including MIME
attachments) is of much wider utility than just handling email, and that
since the email module already has to be very liberal in what it accepts,
it isn't much of a stretch to have it handle those use cases as well (and
in Python2 it does, in the same 'most of the time' way it handles other
non-ASCII byte issues).  In that context, it seems perfectly reasonable to
expect it to parse string (unicode) headers containing non-ascii data.
In such use cases there might be no reason to encode to email RFC
wire-format, and thus an encode-to-bytes-and-tell-me-the-encoding
interface wouldn't serve the use case particularly well because the
application wouldn't want the RFC2047 encoding in the file version of
the data.

We could conceivably drop those use cases if it simplified the API and
implementation, but right now it doesn't feel like it does.  Further,
Python2 serves these use cases, because you can read the non-ascii
data and process it as binary data and it would all just work (most of
the time).  So such use cases probably do exist out in the wild (but
no, we don't have any specific pointers, though I myself was working
on such an ap once that never got to production).  If Python3 email
parses only bytes, then it could serve the use case in somewhat the
same way as Python2: the application would encode the data as, say,
utf8 and pass it to the 'wire format bytes' input interface, which would
then register a defect but otherwise pass the data along to the 'wire'
(the file in this case).  On read it would again register a defect, and
the application could pull the data out using the 'give me the wire-bytes'
interface and decode it itself.

But this feels yucky to me, like a regression to Python2's conflation
of bytes and text.  This type of application really wants to work with
unicode, not to have to futz with bytes.

> > So now we have both a bytes parser and a string parser.
> 
> Why do so many messages on this subject take this for granted?  It's
> wrong for the email module just like it's wrong for every other package.
> 
> There are plenty of other (better) ways to deal with this problem.  Let
> the application decide how to fudge the encoding of the characters back
> into bytes that can be parsed.  "In the face of ambiguity, refuse the
> temptation to guess" and all that.  The application has more of an idea
> of what's going on than the library here, so let it make encoding
> decisions.
> 
> Put another way, there's nothing wrong with having a text parser, as
> long as it just encodes the text according to some known encoding and
> then parses the bytes :).

See above for why I don't think that serves all the use cases for text
parsing.

Perhaps another difference is that in my mind *as an application
developer*, the "real" email messag

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Barry Warsaw

On Sep 16, 2010, at 09:34 PM, R. David Murray wrote:

>Say we start with this bytes input:
>
>To: Glyph Lefkowitz 
>From: "R. David Murray" 
>Subject: =?utf-8?q?p=F6stal?=
>
>A simple message.
>
>Part of the responsibility of the email module is to provide that
>in text form on demand, so the application gets:
>
>To: Glyph Lefkowitz 
>From: "R. David Murray" 
>Subject: pÃ¶stal
>
>A simple message.
>
>Now the application allows the user to do some manipulation of that,
>and we have:
>
>To: "R. David Murray" 
>From: Glyph Lefkowitz 
>Subject: Re: pÃ¶stal
>
>A simple reply.

And of course, what happens if the original subject is in one charset and the
prefix is in an incompatible one?  Then you end up with a wire format of two
RFC 2047 encoded words separated by whitespace.  You have to keep those chunks
separate all the way through to do that properly.  (I know RDM knows this. :)

>But I *am* open to being convinced otherwise.  If everyone hates the
>BytesMessage/StringMessage API design, then that should certainly not
>be what we implement in email.

Just as a point of order, to the extent that we're discussing generic
approaches to similar problems across multiple modules, it's okay that we're
having this discussion on python-dev.  But the email-sig has put in a lot of
work on specific API and implementation designs for the email package, so any
deviation really needs to be debated, discussed, and decided there.

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Steven D'Aprano

On Fri, 17 Sep 2010 11:34:26 am R. David Murray wrote:
> Perhaps another difference is that in my mind *as an application
> developer*, the "real" email message consists of unicode headers and
> message bodies, with attachments that are sometimes binary, and that
> the wire-format is this formalized encoding we have to use to be able
> to send it from place to place.  In that mental model it seems to
> make perfect sense to have a StringMessage that I have encode to
> transmit, and a BytesMessage that I receive and have to decode to
> work with.

+1


-- 
Steven D'Aprano
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Terry Reedy


On 9/16/2010 3:07 PM, Jacob Kaplan-Moss wrote:


On 16 September 2010 07:16, Terry Reedy  wrote:

I'm not working to get Django running on Python 3.1 because I don't
feel confident I'll be able to put any apps I write into production.


Why not? Since the I/O speed problem is fixed, I have no idea what you are
referring to.  Please do be concrete.



Deploying web apps under Python 2 right now is actually pretty
awesome. ...


And will remain so for years.


The key here is that switching between all of these deployment
situations is *incredibly* easy. ...



Python 3 offers me none of this. I don't have a wide variety of tools
to choose from. Worse, I don't even have a guarantee of
interoperability between the tools that *do* exist.


That last needs an updated standard, which may require a bit of nudging 
to get agreement on *something*, along with an updated reference 
implementation. I would expect a usable variety of production 
implementations to gradually follow thereafter, as they have for 2.x.



I'm sorry if I'm coming across as a complainer here.


No. You answered my question quite well.

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread R. David Murray

On Fri, 17 Sep 2010 00:05:12 +0200, Antoine Pitrou  wrote:
> On Thu, 16 Sep 2010 16:51:58 -0400
> "R. David Murray"  wrote:
> >
> > What do we store in the model?  We could say that the model is always
> > text.  But then we lose information about the original bytes message,
> > and we can't reproduce it.  For various reasons (mailman being a big one),
> > this is not acceptable.  So we could say that the model is always bytes.
> > But we want access to (for example) the header values as text, so header
> > lookup should take string keys and return string values[2].
> 
> Why can't you have both in a single class? If you create the class
> using a bytes source (a raw message sent by SMTP, for example), the
> class automatically parses and decodes it to unicode strings; if you
> create the class using an unicode source (the text body of the e-mail
> message and the list of recipients, for example), the class
> automatically creates the bytes representation.
> 
> (of course all processing can be done lazily for performance reasons)

Certainly we could do that.  There are methods, though, whose
implementation is the same except for the detail of whether they are
processing bytes or string, so the dual class structure allows that
implementation to be shared.  So even if we changed the API to be single
class, I might well retain the dual class implementation under the
hood.   I'd have to explore which looked better when the time came.

> > What about email files on disk?  They could be bytes, or they could be,
> > effectively, text (for example, utf-8 encoded). 
> 
> Such a file can be two things:
> - the raw encoding of a whole message (including headers, etc.), then
>   it should be fed as a bytes object
> - the single text body of a hypothetical message, then it should be fed
>   as a unicode object
> I don't see any possible middle-ground.

It's not a middle ground, but as I discussed in my response to Glyph,
it could be a series of headers and a body in, say, utf-8 where the
application wants to treat them as unicode, not bytes (ie: *not*
an email).  Python2 supports this use case, albeit with the same
"works most of the time" as it does with other non-ascii edge cases.

> > On disk, using utf-8,
> > one might store the text representation of the message, rather than
> > the wire-format (ASCII encoded) version.  We might want to write such
> > messages from scratch.
> 
> But then the user knows the encoding (by "user" I mean what/whoever
> calls the email API) and mentions it to the email package.

Yes?  And then?  The email package still has to parse the file, and it
can't use its normal parse-the-RFC-data parser because the file could
contain *legitimate* non-ASCII header data.  So there has to be a separate
parser for this case that will convert the non-ASCII data into RFC2047
encoded data.  At that point you have two parsers that share a bunch of
code...and my current implementation lets the input to the second parser
be text, which is the natural representation of that data, the one the
user or application writer is going to expect.  I *could* implement it
as a variant bytes parser, and have the application call the variant
parser with encoded bytes, but why?  What's the benefit?  If the API
takes text, it is *obvious* that non-ascii data is allowed and is going
to get wire-encoded.  If it takes bytesthere is more mental overhead
in figuring out which bytes-parser interface one should call, depending
on whether one has 'wire format" data or encoded non-ascii data.  I can
just imagine someone using the bytes-that-need-transfer-encoding to try
to parse a file containing RFC encoded data that he knows is stored in a
utf-8 encoded file, because that's the interface that accepts an encoding
paramter.  And then the RFC2047 encoded words wouldn't get decoded.

Overall it seems simpler to me that text file == pass text to the text
parser, RFC-encoded bytes data == pass bytes data to the bytes parser.
This also separates opening the file correctly (specify the encoding on
open) from encoding the data as you prefer (encoding specified to the
email package when telling it to encode to wire format).

> What I'm having an issue with is that you are talking about a bytes
> representation and an unicode representation of a message. But they
> aren't representations of the same things:
> - if it's a bytes representation, it will be the whole, raw message
>   including envelope / headers (also, MIME sections etc.)
> - if it's an unicode representation, it will only be a section of the
>   message decodable as such (a text/plain MIME section, for example;
>   or a decoded header value; or even a single e-mail address part of a
>   decoded header)

Conceptually, a BytesMessage is a model of the entire message with all
the parts encoded in RFC wire-format.  When you access pieces of it,
you get the RFC encoded byte strings.  Conceptually a StringMessage
is a model of the entire message with all the parts decoded as fa

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Terry Reedy

Based on the discussion so far, I think you should go ahead and 
implement the API agreed on by the mail sig both because is *has* been 
agreed on (and thinking about the wsgi discussion, that seems to be a 
major achievement) and because it seems sensible to me also, as far as I 
understand it. The proof of the API will be in the testing. As long as 
you *think* it covers all intended use cases, I am not sure that 
abstract discussion can go much further.


I do have a thought about space and duplication. For normal messages, it 
is not an issue. For megabyte (or in the future, gigabyte?) attachments, 
it is. So if possible, there should only be one extracted blob for both 
bytes and string versions of parsed messages. Or even make the 
extraction from the raw stream lazy, when specifically requested.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread R. David Murray

On Thu, 16 Sep 2010 21:53:17 -0400, Barry Warsaw  wrote:
> And of course, what happens if the original subject is in one charset and the
> prefix is in an incompatible one?  Then you end up with a wire format of two
> RFC 2047 encoded words separated by whitespace.  You have to keep those chunks
> separate all the way through to do that properly.  (I know RDM knows this. :)

Heh, my example got messed up because my current mailer didn't MIME
encode it properly.  That is, I emitted a non-RFC-compliant email :(

This is actually a pretty interesting issue in a number of ways, though
I'm not sure it relates to any other part of the stdlib.  A header can
contain encoded words encoded with different charsets.  An MUA that sorts
by subject and takes prefixes ('Re:') into account, for example, might
be decoding the header entirely before doing header matching/sorting, or
it might be matching against the RFC2047 encoded header.  Hopefully the
former, these days, but don't count on it.  So when emitting a reply, a
careful MUA would want to *only* attach the 'Re:' to the front, and not
otherwise change the header.  If it is going to do that, though, it is
going to have to (a) make sure it preserves the original bytes version
of the header and (b) refold the line if necessary.  This means knowing
lots of stuff about header encoding.  So, really, that job should be
done by the email package, or at least the email package should provide
tools to do this.

The naive way (decode the header to unicode, attach the prefix, re-encode
using your favorite charset) is going to work most of the time, and
that's what it will be easiest to do with email6.  Tacking the Re: on
the front of the bytes version of the header and having email6 refold
it will probably work about as well as it currently does in the old
email package, which is to say that sometimes the unfolded header is
otherwise unchanged, and sometimes it isn't.

> >But I *am* open to being convinced otherwise.  If everyone hates the
> >BytesMessage/StringMessage API design, then that should certainly not
> >be what we implement in email.
> 
> Just as a point of order, to the extent that we're discussing generic
> approaches to similar problems across multiple modules, it's okay that we're
> having this discussion on python-dev.  But the email-sig has put in a lot of
> work on specific API and implementation designs for the email package, so any
> deviation really needs to be debated, discussed, and decided there.

I am also finding it useful to have the API exposed to a wider audience
for feedback, but I agree, any substantive change would need to be
discussed on the email-sig, not here.

--David
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread R. David Murray

On Thu, 16 Sep 2010 23:45:12 -0400, Terry Reedy  wrote:
> Based on the discussion so far, I think you should go ahead and 
> implement the API agreed on by the mail sig both because is *has* been 
> agreed on (and thinking about the wsgi discussion, that seems to be a 
> major achievement) and because it seems sensible to me also, as far as I 
> understand it. The proof of the API will be in the testing. As long as 
> you *think* it covers all intended use cases, I am not sure that 
> abstract discussion can go much further.
> 
> I do have a thought about space and duplication. For normal messages, it 
> is not an issue. For megabyte (or in the future, gigabyte?) attachments, 
> it is. So if possible, there should only be one extracted blob for both 
> bytes and string versions of parsed messages. Or even make the 
> extraction from the raw stream lazy, when specifically requested.

Our intent is to have conversions be as lazy as possible.  There will
doubtless be some interesting heuristics to develop as to what to convert
when and what to cache when, and consequent problems to solve when it
comes to garbage collection...

There's also slated to be a back-end API for storing parts of messages
elsewhere than in memory, though I haven't worked out what that is going
to look like yet.

But we are definitely getting off topic now :)

--David
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

51 matches

Mail list logo