Re: [Python-Dev] (Not) delaying the 3.2 release
> Why not? Since the I/O speed problem is fixed, I have no idea what you > are referring to. Please do be concrete. There's still a performance issue with pickling, but if issue 3873 could be resolved, Python 3 would actually be faster there. - Hagen signature.asc Description: OpenPGP digital signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] standards for distribution names
Hi All, Following on from this question: http://twistedmatrix.com/pipermail/twisted-python/2010-September/022877.html ...I'd thought that the "correct names" for distributions would have been documented in one of: http://www.python.org/dev/peps/pep-0345 http://www.python.org/dev/peps/pep-0376 http://www.python.org/dev/peps/pep-0386 ...but having read them, I drew a blank. Where are the standards for this or is it still a case of "whatever setuptools does"? Chris ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Wed, 15 Sep 2010 19:55:16 -0500 Jacob Kaplan-Moss wrote: > On Wed, Sep 15, 2010 at 6:31 PM, Jesse Noller wrote: > > My goal (personally) is to make sure python 3.2 is perfectly good for use > > in web applications, and is therefore a much more interesting porting > > target for web projects/libraries and frameworks. > > To try (again) to make things concrete here: > > I didn't work to get Django running on Python 3.0 because it was just too > slow. > > I'm not working to get Django running on Python 3.1 because I don't > feel confident I'll be able to put any apps I write into production. > > If Python 3.2 is the same, I won't feel any motivation to target it > and I'll get to be lazy and wait for Python 3.3. Why won't you feel confident? Are there any specific issues (apart from the lack of a WSGI PEP)? If they are technical problems, they should be reported on the bug tracker. If they are representational, cultural or psychological issues, I'm not sure what we can do. But delaying the release won't solve them. Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Thu, Sep 16, 2010 at 8:26 AM, Antoine Pitrou wrote: > On Wed, 15 Sep 2010 19:55:16 -0500 > Jacob Kaplan-Moss wrote: >> On Wed, Sep 15, 2010 at 6:31 PM, Jesse Noller wrote: >> > My goal (personally) is to make sure python 3.2 is perfectly good for use >> > in web applications, and is therefore a much more interesting porting >> > target for web projects/libraries and frameworks. >> >> To try (again) to make things concrete here: >> >> I didn't work to get Django running on Python 3.0 because it was just too >> slow. >> >> I'm not working to get Django running on Python 3.1 because I don't >> feel confident I'll be able to put any apps I write into production. >> >> If Python 3.2 is the same, I won't feel any motivation to target it >> and I'll get to be lazy and wait for Python 3.3. > > Why won't you feel confident? Are there any specific issues (apart from > the lack of a WSGI PEP)? > If they are technical problems, they should be reported on the bug > tracker. > If they are representational, cultural or psychological issues, I'm > not sure what we can do. But delaying the release won't solve them. > > Regards > > Antoine. Can we please give it a little bit of time to hear from the WSGI / Web-Sig folks? I've encouraged bugs to be filed, and discussions to happen here so we know if things (and what those things are), should be fixed. Is there any need, other then our current schedule to push 3.2 out until we can at least get some feedback from the interested parties? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Thu, Sep 16, 2010 at 10:26 PM, Antoine Pitrou wrote:
> Why won't you feel confident? Are there any specific issues (apart from
> the lack of a WSGI PEP)?
> If they are technical problems, they should be reported on the bug
> tracker.
> If they are representational, cultural or psychological issues, I'm
> not sure what we can do. But delaying the release won't solve them.
There are some APIs that should be able to handle bytes *or* strings,
but the current use of string literals in their implementation means
that bytes don't work. This turns out to be a PITA for some networking
related code which really wants to be working with raw bytes (e.g.
URLs coming off the wire).
For example:
>>> import urllib.parse as parse
>>> parse.urlsplit("http://www.ubuntu.com";)
SplitResult(scheme='http', netloc='www.ubuntu.com', path='', query='',
fragment='')
>>> parse.urlsplit(b"http://www.ubuntu.com";)
Traceback (most recent call last):
File "", line 1, in
File "/home/ncoghlan/devel/py3k/Lib/urllib/parse.py", line 178, in urlsplit
i = url.find(':')
TypeError: expected an object with the buffer interface
There's no real reason urlsplit (and similar urllib.parse APIs)
shouldn't support bytes, but the internal use of string literals
currently prevents it.
We don't seem to have created a tracker issue from the discussion back
in June where this came up, so I went ahead and created one just now:
http://bugs.python.org/issue9873
I think there were other APIs mentioned back then beyond the
urllib.parse ones, but I didn't find them when I went trawling through
the list archives yesterday. If anyone else thinks of any APIs that
should allow bytes as well as strings (or vice-versa) feel free to add
them to that issue.
Cheers,
Nick.
--
Nick Coghlan | [email protected] | Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote: >There are some APIs that should be able to handle bytes *or* strings, >but the current use of string literals in their implementation means >that bytes don't work. This turns out to be a PITA for some networking >related code which really wants to be working with raw bytes (e.g. >URLs coming off the wire). Note that email has exactly the same problem. A general solution -- even if embodied in *well documented* best-practices and convention -- would really help make the stdlib work consistently, and I bet third party libraries too. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On 16 September 2010 07:16, Terry Reedy wrote: >> I'm not working to get Django running on Python 3.1 because I don't >> feel confident I'll be able to put any apps I write into production. > > Why not? Since the I/O speed problem is fixed, I have no idea what you are > referring to. Please do be concrete. At the risk of putting words into Jacob's mouth, I understood him to mean that "production quality" WSGI servers either do not exist, or do not implement a consistently defined spec (i.e., everyone is doing their own thing to adapt WSGI to Python 3). There is something of a chicken and egg situation here as with everywhere else (scientific users weren't moving until scipy did, lots of projects based round Twisted can't go until Twisted does, ...) but in the case of web/WSGI, there's a standard, defined in a PEP, with a reference implementation (wsgiref) in the stdlib. So the core has a greater interest. Personally, I don't write web applications (not even in Python :-)) so my interest is minimal. But I think the issue is real, and it's valid for the core team to be concerned. Whether I'd want to delay 3.2, I'm not so sure - certainly not indefinitely, there should be a "put up or shut up" deadline. But I'd be sad if Python 3 saw a reversion to the days of "Python isn't a good web development language because there's no standard infrastructure" comments that was the situation before WSGI existed... Paul. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Thu, 16 Sep 2010 09:52:48 -0400, Barry Warsaw wrote: > On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote: > >There are some APIs that should be able to handle bytes *or* strings, > >but the current use of string literals in their implementation means > >that bytes don't work. This turns out to be a PITA for some networking > >related code which really wants to be working with raw bytes (e.g. > >URLs coming off the wire). > > Note that email has exactly the same problem. A general solution -- even if > embodied in *well documented* best-practices and convention -- would really > help make the stdlib work consistently, and I bet third party libraries too. Allowing bytes-in -> bytes-out where possible would definitely be a help (and Guido has endorsed this, IIUC), but some care has to be taken to understand the API contract of the method in question before blindly applying it. Are you "merely" allowing bytes to be processed as ASCII strings, or does processing the bytes *correctly* imply that you are converting from an ASCII encoding of text in order to process it? In Python2, the latter might not generate unicode yet still produce a correct result most of the time, but a big point of Python3 is to eliminate that "most of the time", so we need to be careful not to reintroduce it. This was all covered in the thread Nick refers to; I just want to emphasize that one needs to look at the API contract carefully before making it polymorphic (in Guido's sense of the term). If the way to do this is well documented best practices, we first have to figure out what those best practices are. To do that we have to write some real-world code. I'm trying one approach in email6: Bytes and String subclasses, where the subclasses have an attribute named 'literals' derived from a utility module that does this: literals = dict( empty = '', colon = ':', newline = '\n', space = ' ', tab = '\t', fws = ' \t', headersep = ': ', ) class _string_literals: pass class _bytes_literals: pass for name, value in literals.items(): setattr(_string_literals, name, value) setattr(_bytes_literals, name, bytes(value, 'ASCII')) del literals, name, value And the subclasses do: class BytesHeader(BaseHeader): lit = email.utils._bytes_literals class StringHeader(BaseHeader): lit = email.utils._string_literals And then BaseHeader uses self.lit.colon, etc, when manipulating strings. It also has to use slice notation rather than indexing when looking at individual characters, which is a PITA but not terrible. I'm not saying this is the best approach, since this is all experimental code at the moment, but it is *an* approach -- R. David Murray www.bitdance.com ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] r84771 - python/branches/release27-maint/Lib/test/test_io.py
Maybe you want to mention *who* warns? Georg Am 13.09.2010 10:20, schrieb florent.xicluna: > Author: florent.xicluna > Date: Mon Sep 13 10:20:19 2010 > New Revision: 84771 > > Log: > Silence warning about 1/0 > > Modified: >python/branches/release27-maint/Lib/test/test_io.py > > Modified: python/branches/release27-maint/Lib/test/test_io.py > == > --- python/branches/release27-maint/Lib/test/test_io.py (original) > +++ python/branches/release27-maint/Lib/test/test_io.py Mon Sep 13 > 10:20:19 2010 > @@ -2484,7 +2484,7 @@ > signal.signal(signal.SIGALRM, self.oldalrm) > > def alarm_interrupt(self, sig, frame): > -1/0 > +1 // 0 > > @unittest.skipUnless(threading, 'Threading required for this test.') > def check_interrupted_write(self, item, bytes, **fdopen_kwargs): -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] r84771 - python/branches/release27-maint/Lib/test/test_io.py
On Thu, 16 Sep 2010 17:27:50 +0200 Georg Brandl wrote: > Maybe you want to mention *who* warns? I suppose it's the -3 flag: $ ~/cpython/27/python -3 -c "1/0" -c:1: DeprecationWarning: classic int division Traceback (most recent call last): File "", line 1, in ZeroDivisionError: integer division or modulo by zero ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] r84847 - python/branches/py3k/Doc/library/re.rst
That reminds me of the undocumented re.Scanner -- which is meant to do
exactly this. Wouldn't it be about time to document or remove it?
Georg
Am 16.09.2010 14:02, schrieb raymond.hettinger:
> Author: raymond.hettinger
> Date: Thu Sep 16 14:02:17 2010
> New Revision: 84847
>
> Log:
> Add tokenizer example to regex docs.
>
> Modified:
>python/branches/py3k/Doc/library/re.rst
>
> Modified: python/branches/py3k/Doc/library/re.rst
> ==
> --- python/branches/py3k/Doc/library/re.rst (original)
> +++ python/branches/py3k/Doc/library/re.rst Thu Sep 16 14:02:17 2010
> @@ -1282,3 +1282,66 @@
> <_sre.SRE_Match object at ...>
> >>> re.match("", r"\\")
> <_sre.SRE_Match object at ...>
> +
> +
> +Writing a Tokenizer
> +^^^
> +
> +A `tokenizer or scanner `_
> +analyzes a string to categorize groups of characters. This is a useful first
> +step in writing a compiler or interpreter.
> +
> +The text categories are specified with regular expressions. The technique is
> +to combine those into a single master regular expression and to loop over
> +successive matches::
> +
> +Token = collections.namedtuple('Token', 'typ value line column')
> +
> +def tokenize(s):
> +tok_spec = [
> +('NUMBER', r'\d+(.\d+)?'), # Integer or decimal number
> +('ASSIGN', r':='), # Assignment operator
> +('END', ';'), # Statement terminator
> +('ID', r'[A-Za-z]+'), # Identifiers
> +('OP', r'[+*\/\-]'),# Arithmetic operators
> +('NEWLINE', r'\n'), # Line endings
> +('SKIP', r'[ \t]'), # Skip over spaces and tabs
> +]
> +tok_re = '|'.join('(?P<%s>%s)' % pair for pair in tok_spec)
> +gettok = re.compile(tok_re).match
> +line = 1
> +pos = line_start = 0
> +mo = gettok(s)
> +while mo is not None:
> +typ = mo.lastgroup
> +if typ == 'NEWLINE':
> +line_start = pos
> +line += 1
> +elif typ != 'SKIP':
> +yield Token(typ, mo.group(typ), line, mo.start()-line_start)
> +pos = mo.end()
> +mo = gettok(s, pos)
> +if pos != len(s):
> +raise RuntimeError('Unexpected character %r on line %d'
> %(s[pos], line))
> +
> +>>> statements = '''\
> +total := total + price * quantity;
> +tax := price * 0.05;
> +'''
> +>>> for token in tokenize(statements):
> +... print(token)
> +...
> +Token(typ='ID', value='total', line=1, column=8)
> +Token(typ='ASSIGN', value=':=', line=1, column=14)
> +Token(typ='ID', value='total', line=1, column=17)
> +Token(typ='OP', value='+', line=1, column=23)
> +Token(typ='ID', value='price', line=1, column=25)
> +Token(typ='OP', value='*', line=1, column=31)
> +Token(typ='ID', value='quantity', line=1, column=33)
> +Token(typ='END', value=';', line=1, column=41)
> +Token(typ='ID', value='tax', line=2, column=9)
> +Token(typ='ASSIGN', value=':=', line=2, column=13)
> +Token(typ='ID', value='price', line=2, column=16)
> +Token(typ='OP', value='*', line=2, column=22)
> +Token(typ='NUMBER', value='0.05', line=2, column=24)
> +Token(typ='END', value=';', line=2, column=28)
--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Thu, Sep 16, 2010 at 09:52:48AM -0400, Barry Warsaw wrote: > On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote: > > >There are some APIs that should be able to handle bytes *or* strings, > >but the current use of string literals in their implementation means > >that bytes don't work. This turns out to be a PITA for some networking > >related code which really wants to be working with raw bytes (e.g. > >URLs coming off the wire). > > Note that email has exactly the same problem. A general solution -- even if > embodied in *well documented* best-practices and convention -- would really > help make the stdlib work consistently, and I bet third party libraries too. > I too await a solution with abated breath :-) I've been working on documenting best practices for APIs and Unicode and for this type of function (take bytes or unicode and output the same type), knowing the encoding is seems like a requirement in most cases: http://packages.python.org/kitchen/designing-unicode-apis.html#take-either-bytes-or-unicode-output-the-same-type I'd love to add another strategy there that shows how you can robustly operate on bytes without knowing the encoding but from writing that, I think that anytime you simplify your API you have to accept limitations on the data you can take in. (For instance, some simplifications can handle anything except ASCII-incompatible encodings). -Toshio pgpAJSHDGRHtD.pgp Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Thu, 16 Sep 2010 11:30:12 -0400 "R. David Murray" wrote: > > And then BaseHeader uses self.lit.colon, etc, when manipulating strings. > It also has to use slice notation rather than indexing when looking at > individual characters, which is a PITA but not terrible. > > I'm not saying this is the best approach, since this is all experimental > code at the moment, but it is *an* approach Out of curiousity, can you explain why polymorphism is needed for e-mail? I would assume that headers are bytes until they are parsed, at which point they become a pair of unicode strings (one for the header name and one for its value). Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] r84847 - python/branches/py3k/Doc/library/re.rst
On 16/09/2010 16:37, Georg Brandl wrote:
That reminds me of the undocumented re.Scanner -- which is meant to do
exactly this. Wouldn't it be about time to document or remove it?
There was a long discussion about this on the bug tracker (the
suggestion to document it was rejected at the time).
http://bugs.python.org/issue5337
Michael Foord
Georg
Am 16.09.2010 14:02, schrieb raymond.hettinger:
Author: raymond.hettinger
Date: Thu Sep 16 14:02:17 2010
New Revision: 84847
Log:
Add tokenizer example to regex docs.
Modified:
python/branches/py3k/Doc/library/re.rst
Modified: python/branches/py3k/Doc/library/re.rst
==
--- python/branches/py3k/Doc/library/re.rst (original)
+++ python/branches/py3k/Doc/library/re.rst Thu Sep 16 14:02:17 2010
@@ -1282,3 +1282,66 @@
<_sre.SRE_Match object at ...>
>>> re.match("", r"\\")
<_sre.SRE_Match object at ...>
+
+
+Writing a Tokenizer
+^^^
+
+A `tokenizer or scanner`_
+analyzes a string to categorize groups of characters. This is a useful first
+step in writing a compiler or interpreter.
+
+The text categories are specified with regular expressions. The technique is
+to combine those into a single master regular expression and to loop over
+successive matches::
+
+Token = collections.namedtuple('Token', 'typ value line column')
+
+def tokenize(s):
+tok_spec = [
+('NUMBER', r'\d+(.\d+)?'), # Integer or decimal number
+('ASSIGN', r':='), # Assignment operator
+('END', ';'), # Statement terminator
+('ID', r'[A-Za-z]+'), # Identifiers
+('OP', r'[+*\/\-]'),# Arithmetic operators
+('NEWLINE', r'\n'), # Line endings
+('SKIP', r'[ \t]'), # Skip over spaces and tabs
+]
+tok_re = '|'.join('(?P<%s>%s)' % pair for pair in tok_spec)
+gettok = re.compile(tok_re).match
+line = 1
+pos = line_start = 0
+mo = gettok(s)
+while mo is not None:
+typ = mo.lastgroup
+if typ == 'NEWLINE':
+line_start = pos
+line += 1
+elif typ != 'SKIP':
+yield Token(typ, mo.group(typ), line, mo.start()-line_start)
+pos = mo.end()
+mo = gettok(s, pos)
+if pos != len(s):
+raise RuntimeError('Unexpected character %r on line %d' %(s[pos],
line))
+
+>>> statements = '''\
+total := total + price * quantity;
+tax := price * 0.05;
+'''
+>>> for token in tokenize(statements):
+... print(token)
+...
+Token(typ='ID', value='total', line=1, column=8)
+Token(typ='ASSIGN', value=':=', line=1, column=14)
+Token(typ='ID', value='total', line=1, column=17)
+Token(typ='OP', value='+', line=1, column=23)
+Token(typ='ID', value='price', line=1, column=25)
+Token(typ='OP', value='*', line=1, column=31)
+Token(typ='ID', value='quantity', line=1, column=33)
+Token(typ='END', value=';', line=1, column=41)
+Token(typ='ID', value='tax', line=2, column=9)
+Token(typ='ASSIGN', value=':=', line=2, column=13)
+Token(typ='ID', value='price', line=2, column=16)
+Token(typ='OP', value='*', line=2, column=22)
+Token(typ='NUMBER', value='0.05', line=2, column=24)
+Token(typ='END', value=';', line=2, column=28)
--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog
READ CAREFULLY. By accepting and reading this email you agree, on behalf of
your employer, to release me from all obligations and waivers arising from any
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap,
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in perpetuity, without
prejudice to my ongoing rights and privileges. You further represent that you
have the authority to release me from any BOGUS AGREEMENTS on behalf of your
employer.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Thu, Sep 16, 2010 at 8:42 AM, Toshio Kuratomi wrote: > On Thu, Sep 16, 2010 at 09:52:48AM -0400, Barry Warsaw wrote: >> On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote: >> >> >There are some APIs that should be able to handle bytes *or* strings, >> >but the current use of string literals in their implementation means >> >that bytes don't work. This turns out to be a PITA for some networking >> >related code which really wants to be working with raw bytes (e.g. >> >URLs coming off the wire). >> >> Note that email has exactly the same problem. A general solution -- even if >> embodied in *well documented* best-practices and convention -- would really >> help make the stdlib work consistently, and I bet third party libraries too. >> > I too await a solution with abated breath :-) I've been working on > documenting best practices for APIs and Unicode and for this type of > function (take bytes or unicode and output the same type), knowing the > encoding is seems like a requirement in most cases: > > http://packages.python.org/kitchen/designing-unicode-apis.html#take-either-bytes-or-unicode-output-the-same-type > > I'd love to add another strategy there that shows how you can robustly > operate on bytes without knowing the encoding but from writing that, I think > that anytime you simplify your API you have to accept limitations on the > data you can take in. (For instance, some simplifications can handle > anything except ASCII-incompatible encodings). In all cases I can imagine where such polymorphic functions make sense, the necessary and sufficient assumption should be that the encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all Latin-N variant, and AFAIK also the popular CJK encodings other than UTF-16. This is the same assumption made by Python's byte type when you use "character-based" methods like lower(). --Guido __ (*) In my mind ASCII and 7-bit are synonymous, but unfortunately there are droves of naive users who believe that ASCII includes all 256 possible 8-bit bytes using some encoding -- typically the default encoding of their DOS or Windows box. :-( -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On 16/09/2010, Guido van Rossum wrote:
>
> In all cases I can imagine where such polymorphic functions make
> sense, the necessary and sufficient assumption should be that the
> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
> Latin-N variant, and AFAIK also the popular CJK encodings other than
> UTF-16. This is the same assumption made by Python's byte type when
> you use "character-based" methods like lower().
Well, depends on what exactly you're doing, it's pretty easy to go wrong:
Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> os.path.split("C:\\十")
('C:\\', '十')
>>> os.path.split("C:\\十".encode(sys.getfilesystemencoding()))
(b'C:\\\x8f', b'')
Similar things can catch out web developers once they step outside the
percent encoding.
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Thu, Sep 16, 2010 at 10:46 AM, Martin (gzlist) wrote:
> On 16/09/2010, Guido van Rossum wrote:
>>
>> In all cases I can imagine where such polymorphic functions make
>> sense, the necessary and sufficient assumption should be that the
>> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
>> Latin-N variant, and AFAIK also the popular CJK encodings other than
>> UTF-16. This is the same assumption made by Python's byte type when
>> you use "character-based" methods like lower().
>
> Well, depends on what exactly you're doing, it's pretty easy to go wrong:
>
> Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
import os, sys
os.path.split("C:\\十")
> ('C:\\', '十')
os.path.split("C:\\十".encode(sys.getfilesystemencoding()))
> (b'C:\\\x8f', b'')
>
> Similar things can catch out web developers once they step outside the
> percent encoding.
Well, that character is not 7-bit ASCII. Of course things will go
wrong there. That's the whole point of what I said, isn't it?
--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standards for distribution names
At 12:08 PM 9/16/2010 +0100, Chris Withers wrote: Following on from this question: http://twistedmatrix.com/pipermail/twisted-python/2010-September/022877.html ...I'd thought that the "correct names" for distributions would have been documented in one of: ... Where are the standards for this or is it still a case of "whatever setuptools does"? Actually, in this case, it's "whatever distutils does". If you don't build your .exe's with Distutils, or if you rename them after the fact, then setuptools won't recognize them as things it can consume. FYI, Twisted has a long history of releasing distribution files that are either built using non-distutils tools or else renamed after being built. Note, too, that if the Windows exe's they're providing aren't built by the distutils bdist_wininst command, then setuptools is probably not going to be able to consume them, no matter what they're called. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Thu, Sep 16, 2010 at 10:56:56AM -0700, Guido van Rossum wrote:
> On Thu, Sep 16, 2010 at 10:46 AM, Martin (gzlist)
> wrote:
> > On 16/09/2010, Guido van Rossum wrote:
> >>
> >> In all cases I can imagine where such polymorphic functions make
> >> sense, the necessary and sufficient assumption should be that the
> >> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
> >> Latin-N variant, and AFAIK also the popular CJK encodings other than
> >> UTF-16. This is the same assumption made by Python's byte type when
> >> you use "character-based" methods like lower().
> >
> > Well, depends on what exactly you're doing, it's pretty easy to go wrong:
> >
> > Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on
> > win32
> > Type "help", "copyright", "credits" or "license" for more information.
> import os, sys
> os.path.split("C:\\十")
> > ('C:\\', '十')
> os.path.split("C:\\十".encode(sys.getfilesystemencoding()))
> > (b'C:\\\x8f', b'')
> >
> > Similar things can catch out web developers once they step outside the
> > percent encoding.
>
> Well, that character is not 7-bit ASCII. Of course things will go
> wrong there. That's the whole point of what I said, isn't it?
>
You were talking about encodings that were supersets of 7-bit ASCII.
I think Martin was demonstrating a byte string that was a superset of 7-bit
ASCII being fed to a stdlib function which went wrong.
-Toshio
pgpTUIwKWOepG.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standards for distribution names
At 12:08 PM 9/16/2010 +0100, Chris Withers wrote: ...I'd thought that the "correct names" for distributions would have been documented in one of: http://www.python.org/dev/peps/pep-0345 http://www.python.org/dev/peps/pep-0376 http://www.python.org/dev/peps/pep-0386 ...but having read them, I drew a blank. Forgot to mention: see distinfo_dirname() in PEP 376 for an explanation of distribution-name normalization. (Case-insensitivity and os-specific case handling is not addressed in the PEPs, though, AFAICT.) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python 2.7 Won't Build
I am trying to rebujild the 2.7 maintenance branch and get this error on Ubuntu 10.04.1 LTS: XXX lineno: 743, opcode: 0 Traceback (most recent call last): File "/usr/local/src/python-2.7-maint-svn/Lib/site.py", line 62, in import os File "/usr/local/src/python-2.7-maint-svn/Lib/os.py", line 743, in def urandom(n): SystemError: unknown opcode I installed it successfully once so I may be getting conflicts, but I can't figure out why. There were some similar bugs reported in previous versions but I didn't see a clear solution. I have done "make distclean" and "./configure". I have unset my PYTHONPATH and LD_LIBRARY_PATH, but python2.7 is my default python. I guess my next step will be to manually remove the installed python 2.7 unless I hear some other suggestions. And I will file a bug report soon unless that is inappropriate. Thanks, -Tom Thomas M. Browder, Jr. Niceville, Florida USA ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Thu, Sep 16, 2010 at 06:28, Nick Coghlan wrote:
> On Thu, Sep 16, 2010 at 10:26 PM, Antoine Pitrou wrote:
>> Why won't you feel confident? Are there any specific issues (apart from
>> the lack of a WSGI PEP)?
>> If they are technical problems, they should be reported on the bug
>> tracker.
>> If they are representational, cultural or psychological issues, I'm
>> not sure what we can do. But delaying the release won't solve them.
>
> There are some APIs that should be able to handle bytes *or* strings,
> but the current use of string literals in their implementation means
> that bytes don't work. This turns out to be a PITA for some networking
> related code which really wants to be working with raw bytes (e.g.
> URLs coming off the wire).
>
> For example:
>
import urllib.parse as parse
parse.urlsplit("http://www.ubuntu.com";)
> SplitResult(scheme='http', netloc='www.ubuntu.com', path='', query='',
> fragment='')
parse.urlsplit(b"http://www.ubuntu.com";)
> Traceback (most recent call last):
> File "", line 1, in
> File "/home/ncoghlan/devel/py3k/Lib/urllib/parse.py", line 178, in urlsplit
> i = url.find(':')
> TypeError: expected an object with the buffer interface
>
> There's no real reason urlsplit (and similar urllib.parse APIs)
> shouldn't support bytes, but the internal use of string literals
> currently prevents it.
>
> We don't seem to have created a tracker issue from the discussion back
> in June where this came up, so I went ahead and created one just now:
> http://bugs.python.org/issue9873
When I do my two months of PSF-sponsored core work (expected to be
Jan/Feb) I was planning on (finally) redoing the dev docs, writing a
HOWTO for maintaining a Python 2/3 code base, and cleaning up the test
suite. But I am starting to think I should change the last one to
solving this polymorphism problem in a way that can be applied across
the board in the stdlib.
>
> I think there were other APIs mentioned back then beyond the
> urllib.parse ones, but I didn't find them when I went trawling through
> the list archives yesterday. If anyone else thinks of any APIs that
> should allow bytes as well as strings (or vice-versa) feel free to add
> them to that issue.
Or create separate issues and make them dependencies for issue9873.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.7 Won't Build
Go ahead and file the bug, but chances are that some other installed Python is executing the code and picking up the .pyc files which have bytecode new to Python 2.7. On Thu, Sep 16, 2010 at 11:41, Tom Browder wrote: > I am trying to rebujild the 2.7 maintenance branch and get this error > on Ubuntu 10.04.1 LTS: > > XXX lineno: 743, opcode: 0 > Traceback (most recent call last): > File "/usr/local/src/python-2.7-maint-svn/Lib/site.py", line 62, in > import os > File "/usr/local/src/python-2.7-maint-svn/Lib/os.py", line 743, in > def urandom(n): > SystemError: unknown opcode > > I installed it successfully once so I may be getting conflicts, but I > can't figure out why. There were some similar bugs reported in > previous versions but I didn't see a clear solution. > > I have done "make distclean" and "./configure". I have unset my > PYTHONPATH and LD_LIBRARY_PATH, but python2.7 is my default python. > > I guess my next step will be to manually remove the installed python > 2.7 unless I hear some other suggestions. > > And I will file a bug report soon unless that is inappropriate. > > Thanks, > > -Tom > > Thomas M. Browder, Jr. > Niceville, Florida > USA > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.7 Won't Build
On Thu, Sep 16, 2010 at 13:48, Brett Cannon wrote: > Go ahead and file the bug, but chances are that some other installed > Python is executing the code and picking up the .pyc files which have > bytecode new to Python 2.7. But isn't that a problem with the build system? It seems to me it should be using all modules from within the build, thus there should be no such error. Regards, -Tom ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Thu, Sep 16, 2010 at 9:59 AM, Paul Moore wrote: > On 16 September 2010 07:16, Terry Reedy wrote: >>> I'm not working to get Django running on Python 3.1 because I don't >>> feel confident I'll be able to put any apps I write into production. >> >> Why not? Since the I/O speed problem is fixed, I have no idea what you are >> referring to. Please do be concrete. > > At the risk of putting words into Jacob's mouth, I understood him to > mean that "production quality" WSGI servers either do not exist, or do > not implement a consistently defined spec (i.e., everyone is doing > their own thing to adapt WSGI to Python 3). Yup, exactly. Deploying web apps under Python 2 right now is actually pretty awesome. There's a clear leader in mod_wsgi that's fast, stable, easy to use, and under active development. There's a few great lightweight pure-Python servers, some new-hotness (Gunicorn) and some tried-and-true (CherryPy). There's a fast-as-hell bleeding-edge option (nginx + uwsgi). And those are just the ones I've successfully put into production -- there're still *more* options if one of those won't cut it. The key here is that switching between all of these deployment situations is *incredibly* easy. Actually, this very afternoon I'm planning to experiment with a switch from mod_wsgi to gunicon. I'm confident enough with the inter-op that I'm going to make the switch on a production web server, monitor it for a bit, then switch back. I've budgeted an hour for this, and I'll probably end up spending half that time playing Minecraft while I gather statistics. Python 3 offers me none of this. I don't have a wide variety of tools to choose from. Worse, I don't even have a guarantee of interoperability between the tools that *do* exist. --- I'm sorry if I'm coming across as a complainer here. It's a frustrating situation for me: I want to start using Python 3, but until there's a working web stack waiting for me I just can't justify the time. And unfortunately I'm just not familiar enough with the problem(s) to have any real shot at working towards a solution, and I'm *certainly* not enough of an expert to work on a PEP or spec. So all I can really do is agitate. Jacob ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On Thu, Sep 16, 2010 at 11:16 AM, Toshio Kuratomi wrote:
> On Thu, Sep 16, 2010 at 10:56:56AM -0700, Guido van Rossum wrote:
>> On Thu, Sep 16, 2010 at 10:46 AM, Martin (gzlist)
>> wrote:
>> > On 16/09/2010, Guido van Rossum wrote:
>> >>
>> >> In all cases I can imagine where such polymorphic functions make
>> >> sense, the necessary and sufficient assumption should be that the
>> >> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
>> >> Latin-N variant, and AFAIK also the popular CJK encodings other than
>> >> UTF-16. This is the same assumption made by Python's byte type when
>> >> you use "character-based" methods like lower().
>> >
>> > Well, depends on what exactly you're doing, it's pretty easy to go wrong:
>> >
>> > Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on
>> > win32
>> > Type "help", "copyright", "credits" or "license" for more information.
>> import os, sys
>> os.path.split("C:\\十")
>> > ('C:\\', '十')
>> os.path.split("C:\\十".encode(sys.getfilesystemencoding()))
>> > (b'C:\\\x8f', b'')
>> >
>> > Similar things can catch out web developers once they step outside the
>> > percent encoding.
>>
>> Well, that character is not 7-bit ASCII. Of course things will go
>> wrong there. That's the whole point of what I said, isn't it?
>>
> You were talking about encodings that were supersets of 7-bit ASCII.
> I think Martin was demonstrating a byte string that was a superset of 7-bit
> ASCII being fed to a stdlib function which went wrong.
Whoops, sorry. I don't have access to Windows so I can't reproduce
this though. I also don't understand it. What is the Unicode codepoint
for that 十 character? What is sys.getfilesystemencoding()? What is the
value of "C:\\十".encode(sys.getfilesystemencoding())?
--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.7 Won't Build
Please file the bug and it can be discussed further there. On Thu, Sep 16, 2010 at 12:05, Tom Browder wrote: > On Thu, Sep 16, 2010 at 13:48, Brett Cannon wrote: >> Go ahead and file the bug, but chances are that some other installed >> Python is executing the code and picking up the .pyc files which have >> bytecode new to Python 2.7. > > But isn't that a problem with the build system? It seems to me it > should be using all modules from within the build, thus there should > be no such error. > > Regards, > > -Tom > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.7 Won't Build
On Sep 16, 2010, at 01:41 PM, Tom Browder wrote: >I am trying to rebujild the 2.7 maintenance branch and get this error >on Ubuntu 10.04.1 LTS: I just tried this on my vanilla 10.04.1 system. I checked out release27-maint ran configure && make. It built without problem. >XXX lineno: 743, opcode: 0 >Traceback (most recent call last): > File "/usr/local/src/python-2.7-maint-svn/Lib/site.py", line 62, in > import os > File "/usr/local/src/python-2.7-maint-svn/Lib/os.py", line 743, in > def urandom(n): >SystemError: unknown opcode > >I installed it successfully once so I may be getting conflicts, but I >can't figure out why. There were some similar bugs reported in >previous versions but I didn't see a clear solution. I installed Python 2.7 to /usr/local, then did a make distclean, configure, make. Again, successfully. >I have done "make distclean" and "./configure". I have unset my >PYTHONPATH and LD_LIBRARY_PATH, but python2.7 is my default python. > >I guess my next step will be to manually remove the installed python >2.7 unless I hear some other suggestions. When you say "installed python 2.7" do you mean the one you installed to /usr/local from a from-source build, or something else (e.g. a Python 2.7 package perhaps)? >And I will file a bug report soon unless that is inappropriate. Sure. Please +nosy me. But I think something else is going on. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On 16/09/2010, Guido van Rossum wrote: > On Thu, Sep 16, 2010 at 11:16 AM, Toshio Kuratomi > wrote: >> You were talking about encodings that were supersets of 7-bit ASCII. >> I think Martin was demonstrating a byte string that was a superset of >> 7-bit >> ASCII being fed to a stdlib function which went wrong. > > Whoops, sorry. I don't have access to Windows so I can't reproduce > this though. I also don't understand it. What is the Unicode codepoint > for that 十 character? What is sys.getfilesystemencoding()? What is the > value of "C:\\十".encode(sys.getfilesystemencoding())? My fault, should have been clearer. I was trying to demonstrate that there's a difference between the unix-friendly encodings like UTF-8 and the EUC codecs which only use high-bit characters for non-ascii text, and the ISO-2022 codecs and Shift JIS. In the example I gave, 十 encodes in CP932 as '\x8f\\', and the function gets confused by the second byte. Obviously the right answer there is just to use unicode, rather than write a function that works with weird multibyte codecs. Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.7 Won't Build
On Thu, Sep 16, 2010 at 14:36, Barry Warsaw wrote: > On Sep 16, 2010, at 01:41 PM, Tom Browder wrote: > >>I am trying to rebujild the 2.7 maintenance branch and get this error >>on Ubuntu 10.04.1 LTS: > > I just tried this on my vanilla 10.04.1 system. I checked out release27-maint > ran configure && make. It built without problem. > >>XXX lineno: 743, opcode: 0 >>Traceback (most recent call last): >> File "/usr/local/src/python-2.7-maint-svn/Lib/site.py", line 62, in >> import os >> File "/usr/local/src/python-2.7-maint-svn/Lib/os.py", line 743, in >> def urandom(n): >>SystemError: unknown opcode ... > When you say "installed python 2.7" do you mean the one you installed to > /usr/local from a from-source build, or something else (e.g. a Python 2.7 > package perhaps)? It was the released source tarball for 2.7, and I get the same error when I try it from that directory. -Tom Thomas M. Browder, Jr. Niceville, Florida USA ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On 9/16/2010 3:07 PM, Jacob Kaplan-Moss wrote: > On Thu, Sep 16, 2010 at 9:59 AM, Paul Moore wrote: >> On 16 September 2010 07:16, Terry Reedy wrote: I'm not working to get Django running on Python 3.1 because I don't feel confident I'll be able to put any apps I write into production. >>> >>> Why not? Since the I/O speed problem is fixed, I have no idea what you are >>> referring to. Please do be concrete. >> >> At the risk of putting words into Jacob's mouth, I understood him to >> mean that "production quality" WSGI servers either do not exist, or do >> not implement a consistently defined spec (i.e., everyone is doing >> their own thing to adapt WSGI to Python 3). > > Yup, exactly. > > Deploying web apps under Python 2 right now is actually pretty > awesome. There's a clear leader in mod_wsgi that's fast, stable, easy > to use, and under active development. There's a few great lightweight > pure-Python servers, some new-hotness (Gunicorn) and some > tried-and-true (CherryPy). There's a fast-as-hell bleeding-edge option > (nginx + uwsgi). And those are just the ones I've successfully put > into production -- there're still *more* options if one of those won't > cut it. > > The key here is that switching between all of these deployment > situations is *incredibly* easy. Actually, this very afternoon I'm > planning to experiment with a switch from mod_wsgi to gunicon. I'm > confident enough with the inter-op that I'm going to make the switch > on a production web server, monitor it for a bit, then switch back. > > I've budgeted an hour for this, and I'll probably end up spending half > that time playing Minecraft while I gather statistics. > > Python 3 offers me none of this. I don't have a wide variety of tools > to choose from. Worse, I don't even have a guarantee of > interoperability between the tools that *do* exist. > > --- > > I'm sorry if I'm coming across as a complainer here. It's a > frustrating situation for me: I want to start using Python 3, but > until there's a working web stack waiting for me I just can't justify > the time. And unfortunately I'm just not familiar enough with the > problem(s) to have any real shot at working towards a solution, and > I'm *certainly* not enough of an expert to work on a PEP or spec. So > all I can really do is agitate. > I think you are entitled to describe real-world use cases that Python 3 needs to start solving to be accepted as production-ready. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.x as the official release
Le 15/09/2010 21:45, Tarek Ziadé a écrit : > Could we remove in any case the wsgiref.egg-info file ? Since we've > been working on a new format for that (PEP 376), that should be > starting to get used in the coming years, it'll be a bit of a > non-sense to have that metadata file in the sdtlib shipped with 3,2 On a related subject: Would it make sense not to run install_egg_info from install anymore? We probably can’t remove the command because of backward compat, but we could stop running it (thus creating egg-info files) by default. Regards ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.7 Won't Build
On Sep 16, 2010, at 02:56 PM, Tom Browder wrote: >On Thu, Sep 16, 2010 at 14:36, Barry Warsaw wrote: >> When you say "installed python 2.7" do you mean the one you >> installed to /usr/local from a from-source build, or something else >> (e.g. a Python 2.7 package perhaps)? > >It was the released source tarball for 2.7, and I get the same error >when I try it from that directory. Yep, sorry, I still cannot reproduce it. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.x as the official release
At 10:18 PM 9/16/2010 +0200, Ãric Araujo wrote: Le 15/09/2010 21:45, Tarek Ziadé a écrit : > Could we remove in any case the wsgiref.egg-info file ? Since we've > been working on a new format for that (PEP 376), that should be > starting to get used in the coming years, it'll be a bit of a > non-sense to have that metadata file in the sdtlib shipped with 3,2 On a related subject: Would it make sense not to run install_egg_info from install anymore? We probably canât remove the command because of backward compat, but we could stop running it (thus creating egg-info files) by default. If you're talking about distutils2 on Python 3, then of course anything goes: backward compatibility isn't an issue. For 2.x, not writing the files would indeed produce backward compatibility problems. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Thu, 16 Sep 2010 17:40:53 +0200, Antoine Pitrou wrote: > On Thu, 16 Sep 2010 11:30:12 -0400 > "R. David Murray" wrote: > > > > And then BaseHeader uses self.lit.colon, etc, when manipulating strings. > > It also has to use slice notation rather than indexing when looking at > > individual characters, which is a PITA but not terrible. > > > > I'm not saying this is the best approach, since this is all experimental > > code at the moment, but it is *an* approach > > Out of curiousity, can you explain why polymorphism is needed for > e-mail? I would assume that headers are bytes until they are parsed, at > which point they become a pair of unicode strings (one for the header > name and one for its value). Currently email accepts strings as input, and produces strings as output. It needs to also accept bytes as input, and emit bytes as output, because unicode can only be used as a 7-bit clean data transmission channel, and that's too restrictive for many email applications (many of which need to deal with "dirty" (non-RFC conformant) 8bit data. [1] Backward compatibility says "case closed". If we were designing from scratch, we could insist that input to the parser is always bytes, and when the model is serialized it always produces bytes. It is possible that one could live with that, but I don't think it is optimal. Given a message, there are many times you want to serialize it as text (for example, for presentation in a UI). You could provide alternate serialization methods to get text out on demandbut then what if someone wants to push that text representation back in to email to rebuild a model of the message? So now we have both a bytes parser and a string parser. What do we store in the model? We could say that the model is always text. But then we lose information about the original bytes message, and we can't reproduce it. For various reasons (mailman being a big one), this is not acceptable. So we could say that the model is always bytes. But we want access to (for example) the header values as text, so header lookup should take string keys and return string values[2]. But for certain types of processing, particularly examination of "dirty", non-RFC conforming input data, you need to be able to access the raw bytes data. What about email files on disk? They could be bytes, or they could be, effectively, text (for example, utf-8 encoded). On disk, using utf-8, one might store the text representation of the message, rather than the wire-format (ASCII encoded) version. We might want to write such messages from scratch. As I said above, we could insist that files on disk be in wire-format, and for many applications that would work fine, but I think people would get mad at us if didn't support text files[3]. So, after much discussion, what we arrived at (so far!) is a model that mimics the Python3 split between bytes and strings. If you start with bytes input, you end up with a BytesMessage object. If you start with string input to the parser, you end up with a StringMessage. If you have a BytesMessage and you want to do something with the text version of the message, you decode it: print(mymsg.decode()) If the message is RFC conformant, the message contains all the information needed to decode it correctly. If its not conformant, email does the best it can and registers defects for the non-conformant bits (or, optionally, email6 will raise errors when the policy is set to strict). If you have a StringMessage and you want to use it where wire-format is needed, you encode it: outmsg = mymsg.encode() smtpserver.sendmail( bytes(outmsg['from']), [bytes(x) for x in itertools.chain( outmsg['to'], outmsg['cc'], outmsg['bcc'])], outmsg.serialize(policy=email.policy.SMTP)) Encoding uses the utf-8 character set by default, but this can be modified by changing the policy. The trick for gathering the list of addresses is how I *think* that part of the API is going to work: iterating the object that models an address header gives you a list of address objects, and converting one of those to a bytes string gives you the wire-format byte string representing a single address. Also note that this is the new API; in compatibility mode (which is controlled by the policy) you'd get the old behavior of just getting the string representation of the whole header back (but then you'd have to parse it to turn it into a list of addresses). The point here is that because we've encoded the message to a BytesMessage, what we get when we turn the pieces into a bytes string are the wire-format byte strings that are required for transmission; for example, non-ASCII characters will be encoded according to the policy and then RFC2047 transfer encoded as needed. At this point you may notice there's a problem with the example above. We actually need to decode each of those byte strings using the ASCII codec before passing them as arguments to smt
Re: [Python-Dev] Python 2.7 Won't Build
I'm attempting to file a bug but keep getting: An error has occurred A problem was encountered processing your request. The tracker maintainers have been notified of the problem. -Tom Thomas M. Browder, Jr. Niceville, Florida USA ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.7 Won't Build
Le jeudi 16 septembre 2010 23:10:22, Tom Browder a écrit : > I'm attempting to file a bug but keep getting: File another bug about this bug! -- Victor Stinner http://www.haypocalc.com/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.x as the official release
> If you're talking about distutils2 on Python 3, then of course > anything goes: backward compatibility isn't an issue. For 2.x, not > writing the files would indeed produce backward compatibility problems. I was talking about distutils in 3.2 (or in the release where wsgiref.egg-info goes away). install_egg_info.py has already been turned into install_distinfo.py in distutils2, following PEP 376. Thank you for your reply, I withdraw my suggestion. Regards ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Thu, 16 Sep 2010 16:51:58 -0400
"R. David Murray" wrote:
>
> What do we store in the model? We could say that the model is always
> text. But then we lose information about the original bytes message,
> and we can't reproduce it. For various reasons (mailman being a big one),
> this is not acceptable. So we could say that the model is always bytes.
> But we want access to (for example) the header values as text, so header
> lookup should take string keys and return string values[2].
Why can't you have both in a single class? If you create the class
using a bytes source (a raw message sent by SMTP, for example), the
class automatically parses and decodes it to unicode strings; if you
create the class using an unicode source (the text body of the e-mail
message and the list of recipients, for example), the class
automatically creates the bytes representation.
(of course all processing can be done lazily for performance reasons)
> What about email files on disk? They could be bytes, or they could be,
> effectively, text (for example, utf-8 encoded).
Such a file can be two things:
- the raw encoding of a whole message (including headers, etc.), then
it should be fed as a bytes object
- the single text body of a hypothetical message, then it should be fed
as a unicode object
I don't see any possible middle-ground.
> On disk, using utf-8,
> one might store the text representation of the message, rather than
> the wire-format (ASCII encoded) version. We might want to write such
> messages from scratch.
But then the user knows the encoding (by "user" I mean what/whoever
calls the email API) and mentions it to the email package.
What I'm having an issue with is that you are talking about a bytes
representation and an unicode representation of a message. But they
aren't representations of the same things:
- if it's a bytes representation, it will be the whole, raw message
including envelope / headers (also, MIME sections etc.)
- if it's an unicode representation, it will only be a section of the
message decodable as such (a text/plain MIME section, for example;
or a decoded header value; or even a single e-mail address part of a
decoded header)
So, there doesn't seem to be any reason for having both a BytesMessage
and an UnicodeMessage at the same abstraction level. They are both
representing different things at different abstraction levels. I don't
see any potential for confusion: raw assembled e-mail message = bytes;
decoded text section of a message = unicode.
As for the problem of potential "bogus" raw e-mail data
(e.g., undecodable headers), well, I guess the library has to make a
choice between purity and practicality, or perhaps let the user choose
themselves. For example, through a `strict` flag. If `strict` is true,
raise an error as soon as a non-decodable byte appears in a header, if
`strict` is false, decode it through a default (encoding, errors)
convention which can be overriden by the user (a sensible possibility
being "utf-8, surrogateescape" to allow for lossless round-tripping).
> As I said above, we could insist that files on
> disk be in wire-format, and for many applications that would work fine,
> but I think people would get mad at us if didn't support text files[3].
Again, this simply seems to be two different abstraction levels:
pre-generated raw email messages including headers, or a single text
waiting to be embedded in an actual e-mail.
> Anyway, what polymorphism means in email is that if you put in bytes,
> you get a BytesMessage, if you put in strings you get a StringMessage,
> and if you want the other one you convert.
And then you have two separate worlds while ultimately the same
concepts are underlying. A library accepting BytesMessage will crash
when a program wants to give a StringMessage and vice-versa. That
doesn't sound very practical.
> [1] Now that surrogateesscape exists, one might suppose that strings
> could be used as an 8bit channel, but that only works if you don't need
> to *parse* the non-ASCII data, just transmit it.
Well, you can parse it, precisely. Not only, but it round-trips if you
unparse it again:
>>> header_bytes = b"From: bogus\xFFname "
>>> name, value = header_bytes.decode("utf-8", "surrogateescape").split(":")
>>> name
'From'
>>> value
' bogus\udcffname '
>>> "{0}:{1}".format(name, value).encode("utf-8", "surrogateescape")
b'From: bogus\xffname '
In the end, what I would call a polymorphic best practice is "try to
avoid bytes/str polymorphism if your domain is well-defined
enough" (which I admit URLs aren't necessarily; but there's no
question a single text/XXX e-mail section is text, and a whole
assembled e-mail message is bytes).
Regards
Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.7 Won't Build
USAOn Thu, Sep 16, 2010 at 16:36, Victor Stinner wrote: > Le jeudi 16 septembre 2010 23:10:22, Tom Browder a écrit : >> I'm attempting to file a bug but keep getting: > > File another bug about this bug! I did, and eventually discovered the problem: I tried to "nosy" Barry as requested by adding his e-mail address, but that causes an error in the tracker. After I finally figured that out, I successfully entered the original bug (and reported it on the "tracker bug"). -Tom Thomas M. Browder, Jr. Niceville, Florida ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Sep 16, 2010, at 4:51 PM, R. David Murray wrote: > Given a message, there are many times you want to serialize it as text > (for example, for presentation in a UI). You could provide alternate > serialization methods to get text out on demandbut then what if > someone wants to push that text representation back in to email to > rebuild a model of the message? You tell them "too bad, make some bytes out of that text." Leave it up to the application. Period, the end, it's not the library's job. If you pushed the text out to a 'view message source' UI representation, then the vicissitudes of the system clipboard and other encoding and decoding things may corrupt it in inscrutable ways. You can't fix it. Don't try. > So now we have both a bytes parser and a string parser. Why do so many messages on this subject take this for granted? It's wrong for the email module just like it's wrong for every other package. There are plenty of other (better) ways to deal with this problem. Let the application decide how to fudge the encoding of the characters back into bytes that can be parsed. "In the face of ambiguity, refuse the temptation to guess" and all that. The application has more of an idea of what's going on than the library here, so let it make encoding decisions. Put another way, there's nothing wrong with having a text parser, as long as it just encodes the text according to some known encoding and then parses the bytes :). > So, after much discussion, what we arrived at (so far!) is a model > that mimics the Python3 split between bytes and strings. If you > start with bytes input, you end up with a BytesMessage object. > If you start with string input to the parser, you end up with a > StringMessage. That may be a handy way to deal with some grotty internal implementation details, but having a 'decode()' method is broken. The thing I care about, as a consumer of this API, is that there is a clearly defined "Message" interface, which gives me a uniform-looking place where I can ask for either characters (if I'm displaying them to the user) or bytes (if I'm putting them on the wire). I don't particularly care where those bytes came from. I don't care what decoding tricks were necessary to produce the characters. Now, it may be worthwhile to have specific normalization / debrokenifying methods which deal with specific types of corrupt data from the wire; encoding-guessing, replacement-character insertion or whatever else are fine things to try. It may also be helpful to keep around a list of errors in the message, for inspection. But as we know, there are lots of ways that MIME data can go bad other than encoding, so that's just one variety of error that we might want to keep around. (Looking at later messages as I'm about to post this, I think this all sounds pretty similar to Antoine's suggestions, with respect to keeping the implementation within a single class, and not having BytesMessage/UnicodeMessage at the same abstraction level.)___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Sep 16, 2010, at 06:11 PM, Glyph Lefkowitz wrote: >That may be a handy way to deal with some grotty internal >implementation details, but having a 'decode()' method is broken. The >thing I care about, as a consumer of this API, is that there is a >clearly defined "Message" interface, which gives me a uniform-looking >place where I can ask for either characters (if I'm displaying them to >the user) or bytes (if I'm putting them on the wire). I don't >particularly care where those bytes came from. I don't care what >decoding tricks were necessary to produce the characters. But first you have to get to that Message interface. This is why the current email package separates parsing and generating from the representation model. You could conceivably have a parser that rot13's all the payload, or just parses the headers and leaves the payload as a blob of bytes. But the parser tries to be lenient in what it accepts, so that one bad header doesn't cause it to just punt on everything that follows. Instead, it parses what it can and registers a defect on that header, which the application can then reason about, because it has a Message object. If it were to just throw up its hands (i.e. raise an exception), you'd basically be left with a blob of useless crap that will just get /dev/null'd. >Now, it may be worthwhile to have specific normalization / >debrokenifying methods which deal with specific types of corrupt data >from the wire; encoding-guessing, replacement-character insertion or >whatever else are fine things to try. It may also be helpful to keep >around a list of errors in the message, for inspection. But as we >know, there are lots of ways that MIME data can go bad other than >encoding, so that's just one variety of error that we might want to >keep around. Right. The middle ground IMO is what the current parser does. It recognizes the problem, registers a defect, and tries to recover, but it doesn't fix the corrupt data. So for example, if you had a valid RFC 2047 encoded Subject but a broken X-Foo header, you'd at least still end up with a Message object. The value of the good headers would be things from which you can get the unicode value, the raw bytes value, parse its parameters, munge it, etc. while the bad header might be something you can only get the raw bytes from. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Sep 16, 2010, at 7:34 PM, Barry Warsaw wrote: > On Sep 16, 2010, at 06:11 PM, Glyph Lefkowitz wrote: > >> That may be a handy way to deal with some grotty internal >> implementation details, but having a 'decode()' method is broken. The >> thing I care about, as a consumer of this API, is that there is a >> clearly defined "Message" interface, which gives me a uniform-looking >> place where I can ask for either characters (if I'm displaying them to >> the user) or bytes (if I'm putting them on the wire). I don't >> particularly care where those bytes came from. I don't care what >> decoding tricks were necessary to produce the characters. > > But first you have to get to that Message interface. This is why the current > email package separates parsing and generating from the representation model. > You could conceivably have a parser that rot13's all the payload, or just > parses the headers and leaves the payload as a blob of bytes. But the parser > tries to be lenient in what it accepts, so that one bad header doesn't cause > it to just punt on everything that follows. Instead, it parses what it can > and registers a defect on that header, which the application can then reason > about, because it has a Message object. If it were to just throw up its hands > (i.e. raise an exception), you'd basically be left with a blob of useless crap > that will just get /dev/null'd. Oh, absolutely. Please don't interpret anything I say as meaning that the email API should not handle broken data. I'm just saying that you should not expect broken data to round-trip through translation to characters and back, any more than you should expect a broken PNG to round-trip through a translation to a 2d array of pixels and back. >> Now, it may be worthwhile to have specific normalization / >> debrokenifying methods which deal with specific types of corrupt data >> from the wire; encoding-guessing, replacement-character insertion or >> whatever else are fine things to try. It may also be helpful to keep >> around a list of errors in the message, for inspection. But as we >> know, there are lots of ways that MIME data can go bad other than >> encoding, so that's just one variety of error that we might want to >> keep around. > > Right. The middle ground IMO is what the current parser does. It recognizes > the problem, registers a defect, and tries to recover, but it doesn't fix the > corrupt data. So for example, if you had a valid RFC 2047 encoded Subject but > a broken X-Foo header, you'd at least still end up with a Message object. The > value of the good headers would be things from which you can get the unicode > value, the raw bytes value, parse its parameters, munge it, etc. while the bad > header might be something you can only get the raw bytes from. My take on this would be that you should always be able to get bytes or characters, but characters are always suspect, in that once you've decoded, if you had invalid bytes, then they're replacement characters (or your choice of encoding fix).___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Thu, 16 Sep 2010 18:11:30 -0400, Glyph Lefkowitz wrote: > On Sep 16, 2010, at 4:51 PM, R. David Murray wrote: > > > Given a message, there are many times you want to serialize it as text > > (for example, for presentation in a UI). You could provide alternate > > serialization methods to get text out on demandbut then what if > > someone wants to push that text representation back in to email to > > rebuild a model of the message? > > You tell them "too bad, make some bytes out of that text." Leave it up > to the application. Period, the end, it's not the library's job. If > you pushed the text out to a 'view message source' UI representation, > then the vicissitudes of the system clipboard and other encoding and > decoding things may corrupt it in inscrutable ways. You can't fix it. > Don't try. Say we start with this bytes input: To: Glyph Lefkowitz From: "R. David Murray" Subject: =?utf-8?q?p=F6stal?= A simple message. Part of the responsibility of the email module is to provide that in text form on demand, so the application gets: To: Glyph Lefkowitz From: "R. David Murray" Subject: pöstal A simple message. Now the application allows the user to do some manipulation of that, and we have: To: "R. David Murray" From: Glyph Lefkowitz Subject: Re: pöstal A simple reply. How does the application "make some bytes out of that text" before passing it back to email? The application shouldn't have to know how to do RFC2047 encoding, certainly, that's one of the jobs of the email module. If the application just encodes the above as UTF8, then it also has to be calling an email API that knows it is getting bytes input that has not been transfer-encoded, and needs to be told the encoding, so that it can do the correct transfer encoding. In that case why not have the API be pass in the text, with an optional override for the default utf-8 encoding that email will otherwise use? Perhaps some of the disconnect here with Antoine (and Jean-Paul, on IRC) is that the email-sig feels that the format of data handled by the email module (rfcx822-style headers, perhaps with a body, perhaps including MIME attachments) is of much wider utility than just handling email, and that since the email module already has to be very liberal in what it accepts, it isn't much of a stretch to have it handle those use cases as well (and in Python2 it does, in the same 'most of the time' way it handles other non-ASCII byte issues). In that context, it seems perfectly reasonable to expect it to parse string (unicode) headers containing non-ascii data. In such use cases there might be no reason to encode to email RFC wire-format, and thus an encode-to-bytes-and-tell-me-the-encoding interface wouldn't serve the use case particularly well because the application wouldn't want the RFC2047 encoding in the file version of the data. We could conceivably drop those use cases if it simplified the API and implementation, but right now it doesn't feel like it does. Further, Python2 serves these use cases, because you can read the non-ascii data and process it as binary data and it would all just work (most of the time). So such use cases probably do exist out in the wild (but no, we don't have any specific pointers, though I myself was working on such an ap once that never got to production). If Python3 email parses only bytes, then it could serve the use case in somewhat the same way as Python2: the application would encode the data as, say, utf8 and pass it to the 'wire format bytes' input interface, which would then register a defect but otherwise pass the data along to the 'wire' (the file in this case). On read it would again register a defect, and the application could pull the data out using the 'give me the wire-bytes' interface and decode it itself. But this feels yucky to me, like a regression to Python2's conflation of bytes and text. This type of application really wants to work with unicode, not to have to futz with bytes. > > So now we have both a bytes parser and a string parser. > > Why do so many messages on this subject take this for granted? It's > wrong for the email module just like it's wrong for every other package. > > There are plenty of other (better) ways to deal with this problem. Let > the application decide how to fudge the encoding of the characters back > into bytes that can be parsed. "In the face of ambiguity, refuse the > temptation to guess" and all that. The application has more of an idea > of what's going on than the library here, so let it make encoding > decisions. > > Put another way, there's nothing wrong with having a text parser, as > long as it just encodes the text according to some known encoding and > then parses the bytes :). See above for why I don't think that serves all the use cases for text parsing. Perhaps another difference is that in my mind *as an application developer*, the "real" email messag
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Sep 16, 2010, at 09:34 PM, R. David Murray wrote: >Say we start with this bytes input: > >To: Glyph Lefkowitz >From: "R. David Murray" >Subject: =?utf-8?q?p=F6stal?= > >A simple message. > >Part of the responsibility of the email module is to provide that >in text form on demand, so the application gets: > >To: Glyph Lefkowitz >From: "R. David Murray" >Subject: pöstal > >A simple message. > >Now the application allows the user to do some manipulation of that, >and we have: > >To: "R. David Murray" >From: Glyph Lefkowitz >Subject: Re: pöstal > >A simple reply. And of course, what happens if the original subject is in one charset and the prefix is in an incompatible one? Then you end up with a wire format of two RFC 2047 encoded words separated by whitespace. You have to keep those chunks separate all the way through to do that properly. (I know RDM knows this. :) >But I *am* open to being convinced otherwise. If everyone hates the >BytesMessage/StringMessage API design, then that should certainly not >be what we implement in email. Just as a point of order, to the extent that we're discussing generic approaches to similar problems across multiple modules, it's okay that we're having this discussion on python-dev. But the email-sig has put in a lot of work on specific API and implementation designs for the email package, so any deviation really needs to be debated, discussed, and decided there. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Fri, 17 Sep 2010 11:34:26 am R. David Murray wrote: > Perhaps another difference is that in my mind *as an application > developer*, the "real" email message consists of unicode headers and > message bodies, with attachments that are sometimes binary, and that > the wire-format is this formalized encoding we have to use to be able > to send it from place to place. In that mental model it seems to > make perfect sense to have a StringMessage that I have encode to > transmit, and a BytesMessage that I receive and have to decode to > work with. +1 -- Steven D'Aprano ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (Not) delaying the 3.2 release
On 9/16/2010 3:07 PM, Jacob Kaplan-Moss wrote: On 16 September 2010 07:16, Terry Reedy wrote: I'm not working to get Django running on Python 3.1 because I don't feel confident I'll be able to put any apps I write into production. Why not? Since the I/O speed problem is fixed, I have no idea what you are referring to. Please do be concrete. Deploying web apps under Python 2 right now is actually pretty awesome. ... And will remain so for years. The key here is that switching between all of these deployment situations is *incredibly* easy. ... Python 3 offers me none of this. I don't have a wide variety of tools to choose from. Worse, I don't even have a guarantee of interoperability between the tools that *do* exist. That last needs an updated standard, which may require a bit of nudging to get agreement on *something*, along with an updated reference implementation. I would expect a usable variety of production implementations to gradually follow thereafter, as they have for 2.x. I'm sorry if I'm coming across as a complainer here. No. You answered my question quite well. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Fri, 17 Sep 2010 00:05:12 +0200, Antoine Pitrou wrote: > On Thu, 16 Sep 2010 16:51:58 -0400 > "R. David Murray" wrote: > > > > What do we store in the model? We could say that the model is always > > text. But then we lose information about the original bytes message, > > and we can't reproduce it. For various reasons (mailman being a big one), > > this is not acceptable. So we could say that the model is always bytes. > > But we want access to (for example) the header values as text, so header > > lookup should take string keys and return string values[2]. > > Why can't you have both in a single class? If you create the class > using a bytes source (a raw message sent by SMTP, for example), the > class automatically parses and decodes it to unicode strings; if you > create the class using an unicode source (the text body of the e-mail > message and the list of recipients, for example), the class > automatically creates the bytes representation. > > (of course all processing can be done lazily for performance reasons) Certainly we could do that. There are methods, though, whose implementation is the same except for the detail of whether they are processing bytes or string, so the dual class structure allows that implementation to be shared. So even if we changed the API to be single class, I might well retain the dual class implementation under the hood. I'd have to explore which looked better when the time came. > > What about email files on disk? They could be bytes, or they could be, > > effectively, text (for example, utf-8 encoded). > > Such a file can be two things: > - the raw encoding of a whole message (including headers, etc.), then > it should be fed as a bytes object > - the single text body of a hypothetical message, then it should be fed > as a unicode object > I don't see any possible middle-ground. It's not a middle ground, but as I discussed in my response to Glyph, it could be a series of headers and a body in, say, utf-8 where the application wants to treat them as unicode, not bytes (ie: *not* an email). Python2 supports this use case, albeit with the same "works most of the time" as it does with other non-ascii edge cases. > > On disk, using utf-8, > > one might store the text representation of the message, rather than > > the wire-format (ASCII encoded) version. We might want to write such > > messages from scratch. > > But then the user knows the encoding (by "user" I mean what/whoever > calls the email API) and mentions it to the email package. Yes? And then? The email package still has to parse the file, and it can't use its normal parse-the-RFC-data parser because the file could contain *legitimate* non-ASCII header data. So there has to be a separate parser for this case that will convert the non-ASCII data into RFC2047 encoded data. At that point you have two parsers that share a bunch of code...and my current implementation lets the input to the second parser be text, which is the natural representation of that data, the one the user or application writer is going to expect. I *could* implement it as a variant bytes parser, and have the application call the variant parser with encoded bytes, but why? What's the benefit? If the API takes text, it is *obvious* that non-ascii data is allowed and is going to get wire-encoded. If it takes bytesthere is more mental overhead in figuring out which bytes-parser interface one should call, depending on whether one has 'wire format" data or encoded non-ascii data. I can just imagine someone using the bytes-that-need-transfer-encoding to try to parse a file containing RFC encoded data that he knows is stored in a utf-8 encoded file, because that's the interface that accepts an encoding paramter. And then the RFC2047 encoded words wouldn't get decoded. Overall it seems simpler to me that text file == pass text to the text parser, RFC-encoded bytes data == pass bytes data to the bytes parser. This also separates opening the file correctly (specify the encoding on open) from encoding the data as you prefer (encoding specified to the email package when telling it to encode to wire format). > What I'm having an issue with is that you are talking about a bytes > representation and an unicode representation of a message. But they > aren't representations of the same things: > - if it's a bytes representation, it will be the whole, raw message > including envelope / headers (also, MIME sections etc.) > - if it's an unicode representation, it will only be a section of the > message decodable as such (a text/plain MIME section, for example; > or a decoded header value; or even a single e-mail address part of a > decoded header) Conceptually, a BytesMessage is a model of the entire message with all the parts encoded in RFC wire-format. When you access pieces of it, you get the RFC encoded byte strings. Conceptually a StringMessage is a model of the entire message with all the parts decoded as fa
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
Based on the discussion so far, I think you should go ahead and implement the API agreed on by the mail sig both because is *has* been agreed on (and thinking about the wsgi discussion, that seems to be a major achievement) and because it seems sensible to me also, as far as I understand it. The proof of the API will be in the testing. As long as you *think* it covers all intended use cases, I am not sure that abstract discussion can go much further. I do have a thought about space and duplication. For normal messages, it is not an issue. For megabyte (or in the future, gigabyte?) attachments, it is. So if possible, there should only be one extracted blob for both bytes and string versions of parsed messages. Or even make the extraction from the raw stream lazy, when specifically requested. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Thu, 16 Sep 2010 21:53:17 -0400, Barry Warsaw wrote:
> And of course, what happens if the original subject is in one charset and the
> prefix is in an incompatible one? Then you end up with a wire format of two
> RFC 2047 encoded words separated by whitespace. You have to keep those chunks
> separate all the way through to do that properly. (I know RDM knows this. :)
Heh, my example got messed up because my current mailer didn't MIME
encode it properly. That is, I emitted a non-RFC-compliant email :(
This is actually a pretty interesting issue in a number of ways, though
I'm not sure it relates to any other part of the stdlib. A header can
contain encoded words encoded with different charsets. An MUA that sorts
by subject and takes prefixes ('Re:') into account, for example, might
be decoding the header entirely before doing header matching/sorting, or
it might be matching against the RFC2047 encoded header. Hopefully the
former, these days, but don't count on it. So when emitting a reply, a
careful MUA would want to *only* attach the 'Re:' to the front, and not
otherwise change the header. If it is going to do that, though, it is
going to have to (a) make sure it preserves the original bytes version
of the header and (b) refold the line if necessary. This means knowing
lots of stuff about header encoding. So, really, that job should be
done by the email package, or at least the email package should provide
tools to do this.
The naive way (decode the header to unicode, attach the prefix, re-encode
using your favorite charset) is going to work most of the time, and
that's what it will be easiest to do with email6. Tacking the Re: on
the front of the bytes version of the header and having email6 refold
it will probably work about as well as it currently does in the old
email package, which is to say that sometimes the unfolded header is
otherwise unchanged, and sometimes it isn't.
> >But I *am* open to being convinced otherwise. If everyone hates the
> >BytesMessage/StringMessage API design, then that should certainly not
> >be what we implement in email.
>
> Just as a point of order, to the extent that we're discussing generic
> approaches to similar problems across multiple modules, it's okay that we're
> having this discussion on python-dev. But the email-sig has put in a lot of
> work on specific API and implementation designs for the email package, so any
> deviation really needs to be debated, discussed, and decided there.
I am also finding it useful to have the API exposed to a wider audience
for feedback, but I agree, any substantive change would need to be
discussed on the email-sig, not here.
--David
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Thu, 16 Sep 2010 23:45:12 -0400, Terry Reedy wrote: > Based on the discussion so far, I think you should go ahead and > implement the API agreed on by the mail sig both because is *has* been > agreed on (and thinking about the wsgi discussion, that seems to be a > major achievement) and because it seems sensible to me also, as far as I > understand it. The proof of the API will be in the testing. As long as > you *think* it covers all intended use cases, I am not sure that > abstract discussion can go much further. > > I do have a thought about space and duplication. For normal messages, it > is not an issue. For megabyte (or in the future, gigabyte?) attachments, > it is. So if possible, there should only be one extracted blob for both > bytes and string versions of parsed messages. Or even make the > extraction from the raw stream lazy, when specifically requested. Our intent is to have conversions be as lazy as possible. There will doubtless be some interesting heuristics to develop as to what to convert when and what to cache when, and consequent problems to solve when it comes to garbage collection... There's also slated to be a back-end API for storing parts of messages elsewhere than in memory, though I haven't worked out what that is going to look like yet. But we are definitely getting off topic now :) --David ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
