Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Toshio Kuratomi
On Tue, Jun 22, 2010 at 11:58:57AM +0900, Stephen J. Turnbull wrote: Toshio Kuratomi writes: One comment here -- you can also have uri's that aren't decodable into their true textual meaning using a single encoding. Apache will happily serve out uris that have utf-8, shift-jis,

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz
On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote: The RFC says that URIs are text, and therefore they can (and IMO should) be operated on as text in the stdlib. No, *blue* is the best color for a shed. Oops, wait, let me try that again. While I broadly agree with this statement, it

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Stephen J. Turnbull
Michael Urman writes: It is somewhat troublesome that there doesn't appear to be an obvious built-in idempotent-when-possible function that gives back the provided bytes/str, If you want something idempotent, it's already the case that bytes(b'abc') = b'abc'. What might be desirable is to

[Python-Dev] UserDict in 2.7

2010-06-22 Thread Raymond Hettinger
There's an entry in whatsnew for 2.7 to the effect of The UserDict class is now a new-style class. I had thought there was a conscious decision to not change any existing classes from old-style to new-style. IIRC, Martin had championed this idea and had rejected all of proposals to make

Re: [Python-Dev] red buildbots on 2.7

2010-06-22 Thread Ronald Oussoren
On 21 Jun, 2010, at 22:25, Antoine Pitrou wrote: Le lundi 21 juin 2010 à 21:13 +0100, Michael Foord a écrit : If OS X is a supported and important platform for Python then fixing all problems that it reveals (or being willing to) should definitely not be a pre-requisite of providing a

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Raymond Hettinger
On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: This is a common pain-point for porting software to 3.x - you had a string, it kinda worked most of the time before, but now you need to keep track of text too and the functions which seemed to work on bytes no longer do. Thanks Glyph.

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Stephen J. Turnbull
P.J. Eby writes: I know, it's a hard thing to wrap one's head around, since on the surface it sounds like unicode is the programmer's savior. I don't need to wrap my head around it. It's been deeply embedded, point first, and the nasty barbs ensure that I have no desire to pull it back out.

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Stephen J. Turnbull
Glyph Lefkowitz writes: On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote: Note also that the complete solution argument cuts both ways. Eg, a complete solution should implement UTS 39 confusables detection[1] and IDNA[2]. Good luck doing that with bytes! And good luck

[Python-Dev] adding new function

2010-06-22 Thread lesni bleble
hello, how can i simply add new functions to module after its initialization (Py_InitModule())? I'm missing something like PyModule_AddCFunction(). thank you L. ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] adding new function

2010-06-22 Thread Daniel Fetchinson
how can i simply add new functions to module after its initialization (Py_InitModule())? I'm missing something like PyModule_AddCFunction(). This type of question really belongs to python-list aka comp.lang.python which I CC-d now. Please keep the discussion on that list. Cheers, Daniel --

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Nick Coghlan
On Tue, Jun 22, 2010 at 4:49 PM, Stephen J. Turnbull step...@xemacs.org wrote:   Which works if and only if your outputs are truly unicode-able. With PEP 383, they always are, as long as you allow Unicode to be decoded to the same garbage your bytes-based program would have produced anyway.

[Python-Dev] [OT] glyphs [was Re: email package status in 3.X]

2010-06-22 Thread Steven D'Aprano
On Tue, 22 Jun 2010 11:46:27 am Terry Reedy wrote: 3. Unicode disclaims direct representation of glyphic variants (though again, exceptions were made for asian acceptance). For example, in English, mechanically printed 'a' and 'g' are different from manually printed 'a' and 'g'. Representing

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Stephen J. Turnbull
Toshio Kuratomi writes: I'll definitely buy that. Would urljoin(b_base, b_subdir) = bytes and urljoin(u_base, u_subdir) = unicode be acceptable though? Probably. But it doesn't matter what I say, since Guido has defined that as polymorphism and approved it in principle. (I think,

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Stephen J. Turnbull
Nick Coghlan writes: On Tue, Jun 22, 2010 at 4:49 PM, Stephen J. Turnbull step...@xemacs.org wrote:   Which works if and only if your outputs are truly unicode-able. With PEP 383, they always are, as long as you allow Unicode to be decoded to the same garbage your bytes-based

Re: [Python-Dev] UserDict in 2.7

2010-06-22 Thread Fred Drake
On Tue, Jun 22, 2010 at 2:21 AM, Raymond Hettinger raymond.hettin...@gmail.com wrote: I had thought there was a conscious decision to not change any existing classes from old-style to new-style. I thought so as well. Changing any class from old-style to new-style risks breaking applications in

Re: [Python-Dev] UserDict in 2.7

2010-06-22 Thread Benjamin Peterson
2010/6/22 Raymond Hettinger raymond.hettin...@gmail.com: There's an entry in whatsnew for 2.7 to the effect of The UserDict class is now a new-style class. I had thought there was a conscious decision to not change any existing classes from old-style to new-style. IIRC, Martin had championed

Re: [Python-Dev] UserDict in 2.7

2010-06-22 Thread Laurens Van Houtven
On Tue, Jun 22, 2010 at 2:40 PM, Fred Drake fdr...@acm.org wrote: On Tue, Jun 22, 2010 at 2:21 AM, Raymond Hettinger raymond.hettin...@gmail.com wrote: I had thought there was a conscious decision to not change any existing classes from old-style to new-style. I thought so as well.  Changing

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Michael Urman
On Tue, Jun 22, 2010 at 00:28, Stephen J. Turnbull step...@xemacs.org wrote: Michael Urman writes:   It is somewhat troublesome that there doesn't appear to be an obvious   built-in idempotent-when-possible function that gives back the   provided bytes/str, If you want something idempotent,

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Guido van Rossum
[Just addressing one little issue here; generally I'm just happy that we're discussing this issue in such detail from so many points of view.] On Mon, Jun 21, 2010 at 10:50 PM, Toshio Kuratomi a.bad...@gmail.com wrote: [...] Would urljoin(b_base, b_subdir) = bytes and urljoin(u_base, u_subdir) =

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jesse Noller wrote: On Jun 19, 2010, at 10:13 AM, Tres Seaver tsea...@palladion.com wrote: Nothing is set in stone; if something is incredibly painful, or worse yet broken, then someone needs to file a bug, bring it to this list, or bring up a

Re: [Python-Dev] red buildbots on 2.7

2010-06-22 Thread Ronald Oussoren
On 22 Jun, 2010, at 3:38, Alexander Belopolsky wrote: On Mon, Jun 21, 2010 at 6:16 PM, Martin v. Löwis mar...@v.loewis.de wrote: The test_posix failure is a regression from 2.6 (but it only shows up on some machines - it is caused by a fairly braindead implementation of a couple of posix

[Python-Dev] State of json in 2.7

2010-06-22 Thread Dirkjan Ochtman
It looks like simplejson 2.1.0 and 2.1.1 have been released: http://bob.pythonmac.org/archives/2010/03/10/simplejson-210/ http://bob.pythonmac.org/archives/2010/03/31/simplejson-211/ It looks like any changes that didn't come from the Python tree didn't go into the Python tree, either. I guess

Re: [Python-Dev] State of json in 2.7

2010-06-22 Thread Benjamin Peterson
2010/6/22 Dirkjan Ochtman dirk...@ochtman.nl: I guess we can't put these changes into 2.7 anymore? How can we make this better next time? Never have externally maintained packages. -- Regards, Benjamin ___ Python-Dev mailing list

Re: [Python-Dev] UserDict in 2.7

2010-06-22 Thread Raymond Hettinger
On Jun 22, 2010, at 5:48 AM, Benjamin Peterson wrote: 2010/6/22 Raymond Hettinger raymond.hettin...@gmail.com: There's an entry in whatsnew for 2.7 to the effect of The UserDict class is now a new-style class. I had thought there was a conscious decision to not change any existing classes

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Ian Bicking
On Tue, Jun 22, 2010 at 6:31 AM, Stephen J. Turnbull step...@xemacs.orgwrote: Toshio Kuratomi writes: I'll definitely buy that. Would urljoin(b_base, b_subdir) = bytes and urljoin(u_base, u_subdir) = unicode be acceptable though? Probably. But it doesn't matter what I say, since

Re: [Python-Dev] red buildbots on 2.7

2010-06-22 Thread Alexander Belopolsky
On Tue, Jun 22, 2010 at 12:39 PM, Ronald Oussoren ronaldousso...@mac.com wrote: .. Both are valid fixes, both have both advantages and disadvantages. Your proposal: * Reverts to the behavior in 2.6 * Ensures that posix.getgroups and posix.setgroups are internally consistent It is also very

Re: [Python-Dev] UserDict in 2.7

2010-06-22 Thread Benjamin Peterson
2010/6/22 Raymond Hettinger raymond.hettin...@gmail.com: On Jun 22, 2010, at 5:48 AM, Benjamin Peterson wrote: 2010/6/22 Raymond Hettinger raymond.hettin...@gmail.com: There's an entry in whatsnew for 2.7 to the effect of The UserDict class is now a new-style class. I had thought there was

Re: [Python-Dev] red buildbots on 2.7

2010-06-22 Thread Bill Janssen
Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Mon, Jun 21, 2010 at 9:26 PM, Bill Janssen jans...@parc.com wrote: .. Though, isn't that behavior of urllib.proxy_bypass another bug? I don't know. Ask Ronald. Hmmm. I brought up the System Preferences panel on my Mac, and

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Toshio Kuratomi
On Tue, Jun 22, 2010 at 08:31:13PM +0900, Stephen J. Turnbull wrote: Toshio Kuratomi writes: unicode handling redesign. I'm stating my reading of the RFC not to defend the use case Philip has, but because I think that the outlook that non-text uris (before being percentencoded) are

Re: [Python-Dev] UserDict in 2.7

2010-06-22 Thread Raymond Hettinger
On Jun 22, 2010, at 10:08 AM, Benjamin Peterson wrote: . There was a typo in abc.py which prevented it from raising errors when non new-style class objects were passed in. For 2.x, that was probably a good thing, a happy accident that made it possible to register existing mapping classes as a

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Guido van Rossum
On Mon, Jun 21, 2010 at 10:28 PM, Stephen J. Turnbull step...@xemacs.org wrote: Michael Urman writes:   It is somewhat troublesome that there doesn't appear to be an obvious   built-in idempotent-when-possible function that gives back the   provided bytes/str, If you want something

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread James Y Knight
On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote: Similarly I'd expect (from experience) that a programmer using Python to want to take the same approach, sticking with unencoded data in nearly all situations. Yeah. This is a real issue I have with the direction Python3 went: it pushes you

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread M.-A. Lemburg
Guido van Rossum wrote: [Just addressing one little issue here; generally I'm just happy that we're discussing this issue in such detail from so many points of view.] On Mon, Jun 21, 2010 at 10:50 PM, Toshio Kuratomi a.bad...@gmail.com wrote: [...] Would urljoin(b_base, b_subdir) = bytes

Re: [Python-Dev] State of json in 2.7

2010-06-22 Thread Brett Cannon
[cc'ing Bob on his gmail address; didn't have any other address handy so I don't know if this will actually get to him] On Tue, Jun 22, 2010 at 09:54, Dirkjan Ochtman dirk...@ochtman.nl wrote: It looks like simplejson 2.1.0 and 2.1.1 have been released:

Re: [Python-Dev] State of json in 2.7

2010-06-22 Thread Bob Ippolito
On Tuesday, June 22, 2010, Brett Cannon br...@python.org wrote: [cc'ing Bob on his gmail address; didn't have any other address handy so I don't know if this will actually get to him] On Tue, Jun 22, 2010 at 09:54, Dirkjan Ochtman dirk...@ochtman.nl wrote: It looks like simplejson 2.1.0 and

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Terry Reedy
On 6/22/2010 1:22 AM, Glyph Lefkowitz wrote: The thing that I have heard in passing from a couple of folks with experience in this area is that some older software in asia would present characters differently if they were originally encoded in a japanese encoding versus a chinese encoding, even

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Terry Reedy
On 6/22/2010 9:24 AM, Michael Urman wrote: By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding, errors) that would pass an instance of bytes through, or encode an instance of str. And of course a to_str that performs similarly, passing str through and decoding bytes. While

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Terry Reedy
On 6/22/2010 12:53 PM, Guido van Rossum wrote: On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger raymond.hettin...@gmail.com wrote: On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: This is a common pain-point for porting software to 3.x - you had a string, it kinda worked most of

Re: [Python-Dev] [OT] glyphs [was Re: email package status in 3.X]

2010-06-22 Thread Terry Reedy
On 6/22/2010 6:52 AM, Steven D'Aprano wrote: On Tue, 22 Jun 2010 11:46:27 am Terry Reedy wrote: 3. Unicode disclaims direct representation of glyphic variants (though again, exceptions were made for asian acceptance). For example, in English, mechanically printed 'a' and 'g' are different from

[Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities

2010-06-22 Thread Craig Younkins
Hello, The method in question: http://docs.python.org/library/cgi.html#cgi.escape http://svn.python.org/view/python/tags/r265/Lib/cgi.py?view=markup # at the bottom Convert the characters '', '' and '' in string s to HTML-safe sequences. Use this if you need to display text that might contain

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Ian Bicking
On Tue, Jun 22, 2010 at 1:07 PM, James Y Knight f...@fuhm.net wrote: The surrogateescape method is a nice workaround for this, but I can't help thinking that it might've been better to just treat stuff as possibly-invalid-but-probably-utf8 byte-strings from input, through processing, to

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Terry Reedy
Tres, I am a Python3 enthusiast and realist. I did not expect major adoption for about 3 years (more optimistic than the 5 years of some). If you are feeling pressured to 'move' to Python3, it is not from me. I am sure you will do so on your own, perhaps even with enthusiasm, when it will be

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Guido van Rossum
On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote: Any turdiness (which I am *not* arguing for) is a natural consequence of the kinds of backward incompatibilities which were *not* ruled out for Python 3, along with the (early, now waning) build it and they will  come

Re: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities

2010-06-22 Thread Bill Janssen
Craig Younkins cyounk...@gmail.com wrote: cgi.escape never escapes single quote characters, which can easily lead to a Cross-Site Scripting (XSS) vulnerability. This seems to be known by many, but a quick search reveals many are using cgi.escape for HTML attribute escaping. Did you file a

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Robert Collins
On Wed, Jun 23, 2010 at 6:09 AM, M.-A. Lemburg m...@egenix.com wrote:           return constant.encode('utf-8') So now you can write x.split(literal_as('', x)). This polymorphism is what we used in Python2 a lot to write code that works for both Unicode and 8-bit strings. Unfortunately,

Re: [Python-Dev] red buildbots on 2.7

2010-06-22 Thread Martin v. Löwis
This effectively substitutes getgrouplist called on the current user for getgroups. In 3.x, I believe the correct action will be to provide direct access to getgrouplist which is while not POSIX (yet?), is widely available. As a policy, adding non-POSIX functions to the posix module is

Re: [Python-Dev] State of json in 2.7

2010-06-22 Thread Fred Drake
On Tue, Jun 22, 2010 at 12:56 PM, Benjamin Peterson benja...@python.org wrote: Never have externally maintained packages. Seriously! I concur with this. Fortunately, it's not a real problem in this case. There's the (maintained) simplejson package, and the unmaintained json package. And

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Nick Coghlan
On Wed, Jun 23, 2010 at 2:17 AM, Guido van Rossum gu...@python.org wrote: (1) Literals. If you write something like x.split('') you are implicitly assuming x is text. I don't see a very clean way to overcome this; you'll have to implement some kind of type check e.g.    x.split('') if

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Nick Coghlan
On Wed, Jun 23, 2010 at 4:09 AM, M.-A. Lemburg m...@egenix.com wrote: It would be great if we could have something like the above as builtin method: x.split(''.as(x)) As per my other message, another possible (and reasonably intuitive) spelling would be: x.split(x.coerce('')) Writing it

Re: [Python-Dev] UserDict in 2.7

2010-06-22 Thread Greg Ewing
Benjamin Peterson wrote: IIRC this was because UserDict tries to be a MutableMapping but abcs require new style classes. Are there any use cases for UserList and UserDict in new code, now that list and dict can be subclassed? If not, I don't think it would be a big problem if they were left

Re: [Python-Dev] UserDict in 2.7

2010-06-22 Thread Michael Foord
On 23/06/2010 00:03, Greg Ewing wrote: Benjamin Peterson wrote: IIRC this was because UserDict tries to be a MutableMapping but abcs require new style classes. Are there any use cases for UserList and UserDict in new code, now that list and dict can be subclassed? Inheriting from list or

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Michael Foord
On 22/06/2010 22:40, Robert Collins wrote: On Wed, Jun 23, 2010 at 6:09 AM, M.-A. Lemburgm...@egenix.com wrote: return constant.encode('utf-8') So now you can write x.split(literal_as('', x)). This polymorphism is what we used in Python2 a lot to write code that works

Re: [Python-Dev] UserDict in 2.7

2010-06-22 Thread Raymond Hettinger
On Jun 22, 2010, at 3:59 PM, Michael Foord wrote: On 23/06/2010 00:03, Greg Ewing wrote: Benjamin Peterson wrote: IIRC this was because UserDict tries to be a MutableMapping but abcs require new style classes. Are there any use cases for UserList and UserDict in new code, now that list

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Michael Foord
On 22/06/2010 19:07, James Y Knight wrote: On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote: Similarly I'd expect (from experience) that a programmer using Python to want to take the same approach, sticking with unencoded data in nearly all situations. Yeah. This is a real issue I have with

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Ian Bicking
On Tue, Jun 22, 2010 at 11:17 AM, Guido van Rossum gu...@python.org wrote: (2) Data sources. These can be functions that produce new data from non-string data, e.g. str(int), read it from a named file, etc. An example is read() vs. write(): it's easy to create a (hypothetical) polymorphic

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread P.J. Eby
At 07:41 AM 6/23/2010 +1000, Nick Coghlan wrote: Then my example above could be made polymorphic (for ASCII compatible encodings) by writing: [x for x in seq if x.endswith(x.coerce(b))] I'm trying to see downsides to this idea, and I'm not really seeing any (well, other than 2.7 being almost

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz
On Jun 22, 2010, at 12:53 PM, Guido van Rossum wrote: On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger raymond.hettin...@gmail.com wrote: On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: This is a common pain-point for porting software to 3.x - you had a string, it kinda worked

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz
On Jun 22, 2010, at 2:07 PM, James Y Knight wrote: Yeah. This is a real issue I have with the direction Python3 went: it pushes you into decoding everything to unicode early, even when you don't care -- all you really wanted to do is pass it from one API to another, with some well-defined

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz
On Jun 22, 2010, at 7:23 PM, Ian Bicking wrote: This is a place where bytes+encoding might also have some benefit. XML is someplace where you might load a bunch of data but only touch a little bit of it, and the amount of data is frequently large enough that the efficiencies are

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Michael Urman
On Tue, Jun 22, 2010 at 15:32, Terry Reedy tjre...@udel.edu wrote: On 6/22/2010 9:24 AM, Michael Urman wrote: These are trivial functions; I just don't fully understand why the capability isn't baked in. Possible reasons: They are special purpose functions easily built on the basic functions

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Mike Klaas
On Tue, Jun 22, 2010 at 4:23 PM, Ian Bicking i...@colorstudy.com wrote: This reminds me of the optimization ElementTree and lxml made in Python 2 (not sure what they do in Python 3?) where they use str when a string is ASCII to avoid the memory and performance overhead of unicode. An

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Robert Collins
On Wed, Jun 23, 2010 at 12:25 PM, Glyph Lefkowitz gl...@twistedmatrix.com wrote: I can also appreciate what's been said in this thread a bunch of times: to my knowledge, nobody has actually shown a profile of an application where encoding is significant overhead.  I believe that encoding

Re: [Python-Dev] red buildbots on 2.7

2010-06-22 Thread Bill Janssen
Bill Janssen jans...@parc.com wrote: Considering that we've just released 2.7rc2, there are an awful lot of red buildbots for 2.7. In fact, I don't remember having seen a green buildbot for OS X and 2.7. Shouldn't these be fixed? Thanks to some action by Ronald, my two PPC OS X buildbots

Re: [Python-Dev] UserDict in 2.7

2010-06-22 Thread Fred Drake
On Tue, Jun 22, 2010 at 7:17 PM, Raymond Hettinger raymond.hettin...@gmail.com wrote: Benjamin fixed the UserDict  and ABC problem earlier today in r82155. It is now the same as it was in Py2.6. Thanks, Benjamin! -Fred -- Fred L. Drake, Jr.fdrake at gmail.com A storm broke loose in my