Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-07 Thread Stephen J. Turnbull
Greg Ewing writes: > The use cases I had in mind for a 1-byte build are those for > which the alternative would be keeping everything in bytes. > Applications using a 1-byte build would need to be aware of > the fact and take care to slice strings at valid places. If > they were using bytes,

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-07 Thread Antoine Pitrou
On Wed, 07 Jul 2010 11:13:09 +0200 "M.-A. Lemburg" wrote: > > And finally: RAM is cheap and today's CPUs work better with 16- or > 32-bit values than 8-bit characters. The latter is wrong. There is no cost in accessing bytes rather than words on modern CPUs. (actually, bytes are cheaper overall

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-07 Thread Greg Ewing
M.-A. Lemburg wrote: Note that using UTF-8 as internal storage format would not work in Python, since Python is a Unicode producer, i.e. it needs to be able to generate and work with code points that are not allowed in UTF-8, e.g. lone surrogates. Well, it wouldn't strictly be UTF-8, any more

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-07 Thread M.-A. Lemburg
Ronald Oussoren wrote: > > On 27 Jun, 2010, at 11:48, Greg Ewing wrote: > >> Stefan Behnel wrote: >>> Greg Ewing, 26.06.2010 09:58: Would there be any sanity in having an option to compile Python with UTF-8 as the internal string representation? >>> It would break Py_UNICODE, because th

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-06 Thread Stefan Behnel
Ronald Oussoren, 06.07.2010 16:51: On 27 Jun, 2010, at 11:48, Greg Ewing wrote: Stefan Behnel wrote: Greg Ewing, 26.06.2010 09:58: Would there be any sanity in having an option to compile Python with UTF-8 as the internal string representation? It would break Py_UNICODE, because the internal

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-06 Thread Ronald Oussoren
On 27 Jun, 2010, at 11:48, Greg Ewing wrote: > Stefan Behnel wrote: >> Greg Ewing, 26.06.2010 09:58: >>> Would there be any sanity in having an option to compile >>> Python with UTF-8 as the internal string representation? >> It would break Py_UNICODE, because the internal size of a unicode chara

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-27 Thread R. David Murray
On Fri, 25 Jun 2010 15:40:52 -0700, Bill Janssen wrote: > Guido van Rossum wrote: > > So you're really just worried about space consumption. I'd like to see > > a lot of hard memory profiling data before I got overly worried about > > that. > > While I've seen some big Web pages, I think the ema

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-27 Thread Greg Ewing
Eric Smith wrote: But isn't this currently ignored everywhere in python's code? It's true that code using a utf-8 build would have to be aware of the fact much more often. But I'm thinking of applications that would otherwise want to keep all their strings encoded to save memory. If they do th

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-27 Thread Eric Smith
On 6/27/2010 5:48 AM, Greg Ewing wrote: Stefan Behnel wrote: Greg Ewing, 26.06.2010 09:58: Would there be any sanity in having an option to compile Python with UTF-8 as the internal string representation? It would break Py_UNICODE, because the internal size of a unicode character would no lo

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-27 Thread Greg Ewing
Stefan Behnel wrote: Greg Ewing, 26.06.2010 09:58: Would there be any sanity in having an option to compile Python with UTF-8 as the internal string representation? It would break Py_UNICODE, because the internal size of a unicode character would no longer be fixed. It's not fixed anyway w

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Nick Coghlan
On Sun, Jun 27, 2010 at 8:11 AM, Terry Reedy wrote: > I can imagine that inter-operation, when appropriate, might work better with > addition of a couple of  missing __rxxx__ methods, such as the mentioned > __rcontains__. Although adding such would affect the implementation of a > core syntax fea

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Terry Reedy
The several posts in this and other threads go me to think about text versus number computing (which I am more familiar with). For numbers, we have in Python three builtins, the general purpose ints and floats and the more specialized complex. Two other rational types can be imported for speci

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Stephen J. Turnbull
Greg Ewing writes: > Would there be any sanity in having an option to compile > Python with UTF-8 as the internal string representation? Losing Py_UNICODE as mentioned by Stefan Behnel (IIRC) is just the beginning of the pain. If Emacs's experience is any guide, the cost in speed and complexit

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Stefan Behnel
Greg Ewing, 26.06.2010 09:58: Tres Seaver wrote: I do know for a fact that using a UCS2-compiled Python instead of the system's UCS4-compiled Python leads to measurable, noticable drop in memory consumption of long-running webserver processes using Unicode Would there be any sanity in having

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Stefan Behnel
Ian Bicking, 26.06.2010 00:26: On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum wrote: On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz I'd like a version of 'decode' which would give me a type that was, in every respect, unicode, and responded to all protocols exactly as other unicode objec

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Greg Ewing
Tres Seaver wrote: I do know for a fact that using a UCS2-compiled Python instead of the system's UCS4-compiled Python leads to measurable, noticable drop in memory consumption of long-running webserver processes using Unicode Would there be any sanity in having an option to compile Python wit

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Steve Holden
Glyph Lefkowitz wrote: > > On Jun 25, 2010, at 5:02 PM, Guido van Rossum wrote: > >> But you'd still have to validate it, right? You wouldn't want to go on >> using what you thought was wrapped UTF-8 if it wasn't actually valid >> UTF-8 (or you'd be worse off than in Python 2). So you're really j

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Bill Janssen
Guido van Rossum wrote: > On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz > wrote: > > > > On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote: > > > > Regarding the proposal of a String ABC, I hope this isn't going to > > become a backdoor to reintroduce the Python 2 madness of allowing > > eq

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Ian Bicking
On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum wrote: > On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz > > I'd like a version of 'decode' which would give me a type that was, in > every > > respect, unicode, and responded to all protocols exactly as other > > unicode objects (or "str objects

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Guido van Rossum wrote: > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). So you're really just > worried a

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Glyph Lefkowitz
On Jun 25, 2010, at 5:02 PM, Guido van Rossum wrote: > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). So you're really just > worried about space consum

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Guido van Rossum
On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz wrote: > > On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote: > > Regarding the proposal of a String ABC, I hope this isn't going to > become a backdoor to reintroduce the Python 2 madness of allowing > equivalency between text and bytes for *some

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Glyph Lefkowitz
On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote: > Regarding the proposal of a String ABC, I hope this isn't going to > become a backdoor to reintroduce the Python 2 madness of allowing > equivalency between text and bytes for *some* strings of bytes and not > others. For my part, what I wan

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Ian Bicking
On Fri, Jun 25, 2010 at 11:30 AM, Stephen J. Turnbull wrote: > Ian Bicking writes: > > > I'm proposing these specials would be used in polymorphic functions, > like > > the functions in urllib.parse. I would not personally use them in my > own > > code (unless of course I was writing my own po

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Stephen J. Turnbull
Ian Bicking writes: > I'm proposing these specials would be used in polymorphic functions, like > the functions in urllib.parse. I would not personally use them in my own > code (unless of course I was writing my own polymorphic functions). > > This also makes it less important that the obj

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Ian Bicking
On Fri, Jun 25, 2010 at 5:06 AM, Stephen J. Turnbull wrote: > > So with this idea in mind it makes more sense to me that *specific > pieces of > > text* can be reasonably treated as both bytes and text. All the string > > literals in urllib.parse.urlunspit() for example. > > > > The semanti

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Stephen J. Turnbull
Ian Bicking writes: > We've setup a system where we think of text as natively unicode, with > encodings to put that unicode into a byte form. This is certainly > appropriate in a lot of cases. But there's a significant class of problems > where bytes are the native structure. Network protoc

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Greg Ewing
Terry Reedy wrote: On 6/24/2010 1:38 PM, Bill Janssen wrote: We have separate types for int, float, Decimal, etc. But they're all numbers, and they all cross-operate. No they do not. Decimal only mixes properly with ints, but not with anything else I think there are also some important di

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Terry Reedy
On 6/24/2010 4:59 PM, Guido van Rossum wrote: But I wouldn't go so far as to claim that interpreting the protocols as text is wrong. After all we're talking exclusively about protocols that are designed intentionally to be directly "human readable" I agree that the claim "':' is just a byte" i

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Guido van Rossum
On Thu, Jun 24, 2010 at 2:44 PM, Ian Bicking wrote: > I think we'll avoid a lot of the confusion that was present with Python 2 by > not making the coercions transitive.  For instance, here's something that > would work in Python 2: > >   urlunsplit(('http', 'example.com', '/foo', u'bar=baz', ''))

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Terry Reedy
On 6/24/2010 1:38 PM, Bill Janssen wrote: Secondly, maybe the string situation in 2.x wasn't as broken as we thought it was. In particular, those who deal with lots of encoded strings seemed to find it handy, and miss it in 3.x. Perhaps strings are more like numbers than we think. We have sep

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Antoine Pitrou
On Thu, 24 Jun 2010 20:07:41 +0100 Michael Foord wrote: > > Although it would require changes for builtin types like file to work > with a new string ABC, right? There is no builtin file type in 3.x. Besides, it is not an ABC-level problem; the IO layer is written in C (although there's still t

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Ian Bicking
On Thu, Jun 24, 2010 at 3:59 PM, Guido van Rossum wrote: > The protocol specs typically go out of their way to specify what byte > values they use for syntactically significant positions (e.g. ':' in > headers, or '/' in URLs), while hand-waving about the meaning of "what > goes in between" since

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Guido van Rossum
I see it a little differently (though there is probably a common concept lurking in here). The protocols you mention are intentionally designed to be encoding-neutral as long as the encoding is an ASCII superset. This covers ASCII itself, Latin-1, Latin-N for other values of N, MacRoman, Microsoft

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Ian Bicking
On Thu, Jun 24, 2010 at 12:38 PM, Bill Janssen wrote: > Here are a couple of ideas I'm taking away from the bytes/string > discussion. > > First, it would probably be a good idea to have a String ABC. > > Secondly, maybe the string situation in 2.x wasn't as broken as we > thought it was. In par

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Brett Cannon
On Thu, Jun 24, 2010 at 12:07, Michael Foord wrote: > On 24/06/2010 19:11, Brett Cannon wrote: >> >> On Thu, Jun 24, 2010 at 10:38, Bill Janssen  wrote: >> [SNIP] >> >>> >>> The language moratorium kind of makes this all theoretical, but building >>> a String ABC still would be a good start, and p

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Michael Foord
On 24/06/2010 19:11, Brett Cannon wrote: On Thu, Jun 24, 2010 at 10:38, Bill Janssen wrote: [SNIP] The language moratorium kind of makes this all theoretical, but building a String ABC still would be a good start, and presumably isn't forbidden by the moratorium. Because a new ABC wo

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Brett Cannon
On Thu, Jun 24, 2010 at 10:38, Bill Janssen wrote: [SNIP] > The language moratorium kind of makes this all theoretical, but building > a String ABC still would be a good start, and presumably isn't forbidden > by the moratorium. Because a new ABC would go into the stdlib (I assume in collections

[Python-Dev] thoughts on the bytes/string discussion

2010-06-24 Thread Bill Janssen
Here are a couple of ideas I'm taking away from the bytes/string discussion. First, it would probably be a good idea to have a String ABC. Secondly, maybe the string situation in 2.x wasn't as broken as we thought it was. In particular, those who deal with lots of encoded strings seemed to find