Greg Ewing writes:
> The use cases I had in mind for a 1-byte build are those for
> which the alternative would be keeping everything in bytes.
> Applications using a 1-byte build would need to be aware of
> the fact and take care to slice strings at valid places. If
> they were using bytes,
On Wed, 07 Jul 2010 11:13:09 +0200
"M.-A. Lemburg" wrote:
>
> And finally: RAM is cheap and today's CPUs work better with 16- or
> 32-bit values than 8-bit characters.
The latter is wrong. There is no cost in accessing bytes
rather than words on modern CPUs.
(actually, bytes are cheaper overall
M.-A. Lemburg wrote:
Note that using UTF-8 as internal storage format would not work
in Python, since Python is a Unicode producer, i.e. it needs to
be able to generate and work with code points that are not allowed
in UTF-8, e.g. lone surrogates.
Well, it wouldn't strictly be UTF-8, any more
Ronald Oussoren wrote:
>
> On 27 Jun, 2010, at 11:48, Greg Ewing wrote:
>
>> Stefan Behnel wrote:
>>> Greg Ewing, 26.06.2010 09:58:
Would there be any sanity in having an option to compile
Python with UTF-8 as the internal string representation?
>>> It would break Py_UNICODE, because th
Ronald Oussoren, 06.07.2010 16:51:
On 27 Jun, 2010, at 11:48, Greg Ewing wrote:
Stefan Behnel wrote:
Greg Ewing, 26.06.2010 09:58:
Would there be any sanity in having an option to compile Python
with UTF-8 as the internal string representation?
It would break Py_UNICODE, because the internal
On 27 Jun, 2010, at 11:48, Greg Ewing wrote:
> Stefan Behnel wrote:
>> Greg Ewing, 26.06.2010 09:58:
>>> Would there be any sanity in having an option to compile
>>> Python with UTF-8 as the internal string representation?
>> It would break Py_UNICODE, because the internal size of a unicode chara
On Fri, 25 Jun 2010 15:40:52 -0700, Bill Janssen wrote:
> Guido van Rossum wrote:
> > So you're really just worried about space consumption. I'd like to see
> > a lot of hard memory profiling data before I got overly worried about
> > that.
>
> While I've seen some big Web pages, I think the ema
Eric Smith wrote:
But isn't this currently ignored everywhere in python's code?
It's true that code using a utf-8 build would have to be
aware of the fact much more often. But I'm thinking of
applications that would otherwise want to keep all their
strings encoded to save memory. If they do th
On 6/27/2010 5:48 AM, Greg Ewing wrote:
Stefan Behnel wrote:
Greg Ewing, 26.06.2010 09:58:
Would there be any sanity in having an option to compile
Python with UTF-8 as the internal string representation?
It would break Py_UNICODE, because the internal size of a unicode
character would no lo
Stefan Behnel wrote:
Greg Ewing, 26.06.2010 09:58:
Would there be any sanity in having an option to compile
Python with UTF-8 as the internal string representation?
It would break Py_UNICODE, because the internal size of a unicode
character would no longer be fixed.
It's not fixed anyway w
On Sun, Jun 27, 2010 at 8:11 AM, Terry Reedy wrote:
> I can imagine that inter-operation, when appropriate, might work better with
> addition of a couple of missing __rxxx__ methods, such as the mentioned
> __rcontains__. Although adding such would affect the implementation of a
> core syntax fea
The several posts in this and other threads go me to think about text
versus number computing (which I am more familiar with).
For numbers, we have in Python three builtins, the general purpose ints
and floats and the more specialized complex. Two other rational types
can be imported for speci
Greg Ewing writes:
> Would there be any sanity in having an option to compile
> Python with UTF-8 as the internal string representation?
Losing Py_UNICODE as mentioned by Stefan Behnel (IIRC) is just the
beginning of the pain.
If Emacs's experience is any guide, the cost in speed and complexit
Greg Ewing, 26.06.2010 09:58:
Tres Seaver wrote:
I do know for a fact that using a UCS2-compiled Python instead of the
system's UCS4-compiled Python leads to measurable, noticable drop in
memory consumption of long-running webserver processes using Unicode
Would there be any sanity in having
Ian Bicking, 26.06.2010 00:26:
On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum wrote:
On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz
I'd like a version of 'decode' which would give me a type that was, in
every
respect, unicode, and responded to all protocols exactly as other
unicode objec
Tres Seaver wrote:
I do know for a fact that using a UCS2-compiled Python instead of the
system's UCS4-compiled Python leads to measurable, noticable drop in
memory consumption of long-running webserver processes using Unicode
Would there be any sanity in having an option to compile
Python wit
Glyph Lefkowitz wrote:
>
> On Jun 25, 2010, at 5:02 PM, Guido van Rossum wrote:
>
>> But you'd still have to validate it, right? You wouldn't want to go on
>> using what you thought was wrapped UTF-8 if it wasn't actually valid
>> UTF-8 (or you'd be worse off than in Python 2). So you're really j
Guido van Rossum wrote:
> On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz
> wrote:
> >
> > On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote:
> >
> > Regarding the proposal of a String ABC, I hope this isn't going to
> > become a backdoor to reintroduce the Python 2 madness of allowing
> > eq
On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum wrote:
> On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz
> > I'd like a version of 'decode' which would give me a type that was, in
> every
> > respect, unicode, and responded to all protocols exactly as other
> > unicode objects (or "str objects
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Guido van Rossum wrote:
> But you'd still have to validate it, right? You wouldn't want to go on
> using what you thought was wrapped UTF-8 if it wasn't actually valid
> UTF-8 (or you'd be worse off than in Python 2). So you're really just
> worried a
On Jun 25, 2010, at 5:02 PM, Guido van Rossum wrote:
> But you'd still have to validate it, right? You wouldn't want to go on
> using what you thought was wrapped UTF-8 if it wasn't actually valid
> UTF-8 (or you'd be worse off than in Python 2). So you're really just
> worried about space consum
On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz
wrote:
>
> On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote:
>
> Regarding the proposal of a String ABC, I hope this isn't going to
> become a backdoor to reintroduce the Python 2 madness of allowing
> equivalency between text and bytes for *some
On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote:
> Regarding the proposal of a String ABC, I hope this isn't going to
> become a backdoor to reintroduce the Python 2 madness of allowing
> equivalency between text and bytes for *some* strings of bytes and not
> others.
For my part, what I wan
On Fri, Jun 25, 2010 at 11:30 AM, Stephen J. Turnbull wrote:
> Ian Bicking writes:
>
> > I'm proposing these specials would be used in polymorphic functions,
> like
> > the functions in urllib.parse. I would not personally use them in my
> own
> > code (unless of course I was writing my own po
Ian Bicking writes:
> I'm proposing these specials would be used in polymorphic functions, like
> the functions in urllib.parse. I would not personally use them in my own
> code (unless of course I was writing my own polymorphic functions).
>
> This also makes it less important that the obj
On Fri, Jun 25, 2010 at 5:06 AM, Stephen J. Turnbull wrote:
> > So with this idea in mind it makes more sense to me that *specific
> pieces of
> > text* can be reasonably treated as both bytes and text. All the string
> > literals in urllib.parse.urlunspit() for example.
> >
> > The semanti
Ian Bicking writes:
> We've setup a system where we think of text as natively unicode, with
> encodings to put that unicode into a byte form. This is certainly
> appropriate in a lot of cases. But there's a significant class of problems
> where bytes are the native structure. Network protoc
Terry Reedy wrote:
On 6/24/2010 1:38 PM, Bill Janssen wrote:
We have separate types for int,
float, Decimal, etc. But they're all numbers, and they all
cross-operate.
No they do not. Decimal only mixes properly with ints, but not with
anything else
I think there are also some important di
On 6/24/2010 4:59 PM, Guido van Rossum wrote:
But I wouldn't go so far as to claim that interpreting the protocols
as text is wrong. After all we're talking exclusively about protocols
that are designed intentionally to be directly "human readable"
I agree that the claim "':' is just a byte" i
On Thu, Jun 24, 2010 at 2:44 PM, Ian Bicking wrote:
> I think we'll avoid a lot of the confusion that was present with Python 2 by
> not making the coercions transitive. For instance, here's something that
> would work in Python 2:
>
> urlunsplit(('http', 'example.com', '/foo', u'bar=baz', ''))
On 6/24/2010 1:38 PM, Bill Janssen wrote:
Secondly, maybe the string situation in 2.x wasn't as broken as we
thought it was. In particular, those who deal with lots of encoded
strings seemed to find it handy, and miss it in 3.x. Perhaps strings
are more like numbers than we think. We have sep
On Thu, 24 Jun 2010 20:07:41 +0100
Michael Foord wrote:
>
> Although it would require changes for builtin types like file to work
> with a new string ABC, right?
There is no builtin file type in 3.x.
Besides, it is not an ABC-level problem; the IO layer is written in C
(although there's still t
On Thu, Jun 24, 2010 at 3:59 PM, Guido van Rossum wrote:
> The protocol specs typically go out of their way to specify what byte
> values they use for syntactically significant positions (e.g. ':' in
> headers, or '/' in URLs), while hand-waving about the meaning of "what
> goes in between" since
I see it a little differently (though there is probably a common
concept lurking in here).
The protocols you mention are intentionally designed to be
encoding-neutral as long as the encoding is an ASCII superset. This
covers ASCII itself, Latin-1, Latin-N for other values of N, MacRoman,
Microsoft
On Thu, Jun 24, 2010 at 12:38 PM, Bill Janssen wrote:
> Here are a couple of ideas I'm taking away from the bytes/string
> discussion.
>
> First, it would probably be a good idea to have a String ABC.
>
> Secondly, maybe the string situation in 2.x wasn't as broken as we
> thought it was. In par
On Thu, Jun 24, 2010 at 12:07, Michael Foord wrote:
> On 24/06/2010 19:11, Brett Cannon wrote:
>>
>> On Thu, Jun 24, 2010 at 10:38, Bill Janssen wrote:
>> [SNIP]
>>
>>>
>>> The language moratorium kind of makes this all theoretical, but building
>>> a String ABC still would be a good start, and p
On 24/06/2010 19:11, Brett Cannon wrote:
On Thu, Jun 24, 2010 at 10:38, Bill Janssen wrote:
[SNIP]
The language moratorium kind of makes this all theoretical, but building
a String ABC still would be a good start, and presumably isn't forbidden
by the moratorium.
Because a new ABC wo
On Thu, Jun 24, 2010 at 10:38, Bill Janssen wrote:
[SNIP]
> The language moratorium kind of makes this all theoretical, but building
> a String ABC still would be a good start, and presumably isn't forbidden
> by the moratorium.
Because a new ABC would go into the stdlib (I assume in collections
Here are a couple of ideas I'm taking away from the bytes/string
discussion.
First, it would probably be a good idea to have a String ABC.
Secondly, maybe the string situation in 2.x wasn't as broken as we
thought it was. In particular, those who deal with lots of encoded
strings seemed to find
39 matches
Mail list logo