Re: [Python-3000] string C API

Josiah Carlson Fri, 15 Sep 2006 10:43:47 -0700

"Jim Jewett" <[EMAIL PROTECTED]> wrote:
> Interning may get awkward if multiple encodings are allowed within a
> program, regardless of whether they're allowed for single strings.  It
> might make sense to intern only strings that are in the same encoding
> as the source code.  (Or whose values are limited to ASCII?)


Why?  If the text hash function is defined on *code points*, then
interning, or really any arbitrary dictionary lookup is the same as it
has always been.


> There should be only one reference to a string until is constructed,
> and after that, its data should be immutable.  Recoding that results
> in different bytes should not be in-place.  Either it returns a new
> string (no problem) or it doesn't change the databuffer-and-encoding
> pointer until the new databuffer is fully constructed.

What about never recoding?  The benefit of the latin-1/ucs-2/ucs-4
method I previously described is that each of the encodings offer a
minimal representation of the code points that the text object contains. 
Certain operations would require a bit of work to handle the comparison
of code points stored in an x-bit-wide representation with code points
stored in a y-bit-wide representation.


> So adding boilerplate to treat text as bytes "for efficiency" may
> become a standard recipe?  Not so good.

Presumably there is going to be a mechanism to open files as bytes
(reads return bytes), and for things like web servers, file servers, etc.,
serving the content up as just a bunch of bytes is really the only thing
that makes sense.

 - Josiah

_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] string C API

Reply via email to