On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum <gu...@python.org> wrote:

> On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz
> > I'd like a version of 'decode' which would give me a type that was, in
> every
> > respect, unicode, and responded to all protocols exactly as other
> > unicode objects (or "str objects", if you prefer py3 nomenclature ;-))
> do,
> > but wouldn't actually copy any of that memory unless it really needed to
> > (for example, to pass to a C API that expected native wide characters),
> and
> > that would hold on to the original bytes so that it could produce them on
> > demand if encoded to the same encoding again. So, as others in this
> thread
> > have mentioned, the 'ABC' really implies some stuff about C APIs as well.
> > I'm not sure about the exact performance impact of such a class, which is
> > why I'd like the ability to implement it *outside* of the stdlib and see
> how
> > it works on a project, and return with a proposal along with some data.
> >  There are also different ways to implement this, and other optimizations
> > (like ropes) which might be better.
> > You can almost do this today, but the lack of things like the
> hypothetical
> > "__rcontains__" does make it impossible to be totally transparent about
> it.
>
> But you'd still have to validate it, right? You wouldn't want to go on
> using what you thought was wrapped UTF-8 if it wasn't actually valid
> UTF-8 (or you'd be worse off than in Python 2). So you're really just
> worried about space consumption. I'd like to see a lot of hard memory
> profiling data before I got overly worried about that.
>

It wasn't my profiling, but I seem to recall that Fredrik Lundh specifically
benchmarked ElementTree with all-unicode and sometimes-ascii-bytes, and
found that using Python 2 strs in some cases provided notable advantages.  I
know Stefan copied ElementTree in this regard in lxml, maybe he also did a
benchmark or knows of one?

-- 
Ian Bicking  |  http://blog.ianbicking.org
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to