On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum <gu...@python.org> wrote:
> On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz > > I'd like a version of 'decode' which would give me a type that was, in > every > > respect, unicode, and responded to all protocols exactly as other > > unicode objects (or "str objects", if you prefer py3 nomenclature ;-)) > do, > > but wouldn't actually copy any of that memory unless it really needed to > > (for example, to pass to a C API that expected native wide characters), > and > > that would hold on to the original bytes so that it could produce them on > > demand if encoded to the same encoding again. So, as others in this > thread > > have mentioned, the 'ABC' really implies some stuff about C APIs as well. > > I'm not sure about the exact performance impact of such a class, which is > > why I'd like the ability to implement it *outside* of the stdlib and see > how > > it works on a project, and return with a proposal along with some data. > > There are also different ways to implement this, and other optimizations > > (like ropes) which might be better. > > You can almost do this today, but the lack of things like the > hypothetical > > "__rcontains__" does make it impossible to be totally transparent about > it. > > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). So you're really just > worried about space consumption. I'd like to see a lot of hard memory > profiling data before I got overly worried about that. > It wasn't my profiling, but I seem to recall that Fredrik Lundh specifically benchmarked ElementTree with all-unicode and sometimes-ascii-bytes, and found that using Python 2 strs in some cases provided notable advantages. I know Stefan copied ElementTree in this regard in lxml, maybe he also did a benchmark or knows of one? -- Ian Bicking | http://blog.ianbicking.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com