On Sun, Sep 7, 2008 at 2:23 PM, Nick Coghlan <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: >> All in all, given the advantage (half the number of allocations) of >> the proposal I think there would have to be *very* good arguments >> against before we reject this outright. I'd like to understand >> Marc-Andre's reasons too. > > As Stefan notes, because of the frequency with which strings are > manipulated in C code via PyString_* / PyUnicode_* calls, it is a data > type where "accept no substitutes" prevails. > > MAL's primary concern appears to be that having Unicode as a plain > PyObject leaves the type more open to subclass-based optimisations that > have been rejected for the builtin types themselves.
Hm. I don't have any particularly insightful imagination as to what those optimizations might be. Have any been implemented (in 3rd party code) in the 8 years that the Unicode object has existed? > Having > PyString/PyBytes as PyVarObjects means that subclasses are more limited > in what they can do. True. > One possibility that occurs to me is to use a PyVarObject variant that > allocates space for an additional void pointer before the variable sized > section of the object. The builtin type would leave that pointer NULL, > but subtypes could perform the second allocation needed to populate it. > > The question is whether the 4-8 bytes wasted per object would be worth > the fact that only one memory allocation would be needed. I believe that 4-8 bytes is more than the overhead of an extra memory allocation from the obmalloc heap. It is probably about the same as the overhead for a memory allocation from the regular malloc heap. So for short strings (of which there are often a lot) it would be more expensive; for longer objects it would probably work out just about the same. There could be a different approach though, whereby the offset from the start of the object to the start of the character array wasn't a constant but a value stored in the class object. (In fact, tp_basicsize could probably be used for this.) It would slow down access to the characters a bit though -- a classic time-space trade-off that would require careful measurement in order to decide which is better. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com