Josiah Carlson schrieb: >> That places a burden on all creators of strings to ensure >> that they are in the minimal format, which could be >> inconvenient for some operations, e.g. taking a substring >> could require making an extra pass to re-code the data. > > If Martin says it's not a big deal, I'm not really all that concerned.
I was thinking about codecs specifically: they often need to make multiple passes anyway. In general, only measurements can tell the performance impacts of some design decision (e.g. it's non-obvious how often the various string operations occur, and what the performance impact is). There is also an issue of convenience here; however, with three different representations, library functions could be provided to support all cases. > It is ultimately about space savings, and in the case of names (since > all will be 8-bit), perhaps even a bit faster to look up in the > interning table (I believe it is easier to hash 8 chars than 8 shorts). That you need to demonstrate through profiling. First, strings likely continue to keep their hash, and then it seems plausible that the cost for hashing is in the computation and the loop, not in the memory access, and that the computation is carried out in 32-bit registers regardless of character width. Regards, Martin _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com