Patches item #1569040, was opened at 2006-10-02 04:04 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1569040&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: Speed up using + for string concatenation Initial Comment: The core concept: adding two strings together no longer returns a pure "string" object. Instead, it returns a "string concatenation" object which holds references to the two strings but does not actually concatenate them... yet. The strings are concatenated only when someone requests the string's value, at which point it allocates all the space it needs and renders the concatenated string all at once. More to the point, if you add multiple strings together (a + b + c), it *doesn't* compute the intermediate strings (a + b). Upsides to this approach: * String concatenation using + is now the fastest way to concatenate strings (that I know of). * In particular, prepending is *way* faster than it used to be. It used to be a pathological case, n! or something. Now it's linear. * Throw off the shackles of "".join([]), you don't need it anymore. * Did I mention it was faster? Downsides to this approach: * Changes how PyStringObjects are stored internally; ob_sval is no longer a char[1], but a char *. This makes each StringObject four bytes larger. * Adds another memory dereference in order to get the value of a string, which is a teensy-weensy slowdown. * Would force a recompile of all C modules that deal directly with string objects (which I imagine is most of them). * Also, *requires* that C modules use the PyString_AS_STRING() macro, rather than casting the object and grabbing ob_sval directly. (I was pleased to see that the Python source was very good about using this macro; if all Python C modules are this well-behaved, this point is happily moot.) * On a related note, the file Mac/Modules/MacOS.c implies that there are Mac-specific Python scripts that peer directly into string objects. These would have to be changed to understand the new semantics. * String concatenation objects are 36 bytes larger than string objects, and this space will often go unreclaimed after the string is rendered. * When rendered, string concatenation objects storing long strings will allocate a second buffer from the heap to store the string. So this adds some minor allocation overhead (though this is offset by the speed gain from the approach overall). * Will definitely need some heavy review before it could go in, in particular I worry I got the semantics surrounding "interned" strings wrong. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1569040&group_id=5470 _______________________________________________ Patches mailing list Patches@python.org http://mail.python.org/mailman/listinfo/patches