Patches item #1629305, was opened at 2007-01-06 10:37 Message generated for change (Comment added) made by lemburg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 11:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 06:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 _______________________________________________ Patches mailing list Patches@python.org http://mail.python.org/mailman/listinfo/patches