[Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches

SourceForge.net Mon, 08 Jan 2007 03:00:19 -0800

Patches item #1629305, was opened at 2007-01-06 10:37
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I 
plan to post separate patches for both "lazy concatenation" and "lazy slices", 
as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy 
slices" patch to be dependent on the "lazy concatenation" patch.  Unicode 
objects are stored differently, and already use a pointer to a 
separately-allocated buffer.   This was the big (and mildly controversial) 
change made by the 8-bit-character "lazy concatenation" patch, and "lazy 
slices" needed it too.  Since Unicode objects already look like that, the 
Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 11:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x +=
y in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most
such operations use short strings, you are likely creating a memory
overhead far greater than the the total length of all the strings



----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 06:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470
_______________________________________________
Patches mailing list
Patches@python.org
http://mail.python.org/mailman/listinfo/patches

[Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches

Reply via email to