On 29 May 2013 16:38, boB Stepp <robertvst...@gmail.com> wrote: > On Tue, May 28, 2013 at 9:34 PM, Steven D'Aprano <st...@pearwood.info> wrote: >> >> However, a word of warning: although you *can* assemble a new string >> character by character like that, you should not, because it risks being >> very slow. *Painfully* slow. If you want to hear the details, please ask, >> but it risks being slow for much the same reason as the infamous Shlemiel >> the Painter Algorithm: >> >> http://www.joelonsoftware.com/articles/fog0000000319.html > > Okay, I will come out of lurking and ask: What are the details?
Here is the code that Steven was referring to: ham = '' for char in 'spam': ham = ham + char print(ham) What this code does is to create an empty string ham. Then for each character in 'spam' is creates a new string by appending the character to ham. However Python cannot append strings it can only create new strings since strings are immutable*. On the first iteration of the loop the zero length string is combined with the character and a string of length 1 is created. On the second a new string of length 2 is created. On the third a new string of length 3 is created. On the fourth a new string of length 4 is created. So four strings are create with lengths 0, 1, 2, 3, 4. If we did this with N characters then N strings would be created with lengths 0, 1, 2, 3, ... , N. If cost of creating each string is proportional to its length then the cost of the operation as a whole is proportional to 0 + 1 + 2 + 3 + ... + N. The general formula to compute this sum is (N**2 + N)/2. In other words the whole operation is quadratic O(N**2) in the number of characters that are combined. The Shlemiel the painter story describes a similar operation in which the time taken to paint each section increases linearly as more sections get painted. The result is that the time taken for Shlemiel to paint a stretch of road is proportional to the square of its length when it should just be proportional to its length. I don't know exactly how str.join is implemented but it does not use this quadratic algorithm. For example if str.join would first compute the length of the resulting string first then it can allocate memory for exactly one string of that length and copy each substring to the appropriate place (actually I imagine it uses an exponentially resizing buffer but this isn't important). * CPython actually has an optimisation that can append strings in precisely this situation. However it is an implementation detail of CPython that may change and it does not work in other interpreters e.g. Jython. Using this kind of code can damage portability since your program may run fine in CPython but fail in other interpreters. Oscar _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor