Thanks for the excellent feedback and very informative. I guess I just did not consider the memory side of things and did not think about just how much extra addition was having to occur. Additionally I also found this supporting link: http://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation
On 06/02/2012 04:29 PM, Steven D'Aprano wrote: > Jordan wrote: > >> #Another version might look like this: >> >> def join_strings2(string_list): >> final_string = '' >> for string in string_list: >> final_string += string >> print(final_string) >> return final_string > > Please don't do that. This risks becoming slow. REALLY slow. Painfully > slow. Like, 10 minutes versus 3 seconds slow. Seriously. > > The reason for this is quite technical, and the reason why you might > not notice is even more complicated, but the short version is this: > > Never build up a long string by repeated concatenation of short strings. > Always build up a list of substrings first, then use the join method to > assemble them into one long string. > > > Why repeated concatenation is slow: > > Suppose you want to concatenation two strings, "hello" and "world", to > make a new string "helloworld". What happens? > > Firstly, Python has to count the number of characters needed, which > fortunately is fast in Python, so we can ignore it. In this case, we > need 5+5 = 10 characters. > > Secondly, Python sets aside enough memory for those 10 characters, > plus a little bit of overhead: ---------- > > Then it copies the characters from "hello" into the new area: hello----- > > followed by the characters of "world": helloworld > > and now it is done. Simple, right? Concatenating two strings is pretty > fast. You can't get much faster. > > Ah, but what happens if you do it *repeatedly*? Suppose we have SIX > strings we want to concatenate, in a loop: > > words = ['hello', 'world', 'foo', 'bar', 'spam', 'ham'] > result = '' > for word in words: > result = result + word > > How much work does Python have to do? > > Step one: add '' + 'hello', giving result 'hello' > Python needs to copy 0+5 = 5 characters. > > Step two: add 'hello' + 'world', giving result 'helloworld' > Python needs to copy 5+5 = 10 characters, as shown above. > > Step three: add 'helloworld' + 'foo', giving 'helloworldfoo' > Python needs to copy 10+3 = 13 characters. > > Step four: add 'helloworldfoo' + 'bar', giving 'helloworldfoobar' > Python needs to copy 13+3 = 16 characters. > > Step five: add 'helloworldfoobar' + 'spam', giving 'helloworldfoobarspam' > Python needs to copy 16+4 = 20 characters. > > Step six: add 'helloworldfoobarspam' + 'ham', giving > 'helloworldfoobarspamham' > Python needs to copy 20+3 = 23 characters. > > So in total, Python has to copy 5+10+13+16+20+23 = 87 characters, just > to build up a 23 character string. And as the number of loops > increases, the amount of extra work needed just keeps expanding. Even > though a single string concatenation is fast, repeated concatenation > is painfully SLOW. > > In comparison, ''.join(words) one copies each substring once: it > counts out that it needs 23 characters, allocates space for 23 > characters, then copies each substring into the right place instead of > making a whole lot of temporary strings and redundant copying. > > So, join() is much faster than repeated concatenation. But you may > never have noticed. Why not? > > Well, for starters, for small enough pieces of data, everything is > fast. The difference between copying 87 characters (the slow way) and > 23 characters (the fast way) is trivial. > > But more importantly, some years ago (Python 2.4, about 8 years ago?) > the Python developers found a really neat trick that they can do to > optimize string concatenation so it doesn't need to repeatedly copy > characters over and over and over again. I won't go into details, but > the thing is, this trick works well enough that repeated concatenation > is about as fast as the join method MOST of the time. > > Except when it fails. Because it is a trick, it doesn't always work. > And when it does fail, your repeated string concatenation code will > suddenly drop from running in 0.1 milliseconds to a full second or > two; or worse, from 20 seconds to over an hour. (Potentially; the > actual slow-down depends on the speed of your computer, your operating > system, how much memory you have, etc.) > > Because this is a cunning trick, it doesn't always work, and when it > doesn't work, and you have slow code and no hint as to why. > > What can cause it to fail? > > - Old versions of Python, before 2.4, will be slow. > > - Other implementations of Python, such as Jython and IronPython, will > not have the trick, and so will be slow. > > - The trick is highly-dependent on internal details of the memory > management of Python and the way it interacts with the operating > system. So what's fast under Linux may be slow under Windows, or the > other way around. > > - The trick is highly-dependent on specific circumstances to do with > the substrings being added. Without going into details, if those > circumstances are violated, you will have slow code. > > - The trick only works when you are adding strings to the end of the > new string, not if you are building it up from the beginning. > > > So even though your function works, you can't rely on it being fast. > > > > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor