"Guido van Rossum" <[EMAIL PROTECTED]> wrote: > OK, point taken, for this particular set of parameters (building a 16 > MB string from 1K identical blocks). > > But how much slower will the list.append version be if the blocks are > 10 bytes instead of 1024? That could make a huge difference. (In fact, > I timed something similar to what you posted, and the doubling > approach is actually faster when the buffer is 256 bytes or less. > > My conclusion: we need to agree on a real benchmark before giving up.
In my programming efforts, I've found two cases which use ''.join(strings) quite often: 1. socket reads 2. content generation In the socket reading case, I tend to use s.read(4096) or so, though I have seen calls in the 512-65536 range. Generally it all depends on how much data the particular application expects to be reading at any one time. Also in my experience, content generation tends to be a bunch of relatively small strings (maybe 10-100 bytes), which also tends to kill the string += operation. Regardless of what does end up being faster in a microbenchmark (which I agree we should have to compare and contrast certain approaches from a performance perspective), from a memory allocator perspective, the fewer reallocs that are necessary to come up with a single string-like representation of the data, I think, the better, as reallocs do tend to fragment address space (an issue I've had to deal with recently). > > Note that removing the string[:] copy in the list.append > > version only reduces the running time by about .07 seconds. > > That's because a string slice that returns the whole string is > optimized to an INCREF operation. So you were really copying the same > buffer over and over, which adds to locality and makes a huge > difference in memory performance. Good point. Making the input 1025 bytes, and performing block[:-1] resulted in a running time of 13.94 seconds. Doing a similar thing for the x += x case, making it x += x[:-1], pushed that one to 11.69 seconds. And finally, doing the same thing with the array version, x.extend(x[:-1]) gets 11.68 seconds. - Josiah _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
