On 13 Nov 2005 22:57:50 -0800, "MackS" <[EMAIL PROTECTED]> wrote:
>Hello everyone > >I am faced with the following problem. For the first time I've asked >myself "might this actually be easier to code in C rather than in >python?", and I am not looking at device drivers. : ) > >This program is meant to process relatively long strings (10-20 MB) by >selectively modifying small chunks one at a time. Eg, it locates >approx. 1000-2000 characters and modifies them. Currently I was doing >this using a string object but it is getting very slow: although I only >modify a tiny bit of the string at a time, a new entire string gets >created whenever I "merge" it with the rest. Eg, > >shortstr = longstr[beg:end] > ># edit shortstr... > >longstr = longstr[:beg] + shortstr + longstr[end:] # new huge string is >created!! > The usual way is to accumulate the edited short pieces of the new version of longstr in a list and then join them once, if you really need the new longstr in a single piece for something. I.e., (untested sketch) chunks_of_new_longstr = [] for chunk in chunker(longstr): #edit chunk (your shortstr) newlong.append(chunk) # or do several appends of pieces from the editing of a chunk longstr = ''.join(chunks_of_new_longstr) But if you don't really need it except to write it to output and the next thing would be open('longstr.txt','wb').write(longstr) # might want 'w' instead of 'wb' for plain text data then don't bother joining into a new longstr but do open('longstr.txt','wb').writelines(chunks_of_new_longstr) instead. But if you are going to do that, why not just fout = open('longstr.txt','wb') before the loop, and fout.write(chunk) in place of newlong.append(chunk) Of course, if longstr is coming from a file, maybe you can have the chunker operate on a file instead of a longstr in memory. >Can I get over this performance problem without reimplementing the >whole thing using a barebones list object? I though I was being "smart" >by avoiding editing the long list, but then it struck me that I am >creating a second object of the same size when I put the modified >shorter string in place... > I imagine you should be able to change a very few lines to switch between ways of getting your input stream of editable chunks and accumulating your output. OTOH, this is all guesswork without more context ;-) Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list