Walter Dörwald wrote: > Martin v. Löwis wrote: > >> Walter Dörwald wrote: >> >>> I think a maxsplit argument (just as for unicode.split()) would help >>> too. >> >> >> Correct - that would allow to get rid of the quadratic part. > > > OK, such a patch should be rather simple. I'll give it a try.
Actually, on a second thought - it would not remove the quadratic aspect. You would still copy the rest string completely on each split. So on the first split, you copy N lines (one result line, and N-1 lines into the rest string), on the second split, N-2 lines, and so on, totalling N*N/2 line copies again. The only thing you save is the join (as the rest is already joined), and the IsLineBreak calls (which are necessary only for the first line). Please see python.org/sf/1268314; it solves the problem by keeping the splitlines result. It only invokes IsLineBreak once per character, and also copies each character only once, and allocates each line only once, totalling in O(N) for these operations. It still does contain a quadratic operation: the lines are stored in a list, and the result list is removed from the list with del lines[0]. This copies N-1 pointers, result in N*N/2 pointer copies. That should still be much faster than the current code. > unicodelinebreaks = u"".join(unichr(c) for c in xrange(0, > sys.maxunicode) if unichr(c).islinebreak()) That is very inefficient. I would rather add a static list to the string module, and have a test that says assert str.unicodelinebreaks == u"".join(ch for ch in (unichr(c) for c in xrange(0, sys.maxunicode)) if unicodedata.bidirectional(ch)=='B' or unicodedata.category(ch)=='Zl') unicodelinebreaks could then be defined as # u"\r\n\x1c\x1d\x1e\x85\u2028\u2029 '\n\r\x1c\x1d\x1e\xc2\x85\xe2\x80\xa8\xe2\x80\xa9'.decode("utf-8") > OK, this would mean we'd have to distinguish between a direct call to > read() and one done by readline() (which we do anyway through the > firstline argument). See my patch. If we have cached lines, we don't need to call .read at all. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com