Martin v. Löwis wrote: > Walter Dörwald wrote: > >>At least it would remove the quadratic number of calls to >>_PyUnicodeUCS2_IsLinebreak(). For each character it would be called only >>once. > > Correct. However, I very much doubt that this is the cause of the > slowdown.
Probably. We'd need a test with the original Argon source to really know. >>The last part of the patch seems to be more related to bug #1235646. > > You mean the last chunk (linebuffer=None)? This is just the extension > to reset. Ouch, you're right: The part of "cvs diff" was part of my checkout, not your patch. I have so many Python checkouts, that I sometimes forget which is which! ;) >>With the patch test_pep263 and test_codecs fail (and test_parser, but >>this might be unrelated): > > Oops, I thought I ran the test suite, but apparently with the patch > removed. New version uploaded. Looks much better now. >>Using collections.deque() should get rid of this problem. > > Alright. There are so many types in Python I've never heard of :-) The problem is that unicode.splitlines() returns a list, so the push/pop performance advantange of collections.deque might be eaten by having to create a collections.deque object in the first place. >>You mean, in the test suite? > > Right. > >>BTW, why the decode() call? For a Python without unicode? > > Right. Not sure what people think whether this should still be > supported, but I keep supporting it whenever I think of it. OK, so should we add this for 2.4.2 or only for 2.5? Should this really be put into string.py, or should it be a class attribute of unicode? (At least that's what was proposed for the other strings in string.py (string.whitespace etc.) too. >>I wonder what happens, if calls to read() and readline() are mixed (e.g. >>if I'm reading Fortran source or anything with a fixed line header): >>read() would be used to read the first n character (which joins the line >>buffer) and readline() reads the rest (which would split it again) etc. >>(Of course this could be done via a single readline() call). > > Then performance would drop again - it should still be correct, though. > > If this is becomes a frequent problem, we could satisfy read requests > from the split lines as well (i.e. join as many lines as you need). > However, I would rather expect that callers of read() typically want > the entire file, or want to read in large chunks (with no line > orientation at all). Agreed! Don't fix a bug that hasn't been reported! ;) >>But, I think a maxsplit argument for splitlines() woould make sense >>independent of this problem. > > I'm not so sure anymore. It is good for consistency, but I doubt there > are actual use cases: how often do you want only the first n lines > of some string? Reading the first n lines of a file might be an > application, but then, you would rather use .readline() directly. Not every unicode string is read from a StreamReader. > For readline, I don't think there is a clear case for splitting of > only the first line (unless you want to return an index instead of > the rest string): if the application eventually wants all of the > data, we better split it right away into individual strings, instead > of dealing with a gradually decreasing trailer. True, this would be best for a readline loop. Another solution would be to have a unicode.itersplitlines() and store the iterator. Then we wouldn't need a maxsplit because you simply can stop iterating once you have what you want. > Anyway, I don't think we should go back to C's readline/fgets. This > is just too messy wrt. buffering and text vs. binary mode. I don't know about C's readline, but StreamReader.read() and StreamReader.readline() are messy enough. But at least it's something we can fix ourselves. > I wish > Python would stop using stdio entirely. So reverting to the 2.3 behaviour for simple codecs is out? Bye, Walter Dörwald _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com