Walter Dörwald wrote: > At least it would remove the quadratic number of calls to > _PyUnicodeUCS2_IsLinebreak(). For each character it would be called only > once.
Correct. However, I very much doubt that this is the cause of the slowdown. > The last part of the patch seems to be more related to bug #1235646. You mean the last chunk (linebuffer=None)? This is just the extension to reset. > With the patch test_pep263 and test_codecs fail (and test_parser, but > this might be unrelated): Oops, I thought I ran the test suite, but apparently with the patch removed. New version uploaded. > Using collections.deque() should get rid of this problem. Alright. There are so many types in Python I've never heard of :-) > You mean, in the test suite? Right. > BTW, why the decode() call? For a Python without unicode? Right. Not sure what people think whether this should still be supported, but I keep supporting it whenever I think of it. > I wonder what happens, if calls to read() and readline() are mixed (e.g. > if I'm reading Fortran source or anything with a fixed line header): > read() would be used to read the first n character (which joins the line > buffer) and readline() reads the rest (which would split it again) etc. > (Of course this could be done via a single readline() call). Then performance would drop again - it should still be correct, though. If this is becomes a frequent problem, we could satisfy read requests from the split lines as well (i.e. join as many lines as you need). However, I would rather expect that callers of read() typically want the entire file, or want to read in large chunks (with no line orientation at all). > But, I think a maxsplit argument for splitlines() woould make sense > independent of this problem. I'm not so sure anymore. It is good for consistency, but I doubt there are actual use cases: how often do you want only the first n lines of some string? Reading the first n lines of a file might be an application, but then, you would rather use .readline() directly. For readline, I don't think there is a clear case for splitting of only the first line (unless you want to return an index instead of the rest string): if the application eventually wants all of the data, we better split it right away into individual strings, instead of dealing with a gradually decreasing trailer. Anyway, I don't think we should go back to C's readline/fgets. This is just too messy wrt. buffering and text vs. binary mode. I wish Python would stop using stdio entirely. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com