New submission from Thomas Barnet-Lamb <tbarnetl...@gmail.com>: It appears that StreamReader's readlines method behaves in a strange manner if the StreamReader has, in a previous read operation, decoded more characters than the user asked for; this happens when both the chars and size parameters are used, but only in some circumstances.
See the following: Python 2.7.2 (default, Jun 26 2011, 02:56:25) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import codecs >>> >>> ## First make a file ... with codecs.open('temp.tmp','wb', encoding='utf8') as f: ... f.write(u'This\u00ab is a test line\nThis is another test line\n') ... >>> >>> ## Now open it for reading ... UTF8Reader = codecs.getreader('utf-8') >>> with UTF8Reader(codecs.open('temp.tmp','rb')) as f: ... print(repr(f.read(size=5, chars=5))) ... print(f.readlines()) ... u'This\xab' [u' is '] # The expected output is # u'This\xab' # [u' is a test line\n', u'This is another test line\n'] I believe the culprit is codecs.py, line 466-467 (the two starred lines below). I think they ought to be replaced with 'pass'. if chars < 0: if size < 0: * if self.charbuffer: * break elif len(self.charbuffer) >= size: break Best wishes, Thomas PS - I will apologize in advance for any oversights or mistakes in the formatting etc. of this bug report---this is my first time! ---------- components: Unicode messages: 139457 nosy: Thomas.Barnet-Lamb priority: normal severity: normal status: open title: StreamReader Readlines versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12446> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com