Bugs item #706595, was opened at 2003-03-19 20:02 Message generated for change (Comment added) made by facundobatista You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=706595&group_id=5470
Category: Python Library Group: Python 2.2.2 Status: Open Resolution: None Priority: 5 Submitted By: Todd Reed (toddreed) Assigned to: M.-A. Lemburg (lemburg) Summary: codecs.open and iterators Initial Comment: Greg Aumann originally posted this problem in comp.lang.python on Nov 4, 2002, but I could not find a bug report. I've simply copied his news post, which explains the problem: ----------- Recently I figured out how to use iterators and generators. Quite easy to use and a great improvement. But when I refactored some of my code I came across a discrepancy that seems like it must be a bug. If you use the old file reading idiom with a codec the lines are converted to unicode but if you use the new iterators idiom then they retain the original encoding and the line is returned in non unicode strings. Surely using the new "for line in file:" idiom should give the same result as the old, "while 1: ...." I came across this when using the pythonzh Chinese codecs but the below code uses the cp1252 encoding to illustrate the problem because everyone should have those codecs. The symptoms are the same with both codecs. I am using python 2.2.2 on win2k. Is this definitely a bug, or is it an undocumented 'feature' of the codecs module? Greg Aumann The following code illustrates the problem: ------------------------------------------------------------------------ """Check readline iterator using a codec.""" import codecs fname = 'tmp.txt' f = file(fname, 'w') for i in range(0x82, 0x8c): f.write( '%x, %s\n' % (i, chr(i))) f.close() def test_iter(): print '\ntesting codec iterator.' f = codecs.open(fname, 'r', 'cp1252') for line in f: l = line.rstrip() print repr(l) print repr(l.decode('cp1252')) f.close() def test_readline(): print '\ntesting codec readline.' f = codecs.open(fname, 'r', 'cp1252') while 1: line = f.readline() if not line: break l = line.rstrip() print repr(l) try: print repr(l.decode('cp1252')) except AttributeError, msg: print 'AttributeError', msg f.close() test_iter() test_readline() ------------------------------------------------------------------------ This code gives the following output: ------------------------------------------------------------------------ testing codec iterator. '82, \x82' u'82, \u201a' '83, \x83' u'83, \u0192' '84, \x84' u'84, \u201e' '85, \x85' u'85, \u2026' '86, \x86' u'86, \u2020' '87, \x87' u'87, \u2021' '88, \x88' u'88, \u02c6' '89, \x89' u'89, \u2030' '8a, \x8a' u'8a, \u0160' '8b, \x8b' u'8b, \u2039' testing codec readline. u'82, \u201a' AttributeError 'unicode' object has no attribute 'decode' u'83, \u0192' AttributeError 'unicode' object has no attribute 'decode' u'84, \u201e' AttributeError 'unicode' object has no attribute 'decode' u'85, \u2026' AttributeError 'unicode' object has no attribute 'decode' u'86, \u2020' AttributeError 'unicode' object has no attribute 'decode' u'87, \u2021' AttributeError 'unicode' object has no attribute 'decode' u'88, \u02c6' AttributeError 'unicode' object has no attribute 'decode' u'89, \u2030' AttributeError 'unicode' object has no attribute 'decode' u'8a, \u0160' AttributeError 'unicode' object has no attribute 'decode' u'8b, \u2039' AttributeError 'unicode' object has no attribute 'decode' ------------------------------------------------------------------------ ---------------------------------------------------------------------- >Comment By: Facundo Batista (facundobatista) Date: 2005-01-15 14:38 Message: Logged In: YES user_id=752496 Can not test it so far, all I got is: testing codec iterator. u'82, \u201a' Traceback (most recent call last): ... File "C:\Python24\lib\encodings\cp1252.py", line 22, in decode return codecs.charmap_decode(input,errors,decoding_map) UnicodeEncodeError: 'ascii' codec can't encode character u'\u201a' in position 4: ordinal not in range(128) I'm on Win2k, sp2, with Py2.4 ---------------------------------------------------------------------- Comment By: Facundo Batista (facundobatista) Date: 2005-01-15 14:38 Message: Logged In: YES user_id=752496 Please, could you verify if this problem persists in Python 2.3.4 or 2.4? If yes, in which version? Can you provide a test case? If the problem is solved, from which version? Note that if you fail to answer in one month, I'll close this bug as "Won't fix". Thank you! . Facundo ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-20 06:35 Message: Logged In: YES user_id=38388 That's a bug in the iterator support which was added to the codecs module: the .next() methods should not call the .next() methods on the reader directly, but instead redirect to the .readline() method. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=706595&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com