New submission from STINNER Victor <[EMAIL PROTECTED]>: I'm trying to fix IDLE to support Unicode (#4008 and #4323). Instead of IDLE builtin charset detection, I tried to use tokenize.detect_encoding() but this function doesn't work with script using Mac new line (b"\r").
Code to detect the encoding of a Python script: ---- def pythonEncoding(filename): with open(filename, 'rb') as fp: encoding, lines = detect_encoding(fp.readline) return encoding ---- Example to reproduce the problem with Mac script: ---- fp = BytesIO(b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re amie")\r') encoding, lines = detect_encoding(fp.readline) print(encoding, lines) ---- => Result: utf-8 [b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re amie")\r'] The problem occurs at "line_string = line.decode('ascii')". Since "line" contains a non-ASCII character (b"\xe8"), the conversion fails. ---------- components: Library (Lib) messages: 76176 nosy: haypo severity: normal status: open title: tokenize.detect_encoding() and Mac newline versions: Python 3.0 _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue4377> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com