New submission from Florent Xicluna <florent.xicl...@gmail.com>: The function tokenize.detect_encoding() detects the encoding either in the coding cookie or in the BOM. If no encoding is found, it returns 'utf-8':
When result is 'utf-8', there's no (easy) way to know if the encoding was really detected in the file, or if it falls back to the default value. Cases (with utf-8): - UTF-8 BOM found, returns ('utf-8-sig', []) - cookie on 1st line, returns ('utf-8', [line1]) - cookie on 2nd line, returns ('utf-8', [line1, line2]) - no cookie found, returns ('utf-8', [line1, line2]) The proposal is to allow to call the function with a different default value (None or ''), in order to know if the encoding is really detected. For example, this function could be used by the Tools/scripts/findnocoding.py script. Patch attached. ---------- components: Library (Lib) files: detect_encoding_default.diff keywords: patch messages: 115567 nosy: flox priority: normal severity: normal stage: patch review status: open title: add an optional "default" argument to tokenize.detect_encoding type: feature request versions: Python 3.2 Added file: http://bugs.python.org/file18745/detect_encoding_default.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9771> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com