[issue9771] add an optional "default" argument to tokenize.detect_encoding

Florent Xicluna Sat, 04 Sep 2010 04:34:07 -0700

New submission from Florent Xicluna <florent.xicl...@gmail.com>:

The function tokenize.detect_encoding() detects the encoding either in the 
coding cookie or in the BOM.  If no encoding is found, it returns 'utf-8':


When result is 'utf-8', there's no (easy) way to know if the encoding was 
really detected in the file, or if it falls back to the default value.

Cases (with utf-8):

 - UTF-8 BOM found, returns ('utf-8-sig', [])
 - cookie on 1st line, returns ('utf-8', [line1])
 - cookie on 2nd line, returns ('utf-8', [line1, line2])
 - no cookie found, returns ('utf-8', [line1, line2])


The proposal is to allow to call the function with a different default value 
(None or ''), in order to know if the encoding is really detected.

For example, this function could be used by the Tools/scripts/findnocoding.py 
script.

Patch attached.

----------
components: Library (Lib)
files: detect_encoding_default.diff
keywords: patch
messages: 115567
nosy: flox
priority: normal
severity: normal
stage: patch review
status: open
title: add an optional "default" argument to tokenize.detect_encoding
type: feature request
versions: Python 3.2
Added file: http://bugs.python.org/file18745/detect_encoding_default.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9771] add an optional "default" argument to tokenize.detect_encoding

Reply via email to