New submission from STINNER Victor:

Python parser (Parser/tokenizer.c) has a translate_into_utf8() function to 
decode a string from the input encoding and encode it to UTF-8.

This function is unnecessary if the input string is already encoded to UTF-8, 
which is something common nowadays. Linux, Mac OS X and many other operating 
systems are now using UTF-8 as the default locale encoding, UTF-8 is the 
default encoding for Python scripts, etc. compile(), eval() and exec() 
functions pass UTF-8 encoded strings to the parser.

Attached patch adds an input_is_utf8 flag to the tokenizer to skip 
translate_into_utf8() if the input string is already encoded to UTF-8.

----------
files: input_is_utf8.patch
keywords: patch
messages: 202331
nosy: benjamin.peterson, haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Parser: don't transcode input string to UTF-8 if it is already encoded 
to UTF-8
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file32526/input_is_utf8.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19519>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to