New submission from STINNER Victor: Python parser (Parser/tokenizer.c) has a translate_into_utf8() function to decode a string from the input encoding and encode it to UTF-8.
This function is unnecessary if the input string is already encoded to UTF-8, which is something common nowadays. Linux, Mac OS X and many other operating systems are now using UTF-8 as the default locale encoding, UTF-8 is the default encoding for Python scripts, etc. compile(), eval() and exec() functions pass UTF-8 encoded strings to the parser. Attached patch adds an input_is_utf8 flag to the tokenizer to skip translate_into_utf8() if the input string is already encoded to UTF-8. ---------- files: input_is_utf8.patch keywords: patch messages: 202331 nosy: benjamin.peterson, haypo, serhiy.storchaka priority: normal severity: normal status: open title: Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8 type: performance versions: Python 3.4 Added file: http://bugs.python.org/file32526/input_is_utf8.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19519> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com