Replacing sys.stdout and sys.stderr can cause obscure crashes when
trying to write non unicode data. The interpreter is terminated with
SIGINT without any specific error writen on the console.
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f964820e8d0},
{0x559f50, [], SA_RESTORER, 0x7f964820e8d0}, 8) = 0
This happens easily when there is an untrapped exception which should
lead to printing a traceback on stderr.
The only way to prevent UnicodeEncodeErrors is to make sure that
PYTHONIOENCODING is set with the ':replace' suffix and this can only be
done *before* starting the interpreter as the initialization is made
very early on and the encoding cannot be set or modified after.
From the official documentation:
PYTHONIOENCODING
Overrides the encoding used for stdin/stdout/stderr, in the syntax
encodingname:errorhandler. The :errorhandler part is optional and
has the same meaning as in str.encode().
https://docs.python.org/2/using/cmdline.html
https://docs.python.org/3/using/cmdline.html
Of course, for proper encoding of unicode characters, one of the
locale-related environment variables (LC_ALL, LANG, LANGUAGE, etc.) must
be set. Python will use the correct encoding accordingly and
PYTHONIOENCODING will be set to "encoding:replace".
Examples:
$ grep utf8 ~/.Xresources
xterm*utf8: 2
$ env - PYTHONIOENCODING=utf-8:replace python2 -c "print u's\u00e9duisante'"
séduisante
$ env - PYTHONIOENCODING=utf-8:replace python3 -c "print('s\u00e9duisante')"
séduisante
$ env - PYTHONIOENCODING=ascii:replace python2 -c "print u's\u00e9duisante'"
s?duisante
$ env - PYTHONIOENCODING=ascii:replace python3 -c "print('s\u00e9duisante')"
s?duisante
$ env - PYTHONIOENCODING=ISO-8859-1:replace python2 -c "print
u's\u00e9duisante'"
s�duisante
$ env - PYTHONIOENCODING=ISO-8859-1:replace python3 -c
"print('s\u00e9duisante')"
s�duisante
Fixes: 046419a3bf8f ("pwclient: Fix encoding problems")
Signed-off-by: Robin Jarry <[email protected]>
---
v2:
- Always set PYTHONIOENCODING=<enc>:replace (on python 2 and 3) to prevent from
UnicodeEncodeErrors
patchwork/bin/pwclient | 35 ++++++++++++++++++++++++++---------
1 file changed, 26 insertions(+), 9 deletions(-)
diff --git a/patchwork/bin/pwclient b/patchwork/bin/pwclient
index ed0351bf5288..5a7b6723afe3 100755
--- a/patchwork/bin/pwclient
+++ b/patchwork/bin/pwclient
@@ -41,16 +41,7 @@ except ImportError:
import shutil
import re
import io
-import locale
-if sys.version_info.major == 2:
- # hack to make writing unicode to standard output/error work on Python 2
- OUT_ENCODING = (sys.stdout.encoding or locale.getpreferredencoding() or
- os.getenv('PYTHONIOENCODING', 'utf-8'))
- sys.stdout = io.open(sys.stdout.fileno(), mode='w',
- encoding=OUT_ENCODING, errors='replace')
- sys.stderr = io.open(sys.stderr.fileno(), mode='w',
- encoding=OUT_ENCODING, errors='replace')
# Default Patchwork remote XML-RPC server URL
# This script will check the PW_XMLRPC_URL environment variable
@@ -821,5 +812,31 @@ def main():
sys.exit(1)
+def force_io_encoding():
+ """
+ Force PYTHONIOENCODING ":errorhandler" to avoid UnicodeEncodeErrors. The
+ only way to do it is to set the environment variable *before* starting the
+ interpreter. From the python docs:
+
+ PYTHONIOENCODING
+
+ Overrides the encoding used for stdin/stdout/stderr, in the syntax
+ encodingname:errorhandler. The :errorhandler part is optional and has the
+ same meaning as in str.encode().
+
+ Note that this only prevents interpreter crashes, it does not exempt from
+ correctly setting the LANG or LC_ALL variables in order to have valid
+ output.
+ """
+ if 'PYTHONIOENCODING' in os.environ:
+ return
+
+ encoding = sys.stdout.encoding or 'utf-8'
+
+ os.environ['PYTHONIOENCODING'] = encoding + ':replace'
+ os.execvp(sys.executable, [sys.executable] + sys.argv) # no return
+
+
if __name__ == "__main__":
+ force_io_encoding()
main()
--
2.11.0.193.g1d1bdafd6426
_______________________________________________
Patchwork mailing list
[email protected]
https://lists.ozlabs.org/listinfo/patchwork