I think the issue is: a.) there is no default locale set in the subiquity installed system. b.) python3 subprocess is doing a 'decode' for each argument in the command list. python2 default encoding *is* supposed to be based on the environment [1], but python3 default encoding is not. python3 is supposed to be utf-8. In the trace above we are down in C code where it is clearly doing 'ascii' encoding.
[1] https://docs.python.org/2/library/sys.html?highlight=getdefaultencoding#sys.getdefaultencoding [2] https://docs.python.org/3/library/stdtypes.html?highlight=decode#str.encode You can see the problem generally below. I only use 'json' as a convienent way to pass in utf-8 characters. You can see that either unset LANG or LANG=C causes the issue. I guess I never thought that subprocess would be converting an argument list of strings to bytes. That does make some sense. So I think there are actually two changes: a.) subiquity (via either curtin or cloud-init) should be setting a utf-8 default locale (all ubuntu generally do that). I'm not sure why the image being installed didnt have one set. b.) cloud-init's subp should probably just do the conversion to bytes of whatever it gets as an argument list for the command, and always assume that strings are to be encoded as utf-8. $ cat go.py #!/usr/bin/python3 import json, subprocess, sys cmd = json.loads(sys.argv[1]) print("cmd=%s" % [x.encode("utf-8") for x in cmd]) subprocess.check_call(cmd) # my default lang is en_US.utf-8 $ ./go.py '["echo", "Andr\u00e9 DSilva"]' cmd=[b'echo', b'Andr\xc3\xa9 DSilva'] André DSilva $ LANG=en_US.utf-8 ./go.py '["echo", "Andr\u00e9 DSilva"]' cmd=[b'echo', b'Andr\xc3\xa9 DSilva'] André DSilva $ env -u LANG ./go.py '["echo", "Andr\u00e9 DSilva"]' cmd=[b'echo', b'Andr\xc3\xa9 DSilva'] Traceback (most recent call last): File "./go.py", line 5, in <module> subprocess.check_call(cmd) File "/usr/lib/python3.6/subprocess.py", line 286, in check_call retcode = call(*popenargs, **kwargs) File "/usr/lib/python3.6/subprocess.py", line 267, in call with Popen(*popenargs, **kwargs) as p: File "/usr/lib/python3.6/subprocess.py", line 709, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1275, in _execute_child restore_signals, start_new_session, preexec_fn) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 4: ordinal not in range(128) $ LANG=C ./go.py '["echo", "Andr\u00e9 DSilva"]' cmd=[b'echo', b'Andr\xc3\xa9 DSilva'] Traceback (most recent call last): File "./go.py", line 5, in <module> subprocess.check_call(cmd) File "/usr/lib/python3.6/subprocess.py", line 286, in check_call retcode = call(*popenargs, **kwargs) File "/usr/lib/python3.6/subprocess.py", line 267, in call with Popen(*popenargs, **kwargs) as p: File "/usr/lib/python3.6/subprocess.py", line 709, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1275, in _execute_child restore_signals, start_new_session, preexec_fn) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 4: ordinal not in range(128) ** Changed in: cloud-init Status: New => Confirmed ** Changed in: cloud-init Importance: Undecided => Medium ** Also affects: cloud-init (Ubuntu) Importance: Undecided Status: New ** Changed in: cloud-init (Ubuntu) Status: New => Confirmed ** Also affects: subiquity Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1751051 Title: UnicodeEncodeError when creating user with non-ascii chars To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1751051/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
