recently I discovered an issue with our Jenkins test environment. It was 
failing in testHooks.py because my Gerrit name contains diacritics and our code 
tried to decode it as ascii.

Traceback (most recent call last):
  File "/usr/lib64/python2.6/unittest.py", line 278, in run
  File "/ephemeral0/vdsm_unit_tests_gerrit_el/tests/hooksTests.py", line 125, 
in test_deviceCustomProperties
    params={'customProperty': ' rocks!'})
  File "/ephemeral0/vdsm_unit_tests_gerrit_el/vdsm/hooks.py", line 70, in 
    scriptenv[k] = unicode(v).encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12: 
ordinal not in range(128)

The relevant code is here:


60              scriptenv = os.environ.copy()
69              for k, v in scriptenv.iteritems():
70                  scriptenv[k] = unicode(v).encode('utf-8')

My first instinct was to decode it using the proper encoding:

source_encoding = sys.stdin.encoding or locale.getpreferredencoding()
for k, v in scriptenv.iteritems():
    scriptenv[k] = v.decode(source_encoding).encode('utf-8')

But it still did not work. So I tried to print out the environment and 
encodings that are used when make check is being run and got this:

sys.stdin.encoding == None
locale.getpreferredencoding() -> ANSI_X3.4-1968
os.environ['LC_ALL'] == 'C'
os.environ['LANG'] == 'en_US.UTF-8'

Please notice the encoding part, my system and terminal are using utf-8, but 
vdsm reads the environment values using ANSI. That is obviously wrong and can't 

So i tried to investigate it further and found out we force LC_ALL to C in 
vdsmd.init, run_tests.sh.in and run_tests_local.sh.in.

I also found the commit that introduced this - 
107644dbad9af250c00e7f25fc51a92c6250d442 - and finally understood where the 
issue was.

Although I understand the reasons for the patch, I do not agree with it. If we 
are executing other tools and parse their output, we should be preparing and 
passing the updated locale _only_ to those tools. We should not be setting the 
locale we need for parsing stuff to the whole vdsm daemon.

Our current practice of setting LC_ALL to C no matter on what terminal or 
system we are starting vdsmd is causing us the above mentioned issue, because 
the environment can (and does) contain data in the system encoding. This 
essentially prevents anybody with utf-8 chars in their names to submit anything 
to vdsm.

So I would like to start a discussion about this that will lead to the 
necessary fixes and change in our current practice :)

Martin Sivák
Red Hat Czech

vdsm-devel mailing list

Reply via email to