Re: [vdsm] environment encoding, LC_ALL and vdsm tests

2013-06-25 Thread David Caro Estevez
The changes are made in the jobs (vdsm_unit_tests, vdsm_unit_tests_gerrit, 
vdsm_unit_tests_el).

Please let me know when you solve the problem so I can delete those fixes.



- Original Message -
> From: "Dan Kenigsberg" 
> To: "David Caro Estevez" 
> Cc: "Martin Sivak" , vdsm-devel@lists.fedorahosted.org
> Sent: Sunday, June 23, 2013 11:06:30 AM
> Subject: Re: environment encoding, LC_ALL and vdsm tests
> 
> On Thu, Jun 20, 2013 at 12:39:22PM -0400, David Caro Estevez wrote:
> > 
> > - Original Message -
> > > From: "Dan Kenigsberg" 
> > > To: "Martin Sivak" , dc...@redhat.com
> > > Cc: vdsm-devel@lists.fedorahosted.org
> > > Sent: Thursday, June 20, 2013 3:08:29 PM
> > > Subject: Re: environment encoding, LC_ALL and vdsm tests
> > > 
> > > On Thu, Jun 20, 2013 at 05:50:16AM -0400, Martin Sivak wrote:
> > > > Hi,
> > > > 
> > > > recently I discovered an issue with our Jenkins test environment. It
> > > > was
> > > > failing in testHooks.py because my Gerrit name contains diacritics and
> > > > our
> > > > code tried to decode it as ascii.
> > > > 
> > > > Traceback (most recent call last):
> > > >   File "/usr/lib64/python2.6/unittest.py", line 278, in run
> > > > testMethod()
> > > >   File "/ephemeral0/vdsm_unit_tests_gerrit_el/tests/hooksTests.py",
> > > >   line
> > > >   125, in test_deviceCustomProperties
> > > > params={'customProperty': ' rocks!'})
> > > >   File "/ephemeral0/vdsm_unit_tests_gerrit_el/vdsm/hooks.py", line 70,
> > > >   in
> > > >   _runHooksDir
> > > > scriptenv[k] = unicode(v).encode('utf-8')
> > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> > > > 12:
> > > > ordinal not in range(128)
> > > > 
> > > > The relevant code is here:
> > > > 
> > > > hooks.py:
> > > > 
> > > > 60  scriptenv = os.environ.copy()
> > > > ...
> > > > 69  for k, v in scriptenv.iteritems():
> > > > 70  scriptenv[k] = unicode(v).encode('utf-8')
> > > > 
> > > > My first instinct was to decode it using the proper encoding:
> > > > 
> > > > source_encoding = sys.stdin.encoding or locale.getpreferredencoding()
> > > > for k, v in scriptenv.iteritems():
> > > > scriptenv[k] = v.decode(source_encoding).encode('utf-8')
> > > > 
> > > > But it still did not work. So I tried to print out the environment and
> > > > encodings that are used when make check is being run and got this:
> > > > 
> > > > sys.stdin.encoding == None
> > > > locale.getpreferredencoding() -> ANSI_X3.4-1968
> > > > os.environ['LC_ALL'] == 'C'
> > > > os.environ['LANG'] == 'en_US.UTF-8'
> > > > 
> > > > Please notice the encoding part, my system and terminal are using
> > > > utf-8,
> > > > but vdsm reads the environment values using ANSI. That is obviously
> > > > wrong
> > > > and can't work.
> > > > 
> > > > So i tried to investigate it further and found out we force LC_ALL to C
> > > > in
> > > > vdsmd.init, run_tests.sh.in and run_tests_local.sh.in.
> > > > 
> > > > I also found the commit that introduced this -
> > > > 107644dbad9af250c00e7f25fc51a92c6250d442 - and finally understood where
> > > > the issue was.
> > > > 
> > > > Although I understand the reasons for the patch, I do not agree with
> > > > it. If we are executing other tools and parse their output, we should
> > > > be preparing and passing the updated locale _only_ to those tools. We
> > > > should not be setting the locale we need for parsing stuff to the
> > > > whole vdsm daemon.
> > > 
> > > Since vdsm is not intended for direct human control, I actually like the
> > > idea of turning off all locale noise by a global LC_ALL=C. The
> > > alternative, of setting it to C before each application with parsed
> > > output seems tedious and easily forgotten.
> > > 
> > > >
> > > > Our current practice of setting LC_ALL to C no matter on what terminal
> > > > or system we are starting vdsmd is causing us the above mentioned
> > > > issue, because the environment can (and does) contain data in the
> > > > system encoding. This essentially prevents anybody with utf-8 chars in
> > > > their names to submit anything to vdsm.
> > > 
> > > No doubt that we have to fix it. The easiest hack is to ask our Jenkins
> > > job to clear the Jenkins env vars before calling `make check`. I'm sure
> > > David (CCed) can do it quite easily.
> > 
> > Yes, that should be easy, if you decide to do that, it can be done in
> > 30min (smallest fraction of time for a task).
> 
> Please do that, as a quick mitigation of the real problem.
> It *is* important that people can use their real name when contributing
> to vdsm code.
> 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] environment encoding, LC_ALL and vdsm tests

2013-06-23 Thread Dan Kenigsberg
On Thu, Jun 20, 2013 at 12:39:22PM -0400, David Caro Estevez wrote:
> 
> - Original Message -
> > From: "Dan Kenigsberg" 
> > To: "Martin Sivak" , dc...@redhat.com
> > Cc: vdsm-devel@lists.fedorahosted.org
> > Sent: Thursday, June 20, 2013 3:08:29 PM
> > Subject: Re: environment encoding, LC_ALL and vdsm tests
> > 
> > On Thu, Jun 20, 2013 at 05:50:16AM -0400, Martin Sivak wrote:
> > > Hi,
> > > 
> > > recently I discovered an issue with our Jenkins test environment. It was
> > > failing in testHooks.py because my Gerrit name contains diacritics and our
> > > code tried to decode it as ascii.
> > > 
> > > Traceback (most recent call last):
> > >   File "/usr/lib64/python2.6/unittest.py", line 278, in run
> > > testMethod()
> > >   File "/ephemeral0/vdsm_unit_tests_gerrit_el/tests/hooksTests.py", line
> > >   125, in test_deviceCustomProperties
> > > params={'customProperty': ' rocks!'})
> > >   File "/ephemeral0/vdsm_unit_tests_gerrit_el/vdsm/hooks.py", line 70, in
> > >   _runHooksDir
> > > scriptenv[k] = unicode(v).encode('utf-8')
> > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12:
> > > ordinal not in range(128)
> > > 
> > > The relevant code is here:
> > > 
> > > hooks.py:
> > > 
> > > 60scriptenv = os.environ.copy()
> > > ...
> > > 69for k, v in scriptenv.iteritems():
> > > 70scriptenv[k] = unicode(v).encode('utf-8')
> > > 
> > > My first instinct was to decode it using the proper encoding:
> > > 
> > > source_encoding = sys.stdin.encoding or locale.getpreferredencoding()
> > > for k, v in scriptenv.iteritems():
> > > scriptenv[k] = v.decode(source_encoding).encode('utf-8')
> > > 
> > > But it still did not work. So I tried to print out the environment and
> > > encodings that are used when make check is being run and got this:
> > > 
> > > sys.stdin.encoding == None
> > > locale.getpreferredencoding() -> ANSI_X3.4-1968
> > > os.environ['LC_ALL'] == 'C'
> > > os.environ['LANG'] == 'en_US.UTF-8'
> > > 
> > > Please notice the encoding part, my system and terminal are using utf-8,
> > > but vdsm reads the environment values using ANSI. That is obviously wrong
> > > and can't work.
> > > 
> > > So i tried to investigate it further and found out we force LC_ALL to C in
> > > vdsmd.init, run_tests.sh.in and run_tests_local.sh.in.
> > > 
> > > I also found the commit that introduced this -
> > > 107644dbad9af250c00e7f25fc51a92c6250d442 - and finally understood where
> > > the issue was.
> > > 
> > > Although I understand the reasons for the patch, I do not agree with
> > > it. If we are executing other tools and parse their output, we should
> > > be preparing and passing the updated locale _only_ to those tools. We
> > > should not be setting the locale we need for parsing stuff to the
> > > whole vdsm daemon.
> > 
> > Since vdsm is not intended for direct human control, I actually like the
> > idea of turning off all locale noise by a global LC_ALL=C. The
> > alternative, of setting it to C before each application with parsed
> > output seems tedious and easily forgotten.
> > 
> > >
> > > Our current practice of setting LC_ALL to C no matter on what terminal
> > > or system we are starting vdsmd is causing us the above mentioned
> > > issue, because the environment can (and does) contain data in the
> > > system encoding. This essentially prevents anybody with utf-8 chars in
> > > their names to submit anything to vdsm.
> > 
> > No doubt that we have to fix it. The easiest hack is to ask our Jenkins
> > job to clear the Jenkins env vars before calling `make check`. I'm sure
> > David (CCed) can do it quite easily.
> 
> Yes, that should be easy, if you decide to do that, it can be done in
> 30min (smallest fraction of time for a task).

Please do that, as a quick mitigation of the real problem.
It *is* important that people can use their real name when contributing
to vdsm code.
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] environment encoding, LC_ALL and vdsm tests

2013-06-20 Thread David Caro Estevez

- Original Message -
> From: "Dan Kenigsberg" 
> To: "Martin Sivak" , dc...@redhat.com
> Cc: vdsm-devel@lists.fedorahosted.org
> Sent: Thursday, June 20, 2013 3:08:29 PM
> Subject: Re: environment encoding, LC_ALL and vdsm tests
> 
> On Thu, Jun 20, 2013 at 05:50:16AM -0400, Martin Sivak wrote:
> > Hi,
> > 
> > recently I discovered an issue with our Jenkins test environment. It was
> > failing in testHooks.py because my Gerrit name contains diacritics and our
> > code tried to decode it as ascii.
> > 
> > Traceback (most recent call last):
> >   File "/usr/lib64/python2.6/unittest.py", line 278, in run
> > testMethod()
> >   File "/ephemeral0/vdsm_unit_tests_gerrit_el/tests/hooksTests.py", line
> >   125, in test_deviceCustomProperties
> > params={'customProperty': ' rocks!'})
> >   File "/ephemeral0/vdsm_unit_tests_gerrit_el/vdsm/hooks.py", line 70, in
> >   _runHooksDir
> > scriptenv[k] = unicode(v).encode('utf-8')
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12:
> > ordinal not in range(128)
> > 
> > The relevant code is here:
> > 
> > hooks.py:
> > 
> > 60  scriptenv = os.environ.copy()
> > ...
> > 69  for k, v in scriptenv.iteritems():
> > 70  scriptenv[k] = unicode(v).encode('utf-8')
> > 
> > My first instinct was to decode it using the proper encoding:
> > 
> > source_encoding = sys.stdin.encoding or locale.getpreferredencoding()
> > for k, v in scriptenv.iteritems():
> > scriptenv[k] = v.decode(source_encoding).encode('utf-8')
> > 
> > But it still did not work. So I tried to print out the environment and
> > encodings that are used when make check is being run and got this:
> > 
> > sys.stdin.encoding == None
> > locale.getpreferredencoding() -> ANSI_X3.4-1968
> > os.environ['LC_ALL'] == 'C'
> > os.environ['LANG'] == 'en_US.UTF-8'
> > 
> > Please notice the encoding part, my system and terminal are using utf-8,
> > but vdsm reads the environment values using ANSI. That is obviously wrong
> > and can't work.
> > 
> > So i tried to investigate it further and found out we force LC_ALL to C in
> > vdsmd.init, run_tests.sh.in and run_tests_local.sh.in.
> > 
> > I also found the commit that introduced this -
> > 107644dbad9af250c00e7f25fc51a92c6250d442 - and finally understood where
> > the issue was.
> > 
> > Although I understand the reasons for the patch, I do not agree with
> > it. If we are executing other tools and parse their output, we should
> > be preparing and passing the updated locale _only_ to those tools. We
> > should not be setting the locale we need for parsing stuff to the
> > whole vdsm daemon.
> 
> Since vdsm is not intended for direct human control, I actually like the
> idea of turning off all locale noise by a global LC_ALL=C. The
> alternative, of setting it to C before each application with parsed
> output seems tedious and easily forgotten.
> 
> >
> > Our current practice of setting LC_ALL to C no matter on what terminal
> > or system we are starting vdsmd is causing us the above mentioned
> > issue, because the environment can (and does) contain data in the
> > system encoding. This essentially prevents anybody with utf-8 chars in
> > their names to submit anything to vdsm.
> 
> No doubt that we have to fix it. The easiest hack is to ask our Jenkins
> job to clear the Jenkins env vars before calling `make check`. I'm sure
> David (CCed) can do it quite easily.

Yes, that should be easy, if you decide to do that, it can be done in 30min 
(smallest fraction of time for a task).

> 
> >
> > So I would like to start a discussion about this that will lead to the
> > necessary fixes and change in our current practice :)
> 
> Unfortunately, I have no idea beyond exterminating non-7-bit chars from
> the environment and setting LC_ALL=C in n+1 places.
> 
> The first approach may not be so horrible as it seems: I'm not sure we
> should pass every vdsm env variable to the hook scripts. Passing only
> ascii ones may be good enough.
> 
> Obviously, unicode custom properties should continue to be explicitly
> added, with utf-8 encoding, to the script environment, as this is a
> documented vdsm API.
> 
> Dan.
> 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] environment encoding, LC_ALL and vdsm tests

2013-06-20 Thread Dan Kenigsberg
On Thu, Jun 20, 2013 at 05:50:16AM -0400, Martin Sivak wrote:
> Hi,
> 
> recently I discovered an issue with our Jenkins test environment. It was 
> failing in testHooks.py because my Gerrit name contains diacritics and our 
> code tried to decode it as ascii.
> 
> Traceback (most recent call last):
>   File "/usr/lib64/python2.6/unittest.py", line 278, in run
> testMethod()
>   File "/ephemeral0/vdsm_unit_tests_gerrit_el/tests/hooksTests.py", line 125, 
> in test_deviceCustomProperties
> params={'customProperty': ' rocks!'})
>   File "/ephemeral0/vdsm_unit_tests_gerrit_el/vdsm/hooks.py", line 70, in 
> _runHooksDir
> scriptenv[k] = unicode(v).encode('utf-8')
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12: 
> ordinal not in range(128)
> 
> The relevant code is here:
> 
> hooks.py:
> 
> 60scriptenv = os.environ.copy()
> ...
> 69for k, v in scriptenv.iteritems():
> 70scriptenv[k] = unicode(v).encode('utf-8')
> 
> My first instinct was to decode it using the proper encoding:
> 
> source_encoding = sys.stdin.encoding or locale.getpreferredencoding()
> for k, v in scriptenv.iteritems():
> scriptenv[k] = v.decode(source_encoding).encode('utf-8')
> 
> But it still did not work. So I tried to print out the environment and 
> encodings that are used when make check is being run and got this:
> 
> sys.stdin.encoding == None
> locale.getpreferredencoding() -> ANSI_X3.4-1968
> os.environ['LC_ALL'] == 'C'
> os.environ['LANG'] == 'en_US.UTF-8'
> 
> Please notice the encoding part, my system and terminal are using utf-8, but 
> vdsm reads the environment values using ANSI. That is obviously wrong and 
> can't work.
> 
> So i tried to investigate it further and found out we force LC_ALL to C in 
> vdsmd.init, run_tests.sh.in and run_tests_local.sh.in.
> 
> I also found the commit that introduced this - 
> 107644dbad9af250c00e7f25fc51a92c6250d442 - and finally understood where the 
> issue was.
> 
> Although I understand the reasons for the patch, I do not agree with
> it. If we are executing other tools and parse their output, we should
> be preparing and passing the updated locale _only_ to those tools. We
> should not be setting the locale we need for parsing stuff to the
> whole vdsm daemon.

Since vdsm is not intended for direct human control, I actually like the
idea of turning off all locale noise by a global LC_ALL=C. The
alternative, of setting it to C before each application with parsed
output seems tedious and easily forgotten.

>
> Our current practice of setting LC_ALL to C no matter on what terminal
> or system we are starting vdsmd is causing us the above mentioned
> issue, because the environment can (and does) contain data in the
> system encoding. This essentially prevents anybody with utf-8 chars in
> their names to submit anything to vdsm.

No doubt that we have to fix it. The easiest hack is to ask our Jenkins
job to clear the Jenkins env vars before calling `make check`. I'm sure
David (CCed) can do it quite easily.

>
> So I would like to start a discussion about this that will lead to the
> necessary fixes and change in our current practice :)

Unfortunately, I have no idea beyond exterminating non-7-bit chars from
the environment and setting LC_ALL=C in n+1 places.

The first approach may not be so horrible as it seems: I'm not sure we
should pass every vdsm env variable to the hook scripts. Passing only
ascii ones may be good enough.

Obviously, unicode custom properties should continue to be explicitly
added, with utf-8 encoding, to the script environment, as this is a
documented vdsm API.

Dan.
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] environment encoding, LC_ALL and vdsm tests

2013-06-20 Thread Petr Sebek
Hi,

I had precisely this problem also. I had to change my name to be only in ANSI.
It's not so big deal, but still I have to malform my name so I can't bring
honour to it ;-).

I'm totally for changing this behaviour if it is relatively achievable.

Petr Šebek

- Original Message -
> From: "Martin Sivak" 
> To: vdsm-devel@lists.fedorahosted.org
> Sent: Thursday, June 20, 2013 11:50:16 AM
> Subject: [vdsm] environment encoding, LC_ALL and vdsm tests
> 
> Hi,
> 
> recently I discovered an issue with our Jenkins test environment. It was
> failing in testHooks.py because my Gerrit name contains diacritics and our
> code tried to decode it as ascii.
> 
> Traceback (most recent call last):
>   File "/usr/lib64/python2.6/unittest.py", line 278, in run
> testMethod()
>   File "/ephemeral0/vdsm_unit_tests_gerrit_el/tests/hooksTests.py", line 125,
>   in test_deviceCustomProperties
> params={'customProperty': ' rocks!'})
>   File "/ephemeral0/vdsm_unit_tests_gerrit_el/vdsm/hooks.py", line 70, in
>   _runHooksDir
> scriptenv[k] = unicode(v).encode('utf-8')
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12:
> ordinal not in range(128)
> 
> The relevant code is here:
> 
> hooks.py:
> 
> 60scriptenv = os.environ.copy()
> ...
> 69for k, v in scriptenv.iteritems():
> 70scriptenv[k] = unicode(v).encode('utf-8')
> 
> My first instinct was to decode it using the proper encoding:
> 
> source_encoding = sys.stdin.encoding or locale.getpreferredencoding()
> for k, v in scriptenv.iteritems():
> scriptenv[k] = v.decode(source_encoding).encode('utf-8')
> 
> But it still did not work. So I tried to print out the environment and
> encodings that are used when make check is being run and got this:
> 
> sys.stdin.encoding == None
> locale.getpreferredencoding() -> ANSI_X3.4-1968
> os.environ['LC_ALL'] == 'C'
> os.environ['LANG'] == 'en_US.UTF-8'
> 
> Please notice the encoding part, my system and terminal are using utf-8, but
> vdsm reads the environment values using ANSI. That is obviously wrong and
> can't work.
> 
> So i tried to investigate it further and found out we force LC_ALL to C in
> vdsmd.init, run_tests.sh.in and run_tests_local.sh.in.
> 
> I also found the commit that introduced this -
> 107644dbad9af250c00e7f25fc51a92c6250d442 - and finally understood where the
> issue was.
> 
> Although I understand the reasons for the patch, I do not agree with it. If
> we are executing other tools and parse their output, we should be preparing
> and passing the updated locale _only_ to those tools. We should not be
> setting the locale we need for parsing stuff to the whole vdsm daemon.
> 
> Our current practice of setting LC_ALL to C no matter on what terminal or
> system we are starting vdsmd is causing us the above mentioned issue,
> because the environment can (and does) contain data in the system encoding.
> This essentially prevents anybody with utf-8 chars in their names to submit
> anything to vdsm.
> 
> So I would like to start a discussion about this that will lead to the
> necessary fixes and change in our current practice :)
> 
> --
> Martin Sivák
> msi...@redhat.com
> Red Hat Czech
> RHEV-M SLA / Brno, CZ
> 
> ___
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


[vdsm] environment encoding, LC_ALL and vdsm tests

2013-06-20 Thread Martin Sivak
Hi,

recently I discovered an issue with our Jenkins test environment. It was 
failing in testHooks.py because my Gerrit name contains diacritics and our code 
tried to decode it as ascii.

Traceback (most recent call last):
  File "/usr/lib64/python2.6/unittest.py", line 278, in run
testMethod()
  File "/ephemeral0/vdsm_unit_tests_gerrit_el/tests/hooksTests.py", line 125, 
in test_deviceCustomProperties
params={'customProperty': ' rocks!'})
  File "/ephemeral0/vdsm_unit_tests_gerrit_el/vdsm/hooks.py", line 70, in 
_runHooksDir
scriptenv[k] = unicode(v).encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12: 
ordinal not in range(128)

The relevant code is here:

hooks.py:

60  scriptenv = os.environ.copy()
...
69  for k, v in scriptenv.iteritems():
70  scriptenv[k] = unicode(v).encode('utf-8')

My first instinct was to decode it using the proper encoding:

source_encoding = sys.stdin.encoding or locale.getpreferredencoding()
for k, v in scriptenv.iteritems():
scriptenv[k] = v.decode(source_encoding).encode('utf-8')

But it still did not work. So I tried to print out the environment and 
encodings that are used when make check is being run and got this:

sys.stdin.encoding == None
locale.getpreferredencoding() -> ANSI_X3.4-1968
os.environ['LC_ALL'] == 'C'
os.environ['LANG'] == 'en_US.UTF-8'

Please notice the encoding part, my system and terminal are using utf-8, but 
vdsm reads the environment values using ANSI. That is obviously wrong and can't 
work.

So i tried to investigate it further and found out we force LC_ALL to C in 
vdsmd.init, run_tests.sh.in and run_tests_local.sh.in.

I also found the commit that introduced this - 
107644dbad9af250c00e7f25fc51a92c6250d442 - and finally understood where the 
issue was.

Although I understand the reasons for the patch, I do not agree with it. If we 
are executing other tools and parse their output, we should be preparing and 
passing the updated locale _only_ to those tools. We should not be setting the 
locale we need for parsing stuff to the whole vdsm daemon.

Our current practice of setting LC_ALL to C no matter on what terminal or 
system we are starting vdsmd is causing us the above mentioned issue, because 
the environment can (and does) contain data in the system encoding. This 
essentially prevents anybody with utf-8 chars in their names to submit anything 
to vdsm.

So I would like to start a discussion about this that will lead to the 
necessary fixes and change in our current practice :)

--
Martin Sivák
msi...@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel