[Bug 58784] jsub and utf8

2014-08-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

Marc A. Pelletier m...@uberbox.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #10 from Marc A. Pelletier m...@uberbox.org ---
Left without comment for six months; reopen if the issue is still relevant.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2014-03-25 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #9 from Marc A. Pelletier m...@uberbox.org ---
Is this still a relevant issue?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||legoktm.wikipe...@gmail.com
 Ever confirmed|0   |1

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #1 from Tim Landscheidt t...@tim-landscheidt.de ---
I can't reproduce either claim:

| scfc@tools-login:~$ cat  test.py  chmod +x test.py  rm -f test.{out,err}
 jsub ./test.py  while ! job test  /dev/null; do sleep 1; done  cat
test.{out,err}
| #!/usr/bin/python3
| import sys
| print(sys.stdout.encoding)
| Your job 1933102 (test) has been submitted
| UTF-8
| scfc@tools-login:~$ cat  test.py  chmod +x test.py  rm -f test.{out,err}
 jsub ./test.py  while ! job test  /dev/null; do sleep 1; done  cat
test.{out,err}
| #!/usr/bin/python3
| print(Talk:Gülen movement)
| Your job 1933103 (test) has been submitted
| Talk:Gülen movement
| scfc@tools-login:~$

Please provide a minimal example.

(Just to clear up some confusion: jsub doesn't actually execute the script; it
just submits it to the job grid aka SGE/OGS.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #2 from Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com ---
Partially reproduced it.

Using the first script:

local-legobot@tools-login:~/$ jsub ./test.py  while ! job test  /dev/null;
do sleep 1; done  cat test.{out,err}
Your job 1933479 (test) has been submitted
ANSI_X3.4-1968

Second script:

local-legobot@tools-login:~/$ jsub ./test.py  while ! job test  /dev/null;
do sleep 1; done  cat test.{out,err}
Your job 1933488 (test) has been submitted
Talk:Gülen movement

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #3 from Merlijn van Deen valhall...@arctus.nl ---
I think this should be a more generic request to make sure the environment on
the exec hosts is the same as what someone has when testing in the interactive
shell.

In any case, the problem is the following:

valhallasw@tools-login:~$ cat  test.sh
#!/bin/bash
locale
valhallasw@tools-login:~$ chmod +x test.sh
valhallasw@tools-login:~$ ./test.sh
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

valhallasw@tools-login:~$ jsub ./test.sh
valhallasw@tools-login:~$ cat test.out
LANG=
LANGUAGE=
LC_CTYPE=POSIX
LC_NUMERIC=POSIX
LC_TIME=POSIX
LC_COLLATE=POSIX
LC_MONETARY=POSIX
LC_MESSAGES=POSIX
LC_PAPER=POSIX
LC_NAME=POSIX
LC_ADDRESS=POSIX
LC_TELEPHONE=POSIX
LC_MEASUREMENT=POSIX
LC_IDENTIFICATION=POSIX
LC_ALL=



Setting LANG=en_US.UTF-8 (or any other UTF-8 locale) should solve this issue.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

Merlijn van Deen valhall...@arctus.nl changed:

   What|Removed |Added

 CC||valhall...@arctus.nl

--- Comment #4 from Merlijn van Deen valhall...@arctus.nl ---
Oh, and to reproduce the issues: compare

LANG=C python -c print u'\xe4'

to 

LANG=en_US.UTF-8 python -c print u'\xe4'

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #5 from Tim Landscheidt t...@tim-landscheidt.de ---
(In reply to comment #3)
 [...]
 Setting LANG=en_US.UTF-8 (or any other UTF-8 locale) should solve this
 issue.

It apparently does, because I have:

| export LANG=de_DE.UTF-8

in ~/.profile, and for me:

| scfc@tools-login:~$ diff -u test.out (./test.sh)
| scfc@tools-login:~$

But my test account shows LANG=en_US.UTF-8 interactively, but jsub locale
gives LANG=, even after export LANG.  The same occurs if I set the locale
to non-en_US.UTF8 before jsub with export LANG=de_DE.UTF-8.

My assumption (and fear :-)) is that SGE sources ~/.profile before job
execution, which means that there will be a *lot* of confusion on where to
configure locales and how they are evaluated.

I don't want to go down that road if it can be avoided.  Is it possible to
explicitely set the locale in Python?  Otherwise we could change jsub so that
users can use qsub's -v option to set the locale in the environment:

| scfc-test@tools-login:~$ qsub -b y -N locale-en -v LANG=en_US.UTF-8 locale
| Your job 1934859 (locale-en) has been submitted
| scfc-test@tools-login:~$ qsub -b y -N locale-de -v LANG=de_DE.UTF-8 locale
| Your job 1934865 (locale-de) has been submitted
| scfc-test@tools-login:~$ fgrep LANG locale-*.o*
| locale-de.o1934865:LANG=de_DE.UTF-8
| locale-de.o1934865:LANGUAGE=
| locale-en.o1934859:LANG=en_US.UTF-8
| locale-en.o1934859:LANGUAGE=
| scfc-test@tools-login:~$

However that does not seem to solve the Python error:

| scfc-test@tools-login:~$ cat test.py 
| #!/usr/bin/python
| print u\xe4
| scfc-test@tools-login:~$ qsub -b y -N python-locale-en -v LANG=en_US.UTF-8
./test.py 
| Your job 1934872 (python-locale-en) has been submitted
| scfc-test@tools-login:~$ cat python-locale-en.*
| Traceback (most recent call last):
|   File ./test.py, line 2, in module
| print u\xe4
| UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
0: ordinal not in range(128)
| scfc-test@tools-login:~$

And for the dbreps tool I indeed had to use:

| # Wrap sys.stdout into a StreamWriter to allow writing unicode.
| sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)

But that is Python 2.7.3 (cf.
http://stackoverflow.com/questions/1473577/writing-unicode-strings-via-sys-stdout-in-python,
http://pythonhosted.org/kitchen/unicode-frustrations.html,
https://wiki.python.org/moin/PrintFails).

I don't know what the situation is for Python 3+.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #6 from Merlijn van Deen valhall...@arctus.nl ---
Ahh, there's another catch.

valhallasw@tools-login:~$ python ./test.py | tee
Traceback (most recent call last):
  File ./test.py, line 2, in module
print u\xe4
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0:
ordinal not in range(128)

valhallasw@tools-login:~$ PYTHONIOENCODING=utf-8 python ./test.py | tee
ä


but that's painful to say the least.


Python 3 has no issues -- it will just use utf-8 if the LANG says so:

(test.py: print(\xe4) -- remember, str in py3 is unicode in py2)

valhallasw@tools-login:~$ python3 ./test.py | tee
ä

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #7 from Tim Landscheidt t...@tim-landscheidt.de ---
(In reply to comment #5)
 [...]
 My assumption (and fear :-)) is that SGE sources ~/.profile before job
 execution, which means that there will be a *lot* of confusion on where to
 configure locales and how they are evaluated.

 I don't want to go down that road if it can be avoided.  Is it possible to
 explicitely set the locale in Python?  Otherwise we could change jsub so that
 users can use qsub's -v option to set the locale in the environment:
 [...]

No, we can't as a test on my account with setting LANG to de_DE.UTF-8 in
~/.profile shows:

| scfc@tools-login:~$ qsub -b y -v LANG=it_IT.UTF-8 env
| Your job 1935416 (env) has been submitted
| scfc@tools-login:~$ fgrep LANG env.o1935416 
| LANG=de_DE.UTF-8
| scfc@tools-login:~$

In bug #48811 we encountered a similar problem: We need -b y for binary
programs, but -b y adds a (login) shell to the call stack:

| scfc@tools-login:~$ { echo '#!/usr/bin/python'; echo 'import os'; echo 'print
os.environ[LANG]'; }  env-test.py  chmod +x env-test.py
| scfc@tools-login:~$ qsub -N test-without-b-y -v LANG=it_IT.UTF-8
./env-test.py 
| Your job 1935503 (test-without-b-y) has been submitted
| scfc@tools-login:~$ qsub -N test-with-b-y -b y -v LANG=it_IT.UTF-8
./env-test.py 
| Your job 1935504 (test-with-b-y) has been submitted
| scfc@tools-login:~$ grep . test-with*-b-y.*
| test-with-b-y.o1935504:de_DE.UTF-8
| test-without-b-y.o1935503:it_IT.UTF-8

There is a configuration variable login_shells in sge_conf(5), but I'll need to
whip up Toolsbeta in shape to evaluate options.

For the time being I suggest wrapper scripts.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #8 from Marc A. Pelletier m...@uberbox.org ---
Part of the difficulty is that there is a combinatorial explosion of starting
environments depending on more factors than you can shake a stick at (given the
gridengine's propensity to try to guess at what you're trying to do, and to
(silently) add a shell anytime it thinks you need to evaluate shell arguments).

The best rule of thumb is if you need something specific in your environment,
set it explicitly.  I would recommend that one /always/ uses a shell wrapper
that sets the environment; a simple generic one might be:

#! /bin/bash

export STUFF_I_NEED=foobar
export PATH=/all:/the/places
exec $@

This will set the STUFF_I_NEED then exec to the program given as argument
without needlessly keeping a subshell around.  That same script can then be
reliably used to launch everything in a reliable way.

I *could* make a globally available script that relies on sourcing, say,
.bashrc:

#! /bin/bash

. ~/.bashrc
exec $@

Which everyone could then use.  I could even have it invoked implicitly by jsub
at need.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l