[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 Marc A. Pelletier changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |WORKSFORME --- Comment #10 from Marc A. Pelletier --- Left without comment for >six months; reopen if the issue is still relevant. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 --- Comment #9 from Marc A. Pelletier --- Is this still a relevant issue? -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 --- Comment #8 from Marc A. Pelletier --- Part of the difficulty is that there is a combinatorial explosion of starting environments depending on more factors than you can shake a stick at (given the gridengine's propensity to try to "guess" at what you're trying to do, and to (silently) add a shell anytime it thinks you need to evaluate shell arguments). The best rule of thumb is "if you need something specific in your environment, set it explicitly". I would recommend that one /always/ uses a shell wrapper that sets the environment; a simple generic one might be: #! /bin/bash export STUFF_I_NEED="foobar" export PATH="/all:/the/places" exec "$@" This will set the STUFF_I_NEED then exec to the program given as argument without needlessly keeping a subshell around. That same script can then be reliably used to launch everything in a reliable way. I *could* make a globally available script that relies on sourcing, say, .bashrc: #! /bin/bash . ~/.bashrc exec "$@" Which everyone could then use. I could even have it invoked implicitly by jsub at need. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 --- Comment #7 from Tim Landscheidt --- (In reply to comment #5) > [...] > My assumption (and fear :-)) is that SGE sources ~/.profile before job > execution, which means that there will be a *lot* of confusion on where to > configure locales and how they are evaluated. > I don't want to go down that road if it can be avoided. Is it possible to > explicitely set the locale in Python? Otherwise we could change jsub so that > users can use qsub's "-v" option to set the locale in the environment: > [...] No, we can't as a test on my account with setting LANG to de_DE.UTF-8 in ~/.profile shows: | scfc@tools-login:~$ qsub -b y -v LANG=it_IT.UTF-8 env | Your job 1935416 ("env") has been submitted | scfc@tools-login:~$ fgrep LANG env.o1935416 | LANG=de_DE.UTF-8 | scfc@tools-login:~$ In bug #48811 we encountered a similar problem: We need "-b y" for binary programs, but "-b y" adds a (login) shell to the call stack: | scfc@tools-login:~$ { echo '#!/usr/bin/python'; echo 'import os'; echo 'print os.environ["LANG"]'; } > env-test.py && chmod +x env-test.py | scfc@tools-login:~$ qsub -N test-without-b-y -v LANG=it_IT.UTF-8 ./env-test.py | Your job 1935503 ("test-without-b-y") has been submitted | scfc@tools-login:~$ qsub -N test-with-b-y -b y -v LANG=it_IT.UTF-8 ./env-test.py | Your job 1935504 ("test-with-b-y") has been submitted | scfc@tools-login:~$ grep . test-with*-b-y.* | test-with-b-y.o1935504:de_DE.UTF-8 | test-without-b-y.o1935503:it_IT.UTF-8 There is a configuration variable login_shells in sge_conf(5), but I'll need to whip up Toolsbeta in shape to evaluate options. For the time being I suggest wrapper scripts. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 --- Comment #6 from Merlijn van Deen --- Ahh, there's another catch. valhallasw@tools-login:~$ python ./test.py | tee Traceback (most recent call last): File "./test.py", line 2, in print u"\xe4" UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0: ordinal not in range(128) valhallasw@tools-login:~$ PYTHONIOENCODING=utf-8 python ./test.py | tee ä but that's painful to say the least. Python 3 has no issues -- it will just use utf-8 if the LANG says so: (test.py: print("\xe4") -- remember, str in py3 is unicode in py2) valhallasw@tools-login:~$ python3 ./test.py | tee ä -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 --- Comment #5 from Tim Landscheidt --- (In reply to comment #3) > [...] > Setting LANG="en_US.UTF-8" (or any other UTF-8 locale) should solve this > issue. It apparently does, because I have: | export LANG=de_DE.UTF-8 in ~/.profile, and for me: | scfc@tools-login:~$ diff -u test.out <(./test.sh) | scfc@tools-login:~$ But my test account shows "LANG=en_US.UTF-8" interactively, but "jsub locale" gives "LANG=", even after "export LANG". The same occurs if I set the locale to non-"en_US.UTF8" before jsub with "export LANG=de_DE.UTF-8". My assumption (and fear :-)) is that SGE sources ~/.profile before job execution, which means that there will be a *lot* of confusion on where to configure locales and how they are evaluated. I don't want to go down that road if it can be avoided. Is it possible to explicitely set the locale in Python? Otherwise we could change jsub so that users can use qsub's "-v" option to set the locale in the environment: | scfc-test@tools-login:~$ qsub -b y -N locale-en -v LANG=en_US.UTF-8 locale | Your job 1934859 ("locale-en") has been submitted | scfc-test@tools-login:~$ qsub -b y -N locale-de -v LANG=de_DE.UTF-8 locale | Your job 1934865 ("locale-de") has been submitted | scfc-test@tools-login:~$ fgrep LANG locale-*.o* | locale-de.o1934865:LANG=de_DE.UTF-8 | locale-de.o1934865:LANGUAGE= | locale-en.o1934859:LANG=en_US.UTF-8 | locale-en.o1934859:LANGUAGE= | scfc-test@tools-login:~$ However that does not seem to solve the Python error: | scfc-test@tools-login:~$ cat test.py | #!/usr/bin/python | print u"\xe4" | scfc-test@tools-login:~$ qsub -b y -N python-locale-en -v LANG=en_US.UTF-8 ./test.py | Your job 1934872 ("python-locale-en") has been submitted | scfc-test@tools-login:~$ cat python-locale-en.* | Traceback (most recent call last): | File "./test.py", line 2, in | print u"\xe4" | UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0: ordinal not in range(128) | scfc-test@tools-login:~$ And for the dbreps tool I indeed had to use: | # Wrap sys.stdout into a StreamWriter to allow writing unicode. | sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout) But that is Python 2.7.3 (cf. http://stackoverflow.com/questions/1473577/writing-unicode-strings-via-sys-stdout-in-python, http://pythonhosted.org/kitchen/unicode-frustrations.html, https://wiki.python.org/moin/PrintFails). I don't know what the situation is for Python 3+. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 Merlijn van Deen changed: What|Removed |Added CC||valhall...@arctus.nl --- Comment #4 from Merlijn van Deen --- Oh, and to reproduce the issues: compare LANG=C python -c "print u'\xe4'" to LANG=en_US.UTF-8 python -c "print u'\xe4'" -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 --- Comment #3 from Merlijn van Deen --- I think this should be a more generic request to make sure the environment on the exec hosts is the same as what someone has when testing in the interactive shell. In any case, the problem is the following: valhallasw@tools-login:~$ cat > test.sh #!/bin/bash locale valhallasw@tools-login:~$ chmod +x test.sh valhallasw@tools-login:~$ ./test.sh LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= valhallasw@tools-login:~$ jsub ./test.sh valhallasw@tools-login:~$ cat test.out LANG= LANGUAGE= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= Setting LANG="en_US.UTF-8" (or any other UTF-8 locale) should solve this issue. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 --- Comment #2 from Kunal Mehta (Legoktm) --- Partially reproduced it. Using the first script: local-legobot@tools-login:~/$ jsub ./test.py && while ! job test > /dev/null; do sleep 1; done && cat test.{out,err} Your job 1933479 ("test") has been submitted ANSI_X3.4-1968 Second script: local-legobot@tools-login:~/$ jsub ./test.py && while ! job test > /dev/null; do sleep 1; done && cat test.{out,err} Your job 1933488 ("test") has been submitted Talk:Gülen movement -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 --- Comment #1 from Tim Landscheidt --- I can't reproduce either claim: | scfc@tools-login:~$ cat > test.py && chmod +x test.py && rm -f test.{out,err} && jsub ./test.py && while ! job test > /dev/null; do sleep 1; done && cat test.{out,err} | #!/usr/bin/python3 | import sys | print(sys.stdout.encoding) | Your job 1933102 ("test") has been submitted | UTF-8 | scfc@tools-login:~$ cat > test.py && chmod +x test.py && rm -f test.{out,err} && jsub ./test.py && while ! job test > /dev/null; do sleep 1; done && cat test.{out,err} | #!/usr/bin/python3 | print("Talk:Gülen movement") | Your job 1933103 ("test") has been submitted | Talk:Gülen movement | scfc@tools-login:~$ Please provide a minimal example. (Just to clear up some confusion: jsub doesn't actually execute the script; it just submits it to the job grid aka SGE/OGS.) -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 58784] jsub and utf8
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784 Kunal Mehta (Legoktm) changed: What|Removed |Added Status|UNCONFIRMED |NEW CC||legoktm.wikipe...@gmail.com Ever confirmed|0 |1 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l