[Bug 58784] jsub and utf8

2014-08-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

Marc A. Pelletier  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #10 from Marc A. Pelletier  ---
Left without comment for >six months; reopen if the issue is still relevant.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2014-03-25 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #9 from Marc A. Pelletier  ---
Is this still a relevant issue?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #8 from Marc A. Pelletier  ---
Part of the difficulty is that there is a combinatorial explosion of starting
environments depending on more factors than you can shake a stick at (given the
gridengine's propensity to try to "guess" at what you're trying to do, and to
(silently) add a shell anytime it thinks you need to evaluate shell arguments).

The best rule of thumb is "if you need something specific in your environment,
set it explicitly".  I would recommend that one /always/ uses a shell wrapper
that sets the environment; a simple generic one might be:

#! /bin/bash

export STUFF_I_NEED="foobar"
export PATH="/all:/the/places"
exec "$@"

This will set the STUFF_I_NEED then exec to the program given as argument
without needlessly keeping a subshell around.  That same script can then be
reliably used to launch everything in a reliable way.

I *could* make a globally available script that relies on sourcing, say,
.bashrc:

#! /bin/bash

. ~/.bashrc
exec "$@"

Which everyone could then use.  I could even have it invoked implicitly by jsub
at need.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #7 from Tim Landscheidt  ---
(In reply to comment #5)
> [...]
> My assumption (and fear :-)) is that SGE sources ~/.profile before job
> execution, which means that there will be a *lot* of confusion on where to
> configure locales and how they are evaluated.

> I don't want to go down that road if it can be avoided.  Is it possible to
> explicitely set the locale in Python?  Otherwise we could change jsub so that
> users can use qsub's "-v" option to set the locale in the environment:
> [...]

No, we can't as a test on my account with setting LANG to de_DE.UTF-8 in
~/.profile shows:

| scfc@tools-login:~$ qsub -b y -v LANG=it_IT.UTF-8 env
| Your job 1935416 ("env") has been submitted
| scfc@tools-login:~$ fgrep LANG env.o1935416 
| LANG=de_DE.UTF-8
| scfc@tools-login:~$

In bug #48811 we encountered a similar problem: We need "-b y" for binary
programs, but "-b y" adds a (login) shell to the call stack:

| scfc@tools-login:~$ { echo '#!/usr/bin/python'; echo 'import os'; echo 'print
os.environ["LANG"]'; } > env-test.py && chmod +x env-test.py
| scfc@tools-login:~$ qsub -N test-without-b-y -v LANG=it_IT.UTF-8
./env-test.py 
| Your job 1935503 ("test-without-b-y") has been submitted
| scfc@tools-login:~$ qsub -N test-with-b-y -b y -v LANG=it_IT.UTF-8
./env-test.py 
| Your job 1935504 ("test-with-b-y") has been submitted
| scfc@tools-login:~$ grep . test-with*-b-y.*
| test-with-b-y.o1935504:de_DE.UTF-8
| test-without-b-y.o1935503:it_IT.UTF-8

There is a configuration variable login_shells in sge_conf(5), but I'll need to
whip up Toolsbeta in shape to evaluate options.

For the time being I suggest wrapper scripts.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #6 from Merlijn van Deen  ---
Ahh, there's another catch.

valhallasw@tools-login:~$ python ./test.py | tee
Traceback (most recent call last):
  File "./test.py", line 2, in 
print u"\xe4"
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0:
ordinal not in range(128)

valhallasw@tools-login:~$ PYTHONIOENCODING=utf-8 python ./test.py | tee
ä


but that's painful to say the least.


Python 3 has no issues -- it will just use utf-8 if the LANG says so:

(test.py: print("\xe4") -- remember, str in py3 is unicode in py2)

valhallasw@tools-login:~$ python3 ./test.py | tee
ä

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #5 from Tim Landscheidt  ---
(In reply to comment #3)
> [...]
> Setting LANG="en_US.UTF-8" (or any other UTF-8 locale) should solve this
> issue.

It apparently does, because I have:

| export LANG=de_DE.UTF-8

in ~/.profile, and for me:

| scfc@tools-login:~$ diff -u test.out <(./test.sh)
| scfc@tools-login:~$

But my test account shows "LANG=en_US.UTF-8" interactively, but "jsub locale"
gives "LANG=", even after "export LANG".  The same occurs if I set the locale
to non-"en_US.UTF8" before jsub with "export LANG=de_DE.UTF-8".

My assumption (and fear :-)) is that SGE sources ~/.profile before job
execution, which means that there will be a *lot* of confusion on where to
configure locales and how they are evaluated.

I don't want to go down that road if it can be avoided.  Is it possible to
explicitely set the locale in Python?  Otherwise we could change jsub so that
users can use qsub's "-v" option to set the locale in the environment:

| scfc-test@tools-login:~$ qsub -b y -N locale-en -v LANG=en_US.UTF-8 locale
| Your job 1934859 ("locale-en") has been submitted
| scfc-test@tools-login:~$ qsub -b y -N locale-de -v LANG=de_DE.UTF-8 locale
| Your job 1934865 ("locale-de") has been submitted
| scfc-test@tools-login:~$ fgrep LANG locale-*.o*
| locale-de.o1934865:LANG=de_DE.UTF-8
| locale-de.o1934865:LANGUAGE=
| locale-en.o1934859:LANG=en_US.UTF-8
| locale-en.o1934859:LANGUAGE=
| scfc-test@tools-login:~$

However that does not seem to solve the Python error:

| scfc-test@tools-login:~$ cat test.py 
| #!/usr/bin/python
| print u"\xe4"
| scfc-test@tools-login:~$ qsub -b y -N python-locale-en -v LANG=en_US.UTF-8
./test.py 
| Your job 1934872 ("python-locale-en") has been submitted
| scfc-test@tools-login:~$ cat python-locale-en.*
| Traceback (most recent call last):
|   File "./test.py", line 2, in 
| print u"\xe4"
| UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
0: ordinal not in range(128)
| scfc-test@tools-login:~$

And for the dbreps tool I indeed had to use:

| # Wrap sys.stdout into a StreamWriter to allow writing unicode.
| sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)

But that is Python 2.7.3 (cf.
http://stackoverflow.com/questions/1473577/writing-unicode-strings-via-sys-stdout-in-python,
http://pythonhosted.org/kitchen/unicode-frustrations.html,
https://wiki.python.org/moin/PrintFails).

I don't know what the situation is for Python 3+.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

Merlijn van Deen  changed:

   What|Removed |Added

 CC||valhall...@arctus.nl

--- Comment #4 from Merlijn van Deen  ---
Oh, and to reproduce the issues: compare

LANG=C python -c "print u'\xe4'"

to 

LANG=en_US.UTF-8 python -c "print u'\xe4'"

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #3 from Merlijn van Deen  ---
I think this should be a more generic request to make sure the environment on
the exec hosts is the same as what someone has when testing in the interactive
shell.

In any case, the problem is the following:

valhallasw@tools-login:~$ cat > test.sh
#!/bin/bash
locale
valhallasw@tools-login:~$ chmod +x test.sh
valhallasw@tools-login:~$ ./test.sh
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

valhallasw@tools-login:~$ jsub ./test.sh
valhallasw@tools-login:~$ cat test.out
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=



Setting LANG="en_US.UTF-8" (or any other UTF-8 locale) should solve this issue.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #2 from Kunal Mehta (Legoktm)  ---
Partially reproduced it.

Using the first script:

local-legobot@tools-login:~/$ jsub ./test.py && while ! job test > /dev/null;
do sleep 1; done && cat test.{out,err}
Your job 1933479 ("test") has been submitted
ANSI_X3.4-1968

Second script:

local-legobot@tools-login:~/$ jsub ./test.py && while ! job test > /dev/null;
do sleep 1; done && cat test.{out,err}
Your job 1933488 ("test") has been submitted
Talk:Gülen movement

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

--- Comment #1 from Tim Landscheidt  ---
I can't reproduce either claim:

| scfc@tools-login:~$ cat > test.py && chmod +x test.py && rm -f test.{out,err}
&& jsub ./test.py && while ! job test > /dev/null; do sleep 1; done && cat
test.{out,err}
| #!/usr/bin/python3
| import sys
| print(sys.stdout.encoding)
| Your job 1933102 ("test") has been submitted
| UTF-8
| scfc@tools-login:~$ cat > test.py && chmod +x test.py && rm -f test.{out,err}
&& jsub ./test.py && while ! job test > /dev/null; do sleep 1; done && cat
test.{out,err}
| #!/usr/bin/python3
| print("Talk:Gülen movement")
| Your job 1933103 ("test") has been submitted
| Talk:Gülen movement
| scfc@tools-login:~$

Please provide a minimal example.

(Just to clear up some confusion: jsub doesn't actually execute the script; it
just submits it to the job grid aka SGE/OGS.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 58784] jsub and utf8

2013-12-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

Kunal Mehta (Legoktm)  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||legoktm.wikipe...@gmail.com
 Ever confirmed|0   |1

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l