https://bugzilla.wikimedia.org/show_bug.cgi?id=58784

       Web browser: ---
            Bug ID: 58784
           Summary: jsub and utf8
           Product: Wikimedia Labs
           Version: unspecified
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: Unprioritized
         Component: tools
          Assignee: m...@uberbox.org
          Reporter: sigm...@gmail.com
                CC: benap...@gmail.com, t...@tim-landscheidt.de
    Classification: Unclassified
   Mobile Platform: ---

A Python 3 script containing this code was executed with jsub:

    import sys
    print(sys.stdout.encoding)

The resulting .out file contained "ANSI_X3.4-1968".
Normally, people set the encoding to utf8. When people assume that the encoding
is utf8, but it isn't, terrible things happen.

Another Python 3 script containing this code was executed with jsub:

    print("Talk:Gülen movement")

The resulting .err file contained this:

    Traceback (most recent call last):
      File "...", line 5, in <module>
        print("Talk:G\xfclen movement")
    UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position
6: ordinal not in range(128)

jsub is written in Perl, which is perfectly capable of using utf8 as its output
encoding. Unicode is important enough to all of us, which leads me to propose
that jsub be edited for this.

I am not an expert with Perl, but I would try to add "use utf8;\nuse open
qw/:std :utf8/;" to the top of the file, right under "use warnings;".

On a slightly related note, scripts running as regular CGI also use the
"ANSI_X3.4-1968" encoding. This may be out of scope of this bug though.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to