[slurm-dev] Re: Incorrect handling of non-ASCII characters

Loris Bennett Mon, 06 Jun 2016 08:00:29 -0700

Hi Tim,

Tim Wickberg <[email protected]> writes:


> We've created enhancement bug 2791 [0] to address this, and included some of 
> the
> discussion from the list. I agree that ASCII-only is an untenable solution for
> modern software, and would like to work towards proper UTF8 support[1].
>
> Slurm is (mostly) UTF8 compatible. This is why the umlaut in Schrödinger is
> printed correctly.
>
> There are two issues that I plan to address in a future release:
>
> 1) Preventing the output routines from splitting in the middle of a multi-byte
> UTF8 character. Right now, 'squeue' and other commands may truncate a string 
> in
> the middle of a multi-byte UTF8 character which results in garbage (usually
> little rectangles) instead of appropriate characters printed to the screen.
>
> 2) Terminal output width calculation for arbitrary UTF8 strings. This is the
> original reported issue - multi-byte Unicode characters have their widths
> miscalculated which throws the justification off making the terminal output
> 'ugly'.
>
> A further issue that I do not currently plan to address is normalization of
> arbitrary strings[2]. Right now, 'Schrödinger' (umlaut) and 'Schrodinger' (no
> umlaut) are treated as separate accounts, which could cause some confusion.
>
> - Tim
>
> [0] https://bugs.schedmd.com/show_bug.cgi?id=2791
> [1] http://utf8everywhere.org/
> [2] http://unicode.org/faq/normalization.html

Thanks for carrying this forward.

Regarding the normalisation, in out case, this is not an issue.  We get
our account information from the university's central identity
management system, in which account names can only contain alphanumeric
ASCII characters.  The proper names, however, are UTF-8 strings.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

[slurm-dev] Re: Incorrect handling of non-ASCII characters

Reply via email to