Hi there,

"stumbling" over a surprising two characters for the single German
character "ß" on the MacOSX, I researched what the situation is on
Ubuntu Linux.

It turns out that the console of all these systems returns non-English
characters as UTF-8 characters.

As a consequence e.g. the German single character "sharp s" ("ß") as an
UTF-8 character consists of 16-bits with a hexadecimal value of "C39F"x
(cf. <http://www.fileformat.info/info/unicode/char/df/index.htm>).

Running the following Rexx program on Windows (codepage 1250) in a shell:

    parse version v
    say v
    a='ß'
    say a "length:" a~length
    say a "a~c2x: " a~c2x
      

yields:

    REXX-ooRexx_4.1.0(MT) 6.03 23 Aug 2010
    ß length: *1*
    ß a~c2x:  *DF*
      

Running the same ooRexx program under Ubuntu (and MacOSX) yields:

    REXX-ooRexx_4.1.0(MT) 6.03 23 Aug 2010
    ß length: *2
    *ß a~c2x:  *C39F
    *

Note that the single character "ß" has suddenly a length of "2" instead
of "1" as under Windows (and has been the case for the past 30 years).

This is a totally different result yielding all of a sudden an
inconsistent behaviour of Rexx programs on different platforms, which
will break quite a few Rexx programs in countries, that have been in a
need to use non-English characters in the past 30 years (practically
everyone living in a country where English is either not the main
language or the only language, i.e. everyone outside of the US, GB,
Australia)! So non-English Rexx programs running on ooRexx on those
systems will mostlikely break, possibly in a very subtle manner.

Although this problem has been known for some time, it is now realizing
on non-Windows-platforms and needs to be addressed ASAP, IMHO!

---rony

P.S.: Out of curiosity I wrote the following NetRexx program which gets
translated to Java (Java uses UTF-16, where "ß" is represented by
16-bits with a value of "00df"x) by the NetRexx compiler, and gets run
under Windows with the codepage set to 1250:

    parse version v
    say v
    a="ß"
    say a "length:" a.length
    say a "a.c2x: " a.c2x
      

output:

    E:\java\scriptJars>java test
    NetRexx 2.05 14 Jan 2005
    ß length: *1*
    ß a.c2x:  DF
      


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

Reply via email to