I think we need to be very careful about evaluating the impact of this. We are
really talking about a physical vs. logical representation. For instance, All th
programs listed below are correct as they are, but there is a much pore
important question.
If I write a string containing 16-bit characters does the string get written to
(or read from) a file correctly? If the lineout and char functions use the
length function internally then I doubt that a correct number of bytes are being
written, especially with the char function. If the correct number of bytes are
being written then we only have a physical vs. logical representation problem.
So we need to dig a little deeper to know the full ramifications of this
"problem".
David Ashley
On 02/15/2011 05:40 AM, Rony G. Flatscher wrote:
Hi there,
"stumbling" over a surprising two characters for the single German character
"ß" on the MacOSX, I researched what the situation is on Ubuntu Linux.
It turns out that the console of all these systems returns non-English
characters as UTF-8 characters.
As a consequence e.g. the German single character "sharp s" ("ß") as an UTF-8
character consists of 16-bits with a hexadecimal value of "C39F"x (cf.
<http://www.fileformat.info/info/unicode/char/df/index.htm>).
Running the following Rexx program on Windows (codepage 1250) in a shell:
parse version v
say v
a='ß'
say a "length:" a~length
say a "a~c2x: " a~c2x
yields:
REXX-ooRexx_4.1.0(MT) 6.03 23 Aug 2010
ß length:*1*
ß a~c2x:*DF*
Running the same ooRexx program under Ubuntu (and MacOSX) yields:
REXX-ooRexx_4.1.0(MT) 6.03 23 Aug 2010
ß length:*2
*ß a~c2x:*C39F
*
Note that the single character "ß" has suddenly a length of "2" instead of "1"
as under Windows (and has been the case for the past 30 years).
This is a totally different result yielding all of a sudden an inconsistent
behaviour of Rexx programs on different platforms, which will break quite a
few Rexx programs in countries, that have been in a need to use non-English
characters in the past 30 years (practically everyone living in a country
where English is either not the main language or the only language, i.e.
everyone outside of the US, GB, Australia)! So non-English Rexx programs
running on ooRexx on those systems will mostlikely break, possibly in a very
subtle manner.
Although this problem has been known for some time, it is now realizing on
non-Windows-platforms and needs to be addressed ASAP, IMHO!
---rony
P.S.: Out of curiosity I wrote the following NetRexx program which gets
translated to Java (Java uses UTF-16, where "ß" is represented by 16-bits with
a value of "00df"x) by the NetRexx compiler, and gets run under Windows with
the codepage set to 1250:
parse version v
say v
a="ß"
say a "length:" a.length
say a "a.c2x: " a.c2x
output:
E:\java\scriptJars>java test
NetRexx 2.05 14 Jan 2005
ß length:*1*
ß a.c2x: DF
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel