Hi Uli,

On 14.03.2013 00:20, Uli Zinngrebe wrote:
> On Wednesday 13 Mar 2013 10:49:50 Rony G. Flatscher wrote:
>
> (1)
> For ASCII one byte equals one character, but unicode has multi-byte 
> characters. 
> With the string functions being developed for ASCII, they should show many 
> bugs when applied to UTF.
>
> (2)
> The byte representation of UTF multi byte characters is not unique, because 
> permutations of the byte sequence keep the same meaning,
> e.g. a-Umlaut:  a with "  means the same as  " with a.
>
> This means before comparing UTF characters for equality, one must normalise 
> the byte sequence.
Yes, that should be done by ooRexx by implementing/using the proper Unicode 
functions.

In ooRexx everywhere where caselessness plays a role only Unicode support can 
make this available to
Rexx programmers having a need for non-English-characters.

E.g. "parse caseless" and all "caseless"-methods of the string class.

Another area are Unicode (especially UTF-16) encoded XML files that do appear 
and need to be processed.

(Short of Unicode support in ooRexx itself, once can use BSF4ooRexx and thereby 
use the Java Unicode
support. But again, this might be an overkill for most.)

Regards,

---rony


>> Subject says it all.
>>
>> ---rony


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

Reply via email to