Re: [Oorexx-devel] How to portably convert between 8-bit and UTF-8 and vice versa

Rony G. Flatscher Tue, 05 Jul 2011 01:44:22 -0700

Hi Jean-Louis,
>
> 2011/7/4 Rony G. Flatscher <rony.flatsc...@wu-wien.ac.at
> <mailto:rony.flatsc...@wu-wien.ac.at>>
>
>     Hi there,
>
>     in the process of creating an external ooRexx function library, I have
>     sometimes to transport strings as UTF-8, even if non-7-Bit-ASCII
>     characters are part of it (for non-English characters).
>
>
> If you need only to transport utf-8 strings, then  strcpy and strlen
> should do the work. You will work on bytes, not on characters.
> If you need to work on characters, and search for a lightweight
> library, then http://utfcpp.sourceforge.net/ may help. But your
> request was not on that :-)
:)


The problem is as follows: the library is supposed to open the dbus
world to ooRexx programmers. dbus implementations are - they claim for
security reasons - extremely wary about spoofing and therefore check
everything thoroughly. If an argument is wrong for whatever reasons the
message call is not carried out.

The current state is that transporting strings is fine as long as they
only contain 7-Bit-ASCII-characters/bytes, i.e. only English letters.
Once starting to transport German umlauts, which of course is very
common in a German speaking country (as French characters in your
country), then dbus merely disconnects, if detecting that the string is
not properly UTF-8-encoded! This makes ooRexx totally incompatible with
dbus (and the rest of the world that has been using UTF-8 as a standard
encoding).

As ooRexx (unexplainably!) still does not officially support
UTF-8/Unicode (in the meantime the entire world speaks UTF-8/Unicode,
text files are UTF-8/Unicode, arguments are UTF-8/Unicode etc.) I need
some means to at least cater somehow for creating proper UTF-8
encodings. Hence this request for help.

>     Ist there a simple/easy way in C++ how one could create UTF-8 strings
>     from 8-Bit-Strings and convert UTF-8 to 8-Bit-Strings, such that that
>     code compiles for Windows as well as for gcc on the other platforms ?
>
> That's more complicated... ICU supports plenty of character sets, but
> it's big.
> See also the library Glib used by GTK :
> http://developer.gnome.org/glib/stable/glib-Character-Set-Conversion.html.
> If your 8-bit string is always encoded in the current locale encoding
> (C runtime), then functions like
> g_locale_to_utf8 ()
> g_locale_from_utf8 ()
> from Glib are what you need.
Hmm, glib would cover at least GNOME-based Linuxes (plus systems where
gtk-apps got installed to, but this would be merely by chance).

Would you know by any chance whether there are alternatives for Linux,
MacOSX and Windows ? 

---

This would not be problem at all, if ooRexx supported UTF-8/Unicode, as
every modern scripting language does nowadays!

---rony

P.S.: Am even contemplating of using JNI (the Java native interface)
which possesses UTF-8 encodings/decodings out of the box, which means
that the dbus library would have to become a part of BSF4ooRexx. Should
ooRexx ever get UTF-8/Unicode capabilities I could adjust the respective
code then.

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2

_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

Re: [Oorexx-devel] How to portably convert between 8-bit and UTF-8 and vice versa

Reply via email to