[Oorexx-devel] What is needed to get Unicode support from/with ooRexx ?

2013-03-13 Thread Rony G. Flatscher
Subject says it all.

---rony


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
___
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel


Re: [Oorexx-devel] What is needed to get Unicode support from/with ooRexx ?

2013-03-13 Thread Jean-Louis Faucher
Hi Rony,

I would say : That depends on your needs... Do you have some use cases in
mind ?

If you want the existing methods of the native String class support
Unicode, then you have to modify the internal representation of the strings
:
disruptive (for C++ API and external libraries) but good performance : use
wide char format utf-16 or utf-32.
conservative (for C++ API and external libraries) but less good perf : use
utf-8.
But once you have that, not sure you made a big progress towards Unicode...

If you want access to the character properties, support locales (date
format, number format, ...), Unicode in regex, transliteration, etc... then
you need a library like ICU, or Java through bsf4oorex (but you know that
already :-).
No need to make the interpreter kernel dependent on ICU (unless you want
that) : one or several wrapper classes will be enough. Now, if you want all
these services natively supported by the kernel, then the class String will
be adapted, and new classes probably added, and the C++ API adapted.

If you want a GUI which is Unicode-enabled :
- if you are user of ooDialog, then you have to compile oodialog in
wide-char mode, and convert to/from utf-16 at the boundaries (except if you
have an ooRexx kernel with native support for utf-16).
- if you are user of Java GUI, then you have already solved the problem
with bsf4oorexx (I think you manage the conversion from/to utf-16, right ?)
- Other GUI exist, but I can't tell a lot. Gtk+ uses utf-8, QT supports
Unicode but can't tell more.

Can't tell for sqlite, except what I read in the FAQ (ICU optionally
supported for case-insensitive comparisons, they don't want to bloat sqlite
by default).

Personally, I don't need (currently) Unicode. But if I wanted to work on a
concrete subject, I would probably work on wrapper classes for ICU, for
learning purpose.

Jean-Louis



2013/3/13 Rony G. Flatscher rony.flatsc...@wu.ac.at

 Subject says it all.

 ---rony



 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_mar
 ___
 Oorexx-devel mailing list
 Oorexx-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/oorexx-devel

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar___
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel


Re: [Oorexx-devel] What is needed to get Unicode support from/with ooRexx ?

2013-03-13 Thread Rony G. Flatscher
Bonsoir Jean-Louis,

thank you very much for your information!


On 13.03.2013 13:04, Jean-Louis Faucher wrote:
 I would say : That depends on your needs... Do you have some use cases in 
 mind ?
Well, thinking about working on Unicode encoded (XML) files and finally getting 
the ability to have
all glyphs available that are needed for European information systems, which 
also means interfacing
with databases that are encoding in Unicode.


 If you want the existing methods of the native String class support Unicode, 
 then you have to
 modify the internal representation of the strings :
 disruptive (for C++ API and external libraries) but good performance : use 
 wide char format utf-16
 or utf-32.
 conservative (for C++ API and external libraries) but less good perf : use 
 utf-8.
 But once you have that, not sure you made a big progress towards Unicode...
That does not sound too bad!
:)

 If you want access to the character properties, support locales (date format, 
 number format,
 ...), Unicode in regex, transliteration, etc... then you need a library like 
 ICU,
Yes that would be one of the needed/expected features.

 or Java through bsf4oorex (but you know that already :-).
8-))

Of course, this is my current fallback using BSF4ooRexx, which by its nature is 
available for all
operating systems ooRexx is officially built for.

 No need to make the interpreter kernel dependent on ICU (unless you want 
 that) : one or
 several wrapper classes will be enough. Now, if you want all these services 
 natively supported by
 the kernel, then the class String will be adapted, and new classes probably 
 added, and the C++ API
 adapted.
Probably that is what I would be really after.

Would you have any estimates about the size (code-wise, time-wise) of such an 
endeavor by any
chance, knowing that you have a lot of experience in Unicode?

 If you want a GUI which is Unicode-enabled :
 - if you are user of ooDialog, then you have to compile oodialog in wide-char 
 mode, and convert
 to/from utf-16 at the boundaries (except if you have an ooRexx kernel with 
 native support for utf-16).
That would be important for Europeans who have a need to create apps that can 
handle all European
glyphs at the interface.

 - if you are user of Java GUI, then you have already solved the problem with 
 bsf4oorexx (I
 think you manage the conversion from/to utf-16, right ?)
Yes, that is what I have been doing and recommending so far. (I am using Java's 
native code, but
also java.lang.String constructors, which allow quite some freedom creating UTF 
strings behind the
curtain.)

 - Other GUI exist, but I can't tell a lot. Gtk+ uses utf-8, QT supports 
 Unicode but can't tell more.
Well, the challenge is to use plain ooRexx and create the needed UTF on all 
operating systems from
ooRexx' strings.

On Linux it is even mandatory, if one wishes to automate/remote-control Linux 
(as can be done on
Windows using OLE), taking advantage of the widely unknown DBus transport 
service when sending
strings as arguments as I learned while creating an external ooRexx library to 
support it.

Again BSF4ooRexx can (and in the case of DBus effectively) serves as a fallback 
here.

 Can't tell for sqlite, except what I read in the FAQ (ICU optionally 
 supported for
 case-insensitive comparisons, they don't want to bloat sqlite by default).
Yes, but then, in a European context one is almost forced to use Unicode 
encodings for strings
stored in databases.

 Personally, I don't need (currently) Unicode. But if I wanted to work on a 
 concrete subject, I
 would probably work on wrapper classes for ICU, for learning purpose.
Well, it would be probably very important to efficiently implement this for the 
ooRexx kernel to not
lose performance, if possible at all.

Again, what would you estimate how much effort such an endeavor would be (to 
incorporate Unicode
into the ooRexx kernel) ?

---rony


 2013/3/13 Rony G. Flatscher rony.flatsc...@wu.ac.at 
 mailto:rony.flatsc...@wu.ac.at

 Subject says it all.

 ---rony

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar___
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel


Re: [Oorexx-devel] What is needed to get Unicode support from/with ooRexx ?

2013-03-13 Thread Jean-Louis Faucher
2013/3/13 Rony G. Flatscher rony.flatsc...@wu.ac.at


 I would say : That depends on your needs... Do you have some use cases in
 mind ?

 Well, thinking about working on Unicode encoded (XML) files and finally
 getting the ability to have all glyphs available that are needed for
 European information systems, which also means interfacing with databases
 that are encoding in Unicode.


The favorite encoding for XML is utf-8.
For me, no need of special support by ooRexx, if your goal is to read the
strings as-is and pass them to the database API (except changing the
encoding, if utf-8 not supported by the database API).
 all glyphs available : for display, I suppose ? because otherwise, I
don't see a problem here.


 No need to make the interpreter kernel dependent on ICU (unless you want
 that) : one or several wrapper classes will be enough. Now, if you want all
 these services natively supported by the kernel, then the class String will
 be adapted, and new classes probably added, and the C++ API adapted.

 Probably that is what I would be really after.

 Would you have any estimates about the size (code-wise, time-wise) of such
 an endeavor by any chance, knowing that you have a lot of experience in
 Unicode?



Sorry Rony, I have no experience of ICU... so no idea :-)

ICU is large, but no need to wrap everything. Some services taken from the
user guide :
Unicode character properties
Unicode normalization
Code page conversion (encoding)
Locale (language + region)
Transliteration
Date and time
Formatting
Searching and sorting (collation)
Text analysis (positions of words, sentences, paragraphs, line wrapping,
regular expressions)
Text layout (bidi : left to right, right to left)
...



 Well, the challenge is to use plain ooRexx and create the needed UTF on
 all operating systems from ooRexx' strings.

 On Linux it is even mandatory, if one wishes to automate/remote-control
 Linux (as can be done on Windows using OLE), taking advantage of the widely
 unknown DBus transport service when sending strings as arguments as I
 learned while creating an external ooRexx library to support it.

 Again BSF4ooRexx can (and in the case of DBus effectively) serves as a
 fallback here.


So I understand that the main need is to convert from the system default
encoding to utf-8... And vice-versa.
ICU probably not needed here... Each platform supported by ooRexx should
have services to do that (Windows and Linux : yes. Others ? I don't know).
But that would be the opportunity to analyze in depth the ICU services for
code page conversion.


Jean-Louis
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar___
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel


Re: [Oorexx-devel] What is needed to get Unicode support from/with ooRexx ?

2013-03-13 Thread Uli Zinngrebe
On Wednesday 13 Mar 2013 10:49:50 Rony G. Flatscher wrote:

(1)
For ASCII one byte equals one character, but unicode has multi-byte 
characters. 
With the string functions being developed for ASCII, they should show many 
bugs when applied to UTF.

(2)
The byte representation of UTF multi byte characters is not unique, because 
permutations of the byte sequence keep the same meaning,
e.g. a-Umlaut:  a with   means the same as   with a.

This means before comparing UTF characters for equality, one must normalise 
the byte sequence.

Cheers, Uli

 Subject says it all.
 
 ---rony
 
 
 
 -- Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_mar
 ___
 Oorexx-devel mailing list
 Oorexx-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/oorexx-devel

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
___
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel