[Oorexx-devel] What is needed to get Unicode support from/with ooRexx ?
Subject says it all. ---rony -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel
Re: [Oorexx-devel] What is needed to get Unicode support from/with ooRexx ?
Hi Rony, I would say : That depends on your needs... Do you have some use cases in mind ? If you want the existing methods of the native String class support Unicode, then you have to modify the internal representation of the strings : disruptive (for C++ API and external libraries) but good performance : use wide char format utf-16 or utf-32. conservative (for C++ API and external libraries) but less good perf : use utf-8. But once you have that, not sure you made a big progress towards Unicode... If you want access to the character properties, support locales (date format, number format, ...), Unicode in regex, transliteration, etc... then you need a library like ICU, or Java through bsf4oorex (but you know that already :-). No need to make the interpreter kernel dependent on ICU (unless you want that) : one or several wrapper classes will be enough. Now, if you want all these services natively supported by the kernel, then the class String will be adapted, and new classes probably added, and the C++ API adapted. If you want a GUI which is Unicode-enabled : - if you are user of ooDialog, then you have to compile oodialog in wide-char mode, and convert to/from utf-16 at the boundaries (except if you have an ooRexx kernel with native support for utf-16). - if you are user of Java GUI, then you have already solved the problem with bsf4oorexx (I think you manage the conversion from/to utf-16, right ?) - Other GUI exist, but I can't tell a lot. Gtk+ uses utf-8, QT supports Unicode but can't tell more. Can't tell for sqlite, except what I read in the FAQ (ICU optionally supported for case-insensitive comparisons, they don't want to bloat sqlite by default). Personally, I don't need (currently) Unicode. But if I wanted to work on a concrete subject, I would probably work on wrapper classes for ICU, for learning purpose. Jean-Louis 2013/3/13 Rony G. Flatscher rony.flatsc...@wu.ac.at Subject says it all. ---rony -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar___ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel
Re: [Oorexx-devel] What is needed to get Unicode support from/with ooRexx ?
Bonsoir Jean-Louis, thank you very much for your information! On 13.03.2013 13:04, Jean-Louis Faucher wrote: I would say : That depends on your needs... Do you have some use cases in mind ? Well, thinking about working on Unicode encoded (XML) files and finally getting the ability to have all glyphs available that are needed for European information systems, which also means interfacing with databases that are encoding in Unicode. If you want the existing methods of the native String class support Unicode, then you have to modify the internal representation of the strings : disruptive (for C++ API and external libraries) but good performance : use wide char format utf-16 or utf-32. conservative (for C++ API and external libraries) but less good perf : use utf-8. But once you have that, not sure you made a big progress towards Unicode... That does not sound too bad! :) If you want access to the character properties, support locales (date format, number format, ...), Unicode in regex, transliteration, etc... then you need a library like ICU, Yes that would be one of the needed/expected features. or Java through bsf4oorex (but you know that already :-). 8-)) Of course, this is my current fallback using BSF4ooRexx, which by its nature is available for all operating systems ooRexx is officially built for. No need to make the interpreter kernel dependent on ICU (unless you want that) : one or several wrapper classes will be enough. Now, if you want all these services natively supported by the kernel, then the class String will be adapted, and new classes probably added, and the C++ API adapted. Probably that is what I would be really after. Would you have any estimates about the size (code-wise, time-wise) of such an endeavor by any chance, knowing that you have a lot of experience in Unicode? If you want a GUI which is Unicode-enabled : - if you are user of ooDialog, then you have to compile oodialog in wide-char mode, and convert to/from utf-16 at the boundaries (except if you have an ooRexx kernel with native support for utf-16). That would be important for Europeans who have a need to create apps that can handle all European glyphs at the interface. - if you are user of Java GUI, then you have already solved the problem with bsf4oorexx (I think you manage the conversion from/to utf-16, right ?) Yes, that is what I have been doing and recommending so far. (I am using Java's native code, but also java.lang.String constructors, which allow quite some freedom creating UTF strings behind the curtain.) - Other GUI exist, but I can't tell a lot. Gtk+ uses utf-8, QT supports Unicode but can't tell more. Well, the challenge is to use plain ooRexx and create the needed UTF on all operating systems from ooRexx' strings. On Linux it is even mandatory, if one wishes to automate/remote-control Linux (as can be done on Windows using OLE), taking advantage of the widely unknown DBus transport service when sending strings as arguments as I learned while creating an external ooRexx library to support it. Again BSF4ooRexx can (and in the case of DBus effectively) serves as a fallback here. Can't tell for sqlite, except what I read in the FAQ (ICU optionally supported for case-insensitive comparisons, they don't want to bloat sqlite by default). Yes, but then, in a European context one is almost forced to use Unicode encodings for strings stored in databases. Personally, I don't need (currently) Unicode. But if I wanted to work on a concrete subject, I would probably work on wrapper classes for ICU, for learning purpose. Well, it would be probably very important to efficiently implement this for the ooRexx kernel to not lose performance, if possible at all. Again, what would you estimate how much effort such an endeavor would be (to incorporate Unicode into the ooRexx kernel) ? ---rony 2013/3/13 Rony G. Flatscher rony.flatsc...@wu.ac.at mailto:rony.flatsc...@wu.ac.at Subject says it all. ---rony -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar___ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel
Re: [Oorexx-devel] What is needed to get Unicode support from/with ooRexx ?
2013/3/13 Rony G. Flatscher rony.flatsc...@wu.ac.at I would say : That depends on your needs... Do you have some use cases in mind ? Well, thinking about working on Unicode encoded (XML) files and finally getting the ability to have all glyphs available that are needed for European information systems, which also means interfacing with databases that are encoding in Unicode. The favorite encoding for XML is utf-8. For me, no need of special support by ooRexx, if your goal is to read the strings as-is and pass them to the database API (except changing the encoding, if utf-8 not supported by the database API). all glyphs available : for display, I suppose ? because otherwise, I don't see a problem here. No need to make the interpreter kernel dependent on ICU (unless you want that) : one or several wrapper classes will be enough. Now, if you want all these services natively supported by the kernel, then the class String will be adapted, and new classes probably added, and the C++ API adapted. Probably that is what I would be really after. Would you have any estimates about the size (code-wise, time-wise) of such an endeavor by any chance, knowing that you have a lot of experience in Unicode? Sorry Rony, I have no experience of ICU... so no idea :-) ICU is large, but no need to wrap everything. Some services taken from the user guide : Unicode character properties Unicode normalization Code page conversion (encoding) Locale (language + region) Transliteration Date and time Formatting Searching and sorting (collation) Text analysis (positions of words, sentences, paragraphs, line wrapping, regular expressions) Text layout (bidi : left to right, right to left) ... Well, the challenge is to use plain ooRexx and create the needed UTF on all operating systems from ooRexx' strings. On Linux it is even mandatory, if one wishes to automate/remote-control Linux (as can be done on Windows using OLE), taking advantage of the widely unknown DBus transport service when sending strings as arguments as I learned while creating an external ooRexx library to support it. Again BSF4ooRexx can (and in the case of DBus effectively) serves as a fallback here. So I understand that the main need is to convert from the system default encoding to utf-8... And vice-versa. ICU probably not needed here... Each platform supported by ooRexx should have services to do that (Windows and Linux : yes. Others ? I don't know). But that would be the opportunity to analyze in depth the ICU services for code page conversion. Jean-Louis -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar___ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel
Re: [Oorexx-devel] What is needed to get Unicode support from/with ooRexx ?
On Wednesday 13 Mar 2013 10:49:50 Rony G. Flatscher wrote: (1) For ASCII one byte equals one character, but unicode has multi-byte characters. With the string functions being developed for ASCII, they should show many bugs when applied to UTF. (2) The byte representation of UTF multi byte characters is not unique, because permutations of the byte sequence keep the same meaning, e.g. a-Umlaut: a with means the same as with a. This means before comparing UTF characters for equality, one must normalise the byte sequence. Cheers, Uli Subject says it all. ---rony -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel