Rony, Off the top of my head, I would say that this is a product of how you are displaying the string, and or creating the string, not a problem with String(). <grin>
On Sat, Aug 1, 2009 at 8:39 AM, Rony G. Flatscher<rony.flatsc...@wu-wien.ac.at> wrote: > If some const *char string contains UTF chars, and creating a Rexx string of > it, then it seems that the UTF chars get changed to question marks, rather > than letting them untouched. It's hard for me to follow what you are saying here, and I assume you are taking about using the native API, but let's go down to your example below. > where: > ├ñ (umlaut a) is actual the sequence of the two chars "0xffffffc3" > followed > by "0xffffffa4" > ├ (umlaut u) is actual the sequence of the two chars "0xffffffc3" > followed > by "0xffffffb6" The docs for the native APIs, for the String() functions always specify that it is an ASCII-Z string. "0xffffffc3" is not two chars, it is 4. So, let's take the sequence of chars: 0xffffffc3 0xffffffa4 0xffffffc3 0xffffffb6 with a space to make it readable, but no space in the actual sequence of chars. Here is the print out I get: Got retStr: ├ ñ ├ ╢ Length: 16 char at 1 in decimal: 255 char at 2 in decimal: 255 char at 3 in decimal: 255 char at 4 in decimal: 195 char at 5 in decimal: 255 char at 6 in decimal: 255 char at 7 in decimal: 255 char at 8 in decimal: 164 char at 9 in decimal: 255 char at 10 in decimal: 255 char at 11 in decimal: 255 char at 12 in decimal: 195 char at 13 in decimal: 255 char at 14 in decimal: 255 char at 15 in decimal: 255 char at 16 in decimal: 182 Which is exactly correct. Here is the Rexx code: retStr = rbMale~test(self) say 'Got retStr:' retStr say 'Length: ' retStr~length do i = 1 to retStr~length say 'char at' i "in decimal:" retStr~substr(i, 1)~c2d end In the above, rbMale is an ooDialog radio button. I have a 'test' method written in the native API, to return a string. Like so: RexxMethod1(RexxStringObject, bc_test, RexxObjectPtr, obj) { RexxMethodContext *c = context; char retStr[17] = {0xff, 0xff, 0xff, 0xc3, 0xff, 0xff, 0xff, 0xa4, 0xff, 0xff, 0xff, 0xc3, 0xff, 0xff, 0xff, 0xb6, 0x00}; char *str = retStr; //RexxStringObject result = c->String(retStr, 16); //RexxStringObject result = c->String(retStr); RexxStringObject result = c->CString(retStr); return result; } Any of the forms of the String() API above produce the same output. > Using the ooRexx CString(str) function to create a Rexx string of such > strings yields a string that displays two consecutive question marks (??) > for each such umlaut. Depending on how, or where, you are displaying this, it is probably what the display 'thing' is doing. It replaces un-printable characters with question marks. But, as my output above shows, if you look at what each character in the returned string is, converting it to decimal to be sure of what it is, it is exactly what it should be. > It seems that the conversion destroys the embedded UTF > chars. I would have expected that CString(str) would leave those chars > untouched It does leave the chars untouched. <grin> > (such that one could feed such chars back to Java such that > conversion to UTF would yield the original string on the Java side). You can certainly do this, but it takes some forethought. For one thing you have to make sure there is no 0x00 in the middle of the string. But, even if there are the String(str, len) format will work for you. As long as you know the actual length. -- Mark Miesfeld ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel