Thanks for clearing that up. I took a look at the log for the file and saw that tridge expected the 'len' argument to init_unistr2() to be the character length, not the byte length of the string. So it appears the callers will have to be fixed, not the function as I thought.
Would be good to have a function that calculated the character length after conversion to UCS2 since it's much more efficient to calculate (/2) than that of a multi-byte charset. Maybe there is.. need to take a look. Thanks, Shirish On Fri, 14 Feb 2003, Gerald (Jerry) Carter wrote: > >On Thu, 13 Feb 2003, Shirish Kalele wrote: > >> >> In init_unistr2, the string length for the UNISTR2 structure seems to >> >> be set equal to the number of bytes occupied by the string when >> >> encoded in the Unix charset (i.e. the value returned by strlen()). >> >> This is not necessarily the number of characters in the string (given >> >> UTF-8 and other variable-byte charsets). >> >> >> >> Shouldn't this actually be set to half the number of bytes occupied >> >> by the string after encoding it in UCS2? Here's a patch that does >> >> this. >> > >> >I think you might get into trouble here due to difference in the MS >> >unicode marshalling "flexibility". >> >> I don't understand. Could you elaborate? > >i guess if (length_of_bytes_in_orig_string != num_character_in_string) >then we would have a problem. Had to think though this a bit. > >I think I misunderstood you to start with. I thought we were talking >about UNISTR2 length == num_characters. My point was that sometimes this >is actually == num_characters*2 (as you mentioned). >