DM Smith wrote: > The thing I noticed in Sword's ICU filters is that it was not consistent > in how it set up the UChar array or converted that back to a SWBuf.
Thanks for digging through everything. I will see if I can't make things a little more consistent once I get UTF8NFC debugged. > The setup may be wrong: > int32_t len = text.length() * 2; > source = new UChar[len + 1]; > len = ucnv_toUChars(conv, source, len, text.c_str(), -1, &err); Yes, that's where I'm focusing my attention. > Many of the filters just use text.length(), one uses text.length()*2+1, > another 5+text.length()*5 and only this one uses text.length()*2. Well, here are some guesses as to what these might have come from.... X+1 is probably making room for a null termination (probably unnecessary since everything is null terminated to begin with). X*2 could be either doubling the byte size to accommodate conversion from 8-bit chars to 16-bit chars OR could be acceptance of the fact that characters we encounter might actually be represented as surrogate pairs in UTF-16. (ICU uses UTF-16 internally.) X*5 is probably allowing for expansion from a character to its UTF-8 representation, which is maximally 5-bytes long. I'll get it all sorted out eventually, but those are what those numbers probably represent. I had a bit of difficulty getting BCB5 installed and working in Vista, but I think I've got everything running well enough for the moment so that I can get to work on this. --Chris _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
