Just as a side note to this discussion, we've recently added in utilstr.h: SWBuf assureValidUTF8(const char *buf);
It would be interesting to pass "Bokmål" to this method and see if it returns the same data. There is a test program you can try under in the source located at: sword/tests/utf8norm http://crosswire.org/svn/sword/trunk/tests/utf8norm.cpp DM Smith wrote: > On Nov 11, 2009, at 9:59 AM, Karl Kleinpaste wrote: > >> DM Smith <[email protected]> writes: >>> U+00E5 is the unicode code point, not the encoding. In hex the utf-8 >>> encoding would be C3 A5. In ISO-8859-1, it would be E5. >> XEmacs tells me that the buffer is UTF-8. Manually re-asserting it... >> >> M-x set-buffer-file-coding-system RET utf-8 RET >> >> ...and re-saving the file makes no change to the content, yet that's >> exactly the mechanism I've used in the past to convert ISO-8859 to UTF-8. >> >>> So I'd suggest looking at a hex dump to see what the encoding is. >> BTDT. "od -c" of this... >> >> # correct: Norwegian Bokmål >> #nb Norsk Bokmål >> # a hack while g_utf8_validate() dislikes 'å': Norwegian Bokmaal >> nb Norsk Bokmaal >> >> ...produces this... >> >> 0007300 o e r o \n # c o r r e c t : >> 0007320 N o r w e g i a n B o k m 303 245 >> 0007340 l \n # n b \t N o r s k B o k m >> 0007360 303 245 l \n # a h a c k w h i >> 0007400 l e g _ u t f 8 _ v a l i d a >> 0007420 t e ( ) d i s l i k e s ' 303 >> 0007440 245 ' : N o r w e g i a n B o >> 0007460 k m a a l \n n b \t N o r s k B >> >> For a-ring, the character map application observes... >> C octal escaped UTF-8: \303\245 >> ...so I'm pretty well convinced that the content is right. > > You've convinced me. I'm curious as to whether this is a reported GTK bug? > > I'm also curious as to whether it handles the decomposed form. The following > is \141\314\212: > Bokmål > > In Him, > DM > > > _______________________________________________ > sword-devel mailing list: [email protected] > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
