On Thu, 4 Nov 2004 [EMAIL PROTECTED] wrote: >Christian, > >> On Tue, 2 Nov 2004, Liz Steel wrote: >> >To clarify: I have a database name with Swedish characters in, which are >> >converted to multibyte characters, however, the filename that is created >> >treats each of the characters separately, which then causes problems later. >> >As an example, the string "Ändrad" is converted to "Ã"ndrad". >> >> The code to parse filenames is not UTF8 aware, and so will cause problems >> when splitting a filename into directory and filename components if the >> string is a UTF8 string. The offending function appears to be >> sqlitepager_open in pager.c, which steps backwards through the path name a >> character at a time looking the directory seperator character, which will >> obviously be tripped up by a multi-byte character. > >I wonder if you could add some explaination for your comments above. UTF-8 >is a special unicode encoding that contains no null characters, preserves >the ascii code range verabatim, and does not include any characters that >"look like" ascii characters. That is to say, each byte is either an ascii >character (0-127) or is in the high byte range (128-255) and therefore >can't be confused with an ascii character. I would have thought that any >special path characters (eg, '/', '\'...) would be a subset of the ascii >range and therefore require no special unicode-aware handling. The >function that sqlite calls to actually create the file, on the other hand, >would have to be unicode-aware for such filenames to work.
Of course, you are correct. I wrote the above before checking how UTF-8 encoding actually works in detail. My only previous i18n experience was with a commercial library which did not assume UTF-8, and provided functions to step through the characters of a string, whether they be UTF-8 or some other encoding I thought I'd sent a correction to the list myself. It appears I didn't, so I stand corrected now. Sorry for any confusion. > >Benjamin > Christian -- /"\ \ / ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \