On Thu, 4 Nov 2004 [EMAIL PROTECTED] wrote:

>Christian,
>
>> On Tue, 2 Nov 2004, Liz Steel wrote:
>> >To clarify: I have a database name with Swedish characters in, which are
>> >converted to multibyte characters, however, the filename that is created
>> >treats each of the characters separately, which then causes problems later.
>> >As an example, the string "Ändrad" is converted to "Ã"ndrad".
>>
>> The code to parse filenames is not UTF8 aware, and so will cause problems
>> when splitting a filename into directory and filename components if the
>> string is a UTF8 string. The offending function appears to be
>> sqlitepager_open in pager.c, which steps backwards through the path name a
>> character at a time looking the directory seperator character, which will
>> obviously be tripped up by a multi-byte character.
>
>I wonder if you could add some explaination for your comments above. UTF-8
>is a special unicode encoding that contains no null characters, preserves
>the ascii code range verabatim, and does not include any characters that
>"look like" ascii characters. That is to say, each byte is either an ascii
>character (0-127) or is in the high byte range (128-255) and therefore
>can't be confused with an ascii character. I would have thought that any
>special path characters (eg, '/', '\'...) would be a subset of the ascii
>range and therefore require no special unicode-aware handling. The
>function that sqlite calls to actually create the file, on the other hand,
>would have to be unicode-aware for such filenames to work.


Of course, you are correct. I wrote the above before checking how UTF-8
encoding actually works in detail.

My only previous i18n experience was with a commercial library which did
not assume UTF-8, and provided functions to step through the characters
of a string, whether they be UTF-8 or some other encoding

I thought I'd sent a correction to the list myself. It appears I didn't,
so I stand corrected now. Sorry for any confusion.


>
>Benjamin
>

Christian

-- 
    /"\
    \ /    ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
     X                           - AGAINST MS ATTACHMENTS
    / \

Reply via email to