On Thu, 2003-10-23 at 13:31, Wayne Venables wrote:
> Unfortunately that still means there is a performance hit converting all 
> data in and out of the library from UTF-8 to UCS16.  A large number of 
> operating systems and programming languages store strings natively as UCS16.

If you're actually writing a portable application you'll be happy to
know your last statement may not be true; Most operating systems and
programming languages do NOT store strings "natively as UCS16"- I'm
aware of [actually] very few operating systems OR languages that use a
double-byte unicode encoding as their native character set. Even if you
meant larger than 1% of operating systems AND languages store strings
natively as UCS16, you'd still be incorrect.

That said, you're probably not writing a portable application; You can
trade space for speed by storing the UTF-8 form for sorting and
collating and use a sqlite_binary_encode'd UCS-16 form as an additional
column. If your catalog is constant but the order of records isn't,
consider storing the UCS-16 strings in a constant database (or build to
it as a cache periodically- google for CDB for source).

Another option (instead of using sqlite_binary_encode-ing) is to select
a code-point outside any text that you'll be using (I used some of the
user-defined code-point) - any repeating pair of octets will do. Then
xor your string before storing and after fetching. This will avoid
keeping null-bytes in there. You'll still need to use it in a separate
column as collating and sorting won't work (unless of course, you don't
need these things).


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to