On 8/5/06, Cory Nelson <[EMAIL PROTECTED]> wrote:
On 8/4/06, Nuno Lucas <[EMAIL PROTECTED]> wrote:
> On 8/4/06, Cory Nelson <[EMAIL PROTECTED]> wrote:
> > IE, using memcmp() to compare strings.  I've been bitten by this
> > before, with SQLite producing unexpected results when using UTF-8.
> > Using UTF-16 has worked more reliably in my experience.
>
> SQLite only knows how to sort ASCII, so memcmp does that right (being
> it UTF-8 or UTF-16).
>
> If you think about it, the only way sorting will work 100% is by
> having some form of localization (because for each language different
> sorting rules apply, _even_ for words composed only of ASCII
> characters).
>
> Adding localization to SQLite is out of the question (it would
> probably need a library as big as SQLite itself), so it's up to the
> user to define it's own localization funtions and integrate them with
> sqlite (there are all the necessary hooks ready for that).

I was not talking about sorting in my post - I've had simple = index
comparisons fail in UTF-8.

You should have reported it. If it's true, it's a bug that needs to be
corrected.
But again I would say I never found a bug like that in sqlite.

But, since you brought it up - I have no expectations of SQLite
integrating a full Unicode locale library, however it would be a great
improvement if it would respect the current locale and use wcs*
functions when available, or at least order by standard Unicode order
instead of completely mangling things on UTF-8 codes.

For it to respect the current locale then the database would be
invalid after moving/using it in another locale (the affected indexes
would need to be rebuilt). Using the COLATE thing (which I never used
exactly because of the problem above) you can define your own sort
function that does what you want.

On the second point, you may be right and can be considered a bug. A
sorted table should have exactly the same order either if the database
is using UTF-8 or UTF-16 internally (even if it doesn't follow the
UNICODE order). At least it seems consistency on a query result should
be assured on this.

Maybe others have another point of view...


Regards,
~Nuno Lucas

Reply via email to