On Sat, Sep 19, 2009 at 12:14:37AM +0100, Simon Slavin scratched on the wall:
> 
> On 18 Sep 2009, at 9:57pm, Igor Tandetnik wrote:
> 
> > Simon Slavin
> > <slav...@hearsay.demon.co.uk> wrote:
> >> On 18 Sep 2009, at 9:07pm, Roger Binns wrote:
> >>
> >>> Simon Slavin wrote:
> >>>> * Unicode support from the ground up
> >>>
> >>> SQLite already has "unicode support from the ground up".  Try using
> >>> non-Unicode strings and you'll see!
> >>
> >> SQLite's indexing correctly understands how to order Unicode
> >> strings ?
> >
> > With ICU extension enabled and correct collation specified, yes. Note
> > that the correct ordering of Unicode strings is locale-dependent.
> 
> Okay.  So I create an indexed database in one locale.  I have a  
> thousand records in there.  The indexes are created using the locale I  
> set.  I then send a copy of this database to a client in another  
> place, and the client has different locale settings.  The client adds  
> another thousand records with their locale settings.  What happens  
> when I use WHERE clauses with '<' or '>' ?  Does the system vaguely  
> work, or does it get a mess ?

  You get a mess.  The locale is what defines the "rules" of the
  language and the meaning of the symbols that make up the string.
  This example is no different than if you added a bunch of records
  under NOCASE, then somehow changed the collation to be case sensitive
  and added a bunch more.  Your indexes are likely bogus because you
  changed the underlying assumptions.

  And that's exactly correct.  If you have one language where 'k' comes
  before 'm' and another language where it doesn't, what exactly do you
  expect to happen when you ask for a "sorted" column of strings?  Does
  'k' come before 'm' or not?  You have to pick a set of rules-- i.e. a
  locale.

  I suppose you could record the locale for each specific string,
  essentially extending the "type" of the string to a string-locale.  But
  all that buys you is the ability to group similar locales together, in
  the same way that SQLite sorts integers, floats, strings, and BLOB
  types together, sorting them internally but considering them overwise
  unmixable.  You still need to pick some arbitrary ordering of those
  types, and you're likely to get some very odd results if you do that
  with locales.



  Of course, this has nothing to do with Unicode.  Unicode is just an
  encoding system for strings that maps characters-codes to bytes.
  That's it.  What those character-codes represent and the glyph that
  is associated with them is up to the locale.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to