> >  On Sat, Apr 5, 2008 at 11:42 PM, Cory Nelson <[EMAIL PROTECTED]> wrote:
> >  > Sort order is highly dependent on locale.  You can add custom
> >  > collations to do this.

> <[EMAIL PROTECTED]> wrote:
> > That was not was I was talking about. I was not talking about Sort Order but
> >  about Searches.
> >  Keith

On Sat, Apr 05, 2008 at 02:58:48PM -0700, Cory Nelson scratched on the wall:
> They are one and the same.  Look up collations.


  I'm a bit confused why everyone keeps pointing Keith at collations.
  He asked about searching and matching, and in specific the LIKE
  operator.  I'm not trying to be dense-- I read the collation web-page
  several times-- but I seem to be missing something.


  As I understand it, collations are used to define sort orders.  A
  collation function must define it's own version of greater-than,
  less-than, and equal.  All three of those definitions must be
  transitive (given A=B, B=C, then A=C; given A<B, B<C, then A<C, etc.)
  or sorting doesn't work.  Collations only work on literals.

  The LIKE operator, on the other hand, only defines needs to define
  "match."  This is not the same as "equal," as a LIKE operand can
  contain wildcards that allow the same pattern to match multiple
  un-equal targets.  In other words, LIKE operators do not need to be
  transitive.  They also don't have any concept of greater-than or
  less-than, only "match" or "no match."  Both of those issues mean
  search operators can't be used for sorting (not that you could
  easily define a concept of greater-than for something like "5%" in
  the number domain).



  Even in code, the two seem unrelated.  The LIKE operator doesn't use
  collation functions, not even the built-in ones.  Even the default
  implementations for the NOCASE collation and the default LIKE
  operator use different comparison mechanisms (not a surprise, as they
  have different goals).

  User defined functions are also different.  To define a new collation
  you use sqlite3_create_collation(), while the a user-defined LIKE
  operator uses the vanilla sqlite3_create_function().



  So unless I'm missing something, it seems like collations used for
  sorting and matching operators are different systems, working under
  different constraints (transitive or not), that produce different
  results (tri-state GT/LT/EQ vs boolean match/no-match), using
  different types of functions (collation functions vs user functions).

  There very well may be some more fundamental issue I'm missing, but
  these really don't seem "one in the same" to me.


  I admit there is a similarity in the fact that both system compare
  data at some abstract level, such as "ignore case."  In both cases
  that requires specific understanding of the data representation, and
  in the case of Unicode that's often going to require locale-specific
  information.  But that statement could be said of just about anything
  that uses Unicode.  Needing information about the representation is
  just as true of ASCII as it is of Unicode, it just happens that the
  rules for ASCII are pretty simple.
  
  It is also true that both the LIKE operator and the built-in NOCASE
  collation have issues dealing with anything beyond the 26 letters used
  by the English language.  But they are two different bugs.  Fixing one
  won't automatically fix the other.



  From my understanding of the original question, I think answer to
  Keith's question is actually on the language-expressions page
  (http://www.sqlite.org/lang_expr.html), which is the same page he
  quoted the bug from.  This page describes the LIKE, GLOB, REGEXP,
  and MATCH operators (all of which return a "match/no-match" value).
  Also see the like(), glob(), regexp(), and match() functions under
  the Core Functions section of the same page.

  If you need full Unicode support for matches and searches, it looks
  like your only option is to define a custom like() user function that
  implements the search and matching behavior you're looking for.
  If you need full Unicode support for sort-ordering, you also need to
  define a new collation.

  Or maybe you can find one that someone else has already written....
  A copy of the message Miha Vrhovnik mentioned can be found here:

  http://www.mail-archive.com/sqlite-users%40sqlite.org/msg30403.html

  It seems the .c file mentioned in this post has been updated since
  the link was first posted.  In addition to a new Unicode aware LIKE
  operator, the newer .c file also includes a new NOCASE collation
  function.  So you can get Unicode aware searching/matching *and*
  sort-ordering.  Perhaps that will fit your needs.

    -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to