On Wed, May 14, 2014 at 1:35 PM, Jan Slodicka <j...@resco.net> wrote:
> Simon Slavin-3 wrote > > On 13 May 2014, at 5:21pm, Constantine Yannakopoulos wrote: > > > >> ​This is very interesting Jan. The only way this could fail is if the > >> collation implementation does something funny if it encounters this > >> character​, e.g. choose to ignore it when comparing. > > > > That cuts out a very large number of collations. The solution works fine > > for any collation which orders strings according to Unicode order. But > > the point of creating a correlation is that you don't want that order. > > > > Simon. > > Simon, I think that the most frequent point of making a collation is to get > the Unicode order. At the bare minimum adding LIKE optimization to the ICU > Sqlite extension would make sense, the savings are really huge. > There could be a flag in sqlite3_create_collation_v2()'s TextRep argument, much like the flag SQLITE_DETERMINISTIC of sqlite3_create_function() that will flag the collation as a "unicode text" collation. If this flag is set, the engine can perform the LIKE optimization for these collations using the U+10FFFD idea to construct an upper limit for the range as it has been described in previous posts. Since this flag is not present in existing calls to sqlite3_create_collation_v2() the change will be backward-compatible. Either this or the already mentioned idea of giving the ability to manually specify lower and upper bounds for the LIKE optimization, perhaps by means of a callback in a hypothetical sqlite3_create_collation_v3() variant. And by the way, "unicode text" collations include all "strange" collations like the one of accent insensitivity and mixed codepage I described in my original post. And I would expect these to be about 95% of all custom coded collations. --Constantine _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users