ok then to do it on the application side but why sqlite does not provide simple functions along with ICU to normalize a string following a specified locale? Then I will be able implement it on the application layer.
I think that trying to re-implement myself the rules ä=ae ü=ue is waste of time, at worst I would use ICU behind ... I want to rely on something well tested :-) may I can ask for having some functions inside SQLite to provide the normalization etc? That would be really great :-) Best regards, Sylvain On Tue, Dec 22, 2009 at 5:44 PM, Nicolas Williams <nicolas.willi...@sun.com>wrote: > On Tue, Dec 22, 2009 at 07:49:24AM -0500, Tim Romano wrote: > > On 12/22/2009 5:31 AM, Sylvain Pointeau wrote: > > > It cannot be done in the application layer... > > > > > You are wrong about that. I have written a full-text search application > > to go against ancient Germanic texts where, for example, there were two > > dozen ways to spell the word for modern English 'sister' --spelling had > > not yet been regularized but reflected regional dialect pronunciations > > and regional scriptorium conventions. There is no way an ICU collation > > could handle that crazy quilt and it had to be done in the application > > layer. > > > > It is done in the application layer by *normalizing* the data on its > > way into the database and then, of course, you must also normalize the > > search terms as the user supplies them. So, for example, if you are > > importing a-umlaut you store 'ae' and if you are given a-umlaut as a > > search term by the user, you search for 'ae'. Normalization of > > graphemes is analogous to Unicode decomposition of composite characters. > > Indeed. > > > However, if SQLite can flip an ICU German collation into Full > > Normalization mode this could be done in the database. > > Sure, but you lose control in the process. Suppose you eventually need > to add support for non-German locales yet you've been using a collation > that performs these conversions -- oops. Or suppose there are > well-known words of foreign origin to which this sort of normalization > must not be applied: a generic toolkit like ICU, that works > character-by-character or codepoint-by-codepoint, will not know about > them -- why should it? And so on. > > > P.S. I recently asked for a lightweight raw "reverse-string" > > (codepoint-by-codepoint) function to be added to the SQLite core > > (because I don't have access to its UDF mechanism in Adobe > > Flex/FlashBuilder) and do agree that there are often good reasons for > > wanting something to be done in the database layer, provided it does not > > slow the database down for everyone else. > > Yes. I believe that databases need to support Unicode normalization- > insensitive/preserving behavior, at least as an option (most input > methods produce pre-composed output, so often one can get away with not > normalizing at all). > > Nico > -- > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users