On 12/22/2009 5:31 AM, Sylvain Pointeau wrote: > It cannot be done in the application layer... > You are wrong about that. I have written a full-text search application to go against ancient Germanic texts where, for example, there were two dozen ways to spell the word for modern English 'sister' --spelling had not yet been regularized but reflected regional dialect pronunciations and regional scriptorium conventions. There is no way an ICU collation could handle that crazy quilt and it had to be done in the application layer.
It is done in the application layer by *normalizing* the data on its way into the database and then, of course, you must also normalize the search terms as the user supplies them. So, for example, if you are importing a-umlaut you store 'ae' and if you are given a-umlaut as a search term by the user, you search for 'ae'. Normalization of graphemes is analogous to Unicode decomposition of composite characters. However, if SQLite can flip an ICU German collation into Full Normalization mode this could be done in the database. P.S. I recently asked for a lightweight raw "reverse-string" (codepoint-by-codepoint) function to be added to the SQLite core (because I don't have access to its UDF mechanism in Adobe Flex/FlashBuilder) and do agree that there are often good reasons for wanting something to be done in the database layer, provided it does not slow the database down for everyone else. Regards Tim Romano _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users