On 12/22/2009 5:31 AM, Sylvain Pointeau wrote:
> It cannot be done in the application layer...
>    
You are wrong about that. I have written a full-text search application 
to go against ancient Germanic texts where, for example, there were two 
dozen ways to spell the word for modern English 'sister' --spelling had 
not yet been regularized but reflected regional dialect pronunciations 
and regional scriptorium conventions.  There is no way an ICU collation 
could handle that crazy quilt and it had to be done in the application 
layer.

It is done in the application layer  by *normalizing* the data on its 
way into the database and then, of course, you must also normalize the 
search terms as the user supplies them.  So, for example, if you are 
importing a-umlaut you store 'ae' and if you are given a-umlaut as a 
search term by the user, you search for 'ae'.  Normalization of 
graphemes is analogous to Unicode decomposition of composite characters.

However, if SQLite can flip an ICU German collation into Full 
Normalization mode this could be done in the database.

P.S. I recently asked for a lightweight raw "reverse-string" 
(codepoint-by-codepoint) function to be added to the SQLite core 
(because I don't have access to its UDF mechanism in Adobe 
Flex/FlashBuilder) and do agree that there are often good reasons for 
wanting something to be done in the database layer, provided it does not 
slow the database down for everyone else.

Regards
Tim Romano



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to