Re: [sqlite] bug? like-search with german umlaut is case-sensitive, should not be
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Peter Jacobi wrote: > (1) Just as an OS abstraction layer is in place for I/O, wouldn't it > be possible to use an OS abstraction layer for L14N? SQLite allows multiple registrations of the same function if they take different number of arguments. Consequently the SQLite core implements upper/lower taking one arg and the ICU extension implements upper/lower taking two args with the second arg being the locale. Why don't you write an extension that does the mapping into the Win32 api and then contribute it - http://sqlite.org/contrib - if it is small and works well then it could become part of the core for Windows. > (2) I'm under the impression, that the problematic cases ICU in SQLite does a lot more than just locale specific upper/lower casing. It also does locale specific sorting (which can't be done with a trivial lookup table) and LIKE/regular expressions. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkkfPHYACgkQmOOfHg372QQ25wCgzoswPyHJbHKdw+AZeX/6MV3g XWMAn10xjkcf3NjZWvr+e+BOhyLUErzO =62c0 -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] bug? like-search with german umlaut is case-sensitive, should not be
I'm aware that ICU is able to provide a very general solution, but I'm wondering about two other options: (1) Just as an OS abstraction layer is in place for I/O, wouldn't it be possible to use an OS abstraction layer for L14N? So that for example uppercasing is forwarded to LCMapString(LCMAP_UPPERCASE) on Win32. That would bring the Sqlite behaviour in line with the handling in the application program itself (provided that it uses OS APIs and not ICU). (2) I'm under the impression, that the problematic cases (german sharp-s, turkic i) are few compared with all the cases where a simple lookup would things make work. If I'm not mistaken, a lookup table of 2048 entries handling all 2 byte UTF-8 characters would already cover all the joint character repertoire of all ISO-8859-* (and their MSFT counterparts). Thai (in ISO 8859-11) is using three byte UTF-8 but doesn't have upper/lower case. Regards, Peter ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] bug? like-search with german umlaut is case-sensitive, should not be
Thanks for that pointer to the icu project. Did not know about that!! thomas Am Freitag, den 14.11.2008, 15:27 +0200 schrieb Elefterios Stamatogiannakis: > Has anybody successfully compiled sqlite with icu for win32? > > I haven't managed to find an libicu for mingw. Any tips welcome. > > lefteris > > D. Richard Hipp wrote: > > On Nov 14, 2008, at 8:08 AM, Martin Engelschalk wrote: > > > >> Hi all, > >> > >> the ICU project is a very powerful tool to handle codepages, and also > >> supports regular expressions (using a class named "RegexMatcher", see > >> http://icu-project.org/apiref/icu4c/classRegexMatcher.html). > >> So, it should be relatively easy to replace the like() - function in > >> sqlite (see http://www.sqlite.org/lang_corefunc.html#like and > >> http://www.sqlite.org/c3ref/create_function.html) > >> > > > > http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt&v=1.2 > > > > D. Richard Hipp > > [EMAIL PROTECTED] > > > > > > > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] bug? like-search with german umlaut is case-sensitive, should not be
Has anybody successfully compiled sqlite with icu for win32? I haven't managed to find an libicu for mingw. Any tips welcome. lefteris D. Richard Hipp wrote: > On Nov 14, 2008, at 8:08 AM, Martin Engelschalk wrote: > >> Hi all, >> >> the ICU project is a very powerful tool to handle codepages, and also >> supports regular expressions (using a class named "RegexMatcher", see >> http://icu-project.org/apiref/icu4c/classRegexMatcher.html). >> So, it should be relatively easy to replace the like() - function in >> sqlite (see http://www.sqlite.org/lang_corefunc.html#like and >> http://www.sqlite.org/c3ref/create_function.html) >> > > http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt&v=1.2 > > D. Richard Hipp > [EMAIL PROTECTED] > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] bug? like-search with german umlaut is case-sensitive, should not be
On Nov 14, 2008, at 8:08 AM, Martin Engelschalk wrote: > Hi all, > > the ICU project is a very powerful tool to handle codepages, and also > supports regular expressions (using a class named "RegexMatcher", see > http://icu-project.org/apiref/icu4c/classRegexMatcher.html). > So, it should be relatively easy to replace the like() - function in > sqlite (see http://www.sqlite.org/lang_corefunc.html#like and > http://www.sqlite.org/c3ref/create_function.html) > http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt&v=1.2 D. Richard Hipp [EMAIL PROTECTED] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] bug? like-search with german umlaut is case-sensitive, should not be
Hi all, the ICU project is a very powerful tool to handle codepages, and also supports regular expressions (using a class named "RegexMatcher", see http://icu-project.org/apiref/icu4c/classRegexMatcher.html). So, it should be relatively easy to replace the like() - function in sqlite (see http://www.sqlite.org/lang_corefunc.html#like and http://www.sqlite.org/c3ref/create_function.html) Martin Igor Tandetnik wrote: > "Thomas Mittelstaedt" > <[EMAIL PROTECTED]> wrote in > message news:[EMAIL PROTECTED] > >> Just did a search on my database using >> SELECT * FROM ku2008 where "Empfaenger 1" like '%köck%'; >> >> and nothing was found. Doing a SELECT * FROM ku2008 where "Empfaenger >> 1" like '%kÖck%'; with the capital umlaut did find the record. >> > > http://sqlite.org/lang_expr.html > > "SQLite only understands upper/lower case for 7-bit Latin characters. > Hence the LIKE operator is case sensitive for 8-bit iso8859 characters > or UTF-8 characters. For example, the expression 'a' LIKE 'A' is TRUE > but 'æ' LIKE 'Æ' is FALSE." > > Apparently, it's possible to integrate SQLite with ICU > (http://icu-project.org/) to support properly localized collation and > case folding. I don't know the details, hopefully someone more > knowledgeable will chime in. > > Igor Tandetnik > > > > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- * Codeswift GmbH * Traunstr. 30 A-5026 Salzburg-Aigen Tel: +49 (0) 8662 / 494330 Mob: +49 (0) 171 / 4487687 Fax: +49 (0) 12120 / 204645 [EMAIL PROTECTED] www.codeswift.com / www.swiftcash.at Codeswift Professional IT Services GmbH Firmenbuch-Nr. FN 202820s UID-Nr. ATU 50576309 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] bug? like-search with german umlaut is case-sensitive, should not be
"Thomas Mittelstaedt" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Just did a search on my database using > SELECT * FROM ku2008 where "Empfaenger 1" like '%köck%'; > > and nothing was found. Doing a SELECT * FROM ku2008 where "Empfaenger > 1" like '%kÖck%'; with the capital umlaut did find the record. http://sqlite.org/lang_expr.html "SQLite only understands upper/lower case for 7-bit Latin characters. Hence the LIKE operator is case sensitive for 8-bit iso8859 characters or UTF-8 characters. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE." Apparently, it's possible to integrate SQLite with ICU (http://icu-project.org/) to support properly localized collation and case folding. I don't know the details, hopefully someone more knowledgeable will chime in. Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] bug? like-search with german umlaut is case-sensitive, should not be
Thomas Mittelstaedt schrieb: > Hallo, > > Just did a search on my database using > SELECT * FROM ku2008 where "Empfaenger 1" like '%köck%'; > > and nothing was found. Doing a SELECT * FROM ku2008 where "Empfaenger 1" > like '%kÖck%'; with the capital umlaut did find the record. > The data is utf-8! my sqlite version is 3.5.9 on ubuntu hardy. > Documented bug, see the sqlite expressions documentation page which states: http://www.sqlite.org/lang_expr.html (A bug: SQLite only understands upper/lower case for 7-bit Latin characters. Hence the LIKE operator is case sensitive for 8-bit iso8859 characters or UTF-8 characters. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.). But its hard to fix as you would need language information for the data to get the upper/lower thing always correct (just think about the ß -> SS anomaly in german). Michael -- Michael Schlenker Software Engineer CONTACT Software GmbH Tel.: +49 (421) 20153-80 Wiener Straße 1-3 Fax:+49 (421) 20153-41 28359 Bremen http://www.contact.de/ E-Mail: [EMAIL PROTECTED] Sitz der Gesellschaft: Bremen Geschäftsführer: Karl Heinz Zachries, Ralf Holtgrefe Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] bug? like-search with german umlaut is case-sensitive, should not be
Hello Thomas, I have the same problem. There is no readily available function for converting utf-8 characters outside 7-bit-Ascii from lower to upper, so sqlite does not use one. To achieve this, you have to write your own function and/or incorporate something like ICU into your project. I still have hte work before me. Martin Thomas Mittelstaedt wrote: > Hallo, > > Just did a search on my database using > SELECT * FROM ku2008 where "Empfaenger 1" like '%köck%'; > > and nothing was found. Doing a SELECT * FROM ku2008 where "Empfaenger 1" > like '%kÖck%'; with the capital umlaut did find the record. > The data is utf-8! my sqlite version is 3.5.9 on ubuntu hardy. > > thomas > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > -- * Codeswift GmbH * Traunstr. 30 A-5026 Salzburg-Aigen Tel: +49 (0) 8662 / 494330 Mob: +49 (0) 171 / 4487687 Fax: +49 (0) 12120 / 204645 [EMAIL PROTECTED] www.codeswift.com / www.swiftcash.at Codeswift Professional IT Services GmbH Firmenbuch-Nr. FN 202820s UID-Nr. ATU 50576309 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] bug? like-search with german umlaut is case-sensitive, should not be
Hallo, Just did a search on my database using SELECT * FROM ku2008 where "Empfaenger 1" like '%köck%'; and nothing was found. Doing a SELECT * FROM ku2008 where "Empfaenger 1" like '%kÖck%'; with the capital umlaut did find the record. The data is utf-8! my sqlite version is 3.5.9 on ubuntu hardy. thomas ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users