Re: [sqlite] Possible UNICODE LIKE, upper(), lower() function solution

Cory Nelson Thu, 03 Jan 2008 05:58:05 -0800

On Jan 3, 2008 4:10 AM, ioannis <[EMAIL PROTECTED]> wrote:
> Dear all SQLite3 users,
>
> Recently i have been working on a dictionary style project that had to
> work with UNICODE non-latin1 strings, i did try the ICU project but i
> wasn't satisfied with the extra baggage that came with it.
> I would like to recommend the following possible solution to the long
> standing UNICODE issue, that was built in as an ICU alternative
> (excluding collation's), and could be easily be included in the SQLite
> core as default behavior.
>
> http://ioannis.mpsounds.net/blog/?dl=sqlite3_unicode.c
>
> The above file contains mapping tables for lower(), upper(), title(),
> fold()* characters based on UNICODE mapping tables as described
> currently by the UNICODE standard v5.1.0 beta, that are used by
> functions to transform characters to their respective folding cases.
> (These tables were built by a modified version of Loic Dachary builder
> in order to included required case transformations)
> * UNICODE uses case folding mapping tables to implement non-case
> sensitive comparison sequences (eg LIKE).
>
> The above file utilizes the existing ICU infrastructure built in
> SQLite in order to activate the extra functionality, to automatically
> :
> - override the LIKE operation, to support full UNICODE non-case
> sensitive comparison
> - override upper(), lower(), to support case transformation of UNICODE
> characters based on UNICODE mapping tables as described currently by
> the UNICODE standard v5.1.0 beta
> - provide title() and fold() functions, also based on UNICODE mapping
> tables as described currently by the UNICODE standard v5.1.0 beta
> - provide unaccent() function, (based on the unac library designed for
> linux by Loic Dachary) to decompose UNICODE characters to there
> unaccented equivalents in order to perform simpler queries and return
> wider range of results. (eg. ά -> α, æ -> ae in the latter example the
> string will automatically grow by 1 character point)
>
> In comparison to ICU no collation sequences have been implemented yet.
> The above functionalities have been designed to be included/excluded
> independently according to specific needs in order to minimize the
> size of the library.
> The total overhead over the SQLite library size with all functionality
> enabled is approximately 70~80KB.
>
> The above file has not been thoroughly tested, but i consider the
> implementation to stable.
> You can leave comments, bug reports, suggestions on this board or at
> http://ioannis.mpsounds.net/blog/2007/12/19/sqlite-native-unicode-like-support
> (PS. I am not an SQLite expert, but i had to improvise on some extent
> on this matter.)
>
> Thank you very much.
>


I guess I'm confused at what the purpose of this is.  I'm far from a
Unicode expert but my understand thus far is that there is no One
solution.

Locales are there for a reason - different places can use different
sort orders and case conversions.  Your blog makes using locales seem
as a detriment, but I'm not sure how you can get around it.

-- 
Cory Nelson
http://www.int64.org

Re: [sqlite] Possible UNICODE LIKE, upper(), lower() function solution

Reply via email to