On Jan 3, 2008 4:10 AM, ioannis <[EMAIL PROTECTED]> wrote: > Dear all SQLite3 users, > > Recently i have been working on a dictionary style project that had to > work with UNICODE non-latin1 strings, i did try the ICU project but i > wasn't satisfied with the extra baggage that came with it. > I would like to recommend the following possible solution to the long > standing UNICODE issue, that was built in as an ICU alternative > (excluding collation's), and could be easily be included in the SQLite > core as default behavior. > > http://ioannis.mpsounds.net/blog/?dl=sqlite3_unicode.c > > The above file contains mapping tables for lower(), upper(), title(), > fold()* characters based on UNICODE mapping tables as described > currently by the UNICODE standard v5.1.0 beta, that are used by > functions to transform characters to their respective folding cases. > (These tables were built by a modified version of Loic Dachary builder > in order to included required case transformations) > * UNICODE uses case folding mapping tables to implement non-case > sensitive comparison sequences (eg LIKE). > > The above file utilizes the existing ICU infrastructure built in > SQLite in order to activate the extra functionality, to automatically > : > - override the LIKE operation, to support full UNICODE non-case > sensitive comparison > - override upper(), lower(), to support case transformation of UNICODE > characters based on UNICODE mapping tables as described currently by > the UNICODE standard v5.1.0 beta > - provide title() and fold() functions, also based on UNICODE mapping > tables as described currently by the UNICODE standard v5.1.0 beta > - provide unaccent() function, (based on the unac library designed for > linux by Loic Dachary) to decompose UNICODE characters to there > unaccented equivalents in order to perform simpler queries and return > wider range of results. (eg. ά -> α, æ -> ae in the latter example the > string will automatically grow by 1 character point) > > In comparison to ICU no collation sequences have been implemented yet. > The above functionalities have been designed to be included/excluded > independently according to specific needs in order to minimize the > size of the library. > The total overhead over the SQLite library size with all functionality > enabled is approximately 70~80KB. > > The above file has not been thoroughly tested, but i consider the > implementation to stable. > You can leave comments, bug reports, suggestions on this board or at > http://ioannis.mpsounds.net/blog/2007/12/19/sqlite-native-unicode-like-support > (PS. I am not an SQLite expert, but i had to improvise on some extent > on this matter.) > > Thank you very much. >
I guess I'm confused at what the purpose of this is. I'm far from a Unicode expert but my understand thus far is that there is no One solution. Locales are there for a reason - different places can use different sort orders and case conversions. Your blog makes using locales seem as a detriment, but I'm not sure how you can get around it. -- Cory Nelson http://www.int64.org