Re: [sqlite] collation for german "Umlaute"

2013-10-31 Thread Jean-Christophe Deschamps



Search the Internet for an SQLite extension called "unifuzz.c" and see
if that does what you want in the way of character folding.  I have a
copy of the code on my other computer if you cannot find the original
authors original code.


Here is the download link:
https://dl.dropboxusercontent.com/u/26433628/unifuzz.zip

It is a very bad (yes!) collation collection, since none of them match 
any known locale collation requirement. If you need to collate German 
only but correctly, look at ICU.


What it offers is a set of locale-independant collations and functions, 
plus a bit more.


I wrote it after realizing there was no good way to collate and fuzzy 
search text from various languages intermixed in the same column, for 
instance (we had customers in 49 countries and suppliers from 10.) 
Since it's impossible to collate things correctly and simultaneously 
for several languages, the best was to case fold and/or unaccent data 
in the least damaging way.


The (largish) code has provision for dealing with the German eszet and 
several unique characters. I know it's being used in a number of 
countries, not all latin. It uses custom v5.1 Unicode tries in circa 180kb.


It also offers collation for Unicode digits (not only 0-9) and a good 
share of other string functions.


It is currently Windows-bound since it uses one Windows function, but 
I'm sure it can be made to work under Linux as well. Be sure to read 
the lengthy comment at top of the source before using or deciding it's 
worthless.


Please drop me a note if you find it useful or discover bugs. Feel free 
to use and abuse the code, but please don't release a distinct version 
under the same name.


--
j...@antichoc.net  


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] collation for german "Umlaute"

2013-10-30 Thread Keith Medcalf

Search the Internet for an SQLite extension called "unifuzz.c" and see
if that does what you want in the way of character folding.  I have a
copy of the code on my other computer if you cannot find the original
authors original code.

I didn't write it, but it basically implements a NOCASE collation which
also also folds accents and other non-english characters to ASCII
somewhat like the standard latin1_CI_AI in other SQL implementations.
 It also has some other related collations and folding functions.

You can compile as an extension or with very minor modifications
incorporate it directly into the amalgamation and compile your own
engine (and shell) with the extension and collations pre-loaded.

On Wed, 30 Oct 2013 13:51:26 +0100
 Ulrich Goebel  wrote:
>Hallo,
>
>for a SQLite db I would like to define a collation for german
>"Umlaute" (don't know the english word for that, sorry) and
>"Sonderzeichen" (äöü, ÄÖÜ, ß), so that:
>
>a=A=ä=Ä
>o=O=ö=Ö
>u=U=ü=Ü
>ß=s (or, better: ß=ss)
>
>I want to use such a collation even on columns which are indeces, so I
>would like to "connect" the collation to the column/index at the time
>creating the column/index, not only in the later SELECTs. And I
>suppose, that I really don't have to specify the collation in the
>SELECTs if it is connected to the index. Is that right?
>
>It should work like that:
>
>CREATE TABLE person (name text);
>CREATE INDEX idx_name on person (name COLLATE umlaute);
>
>and later on
>
>SELECT name FROM person ORDER BY name;
> /* whithout specifying the collation again! */
>
>should give
>
>   Ortin
>   Ötzkök
>   Pandulas
>   Zurmühlen
>
>instead of
>
>   Ortin
>   Pandulas
>   Zurmühlen
>   Ötzkök
>
>I'm sure, that I'm not the first fellow, who would like, but I don't
>find a solution for my problem.
>
>Any help will be very wellcome!
>
>By the way: I define my db, tables and indices using a SQL script,
>which I execute by the .read command in sqlite3 (under Ubuntu Linux).
>But I don't find any possibility to define any collation. I just found
>how to use an allredy existing collation. Isn't it possible, what I
>want?
>
>Another way could be possible: I use python as my programming
>language, with the apsw module to connect to the SQLite db. There is a
>method apsw.connection.createcollation, which registers a
>(python)-sorting-(collate-)function as a collation. That can be used
>later, for example in SELECTs. Could it also be used in CREATE TABLEs
>or CREATE INDEXs? In this case I could define my db, tables and
>indices within python/apsw instead of the SQL-script and .read. Right?
>
>The background is very simple: in my db I store german people with
>their name, address and so on, and there are these "Umlaute" in the
>names, and the row name has its own index.
>
>Now I brought more text then I would like to... but I hope for Your
>help!
>
>Ulrich
>
>
>-- 
>Ulrich Goebel
>Paracelsusstr. 120, 53177 Bonn
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


#include 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] collation for german "Umlaute"

2013-10-30 Thread Igor Tandetnik

On 10/30/2013 8:51 AM, Ulrich Goebel wrote:

I want to use such a collation even on columns which are indeces, so I
would like to "connect" the collation to the column/index at the time
creating the column/index, not only in the later SELECTs. And I suppose,
that I really don't have to specify the collation in the SELECTs if it
is connected to the index. Is that right?


If you specify the collation on the column in CREATE TABLE statement, 
this collation will be used by default by any indexes and expressions 
mentioning this column, unless explicitly overridden.



It should work like that:

CREATE TABLE person (name text);
CREATE INDEX idx_name on person (name COLLATE umlaute);


No, that's wrong. Move the COLLATE clause to CREATE TABLE.


and later on

SELECT name FROM person ORDER BY name;
 /* whithout specifying the collation again! */


As written, this would use the regular BINARY collation, because that's 
what's specified in the table. The index will not be used at all, since 
it's created with a different collation. Again, move the COLLATE clause 
to CREATE TABLE, then everything will work the way you expect.



By the way: I define my db, tables and indices using a SQL script, which
I execute by the .read command in sqlite3 (under Ubuntu Linux). But I
don't find any possibility to define any collation. I just found how to
use an allredy existing collation. Isn't it possible, what I want?


http://sqlite.org/c3ref/create_collation.html
http://sqlite.org/lang_corefunc.html#load_extension

You would have to write a shared library implementing your custom 
collation. Or, you could use collations provided by ICU:


http://www.sqlite.org/src/artifact?ci=trunk=ext/icu/README.txt

but I don't think it works quite the way you want. ICU collations are 
case sensitive, and the German one doesn't make A=Ä - it makes Ä sort 
right after A and before B.



Another way could be possible: I use python as my programming language,
with the apsw module to connect to the SQLite db. There is a method
apsw.connection.createcollation, which registers a
(python)-sorting-(collate-)function as a collation. That can be used
later, for example in SELECTs. Could it also be used in CREATE TABLEs or
CREATE INDEXs?


Yes. Note that, once you do that, you will be limited in how you can 
manipulate this database outside of your program (e.g. in sqlite3 
command line shell). Any statement that would need to utilize the 
collation would throw "unknown collation" error.



In this case I could define my db, tables and indices
within python/apsw instead of the SQL-script and .read. Right?


Right.
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] collation for german "Umlaute"

2013-10-30 Thread Richard Hipp
On Wed, Oct 30, 2013 at 8:51 AM, Ulrich Goebel  wrote:

> Hallo,
>
> for a SQLite db I would like to define a collation for german "Umlaute"
> (don't know the english word for that, sorry) and "Sonderzeichen" (äöü,
> ÄÖÜ, ß), so that:
>
> a=A=ä=Ä
> o=O=ö=Ö
> u=U=ü=Ü
> ß=s (or, better: ß=ss)
>
> I want to use such a collation even on columns which are indeces, so I
> would like to "connect" the collation to the column/index at the time
> creating the column/index, not only in the later SELECTs. And I suppose,
> that I really don't have to specify the collation in the SELECTs if it is
> connected to the index. Is that right?
>
> It should work like that:
>
> CREATE TABLE person (name text);
> CREATE INDEX idx_name on person (name COLLATE umlaute);
>

If, instead, you say:

   CREATE TABLE person(name text COLLATE umlaute);
   CRETE INDEX idx_name ON person(name);

Then "umlaute" becomes the default collating sequence for the
person.namecolumn.  It is used everywhere, unless overridden.  So it
is used on the
index.  And it will get used in your ORDER BY clause.

If you specify the collating sequence on the index only, then it is used
only on the index.  Then you'd have to specify it again on the order by
clause ("... ORDER BY name COLLATE umlaute") in order to get it take effect
there.



>
> and later on
>
> SELECT name FROM person ORDER BY name;
> /* whithout specifying the collation again! */
>
> should give
>
>   Ortin
>   Ötzkök
>   Pandulas
>   Zurmühlen
>
> instead of
>
>   Ortin
>   Pandulas
>   Zurmühlen
>   Ötzkök
>
> I'm sure, that I'm not the first fellow, who would like, but I don't find
> a solution for my problem.
>
> Any help will be very wellcome!
>
> By the way: I define my db, tables and indices using a SQL script, which I
> execute by the .read command in sqlite3 (under Ubuntu Linux). But I don't
> find any possibility to define any collation. I just found how to use an
> allredy existing collation. Isn't it possible, what I want?
>
> Another way could be possible: I use python as my programming language,
> with the apsw module to connect to the SQLite db. There is a method
> apsw.connection.**createcollation, which registers a
> (python)-sorting-(collate-)**function as a collation. That can be used
> later, for example in SELECTs. Could it also be used in CREATE TABLEs or
> CREATE INDEXs? In this case I could define my db, tables and indices within
> python/apsw instead of the SQL-script and .read. Right?
>
> The background is very simple: in my db I store german people with their
> name, address and so on, and there are these "Umlaute" in the names, and
> the row name has its own index.
>
> Now I brought more text then I would like to... but I hope for Your help!
>
> Ulrich
>
>
> --
> Ulrich Goebel
> Paracelsusstr. 120, 53177 Bonn
> __**_
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] collation for german "Umlaute"

2013-10-30 Thread Ulrich Goebel

Hallo,

for a SQLite db I would like to define a collation for german "Umlaute" 
(don't know the english word for that, sorry) and "Sonderzeichen" (äöü, 
ÄÖÜ, ß), so that:


a=A=ä=Ä
o=O=ö=Ö
u=U=ü=Ü
ß=s (or, better: ß=ss)

I want to use such a collation even on columns which are indeces, so I 
would like to "connect" the collation to the column/index at the time 
creating the column/index, not only in the later SELECTs. And I suppose, 
that I really don't have to specify the collation in the SELECTs if it 
is connected to the index. Is that right?


It should work like that:

CREATE TABLE person (name text);
CREATE INDEX idx_name on person (name COLLATE umlaute);

and later on

SELECT name FROM person ORDER BY name;
/* whithout specifying the collation again! */

should give

  Ortin
  Ötzkök
  Pandulas
  Zurmühlen

instead of

  Ortin
  Pandulas
  Zurmühlen
  Ötzkök

I'm sure, that I'm not the first fellow, who would like, but I don't 
find a solution for my problem.


Any help will be very wellcome!

By the way: I define my db, tables and indices using a SQL script, which 
I execute by the .read command in sqlite3 (under Ubuntu Linux). But I 
don't find any possibility to define any collation. I just found how to 
use an allredy existing collation. Isn't it possible, what I want?


Another way could be possible: I use python as my programming language, 
with the apsw module to connect to the SQLite db. There is a method 
apsw.connection.createcollation, which registers a 
(python)-sorting-(collate-)function as a collation. That can be used 
later, for example in SELECTs. Could it also be used in CREATE TABLEs or 
CREATE INDEXs? In this case I could define my db, tables and indices 
within python/apsw instead of the SQL-script and .read. Right?


The background is very simple: in my db I store german people with their 
name, address and so on, and there are these "Umlaute" in the names, and 
the row name has its own index.


Now I brought more text then I would like to... but I hope for Your help!

Ulrich


--
Ulrich Goebel
Paracelsusstr. 120, 53177 Bonn
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users