Hello, https://www.sqlite.org/fts3.html#tokenizer page says that unicode61 tokenizer implements _full_ case folding (it doesn't emphasize the word, but it's there). ftp://unicode.org/Public/6.1.0/ucd/CaseFolding.txt has the following rules:
-- cut -- ... 00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S ... 1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S 1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S ... -- cut -- I.e. in _full_ case folding both "?" (U+1E9E) and "?" (U+00DF) are mapped to "ss", whereas in _simple_ case folding first one is mapped to the second. SQLite 3.11.0 works according to simple rules: -- cut -- CREATE VIRTUAL TABLE t USING fts3tokenize(unicode61); SELECT token FROM t WHERE input = "? ?"; -- cut -- gives -- cut-- ? ? -- cut-- So which one is correct, documentation or implementation? I also wonder what a native German speaker would expect in full-text search case? (Google gives different result counts for "Schlo?" and "Schloss", which actually surprises me a bit). -- Tomash Brechko

