On Mon, Nov 30, 2009 at 06:37:11PM +0000, Simon Slavin wrote: > > On 30 Nov 2009, at 5:51pm, Nicolas Williams wrote: > > > Consider a column that contains a person's last name. Q: do proper > > names have a language? A: No, since people can be from all over and > > even within a single country may have last names of various radically > > different origins. > > But what is the purpose of collating a column ? Why, to allow it to > be indexed, of course. And for it to be indexed every value in the > column must be comparable to every other value. So it might be > sufficient to simply declare the column as having a language: > > ALTER TABLE ADD COLUMN familyname UNICODE LANG Deutsche > > Actually, we'd probably use ISO 639-3: > > ALTER TABLE ADD COLUMN familyname UNICODE LANG deu
There's already a COLLATE column-constraint. Given the use of Unicode 'collation' is approximately the same as 'language'. One use of it is to ensure that an index can be used to optimize ORDER BY clauses where the ORDER BY clause's collation is defaulted or the same as in the index, but it also affects comparisons, even equality comparisons. > That would be sufficient to allow the standard SQL functions like > indexing and comparison to be implemented. The column 'language' > could perhaps be absolute, or perhaps be used as a default if the > individual values did not declare a language. On the other hand, it > might perhaps not be necessary to declare the language for each > column: it's likely that all columns for any database would want to > use the same language for collation. See my previous message: it would make no sense to have a column with data-dependent collations. But perhaps I'm missing something. Can you describe the semantics of data-dependent collations? > > Note too that Unicode has codepoints for specifying the language that > > the subsequent text is written in. > > I did not know this ! This makes things simpler. Are you talking > about > > http://unicode.org/reports/tr35/ > > This appears to be a way of specifying a language outside of the text > stream, not inside it. No, I meant this: http://unicode.org/unicode/faq/languagetagging.html#2 http://www.unicode.org/versions/Unicode5.0.0/ch16.pdf#G17521 (section 16.9) Note that use of Unicode language tags is discouraged. The same reasons apply here, IMO. Nico -- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users