Daniel, I was reading your email and responses to it with great interest. I was aware that Solr has an implicit assumption that a field is mono-lingual per system. But your mail and its correspondence made me wonder if this limitation is practical for multi-lingual search applications. For bi-lingual or tri-lingual search, we can have parallel fields (title_en, title_fr, title_de, for example) but this wouldn't scale well.
Assume we are making a search application for multi-lingual library in a university in Japan, for example, the application would have a book title field in Japanese, perhaps another title field in English for visiting scholars, and a title field in the original language. The last field's field would vary among more than 50 modern languages (and not so modern languages like Latin). Solr may need some rearchitecutring in this area. I work for a company called Basis Technology, (www.basistech.com) which develops a suite of language processing software and I've written a module to integrate this with Solr (and Lucene in general). The module is made of a universal Tokenizer and Analyzers for English and Japanese, but they can be modified easily to handle any of the 16 languages we can handle. (Source code is provided.) When I was developing this module, I thought of writing a super Analyzer that automatically detects the language and do the right thing. But I've found this won't fit well with the design of Lucene and Solr. For one thing, there is no way to save the detected language in the field, if the language is detected within the Analyzer. Lucene and Solr requires that the language be known before an Analyzer can be instantiated,and it's the Analyzer that detects the language in my design.... A second obstacle is that the kinds of Filters the Analyzer use depends on the language, so it must be dynamically changed. This could be done programatically but it's not easy. My big hope is that we can work together to come up with some way so that the detected language within the Analayzer can somehow be retrieved and made it into the field. Anyway, if you are interested in trying my multi-lingual Analyzers, please contact me in private email. Regards, -kuro