I'm not sure what sort of "field" you mean for defining the
language.

If you plan to use a single search UI regardless of language,
we used to do this in Ultraseek, but it doesn't really work.
Queries are too short for reliable language ID (is "die" in
German, English, or Latin?), and language-specific processing
can be pretty differnent.

We ran into surface words that collided in different languages.
As I remember, "mobile" is a plural noun in Dutch but a verb in
English.

Finally, Solr lingustic support is OK for English, but not as
good for more heavily-inflected langauge. For German, you
really need to decompose compound words, something not available
in Solr.

The only semi-successful cross-language search seems to be with
n-gram indexing. That usually produces a larger index and somewhat
slower performance (because of the number of terms), but at least
it works.

wunder

On 6/7/07 10:47 AM, "Daniel Alheiros" <[EMAIL PROTECTED]> wrote:

> I have to index and search content in several languages.
> 
> My scenario is a bit different from other that I've already read in this
> forum, as my client is the same to search any language and it could be
> accomplished using a field to define language.

Reply via email to