icult to get
the query language. Any thoughts/ideas on turning stemming on/off?
Thanks
Prabhu
-Original Message-
From: Dominique Bejean [mailto:dominique.bej...@eolya.fr]
Sent: 06 April 2012 10:58
To: solr-user@lucene.apache.org
Subject: Re: Choosing tokenizer based on language of document
Hi,
Yes, I agree it is not an easy issue. Index all languages with the
appropriate char filter, tokenizer and filters for each language is not
possible without new text type and new analyzer development.
If you plan to index up to 10 different languages, I suggest one text
field per language
This is really difficult to imagine working well. Even if you
do choose the appropriate analysis chain (and it must
be a chain here), and manage to appropriately tokenize
for each language, what happens at query time?
How do you expect to get matches on, say, Ukranian when
the tokens of the query
Hi,
I have documents in different languages and I want to choose the
tokenizer to use for a document based on the language of the document. The
language of the document is already known and is indexed in a field. What I
want to do is when I index the text in the document, I want to choose