It seems that Solr's query parser doesn't pass a single term query
to the Analyzer for the field. For example, if I give it
2001年 (year 2001 in Japanese), the searcher returns 0 hits 
but if I quote them with double-quotes, it returns hits. 
In this experiment, I configured schema.xml so that
the field in question will use the morphological Analyzer 
my company makes that is capable of splitting 2001年  
into two tokens 2001 and 年.  I am guessing that this
Analyzer is called ONLY IF the term is a phrase.
Is my observation correct?

If so, is there any configuration parameter that I can tweak 
to force any query for the text fields be processed by 
the Analyzer?

One might ask why users won't put space between 2001 and 年.
Well if they are clearly two separate words, people do that.
But 年 works more like a suffix in this case, and in many
Japanese speaker's mind, 2001年 seems like one token, so
many people won't.  (Remember Japanese don't use spaces
in normal writing.)  Forcing to use Analyzer would also
be useful for compound word handling often desirable
for languages like German.

----
Teruhiko "Kuro" Kurosaka
RLP + Lucene & Solr = powerful search for global contents

Reply via email to