[Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by HossMan

Apache Wiki Tue, 14 Mar 2006 15:31:46 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by HossMan:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

------------------------------------------------------------------------------
  
  }}}
  
+ Keep in mind that while the SynonymFilter will happily work with symonyms 
containing multiple words (ie: "`sea biscuit, sea biscit, seabiscuit`") The 
recommended approach for dealing with synonyms like this, is to expand the 
synonym when indexing.  This is because there are two potential issues that can 
arrise at query time:
+ 
+  1. The Lucene QueryParser tokenizes on white space before giving any text to 
the Analyzer, so if a person searches for the words `sea biscit` the analyzer 
will be given the words "sea" and "biscit" seperately, and will not know that 
they match a synonym.
+  1. Phrase searching (ie: `"sea biscit"`) will cause the QueryParser to pass 
the entire string to the analyzer, but if the SynonymFilter is configured to 
expand the synonyms, then when the QueryParser gets the resulting list of 
tokens back from the Analyzer, it will construct a MultiPhraseQuery that will 
not have the desired effect.  This is because of the limited mechanism 
available for the Analyzer to indicate that two terms occupy the same position: 
there is no way to indicate that a "phrase" occupies the same position as a 
term.  For our example the resulting MultiPhraseQuery would be `"(sea | sea | 
seabiscuit) (biscuit | biscit)"` which would not match the simple case of 
"seabisuit" occuring in a document
+

[Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by HossMan

Reply via email to