You could presumably do it with solr.PatternTokenizerFactory with the pattern 
set to .* as your <tokenizer>

Or, maybe, if Solr allows it, you don't use any tokenizer at all?

Or, maybe you could use solr.WhitespaceTokenizerFactory, allowing it to split 
up the words, along with solr.WordDelimiterFilterFactory with catenateWords="1" 
to put them back together (with the other parameters set to 0).  My guess is 
that that will not work -- that once the tokenizer has split up the words, a 
filter doesn't see them all together after that.

You can use the "analyze" capability on the /solr/admin page to see what will 
happen under various test scenarios without having to actually load up a bunch 
of documents.

Then you could use solr.SynonymFilterFactory to do your synonym processing 
<filter>



-----Original Message-----
From: Will Milspec [mailto:will.mils...@gmail.com] 
Sent: Wednesday, August 17, 2011 9:02 PM
To: solr-user@lucene.apache.org
Subject: Synonym and Whitespaces and optional TokenizerFactory

Hi all,

This may be obvious. My question pertains to use of tokenizerFactory
together with SynonymFilterFactory. Which tokenizerFactory does one  use to
treat "synonyms with spaces" as one token,

Example these two entries are synonyms: "lms", "learning management system"

index time expansion would expand "lms" to these terms
           "lms"
           "learning management system"

i.e. not  like this:
           "lms"
           "learning"
           "management"
           "system"

Excerpt from the wiki article:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
<quote>
The optional *tokenizerFactory* parameter names a tokenizer factory class to
analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319), which
can help with the synonym+stemming problem described in
http://search-lucene.com/m/hg9ri2mDvGk1 .
</quote>

thanks,

will

Reply via email to