You could presumably do it with solr.PatternTokenizerFactory with the pattern set to .* as your <tokenizer>
Or, maybe, if Solr allows it, you don't use any tokenizer at all? Or, maybe you could use solr.WhitespaceTokenizerFactory, allowing it to split up the words, along with solr.WordDelimiterFilterFactory with catenateWords="1" to put them back together (with the other parameters set to 0). My guess is that that will not work -- that once the tokenizer has split up the words, a filter doesn't see them all together after that. You can use the "analyze" capability on the /solr/admin page to see what will happen under various test scenarios without having to actually load up a bunch of documents. Then you could use solr.SynonymFilterFactory to do your synonym processing <filter> -----Original Message----- From: Will Milspec [mailto:will.mils...@gmail.com] Sent: Wednesday, August 17, 2011 9:02 PM To: solr-user@lucene.apache.org Subject: Synonym and Whitespaces and optional TokenizerFactory Hi all, This may be obvious. My question pertains to use of tokenizerFactory together with SynonymFilterFactory. Which tokenizerFactory does one use to treat "synonyms with spaces" as one token, Example these two entries are synonyms: "lms", "learning management system" index time expansion would expand "lms" to these terms "lms" "learning management system" i.e. not like this: "lms" "learning" "management" "system" Excerpt from the wiki article: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters <quote> The optional *tokenizerFactory* parameter names a tokenizer factory class to analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319), which can help with the synonym+stemming problem described in http://search-lucene.com/m/hg9ri2mDvGk1 . </quote> thanks, will