No, there isn't a tokenizer that'll do what you want that I know
about. Really, I suspect you need to back up a bit and re-think the
problem. It looks to me like you've taken a path that's going to cause
you endless grief when, as Jack says, phrase searches are built in to
the tokenization process.

Best,
Erick


On Wed, Apr 2, 2014 at 12:58 PM, Jack Krupansky <j...@basetechnology.com> wrote:
> Query by phrase is a core feature of tokenized text in Lucene and Solr, so
> there is no need to use a pattern token filter for that purpose. And yes,
> doing so pretty much breaks most token filters that would assume that the
> text is tokenized.
>
> -- Jack Krupansky
>
> -----Original Message----- From: solr-user
> Sent: Wednesday, April 2, 2014 12:46 PM
> To: solr-user@lucene.apache.org
>
> Subject: Re: how do I get search for "fort st john" to match "ft saint john"
>
> Hi Eric.
>
> No, that doesnt fix the problem either (I have tested this previously and
> did so again just now)
>
> Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
> since I want the user to search by phrase), the phrase "marina former fort
> ord" (for example) does not get turned into four tokens ("marina", "former",
> "fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
> for them (by design)
>
> the original question remains: is there a tokenizer/plugin that will allow
> me to synonym words in a unbroken phrase?
>
> note: the reason I dont want to tokenize the data by whitespace is that it
> would cause way to many results to get returned if I, for example, search on
> "new" or "st" ...  However, I still want to be able to include "fort saint
> john" in the results if the user searches for "ft st john" or "fort st john"
> or ...
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to