Thanks for the response guys:

Grant: I had a brief look at LingPipe, it looks quite interesting but I'm
concerned that the licensing may prevent me from using it in my project.
Michael: I have used the Yahoo API in the past but due to it's generic
nature, I wasn't entirely happy with the results in my test cases.
Yonik: This is the approach I had in mind, will it still work if I put the
SynonymFilter after the word-delimiter filter in the schema config? Ideally
I want to strip out the underscore char before it gets indexed, is that
possible by using a PatternReplaceFilterFactory after the SynonymFilter?

Cheers,
Piete



On 21/09/2007, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> On 9/19/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:
> > However, I'd like to be able to
> > analyze documents more intelligently to recognize phrase keywords such
> as
> > "open source", "Microsoft Office", "Bill Gates" rather than splitting
> each
> > word into separate tokens (the field is never used in search queries so
> > matching is not an issue).  I've been looking at SynonymFilterFactory as
> a
> > possible solution to this problem but haven't been able to work out the
> > specifics of how to configure it for phrase mappings.
>
> SynonymFilter works out-of-the-box with multi-token synonyms...
>
> Microsoft Office => microsoft_office
> Bill Gates, William Gates => bill_gates
>
> Just don't use a word-delimiter filter if you use underscore to join
> words.
>
> -Yonik
>

Reply via email to