You might investigate some tools like Alias-i's LingPipe or do some searches for phrase recognition software, etc.

-Grant

On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:

I'm currently looking at methods of term extraction and automatic keyword
generation from indexed documents.  I've been experimenting with
MoreLikeThis and values returned by the "mlt.interestingTerms" parameter and
so far this approach has worked well.  However, I'd like to be able to
analyze documents more intelligently to recognize phrase keywords such as "open source", "Microsoft Office", "Bill Gates" rather than splitting each word into separate tokens (the field is never used in search queries so matching is not an issue). I've been looking at SynonymFilterFactory as a possible solution to this problem but haven't been able to work out the
specifics of how to configure it for phrase mappings.

Has anybody else dealt with this problem before or able to offer any
insights into achieve the desired results?

Thanks in advance,
Pieter

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ


Reply via email to