You might investigate some tools like Alias-i's LingPipe or do some
searches for phrase recognition software, etc.
-Grant
On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:
I'm currently looking at methods of term extraction and automatic
keyword
generation from indexed documents. I've been experimenting with
MoreLikeThis and values returned by the "mlt.interestingTerms"
parameter and
so far this approach has worked well. However, I'd like to be able to
analyze documents more intelligently to recognize phrase keywords
such as
"open source", "Microsoft Office", "Bill Gates" rather than
splitting each
word into separate tokens (the field is never used in search
queries so
matching is not an issue). I've been looking at
SynonymFilterFactory as a
possible solution to this problem but haven't been able to work out
the
specifics of how to configure it for phrase mappings.
Has anybody else dealt with this problem before or able to offer any
insights into achieve the desired results?
Thanks in advance,
Pieter
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ