Re: Term extraction

2007-09-22 Thread Brian Whitman
On Sep 21, 2007, at 3:37 AM, Pieter Berkel wrote: Thanks for the response guys: Grant: I had a brief look at LingPipe, it looks quite interesting but I'm concerned that the licensing may prevent me from using it in my project. Does the opennlp license look good for you? It's LGPL. Not

Re: Term extraction

2007-09-21 Thread Pieter Berkel
Thanks for the response guys: Grant: I had a brief look at LingPipe, it looks quite interesting but I'm concerned that the licensing may prevent me from using it in my project. Michael: I have used the Yahoo API in the past but due to it's generic nature, I wasn't entirely happy with the results

Re: Term extraction

2007-09-21 Thread Yonik Seeley
On 9/21/07, Pieter Berkel [EMAIL PROTECTED] wrote: Yonik: This is the approach I had in mind, will it still work if I put the SynonymFilter after the word-delimiter filter in the schema config? SynonymFilter doesn't currently have the capability to handle multiple tokens at the same position in

Re: Term extraction

2007-09-20 Thread Michael Kimsal
Not sure if this is in the same league or not, but Yahoo offers a term extraction web service. http://developer.yahoo.com/search/content/V1/termExtraction.html On 9/20/07, Grant Ingersoll [EMAIL PROTECTED] wrote: You might investigate some tools like Alias-i's LingPipe or do some searches

Re: Term extraction

2007-09-20 Thread Yonik Seeley
On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote: However, I'd like to be able to analyze documents more intelligently to recognize phrase keywords such as open source, Microsoft Office, Bill Gates rather than splitting each word into separate tokens (the field is never used in search queries

Term extraction

2007-09-19 Thread Pieter Berkel
I'm currently looking at methods of term extraction and automatic keyword generation from indexed documents. I've been experimenting with MoreLikeThis and values returned by the mlt.interestingTerms parameter and so far this approach has worked well. However, I'd like to be able to analyze

Re: Term extraction

2007-09-19 Thread Brian Whitman
On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote: I'm currently looking at methods of term extraction and automatic keyword generation from indexed documents. We do it manually (not in solr, but we put the results in solr.) We do it the usual way - chunk (into n-grams, named entities

Re: Term extraction

2007-09-19 Thread Pieter Berkel
: On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote: I'm currently looking at methods of term extraction and automatic keyword generation from indexed documents. We do it manually (not in solr, but we put the results in solr.) We do it the usual way - chunk (into n-grams, named entities noun