Finding Keywords/Phrases

2011-06-12 Thread Frank A
I have a single copyfield that has a number of other fields copied to it.
I'm trying to extract a list of keywords and common terms.  I realize it
may not be a 100% dynamic and I may need to manually filter.  Right now I
tried using a CommonGrams filter.  However, what I see is it creates tokens
for both hot dog and hot dog.  Is there anyway from within solr
configuration to treat hot's frequency as hot when not followed by dog.
For example, right now I may see a term/frequency of:

hot   8
dog  6
hot dog  6

What I really want is:

hot dog 6
hot 2

Any ideas?


Re: Finding Keywords/Phrases

2011-06-12 Thread Adam Estrada
Hi Frank,

I have been working on something very similar and I am at the point where I
don't believe (and I could be totally wrong) that a pure Solr solution is
going to do this. I would look at Mahout and play with some of the machine
learning algorithms that it can run against a Lucene index. I have not
gotten any further than experimenting with it right now but so far it looks
promising.

Adam

On Sun, Jun 12, 2011 at 10:20 AM, Frank A fsa...@gmail.com wrote:

 I have a single copyfield that has a number of other fields copied to it.
 I'm trying to extract a list of keywords and common terms.  I realize it
 may not be a 100% dynamic and I may need to manually filter.  Right now I
 tried using a CommonGrams filter.  However, what I see is it creates tokens
 for both hot dog and hot dog.  Is there anyway from within solr
 configuration to treat hot's frequency as hot when not followed by dog.
 For example, right now I may see a term/frequency of:

 hot   8
 dog  6
 hot dog  6

 What I really want is:

 hot dog 6
 hot 2

 Any ideas?