Hi Frank,

I have been working on something very similar and I am at the point where I
don't believe (and I could be totally wrong) that a pure Solr solution is
going to do this. I would look at Mahout and play with some of the machine
learning algorithms that it can run against a Lucene index. I have not
gotten any further than experimenting with it right now but so far it looks
promising.

Adam

On Sun, Jun 12, 2011 at 10:20 AM, Frank A <fsa...@gmail.com> wrote:

> I have a single copyfield that has a number of other fields copied to it.
> I'm trying to "extract" a list of keywords and common terms.  I realize it
> may not be a 100% dynamic and I may need to manually filter.  Right now I
> tried using a CommonGrams filter.  However, what I see is it creates tokens
> for both "hot" "dog" and "hot dog".  Is there anyway from within solr
> configuration to treat "hot"'s frequency as "hot when not followed by dog".
> For example, right now I may see a term/frequency of:
>
> hot   8
> dog  6
> hot dog  6
>
> What I really want is:
>
> hot dog 6
> hot 2
>
> Any ideas?
>

Reply via email to