You can try indexing all 2-grams, 3-grams, and 4-grams in your corpus. Then 
you can examine all the terms in your index and see which n-grams are used 
the most.

On 9/12/05, Wilkerson, Cory <[EMAIL PROTECTED]> wrote:
> 
> So...I've had good/great luck finding all terms in my index using the
> Lucene API - life is good. Now - I'm trying to take things a step
> further and find sequences of key words (maybe two/three/four word
> combinations). It's great that I can find "new" and "orleans", but I'm
> mostly interested in articles that contain "new orleans". I realize I
> can *search* for these terms but I'm more interested in writing an
> engine that says "Hey, these sequences seem to be fairly important
> because they're occurring quite a bit across this index."
> 
> Any suggestions?
> Cory Wilkerson
> 



-- 
Andy Liu
[EMAIL PROTECTED]
(301) 873-8458

Reply via email to