On Thu, May 27, 2010 at 12:03 PM, Grant Ingersoll <[email protected]>wrote:

>
> I just want to supplement my docs with some "high quality" collocations.
>  TF-IDF is good enough, just not clear how best to get them out at this
> point, on a per doc basis.
>

You could use the CollocDriver to get a sense of the LLR range for your
corpus and then provide a minLLR as an argument to  seq2sparse -- that said,
it doesn't necessarilly address the issue of collocations with a high LLR
but are made up of words with a high frequency in the corpus. This might not
be an issue for you however.

Reply via email to