On Thu, May 27, 2010 at 12:03 PM, Grant Ingersoll <[email protected]>wrote:
> > I just want to supplement my docs with some "high quality" collocations. > TF-IDF is good enough, just not clear how best to get them out at this > point, on a per doc basis. > You could use the CollocDriver to get a sense of the LLR range for your corpus and then provide a minLLR as an argument to seq2sparse -- that said, it doesn't necessarilly address the issue of collocations with a high LLR but are made up of words with a high frequency in the corpus. This might not be an issue for you however.
