For retrieval, I have had very good results in just retaining high LLR
collocations and letting any subsequent processing deal with the weighting.

On the other hand, I just saw this article which tested collocations for
spam detection and got no lift because the individual constituent words were
carrying the weight already.  (
http://www.aueb.gr/users/ion/docs/TR2004_updated.pdf linked from
http://aclweb.org/aclwiki/index.php?title=Spam_filtering_datasets)

On Thu, May 27, 2010 at 9:03 AM, Grant Ingersoll <[email protected]>wrote:

> > There may be use cases for keeping LLR if only for diagnostic purposes.
>
> I just want to supplement my docs with some "high quality" collocations.
>  TF-IDF is good enough, just not clear how best to get them out at this
> point, on a per doc basis.

Reply via email to