Most correlation measures have trouble with small counts. They ascribe very high score to coincidence (hence the title of the original paper)
Sent from my iPhone On Jun 21, 2012, at 2:01 PM, Nimrod Priell <[email protected]> wrote: > > I did note Lingpipe uses a different type of scoring, Pearson C_2 goodness of > fit (it seems different from LLR, but I didn't dig deep) to do their > collocation scoring: > http://alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html > (the exact method is documented in the code, > http://alias-i.com/lingpipe/docs/api/com/aliasi/lm/TokenizedLM.html#chiSquaredIndependence(int[]) > ). Is that method a good way to capture what I'd like?
