Most correlation measures have trouble with small counts. They ascribe very 
high score to coincidence (hence the title of the original paper)

Sent from my iPhone

On Jun 21, 2012, at 2:01 PM, Nimrod Priell <[email protected]> wrote:

> 
> I did note Lingpipe uses a different type of scoring, Pearson C_2 goodness of 
> fit (it seems different from LLR, but I didn't dig deep) to do their 
> collocation scoring: 
> http://alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html 
> (the exact method is documented in the code, 
> http://alias-i.com/lingpipe/docs/api/com/aliasi/lm/TokenizedLM.html#chiSquaredIndependence(int[])
>  ). Is that method a good way to capture what I'd like?

Reply via email to