Sean, thank you very much. Yes, that is true, I can look at the directionality 
just by comparing P(A and B) with P(A)*P(B) (where P here is the sample 
estimation of the probability of the event).

Thanks,
Nimrod

On Jun 21, 2012, at 4:55 PM, Sean Owen wrote:

> Is this not just a matter of comparing the frequency of "the" with
> "the the"? If "the" is 1/n of the words, then "the the" ought to be
> 1/n^2. If it's less, it's under-represented.
> 
> On Thu, Jun 21, 2012 at 9:01 PM, Nimrod Priell <[email protected]> 
> wrote:
>> I am wondering if there's a way to detect whether the deviation from 
>> independence is of the type that the co-occurrance is under-represented or 
>> over-represented w.r.t random sampling. Ideally, I'd like a measure on, say, 
>> (-inf, inf) where if the result is negative there is under-representation of 
>> the class where both A and B occur, and if it is positive, there is an 
>> overabundance of samples with (A intersection B).
>> 
>> My initial guess was that LLR(k_11, k_12, k_21, k_22) has one minima with 
>> respect to k_11, i.e. keeping all other parameters fixed, it will be 
>> decreasing with k_11 up to a point, then increasing. That minimum is 
>> obviously when the co-occurance is random.

Reply via email to