Sean, thank you very much. Yes, that is true, I can look at the directionality just by comparing P(A and B) with P(A)*P(B) (where P here is the sample estimation of the probability of the event).
Thanks, Nimrod On Jun 21, 2012, at 4:55 PM, Sean Owen wrote: > Is this not just a matter of comparing the frequency of "the" with > "the the"? If "the" is 1/n of the words, then "the the" ought to be > 1/n^2. If it's less, it's under-represented. > > On Thu, Jun 21, 2012 at 9:01 PM, Nimrod Priell <[email protected]> > wrote: >> I am wondering if there's a way to detect whether the deviation from >> independence is of the type that the co-occurrance is under-represented or >> over-represented w.r.t random sampling. Ideally, I'd like a measure on, say, >> (-inf, inf) where if the result is negative there is under-representation of >> the class where both A and B occur, and if it is positive, there is an >> overabundance of samples with (A intersection B). >> >> My initial guess was that LLR(k_11, k_12, k_21, k_22) has one minima with >> respect to k_11, i.e. keeping all other parameters fixed, it will be >> decreasing with k_11 up to a point, then increasing. That minimum is >> obviously when the co-occurance is random.
