Which is why LLR would be really nice in two action cross-similairty case. The cross-corelation sparsification via cooccurrence is probably pretty weak, no?
On Aug 18, 2013, at 11:53 AM, Ted Dunning <[email protected]> wrote: Outside of the context of your demo, suppose that you have events a, b, c and d. Event a is the one we are centered on and is relatively rare. Event b is not so rare, but has weak correlation with a. Event c is as rare as a, but correlates strongly with it. Even d is quite common, but has no correlation with a. The 2x2 matrices that you would get would look something like this. In each of these, a and NOT a are in rows while other and NOT other are in columns. versus b, llrRoot = 8.03 b NOT b a *10* *10* NOT a *1000* *99000* versus c, llrRoot = 11.5 c NOT c a *10* *10* NOT a *30* *99970* versus d, llrRoot = 0 d NOT d a *10* *10* NOT a *50000* *50000* Note that what we are holding constant here is the prevalence of a (20 times) and the distribution of a under the conditions of the other symbol. What is being varied is the distribution of the other symbol in the "NOT a" case. On Sun, Aug 18, 2013 at 10:50 AM, B Lyon <[email protected]> wrote: > Thanks folks for taking a look. > > I haven't sat down to try it yet, but wondering how hard it is to construct > (realizable and realistic) k11, k12, k21, k22 values for three binary > sequences X, Y, Z where (X,Y) and (Y,Z) have same co-occurrence, but you > can tweak k12 and k21 so that the LLR values are extremely different in > both directions. I assume that k22 doesn't matter much in practice since > things are sparse and k22 is huge. Well, obviously, I guess you could > simply switch the k12/k21 values between the two sequence pairs to flip the > order at will... which is information that co-occurrence of course does not > "know about". > > > On Sat, Aug 17, 2013 at 10:30 PM, Ted Dunning <[email protected]> > wrote: > >> This is nice. As you say, k11 is the only part that is used in >> cooccurrence and it doesn't weight by prevalence, either. >> >> This size analysis is hard to demonstrate much difference because it is >> hard to show interesting values of LLR without absurdly string > coordination >> between items. >> >> >> On Fri, Aug 16, 2013 at 8:21 PM, B Lyon <[email protected]> wrote: >> >>> As part of trying to get a better grip on recommenders, I have started > a >>> simple interactive visualization that begins with the raw data of >> user-item >>> interactions and goes all the way to being able to twiddle the >> interactions >>> in a test user vector to see the impact on recommended items. This is >> for >>> simple "user interacted with an item" case rather than numerical >>> preferences for items. The goal is to show the intermediate pieces and >> how >>> they fit together via popup text on mouseovers and dynamic highlighting >> of >>> the related pieces. I am of course interested in feedback as I keep >>> tweaking on it - not sure I got all the terminology quite right yet, > for >>> example, and might have missed some other things I need to know about. >>> Note that this material is covered in Chapter 6.2 in MIA in the >> discussion >>> on distributed recommenders. >>> >>> It's on googledrive here (very much a work-in-progress): >>> >>> https://googledrive.com/host/0B2GQktu-wcTiWHRwZFJacjlqODA/ >>> >>> (apologies to small resolution screens) >>> >>> This is based only on the co-occurrence matrix, rather than including > the >>> other similarity measures, although in working through this, it seems >> that >>> the other ones can just be interpreted as having alternative > definitions >> of >>> what "*" means in matrix multiplication of A^T*A, where A is the >> user-item >>> matrix... and as an aside to me begs the interesting question of > [purely >>> hypotheticall?] situations where LLR and co-occurrence are at odds with >>> each other in making recommendations, as co-occurrence seems to be just >>> using the "k11" term that is part of the LLR calculation. >>> >>> My goal (at the moment at least) is to eventually continue this for the >>> solr-recommender project that started as few weeks ago, where we have > the >>> additional cross-matrix, as well as a kind of regrouping of pieces for >>> solr. >>> >>> >>> -- >>> BF Lyon >>> http://www.nowherenearithaca.com >>> >> > > > > -- > BF Lyon > http://www.nowherenearithaca.com >
