Thanks folks for taking a look. I haven't sat down to try it yet, but wondering how hard it is to construct (realizable and realistic) k11, k12, k21, k22 values for three binary sequences X, Y, Z where (X,Y) and (Y,Z) have same co-occurrence, but you can tweak k12 and k21 so that the LLR values are extremely different in both directions. I assume that k22 doesn't matter much in practice since things are sparse and k22 is huge. Well, obviously, I guess you could simply switch the k12/k21 values between the two sequence pairs to flip the order at will... which is information that co-occurrence of course does not "know about".
On Sat, Aug 17, 2013 at 10:30 PM, Ted Dunning <[email protected]> wrote: > This is nice. As you say, k11 is the only part that is used in > cooccurrence and it doesn't weight by prevalence, either. > > This size analysis is hard to demonstrate much difference because it is > hard to show interesting values of LLR without absurdly string coordination > between items. > > > On Fri, Aug 16, 2013 at 8:21 PM, B Lyon <[email protected]> wrote: > > > As part of trying to get a better grip on recommenders, I have started a > > simple interactive visualization that begins with the raw data of > user-item > > interactions and goes all the way to being able to twiddle the > interactions > > in a test user vector to see the impact on recommended items. This is > for > > simple "user interacted with an item" case rather than numerical > > preferences for items. The goal is to show the intermediate pieces and > how > > they fit together via popup text on mouseovers and dynamic highlighting > of > > the related pieces. I am of course interested in feedback as I keep > > tweaking on it - not sure I got all the terminology quite right yet, for > > example, and might have missed some other things I need to know about. > > Note that this material is covered in Chapter 6.2 in MIA in the > discussion > > on distributed recommenders. > > > > It's on googledrive here (very much a work-in-progress): > > > > https://googledrive.com/host/0B2GQktu-wcTiWHRwZFJacjlqODA/ > > > > (apologies to small resolution screens) > > > > This is based only on the co-occurrence matrix, rather than including the > > other similarity measures, although in working through this, it seems > that > > the other ones can just be interpreted as having alternative definitions > of > > what "*" means in matrix multiplication of A^T*A, where A is the > user-item > > matrix... and as an aside to me begs the interesting question of [purely > > hypotheticall?] situations where LLR and co-occurrence are at odds with > > each other in making recommendations, as co-occurrence seems to be just > > using the "k11" term that is part of the LLR calculation. > > > > My goal (at the moment at least) is to eventually continue this for the > > solr-recommender project that started as few weeks ago, where we have the > > additional cross-matrix, as well as a kind of regrouping of pieces for > > solr. > > > > > > -- > > BF Lyon > > http://www.nowherenearithaca.com > > > -- BF Lyon http://www.nowherenearithaca.com
