both Elkan's work and Yahoo's paper are based on the notion (which is confirmed by SGD experience) that if we try to substitute missing data with neutral values, the whole learning falls apart. Sort of.
I.e. if we always know some context A (in this case, static labels and dyadic ids) and only sometimes some context B, then assuming neutral values for context B if we are missing this data is invalid because we are actually substituting unknown data with made-up data. Which is why SGD produces higher errors than necessary on sparsified label data. this is also the reason why SVD recommenders produce higher errors over sparse sample data as well (i think that's the consensus). However, thinking in offline-ish mode, if we learn based on samples with A data, then freeze the learner and learn based on error between frozen learner for A and only the input that has context B, for learner B, then we are not making the mistake per above. At no point our learner takes any 'made-up' data. This whole notion is based on Bayesian inference process: what can you say if you only know A; and what correction would you make if you also new B. Both papers do a corner case out of this: we have two types of data, A and B, and we learn A then freeze leaner A, then learn B where available. But general case doesn't have to be A and B. Actually that's our case (our CEO calls it 'trunk-brunch-leaf' case): We always know some context A, and sometimes B, and also sometimes we know all of A, B and some addiional context C. so there's a case to be made to generalize the inference architecture: specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever else. -d On Wed, Feb 2, 2011 at 12:14 AM, Sebastian Schelter <[email protected]> wrote: > Hi Ted, > > I looked through the paper a while ago. The approach seems to have great > potential, especially because of the ability to include side information and > to work with nominal and ordinal data. Unfortunately I have to admit that a > lot of the mathematical details overextend my understanding. I'd be ready to > assist anyone willing to build a recommender from that approach but it's not > a thing I could tackle on my own. > > --sebastian > > PS: The algorithm took 7 minutes to learn from the movielens 1M dataset, > not Netflix. > > > On 01.02.2011 18:02, Ted Dunning wrote: > >> >> Sebastian, >> >> Have you read the Elkan paper? Are you interested in (partially) content >> based recommendation? >> >> On Tue, Feb 1, 2011 at 2:02 AM, Sebastian Schelter <[email protected]<mailto: >> [email protected]>> wrote: >> >> Hi Gökhan, >> >> I wanna point you to some papers I came across that deal with >> similar problems: >> >> "Google News Personalization: Scalable Online Collaborative >> Filtering" ( http://www2007.org/papers/paper570.pdf ), this paper >> describes how Google uses three algorithms (two of which cluster >> the users) to achieve online recommendation of news articles. >> >> "Feature-based recommendation system" ( >> http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ), >> this approach didn't really convince me and I think the paper is >> lacking a lot of details, but it might still be an interesting read. >> >> --sebastian >> >> On 01.02.2011 00:26, Gökhan Çapan wrote: >> >> Hi, >> >> I've made a search, sorry in case this is a double post. >> Also, this question may not be directly related to Mahout. >> >> Within a domain which is enitrely user generated and has a >> very big item >> churn (lots of new items coming, while some others leaving the >> system), what >> do you recommend to produce accurate recommendations using >> Mahout (Not just >> Taste)? >> >> I mean, as a concrete example, in the eBay domain, not Amazon's. >> >> Currently I am creating item clusters using LSH with MinHash >> (I am not sure >> if it is in Mahout, I can contribute if it is not), and produce >> recommendations using these item clusters (profiles). When a >> new item >> arrives, I find its nearest profile, and recommend the item >> where its >> belonging profile is recommended to. Do you find this approach >> good enough? >> >> If you have a theoretical idea, could you please point me to >> some related >> papers? >> >> (As an MSc student, I can implement this as a Google Summer of >> Code project, >> with your mentoring.) >> >> Thanks in advance >> >> >> >> >
