Actually, our case is even a little more complex: our hierarchy may be A/[B/[C|D]], i.e. for some inputs full hierarchy is A/B/C and for some inputs it is A/B/D, mutually exclusive. Technically, both hierarchies could be re-learned independently; but it stands to reason that A and B learners do not have to be re-learned independently just to save on the computation.
Ted has mentioned there's a hierarchy in Mahout, i wonder if it can handle the case presented, and what class i might look at to see how to set this up. -d On Wed, Feb 2, 2011 at 2:43 PM, Dmitriy Lyubimov <[email protected]> wrote: > both Elkan's work and Yahoo's paper are based on the notion (which is > confirmed by SGD experience) that if we try to substitute missing data with > neutral values, the whole learning falls apart. Sort of. > > I.e. if we always know some context A (in this case, static labels and > dyadic ids) and only sometimes some context B, then assuming neutral values > for context B if we are missing this data is invalid because we are actually > substituting unknown data with made-up data. Which is why SGD produces > higher errors than necessary on sparsified label data. this is also the > reason why SVD recommenders produce higher errors over sparse sample data as > well (i think that's the consensus). > > However, thinking in offline-ish mode, if we learn based on samples with A > data, then freeze the learner and learn based on error between frozen > learner for A and only the input that has context B, for learner B, then we > are not making the mistake per above. At no point our learner takes any > 'made-up' data. > > This whole notion is based on Bayesian inference process: what can you say > if you only know A; and what correction would you make if you also new B. > > Both papers do a corner case out of this: we have two types of data, A and > B, and we learn A then freeze leaner A, then learn B where available. > > But general case doesn't have to be A and B. Actually that's our case (our > CEO calls it 'trunk-brunch-leaf' case): We always know some context A, and > sometimes B, and also sometimes we know all of A, B and some addiional > context C. > > so there's a case to be made to generalize the inference architecture: > specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever else. > > -d > > > On Wed, Feb 2, 2011 at 12:14 AM, Sebastian Schelter <[email protected]>wrote: > >> Hi Ted, >> >> I looked through the paper a while ago. The approach seems to have great >> potential, especially because of the ability to include side information and >> to work with nominal and ordinal data. Unfortunately I have to admit that a >> lot of the mathematical details overextend my understanding. I'd be ready to >> assist anyone willing to build a recommender from that approach but it's not >> a thing I could tackle on my own. >> >> --sebastian >> >> PS: The algorithm took 7 minutes to learn from the movielens 1M dataset, >> not Netflix. >> >> >> On 01.02.2011 18:02, Ted Dunning wrote: >> >>> >>> Sebastian, >>> >>> Have you read the Elkan paper? Are you interested in (partially) content >>> based recommendation? >>> >>> On Tue, Feb 1, 2011 at 2:02 AM, Sebastian Schelter <[email protected]<mailto: >>> [email protected]>> wrote: >>> >>> Hi Gökhan, >>> >>> I wanna point you to some papers I came across that deal with >>> similar problems: >>> >>> "Google News Personalization: Scalable Online Collaborative >>> Filtering" ( http://www2007.org/papers/paper570.pdf ), this paper >>> describes how Google uses three algorithms (two of which cluster >>> the users) to achieve online recommendation of news articles. >>> >>> "Feature-based recommendation system" ( >>> http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ), >>> this approach didn't really convince me and I think the paper is >>> lacking a lot of details, but it might still be an interesting read. >>> >>> --sebastian >>> >>> On 01.02.2011 00:26, Gökhan Çapan wrote: >>> >>> Hi, >>> >>> I've made a search, sorry in case this is a double post. >>> Also, this question may not be directly related to Mahout. >>> >>> Within a domain which is enitrely user generated and has a >>> very big item >>> churn (lots of new items coming, while some others leaving the >>> system), what >>> do you recommend to produce accurate recommendations using >>> Mahout (Not just >>> Taste)? >>> >>> I mean, as a concrete example, in the eBay domain, not Amazon's. >>> >>> Currently I am creating item clusters using LSH with MinHash >>> (I am not sure >>> if it is in Mahout, I can contribute if it is not), and produce >>> recommendations using these item clusters (profiles). When a >>> new item >>> arrives, I find its nearest profile, and recommend the item >>> where its >>> belonging profile is recommended to. Do you find this approach >>> good enough? >>> >>> If you have a theoretical idea, could you please point me to >>> some related >>> papers? >>> >>> (As an MSc student, I can implement this as a Google Summer of >>> Code project, >>> with your mentoring.) >>> >>> Thanks in advance >>> >>> >>> >>> >> >
