Thanks Ted, Sean. I was hesitant with the validation process, initially, because there are many hyper-parameters (to tune) and the datasets are big.
Will surely explore your suggestions for parameter selection and tuning. Thanks, Rohit On Wed, Sep 11, 2013 at 2:19 AM, Ted Dunning <[email protected]> wrote: > On Wed, Sep 11, 2013 at 12:07 AM, Sean Owen <[email protected]> wrote: > > > > 2. Do we have to tune the "similarityclass" parameter in item-based CF? > > If > > > so, do we compare the mean average precision values based on validation > > > data, and then report the same for the test set? > > > > > > > > Yes you are conceptually looking over the entire hyper-parameter space. > If > > the similarity metric is one of those, you are trying different metrics. > > Grid search, just brute-force trying combinations, works for 1-2 > > hyper-parameters. Otherwise I'd try randomly choosing parameters, really, > > or else it will take way too long to explore. You try to pick > > hyper-parameters 'nearer' to those that have yielded better scores. > > > > Or use a real exploration algorithm. For my favorite (hear that horn > blowing?) see this article on recorded step > meta-mutation.<http://arxiv.org/abs/0803.3838> > The idea is a randomized search, but with something akin to momentum. This > lets you search nasty landscapes with pretty pretty good robustness and > smooth ones with fast convergence. The code and theory are simple and > there is an implementation in Mahout. >
