Hello Sebastian, could you do a deeper explanation or refer to any article that handle the subject?
Best regards, Niklas 2014-03-30 20:50 GMT+02:00 Sebastian Schelter <[email protected]>: > Use k-fold cross-validation or hold-out tests for estimating the quality > of different parameter combinations. > > --sebastian > > > On 03/30/2014 11:53 AM, Niklas Ekvall wrote: > >> Hi, >> >> My name is Niklas Ekvall and I have a implementation of the recommender >> algorithm "Large-scale Parallel Collaborative Filtering for the Netflix >> Prize" and now I'm wondering how to choose the number of features and >> lambda. Could any of guys help me to explain a stepwise strategy to choose >> or optimize these two parameters? >> >> Best regards, Niklas >> >> >> 2014-03-27 19:07 GMT+01:00 j.barrett Strausser < >> [email protected]>: >> >> Thanks Ted, >>> >>> Yes for the time problem. We tend to use aggregations of session data. So >>> instead of asking for user recommendations we do things like >>> user+sessions >>> recommendations. >>> >>> Of course, deciding when sessions start and stop isn't trivial. I ideally >>> what I would want to is time-weight views using a kernel or convolution. >>> That's a bit heavy so we typically have a global model, that is is >>> basically all preferences over times. Then these user+session type >>> models. >>> We can then combine these at another level to give recommendations based >>> on >>> what you like throughout time versus what you have been doing recently. >>> >>> >>> >>> -b >>> >>> >>> On Thu, Mar 27, 2014 at 1:59 PM, Ted Dunning <[email protected]> >>> wrote: >>> >>> For the poly-syllable challenged, >>>> >>>> hetereoscedasticity - degree of variation changes. This is common with >>>> counts because you expect the standard deviation of count data to be >>>> proportional to sqrt(n). >>>> >>>> time imhogeneity - changes in behavior over time. One way to handle >>>> this >>>> (roughly) is to first remove variation in personal and item means over >>>> >>> time >>> >>>> (if using ratings) and then to segment user histories into episodes. By >>>> including both short and long episodes you get some repair for changes >>>> in >>>> personal preference. A great example of how this works/breaks is >>>> >>> Christmas >>> >>>> music. On December 26th, you want to *stop* recommending this music so >>>> >>> it >>> >>>> really pays to limit histories at this point. By having an episodic >>>> user >>>> session that starts around November and runs to Christmas, you can get >>>> >>> good >>> >>>> recommendations for seasonal songs and not pollute the rest of the >>>> universe. >>>> >>>> >>>> >>>> On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser < >>>> [email protected]> wrote: >>>> >>>> For my team it has usually been hetereoscedasticity and time >>>>> >>>> inhomogeneity. >>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin >>>>> <[email protected]>wrote: >>>>> >>>>> Interesting topic, >>>>>> Ted, can you give examples of those mathematical assumptions >>>>>> under-pinning ALS which are violated by the real world? >>>>>> >>>>>> On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> How can there be any other practical method? Essentially all of >>>>>>> >>>>>> the >>> >>>> mathematical assumptions under-pinning ALS are violated by the real >>>>>>> >>>>>> world. >>>>>> >>>>>>> Why would any mathematical consideration of the number of features >>>>>>> >>>>>> be >>>> >>>>> much >>>>>> >>>>>>> more than heuristic? >>>>>>> >>>>>>> That said, you can make an information content argument. You can >>>>>>> >>>>>> also >>>> >>>>> make >>>>>> >>>>>>> the argument that if you take too many features, it doesn't much >>>>>>> >>>>>> hurt >>> >>>> so >>>>> >>>>>> you should always take as many as you can compute. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter < >>>>>>> >>>>>> [email protected]> >>> >>>> wrote: >>>>>> >>>>>>> >>>>>>> Hi, >>>>>>>> >>>>>>>> does anyone know of a principled approach of choosing the number >>>>>>>> >>>>>>> of >>> >>>> features for ALS (other than cross-validation?) >>>>>>>> >>>>>>>> --sebastian >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> https://github.com/bearrito >>>>> @deepbearrito >>>>> >>>>> >>>> >>> >>> >>> -- >>> >>> >>> https://github.com/bearrito >>> @deepbearrito >>> >>> >> >
