You should also be aware that the alpha parameter comes from a formula the authors introduce to measure the "confidence" in the observed values:
confidence = 1 + alpha * observed_value You can also change that formula in the code to something that you see more fit, the paper even suggests alternative variants. Best, Sebastian On 18.03.2013 18:06, Han JU wrote: > Thanks for quick responses. > > Yes it's that dataset. What I'm using is triplets of "user_id song_id > play_times", of ~ 1m users. No audio things, just plein text triples. > > It seems to me that the paper about "implicit feedback" matchs well this > dataset: no explicit ratings, but times of listening to a song. > > Thank you Sean for the alpha value, I think they use big numbers is because > their values in the R matrix is big. > > > 2013/3/18 Sebastian Schelter <[email protected]> > >> JU, >> >> are you refering to this dataset? >> >> http://labrosa.ee.columbia.edu/millionsong/tasteprofile >> >> On 18.03.2013 17:47, Sean Owen wrote: >>> One word of caution, is that there are at least two papers on ALS and >> they >>> define lambda differently. I think you are talking about "Collaborative >>> Filtering for Implicit Feedback Datasets". >>> >>> I've been working with some folks who point out that alpha=40 seems to be >>> too high for most data sets. After running some tests on common data >> sets, >>> alpha=1 looks much better. YMMV. >>> >>> In the end you have to evaluate these two parameters, and the # of >>> features, across a range to determine what's best. >>> >>> Is this data set not a bunch of audio features? I am not sure it works >> for >>> ALS, not naturally at least. >>> >>> >>> On Mon, Mar 18, 2013 at 12:39 PM, Han JU <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I'm wondering has someone tried ParallelALS with implicite feedback job >> on >>>> million song dataset? Some pointers on alpha and lambda? >>>> >>>> In the paper alpha is 40 and lambda is 150, but I don't know what are >> their >>>> r values in the matrix. They said is based on time units that users have >>>> watched the show, so may be it's big. >>>> >>>> Many thanks! >>>> -- >>>> *JU Han* >>>> >>>> UTC - Université de Technologie de Compiègne >>>> * **GI06 - Fouille de Données et Décisionnel* >>>> >>>> +33 0619608888 >>>> >>> >> >> > >
