Thanks a bunch. That's very helpful. On Friday, December 16, 2016, Sean Owen <so...@cloudera.com> wrote:
> That all looks correct. > > On Thu, Dec 15, 2016 at 11:54 PM Manish Tripathi <tr.man...@gmail.com > <javascript:_e(%7B%7D,'cvml','tr.man...@gmail.com');>> wrote: > >> ok. Thanks. So here is what I understood. >> >> Input data to Als.fit(implicitPrefs=True) is the actual strengths (count >> data). So if I have a matrix of (user,item,views/purchases) I pass that as >> the input and not the binarized one (preference). This signifies the >> strength. >> >> 2) Since we also pass the alpha parameter to this Als.fit() method, Spark >> internally creates the confidence matrix +1+alpha*input_data or some other >> alpha factor. >> >> 3). The output which it gives is basically a factorization of 0/1 matrix >> (binarized matrix from initial input data), hence the output also resembles >> the preference matrix (0/1) suggesting the interaction. So typically it >> should be between 0-1but if it is negative it means very less >> preference/interaction >> >> *Does all the above sound correct?.* >> >> If yes, then one last question- >> >> 1). *For explicit dataset where we don't use implicitPref=True,* the >> predicted ratings would be actual ratings like it can be 2.3,4.5 etc and >> not the interaction measure. That is because in explicit we are not using >> the confidence matrix and preference matrix concept and use the actual >> rating data. So any output from Spark ALS for explicit data would be a >> rating prediction. >> ᐧ >> >> On Thu, Dec 15, 2016 at 3:46 PM, Sean Owen <so...@cloudera.com >> <javascript:_e(%7B%7D,'cvml','so...@cloudera.com');>> wrote: >> >>> No, input are weights or strengths. The output is a factorization of the >>> binarization of that to 0/1, not probs or a factorization of the input. >>> This explains the range of the output. >>> >>> >>> On Thu, Dec 15, 2016, 23:43 Manish Tripathi <tr.man...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','tr.man...@gmail.com');>> wrote: >>> >>>> when you say *implicit ALS *is* factoring the 0/1 matrix. , are you >>>> saying for implicit feedback algorithm we need to pass the input data as >>>> the preference matrix i.e a matrix of 0 and 1?. * >>>> >>>> Then how will they calculate the confidence matrix which is basically >>>> =1+alpha*count matrix. If we don't pass the actual count of values (views >>>> etc) then how does Spark calculates the confidence matrix?. >>>> >>>> I was of the understanding that input data for >>>> als.fit(implicitPref=True) is the actual count matrix of the >>>> views/purchases?. Am I going wrong here if yes, then how is Spark >>>> calculating the confidence matrix if it doesn't have the actual count data. >>>> >>>> The original paper on which Spark algo is based needs the actual count >>>> data to create a confidence matrix and also needs the 0/1 matrix since the >>>> objective functions uses both the confidence matrix and 0/1 matrix to find >>>> the user and item factors. >>>> ᐧ >>>> >>>> On Thu, Dec 15, 2016 at 3:38 PM, Sean Owen <so...@cloudera.com >>>> <javascript:_e(%7B%7D,'cvml','so...@cloudera.com');>> wrote: >>>> >>>>> No, you can't interpret the output as probabilities at all. In >>>>> particular they may be negative. It is not predicting rating but >>>>> interaction. Negative means very strongly not predicted to interact. No, >>>>> implicit ALS *is* factoring the 0/1 matrix. >>>>> >>>>> On Thu, Dec 15, 2016, 23:31 Manish Tripathi <tr.man...@gmail.com >>>>> <javascript:_e(%7B%7D,'cvml','tr.man...@gmail.com');>> wrote: >>>>> >>>>>> Ok. So we can kind of interpret the output as probabilities even >>>>>> though it is not modeling probabilities. This is to be able to use it for >>>>>> binaryclassification evaluator. >>>>>> >>>>>> So the way I understand is and as per the algo, the predicted matrix >>>>>> is basically a dot product of user factor and item factor matrix. >>>>>> >>>>>> but in what circumstances the ratings predicted can be negative. I >>>>>> can understand if the individual user factor vector and item factor >>>>>> vector >>>>>> is having negative factor terms, then it can be negative. But practically >>>>>> does negative make any sense? AS per algorithm the dot product is the >>>>>> predicted rating. So rating shouldnt be negative for it to make any >>>>>> sense. >>>>>> Also rating just between 0-1 is normalised rating? Typically rating we >>>>>> expect to be like any real value 2.3,4.5 etc. >>>>>> >>>>>> Also please note, for implicit feedback ALS, we don't feed 0/1 >>>>>> matrix. We feed the count matrix (discrete count values) and am assuming >>>>>> spark internally converts it into a preference matrix (1/0) and a >>>>>> confidence matrix =1+alpha*count_matrix >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ᐧ >>>>>> >>>>>> On Thu, Dec 15, 2016 at 2:56 PM, Sean Owen <so...@cloudera.com >>>>>> <javascript:_e(%7B%7D,'cvml','so...@cloudera.com');>> wrote: >>>>>> >>>>>>> No, ALS is not modeling probabilities. The outputs are >>>>>>> reconstructions of a 0/1 matrix. Most values will be in [0,1], but, it's >>>>>>> possible to get values outside that range. >>>>>>> >>>>>>> On Thu, Dec 15, 2016 at 10:21 PM Manish Tripathi < >>>>>>> tr.man...@gmail.com >>>>>>> <javascript:_e(%7B%7D,'cvml','tr.man...@gmail.com');>> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> ran the ALS model for implicit feedback thing. Then I used the >>>>>>>> .transform method of the model to predict the ratings for the original >>>>>>>> dataset. My dataset is of the form (user,item,rating) >>>>>>>> >>>>>>>> I see something like below: >>>>>>>> >>>>>>>> predictions.show(5,truncate=False) >>>>>>>> >>>>>>>> >>>>>>>> Why is the last prediction value negative ?. Isn't the transform >>>>>>>> method giving the prediction(probability) of seeing the rating as 1?. >>>>>>>> I had >>>>>>>> counts data for rating (implicit feedback) and for validation dataset I >>>>>>>> binarized the rating (1 if >0 else 0). My training data has rating >>>>>>>> positive >>>>>>>> (it's basically the count of views to a video). >>>>>>>> >>>>>>>> I used following to train: >>>>>>>> >>>>>>>> * als = ALS(rank=x, maxIter=15, regParam=y, >>>>>>>> implicitPrefs=True,alpha=40.0)* >>>>>>>> >>>>>>>> * model=als.fit(self.train)* >>>>>>>> >>>>>>>> What does negative prediction mean here and is it ok to have that? >>>>>>>> ᐧ >>>>>>>> >>>>>>> >>>>>> >>>> >>