First I agree with Ted that LLR is better. I've tried all of the similarity 
methods in Mahout on exactly the same dataset and got far higher 
cross-validation scores for LLR. You may still use pearson with Mahout 0.9 and 
1.0 but it is not supported in the Mahout 1.0 Spark jobs. 

If you have data in tables you need to create single interactions. These will 
look like:

user1,vendor1,rating
userN,vendorM,rating
...

If you are recommending vendors (not specific services of specific vendors) you 
need to map your IDs into IDs that the recommender can ingest. You can’t tell 
which of the separate ratings will be used if the same user rated multiple 
services of the same vendor so you should determine which rating you want to 
use as input. 

You need to translate your IDs into Mahout IDs. Let’s say you go through all of 
your vendors, assign the first one a Mahout ID of integer = 0, then the next 
unique vendor you see will get Mahout ID = 1 and so on. You need to do this for 
your Items (vendors) as well. So your input to Mahout will look something like 
this:

Formatted as Mahout User ID, Mahout Item ID, rating your input files will 
contain:

0,0,1
0,2000,3
0,4,5
1,3,1
1000,2000,5
…

Then after you run the Mahout Item-based recommender you will get back a list 
of recommendations for each user. The key will be an integer equal to the 
Mahout user ID. The value will be a list of Mahout Item IDs with strengths. You 
will need to map the Mahout IDs back into your application ids. Since you are 
recommending vendors the vendors are items so map all Mahout Item IDs into your 
vendor ids and the Mahout User IDs into your user ids.

On Sep 30, 2014, at 6:55 PM, vinayakb malagatti <[email protected]> 
wrote:

Thank you  @Ted, but my guide is suggesting to go with what Pat is
suggesting. @Pat could you plz tell, if I want to recommend vendors to the
user from the table how they should be grouped and  you mentioned "*your
recs will be returned using the same integer IDs so you will have to
translate your “user1” and “vendor1-service1” into non-negative contiguous
integers*" i don't know about translation could you plz tell more about the
translation.

Thanks and Regards,
Vinayak B


On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning <[email protected]> wrote:

> Yes.  But I strongly suggest that you not use Pearson Correlation.
> 
> Use the LLR similarity to compute indicator actions for each vendor.  Then
> use a user's history of actions to score vendors.  This is not only much
> simpler than what you are asking for, it will be more accurate.
> 
> You should also measure additional actions besides ratings.
> 
> 
> 
> On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti <
> [email protected]> wrote:
> 
>> @Pat and @Ted Thank You so much for the replay. I was looking for the
>> solution as Pat suggested, here I want to suggest the Vendors to the User
>> which he not yet used by User taking the history of that User and compare
>> with other user who have rated the common vendors. If we take the table
> in
>> that
>> 
>>   -   for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and
> User 2
>>   has rated Vendor 1, Vendor 2 and Vendor 3.
>>   -  Common between User 2 and User 1 are Vendor 1 and Vendor 3.
>>   - Assume that if Pearson Correlation between them is nearly 1, hence
> we
>>   can Recommend the Vendor 2 to the User 1 which User 1 is not used.
>> 
>> Can we do like this, using the Apache Mahout  if Yes could you plz give
>> some brief idea.
>> 
>> Thanks and Regards,
>> Vinayak B
>> 
>> 
>> On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning <[email protected]>
>> wrote:
>> 
>>> I would recommend that you look at actions other than ratings as well.
>>> 
>>> Did a user expand and read 1 review?  did they read >3 reviews?
>>> 
>>> Did they mark a rating as useful?
>>> 
>>> Did they ask for contact information?
>>> 
>>> You know your system better than I possibly could, but using other
>>> information in addition to ratings is very important for getting the
>>> highest quality predictive information.
>>> 
>>> You can start with ratings, but you should push to get other kinds of
>>> information as much as possible.  Ratings are often given by only a
> very
>>> small number of people.  That severely limits how much value you can
> add
>>> with a recommendation engine.  At the same time most people are busy
> not
>>> giving you ratings, they are doing lots of other things that tell you
>> what
>>> they are thinking and reacting to.  If you don't pay attention to that
>>> additional information, you are handicapping yourself severely.
>>> 
>>> 
>>> On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti <
>>> [email protected]> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I have table something looks like in DB :
>>>> 
>>>> 
>>>> ​​​
>>>> rating table
>>>> <
>>>> 
>>> 
>> 
> https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web
>>>>> 
>>>> ​
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Thanks and Regards,
>>>> Vinayak B
>>>> 
>>> 
>> 
> 

Reply via email to