Re: which is a more appropriate form of ratings ?

Nick Pentreath Thu, 25 Feb 2016 03:52:55 -0800

Yes, ALS requires the aggregated version (A). You can use decimal or whole
numbers for the rating, depending on your application, as for implicit data
they are not "ratings" but rather "weights".

A common approach is to apply different weightings to different user events
(such as 1.0 for a page view, 5.0 for a purchase, 2.0 for a like, etc).
That allows all user event data to be aggregated together in a fairly
principled manner. The weights however need to be specified upfront in
order to do that aggregation (they could be selected via cross-validation,
domain knowledge or the relative frequency of each event within a dataset,
for example).

On Thu, 25 Feb 2016 at 13:26 Sabarish Sasidharan <sabarish....@gmail.com>
wrote:

> I believe the ALS algo expects the ratings to be aggregated (A). I don't
> see why you have to use decimals for rating.
>
> Regards
> Sab
>
> On Thu, Feb 25, 2016 at 4:50 PM, Hiroyuki Yamada <mogwa...@gmail.com>
> wrote:
>
>> Hello.
>>
>> I just started working on CF in MLlib.
>> I am using trainImplicit because I only have implicit ratings like page
>> views.
>>
>> I am wondering which is a more appropriate form of ratings.
>> Let's assume that view count is regarded as a rating and
>> user 1 sees page 1 3 times and sees page 2 twice and so on.
>>
>> In this case, I think ratings can be formatted like the following 2
>> cases. (of course it is a RDD actually)
>>
>> A:
>> user_id,page_id,rating(page view)
>> 1,1,0.3
>> 1,2,0.2
>> ...
>>
>> B:
>> user_id,page_id,rating(page view)
>> 1,1,0.1
>> 1,1,0.1
>> 1,1,0.1
>> 1,2,0.1
>> 1,2,0.1
>> ...
>>
>> It is allowed to have like B ?
>> If it is, which is better ? ( is there any difference between them ?)
>>
>> Best,
>> Hiro
>>
>>
>>
>>
>

Re: which is a more appropriate form of ratings ?

Reply via email to