Thanks very much, Nick and Sabarish.
That helps me a lot.

Regards,
*Hiro*

On Thu, Feb 25, 2016 at 8:52 PM, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> Yes, ALS requires the aggregated version (A). You can use decimal or whole
> numbers for the rating, depending on your application, as for implicit data
> they are not "ratings" but rather "weights".
>
> A common approach is to apply different weightings to different user
> events (such as 1.0 for a page view, 5.0 for a purchase, 2.0 for a like,
> etc). That allows all user event data to be aggregated together in a fairly
> principled manner. The weights however need to be specified upfront in
> order to do that aggregation (they could be selected via cross-validation,
> domain knowledge or the relative frequency of each event within a dataset,
> for example).
>
>
> On Thu, 25 Feb 2016 at 13:26 Sabarish Sasidharan <sabarish....@gmail.com>
> wrote:
>
>> I believe the ALS algo expects the ratings to be aggregated (A). I don't
>> see why you have to use decimals for rating.
>>
>> Regards
>> Sab
>>
>> On Thu, Feb 25, 2016 at 4:50 PM, Hiroyuki Yamada <mogwa...@gmail.com>
>> wrote:
>>
>>> Hello.
>>>
>>> I just started working on CF in MLlib.
>>> I am using trainImplicit because I only have implicit ratings like page
>>> views.
>>>
>>> I am wondering which is a more appropriate form of ratings.
>>> Let's assume that view count is regarded as a rating and
>>> user 1 sees page 1 3 times and sees page 2 twice and so on.
>>>
>>> In this case, I think ratings can be formatted like the following 2
>>> cases. (of course it is a RDD actually)
>>>
>>> A:
>>> user_id,page_id,rating(page view)
>>> 1,1,0.3
>>> 1,2,0.2
>>> ...
>>>
>>> B:
>>> user_id,page_id,rating(page view)
>>> 1,1,0.1
>>> 1,1,0.1
>>> 1,1,0.1
>>> 1,2,0.1
>>> 1,2,0.1
>>> ...
>>>
>>> It is allowed to have like B ?
>>> If it is, which is better ? ( is there any difference between them ?)
>>>
>>> Best,
>>> Hiro
>>>
>>>
>>>
>>>
>>

Reply via email to