Thanks very much, Nick and Sabarish. That helps me a lot. Regards, *Hiro*
On Thu, Feb 25, 2016 at 8:52 PM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > Yes, ALS requires the aggregated version (A). You can use decimal or whole > numbers for the rating, depending on your application, as for implicit data > they are not "ratings" but rather "weights". > > A common approach is to apply different weightings to different user > events (such as 1.0 for a page view, 5.0 for a purchase, 2.0 for a like, > etc). That allows all user event data to be aggregated together in a fairly > principled manner. The weights however need to be specified upfront in > order to do that aggregation (they could be selected via cross-validation, > domain knowledge or the relative frequency of each event within a dataset, > for example). > > > On Thu, 25 Feb 2016 at 13:26 Sabarish Sasidharan <sabarish....@gmail.com> > wrote: > >> I believe the ALS algo expects the ratings to be aggregated (A). I don't >> see why you have to use decimals for rating. >> >> Regards >> Sab >> >> On Thu, Feb 25, 2016 at 4:50 PM, Hiroyuki Yamada <mogwa...@gmail.com> >> wrote: >> >>> Hello. >>> >>> I just started working on CF in MLlib. >>> I am using trainImplicit because I only have implicit ratings like page >>> views. >>> >>> I am wondering which is a more appropriate form of ratings. >>> Let's assume that view count is regarded as a rating and >>> user 1 sees page 1 3 times and sees page 2 twice and so on. >>> >>> In this case, I think ratings can be formatted like the following 2 >>> cases. (of course it is a RDD actually) >>> >>> A: >>> user_id,page_id,rating(page view) >>> 1,1,0.3 >>> 1,2,0.2 >>> ... >>> >>> B: >>> user_id,page_id,rating(page view) >>> 1,1,0.1 >>> 1,1,0.1 >>> 1,1,0.1 >>> 1,2,0.1 >>> 1,2,0.1 >>> ... >>> >>> It is allowed to have like B ? >>> If it is, which is better ? ( is there any difference between them ?) >>> >>> Best, >>> Hiro >>> >>> >>> >>> >>