Re: taking mahout into production

Sebastian Schelter Fri, 20 May 2011 11:25:43 -0700

I published an article in my blog at http://ssc.io recently that deals with
scaling recommender systems, i'm sure it has some ideas you could adapt.


--sebastian
Am 20.05.2011 20:02 schrieb "Ted Dunning" <[email protected]>:
> Sean will be able to address scaling and configuration better than I, but
I
> have built video recommendation systems before and found that
>
> a) ratings are nearly worthless, largely because so few people will rate
> things
>
> b) the best preference data we ever found was whether the user viewed the
> asset longer than 30 seconds. This is a binary preference and it helps to
> have it that way since you can make use of a number of economies.
>
> c) some randomization in recommendations is very important so that you
> preserve some exploratory behavior. I implemented this by adding small
> amounts of noise to recommendation scores to perturb the ranking.
>
> On Fri, May 20, 2011 at 10:31 AM, Varnit Khanna <[email protected]> wrote:
>
>> Hi,
>> I have been considering using mahout for our recommendation engine
>> needs and had couple of questions about using it in production.
>>
>> Use Case:
>> We need to provide recommendation on video assets (similar to hulu) to
>> couple of million users and we have over 100K assets. Since we are
>> experiencing growth both in users and assets I am planning to use
>> mahout on hadoop.
>>
>> Preference Data:
>> Currently we do not have a ratings system built into our video
>> player/page but we do have logs on user impressions on video assets
>> which I will be feeding into RecommenderJob. Until we build a ratings
>> system I am planning on using the following preference data:
>>
>> Impressions | Rating
>> 1 | (empty)
>> 2 | 2
>> 3 | 3
>> 4 | 4
>> >=5 | 5
>>
>> Does this preference data make sense? I will be using the standard
>> RecommenderJob to generate recommendations until I get a better
>> understanding of mahout.
>>
>> Questions:
>> 1) What will be the best approach to deal with cold start on new
>> assets and users?
>> 2) Is it typical to parse the entire dataset in production to generate
>> recommendations for new assets and users or can it be done
>> incrementally?
>> 3) What is a better approach for this use case item or user based CF?
>> Also at some point in the future we would like to generate
>> recommendations on news assets so a single system might be beneficial.
>>
>> Thanks
>> -varnit
>>

Re: taking mahout into production

Reply via email to