-------- Original Message --------
Subject: Re: Mahout recommendation in implicit feedback situation
Date: Mon, 05 May 2014 18:25:00 +0200
From: Alessandro Suglia <[email protected]>
To: Ted Dunning <[email protected]>
The standard movielens-100k has five split, each divided in training and
test set.
In my case, for each split I train the recommender on the training set
and I've tried to test it on the test set.
The test phase is conducted using an external tool, but in order to run
it correctly I need to predict for each user a specific number of items
(it is a top-n recommendation task) that are present in the test set and
then rank them according to the predicted value for each one.
Now my problem is that there aren't any way in Mahout (apparently from
my experience) to get a value between 0 and 1 for a specific item.
Is it true?
On 05/05/14 01:36, Ted Dunning wrote:
I would second all of what Pat said.
I would add that off-line evaluation of recommenders is pretty tricky
because, in practice, recommenders generate their own training data.
This means that off-line evaluations or even performance on the first
day is not the entire story.
On Sun, May 4, 2014 at 6:16 PM, Pat Ferrel <[email protected]
<mailto:[email protected]>> wrote:
First, are you doing an offline precision test? With training set
and probe or test set?
You can remove some data from the dataset. So remove certain
preferences. Then train and obtain recommendations for the user’s
who had some data withheld. The test data has not been used to
train and get recs so you then compare what users’ actually
preferred to the prediction made by the recommender. If all of
them match you have 100% precision. Note that you are comparing
recommendations to actual but held-out preferences
If you are using some special tools you may be doing this to
compare algorithms, which is not an exact thing at all, no matter
what the Netflix prize may have led us to believe. If you are
using offline tests to tune a specific recommender you may have
better luck with the results.
In one installation we had real data and split it into test and
training by date. 90% of older data was used to train, the most
recent 10% was used to test. This would mimic the way data comes
in. We compared the recommendations from the training data against
the actual preferences in the help-out data and used
MAP@some-number-of-recs as the score. This allows you to measure
ranking, where RMSE does not. The Map score led us to several
useful conclusions about tuning that were data dependent.
http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision
On May 4, 2014, at 12:17 AM, Alessandro Suglia
<[email protected] <mailto:[email protected]>>
wrote:
Unfortunately it is not what I need because I'm using a
supplementary tool in order to compute the metrics, so I simply
need to produce a list of recommendation according to some
estimated preference that I have to compute for a specific user
and for specific items (the items in the test set).
How does it possible that Mahout doesn't grant this possibility?
Am I doing something wrong?
On 05/04/14 01:20, Pat Ferrel wrote:
> Are you doing this as an offline performance test? There is a
test framework for the in-memory recommenders (non-hadoop) that
will hold out random preferences and then use the held out ones to
preform various quality metrics. Is this what you need?
>
> See this wiki page under Evaluation
https://mahout.apache.org/users/recommender/userbased-5-minutes.html
> On May 3, 2014, at 3:46 PM, Alessandro Suglia
<[email protected] <mailto:[email protected]>>
wrote:
>
> This is the procedure that I've adopted in the first moment
(incorrectly).
> But what I need to do is to estimate the preference for items
that aren't in the training set. In particular I'm working with
the movielens 10k's so for each split I should train my
recommender on the training set and test them (using some
classification metrics) on the test set.
> I'm not using the default mahout's evalutator so I need to
predict the preference and after that put all the results in a
specific file.
> Can you make an example in which I can appropriately follow this
way?
>
> Thank you in advance.
> Alessandro Suglia
>
> Il 03/mag/2014 23:06 Pat Ferrel <[email protected]
<mailto:[email protected]>> ha scritto:
>> Actually the regular cooccurrence recommender should work too.
Your example on Stackoverflow is calling the wrong method to get
recs, call .recommend(uersId) to get an ordered list of ids with
strengths.
>>
>> It looks to me like you are getting preference data from the
user, which in this case is 1 or 0—not recommendations.
>>
>> On May 3, 2014, at 7:42 AM, Sebastian Schelter <[email protected]
<mailto:[email protected]>> wrote:
>>
>> You should try the
>>
>>
org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender
>>
>> which has been built to handle such data.
>>
>> Best,
>> Sebastian
>>
>>
>> On 05/03/2014 04:34 PM, Alessandro Suglia wrote:
>>> I have described it in the SO's post:
>>> "When I execute this code, the result is a list of 0.0 or 1.0
which are
>>> not useful in the context of top-n recommendation in implicit
feedback
>>> context. Simply because I have to obtain, for each item, an
estimated
>>> rate which stays in the range [0, 1] in order to rank the list in
>>> decreasing order and construct the top-n recommendation
appropriately."
>>> On 05/03/14 16:25, Sebastian Schelter wrote:
>>>> Hi Allessandro,
>>>>
>>>> what result do you expect and what do you get? Can you give a
concrete
>>>> example?
>>>>
>>>> --sebastian
>>>>
>>>> On 05/03/2014 12:11 PM, Alessandro Suglia wrote:
>>>>> Good morning,
>>>>> I've tried to create a recommender system using Mahout in an
implicit
>>>>> feedback situation. What I'm trying to do is explained
exactlly in this
>>>>> post on stack overflow:
>>>>>
http://stackoverflow.com/questions/23077735/mahout-recommendation-in-implicit-feedback-situation.
>>>>>
>>>>>
<http://stackoverflow.com/questions/23077735/mahout-recommendation-in-implicit-feedback-situation>
>>>>>
>>>>>
>>>>> As you can see, I'm having some problem with it simply
because I cannot
>>>>> get the result that I expect (a value between 0 and 1) when
I try to
>>>>> predict a score for a specific item.
>>>>>
>>>>> Someone here can help me, please?
>>>>>
>>>>> Thank you in advance.
>>>>>
>>>>> Alessandro Suglia
>>>>>
>>