So you're proposing that we separate the actions of estimating
preferences for unknown items, and recommending items to users to click
: the latter could include some items for which a preference has been
expressed. It's a good idea to think that way, thanks for the tip. I
would argue, though, that .recommend() is aimed at the latter task: it
predicts preferences, and sorts them, and returns the top N items. It
is a final step in a process that includes unknown preference estimation
as an intermediate step. This is built into Mahout as I see it, by
separating .recommend() and .estimatePreference(). That's why I still
think the most elegant solution is simply adding known preference values
to the predicted ones to the set of possible recommendations. AFAIK
this is most easily accomplished by playing around with
CandidateItemStrategies. How would you go about it without having to
write your own sorting function?
About boolean recommenders: Many of my users made no purchases, only
clicks. So, if I use a generic recommender, it will make random
recommendations because my training data is essentially boolean. Has
anyone else run into this problem? One solution I am about to try is
letting the rating value of a click decay with time since the click was
made. I am not sure if the ratings will be different enough for
GenericRecommender to work, and I am also not sure I am justified in
reducing the item similarity between two items because two users clicked
on them at different times. Has anyone tried a solution based on a
regularized normalization of some sort?
Thanks.
On 01/26/2012 01:28 PM, Sean Owen wrote:
It's correct to think of a recommender as something that can fill in the
blanks. If you transform your input into a numerical scale, it ought to
fill in estimated values for the items you don't have input for. It does
not repeat back to you the input you already have -- these are removed from
results -- on the theory that you already have that information.
It does not mean you can't use both the real and estimated data together,
in the end. You could add back in clicked items, with their known values,
and use that as the basis of something. You should not need to estimate
preference for already-rated items -- you already have that info, right?
So perhaps it is a question of setting the scale correctly? If a click = 1,
and maybe a purchase = 10, then an item estimated at 0.3 is judged to be
less interesting than clicked items. Something else is going wrong with the
data, or rating scale, or even algorithm if these results are consistently
unintuitive.
(Not all recommenders operate by estimating preferences, in particular the
ones that don't use preferences: the ones that deal with 'boolean' data. I
am not sure that is at play here though?)
On Thu, Jan 26, 2012 at 8:28 AM, Anatoliy Kats<[email protected]> wrote:
I have not seen this discussion from the beginning, but I think the
troubles I'm having are similar in nature. We are recommending items the
user can buy on our website. Our preferences are past purchases, and also
past clicks on the item's description. If a purchase was made, certainly
we do not want to recommend the item again, but if it was only a click, we
are even more confident that we should be recommending that item. Yet the
recommenders are hardcoded not to. I managed to get around this by
changing the recommender's CandidateItemSimilarity.
I also need to estimatePreference() of the items the user clicked on, or
at least I think I do. The unclicked items have an estimated preference of
around 0.3, whereas the click is treated as a rating of 1. Intuitively
that seems unfair, I'd essentially only be recommending items the user
clicked on. I have my own recommender class which uses
Generic...Recommender() as a delegate. So, I can override the
estimatePreference() to return something else, but this concerns me for two
reasons. First, this is not estimatePreference()'s intended usage, so I'm
afraid of breaking something. Second, many recommenders have a private
doEstimatePreference() method that I'd love to call for already-rated
items, but since it is a private method of my delegate, I cannot. That
makes me sad.
I hope this helps some of you, and I would appreciate some feedback on
whether what I'm doing is even a good idea, and how to go about it.
Thanks,
Anatoliy
On 01/25/2012 09:36 PM, Sean Owen wrote:
(moving to user@)
I think I understand more about what you are doing. It doesn't quite make
sense to say you will train a recommender on the output of the
recommender,
but I understand that you mean you have some information about what users
have visited what attractions or shows.
This is classic recommendation. You put that in, and it can tell you what
other attractions, shows, etc. the user may like.
So going back to the beginning, I'm not yet clear on why that isn't
already
the answer for you, since you have built this. Explain again what else you
are trying to do to filter or process the result?
On Wed, Jan 25, 2012 at 5:25 PM, Saikat Kanjilal<[email protected]>**
wrote:
Putting back on the list, we want to recommend new items in the park, an
item could be:1) attraction2) restaurant3) show4) Ride5) resort
Our real data if you will is the recommendations that result from
understanding their preferences in more detail based on their
reservations
and resort stays. So I wonder if our real data is our training data that
the recommender can use for training and calculate predicted data based
on
that.
Date: Wed, 25 Jan 2012 17:20:02 +0000
Subject: Re: Add on to itemsimilarity
From: [email protected]
To: [email protected]
(do you mind putting this back on the list? might be a good discussion
for
others)
What are you recommending to the user -- theme parks, rides at a theme
park?
Yes, you would always be recommending 'unknown' things to the user. You
already 'know' how much they like or dislike the things for which you
have
data, so recommendations aren't of use to you.
Of course, you can use both real and predicted data in your system -- it
depends on what you are trying to accomplish. The recommender's role is
creating the predicted data.
On Wed, Jan 25, 2012 at 5:12 PM, Saikat Kanjilal<[email protected]>
wrote:
Actually let me more clear, we are building a recommendations engine for
a
theme parks experience, the user preferences is something we are storing
based on the user's reservations and analytics, this is something that's
stored before the user rates any items and may or may not have a direct
relationship to the recommendations the user makes as they go around the
park. This is due to the fact that the user recommendations could be
other
rides or attractions that exist outside of the actual preferences. Its
not
clear yet to me how to tie these preferences into the item similarity
results.