Wow that page should be reworded or removed. They are trying to talk about
ensemble models, which are a valid thing but they badly misapply it there.
The application to multiple data types is just wrong and I know because I
tried exactly what they are suggesting but with cross-validation tests to
measure how much worse things got.
For instance if you use buy and dislike what kind of result are you going
to get if you have 2 models? One set of results will recommend “buy” the
other will tell you what a user is likely to “dislike”. How do you combine
Ensembles are meant to use multiple *algorithms* and do something like
voting on recommendations. But you have to pay close attention to what the
algorithm uses as input and what it recommends. All members of the ensemble
must recommend the same action to the user.
Whoever contributed this statement: The default algorithm described in DASE
user-to-item view events as training data. However, your application may
have more than one type of events which you want to take into account, such
as buy, rate and like events. One way to incorporate other types of events
to improve the system is to add another algorithm to process these events,
build a separated model and then combine the outputs of multiple algorithms
Is patently wrong. Ensembles must recommend the same action to users and
unless each algorithm in the ensemble is recommending the same thing (all
be it with slightly different internal logic) then you will get gibberish
out. The winner of the Netflix prize did an ensemble with 107 (IIRC)
different algorithms all using exactly the same input data. There is no
principle that says if you feed conflicting data into several ensemble
algorithms that you will get diamonds out.
Furthermore using view events is bad to begin with because the recommender
will recommend what it thinks you want to view. We did this once with a
large dataset from a big E-Com company where we did cross-validation tests
using “buy” alone, “view” alone, and ensembles of “buy” and “view”. We got
far better results using buy alone than using buy with ~100x as many
“views". The intent of the user and how they find things to view is so
different than when they finally come to buy something that adding view
data got significantly worse results. This is because people have different
reasons to view—maybe a flashy image, maybe a promotion, maybe some
placement bias, etc. This type of browsing “noise” pollutes the data which
can no longer be used to recommend “buy”s. We did several experiments
including comparing several algorithms types with “buy” and “view” events.
“view” always lost to “buy” no matter the algo we used (they were all
unimodal). There may be some exception to this result out there but it will
be accidental, not because it is built into the algorithm. When I say this
worsened results I’m not talking about some tiny fraction of a %, I’m
talking about a decrease of 15-20%
You could argue that “buy”, “like”, and rate will produce similar results
but from experience I can truly say that view and dislike will not.
Since the method described on the site is so sensitive to the user intent
recorded in events I would never use something like that without doing
cross-validation tests and then you are talking about a lot of work. There
is no theoretical or algorithmic correlation detection built into the
ensemble method so you may or may not get good results and I can say
unequivocally that the exact thing they describe will give worse results
(or at least it did in our experiments). You cannot ignore the intent
behind the data you use as input unless this type of correlation detection
is built into the algorithm and with the ensemble method described this
issue is completely ignored.
The UR uses the Correlated Cross-Occurrence algorithm for this exact reason
and was invented to solve the problem we found using “buy” and “view” data
together. Let’s take a ridiculous extreme and use “dislikes" to recommend
“likes”? Does that even make sense? Check out an experiment with CCO where
we did this exact thing:
OK, rant over :-) Thanks for bringing up one of the key issues being
addressed by modern recommenders—multimodality. It is being addressed in
scientific ways, unfortunately the page on PIO’s site gets it wrong.
From: KRISH MEHTA <krish14011...@gmail.com> <krish14011...@gmail.com>
Reply: KRISH MEHTA <krish14011...@gmail.com> <krish14011...@gmail.com>
Date: June 13, 2018 at 2:19:17 PM
To: Pat Ferrel <p...@occamsmachete.com> <p...@occamsmachete.com>
Subject: Re: Few Queries Regarding the Recommendation Template
I Understand but if I just want the likes, dislikes and views then I can
combine the algorithms right? Given in the link:
hope this works.
On Jun 13, 2018, at 1:19 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
I would strongly recommend against using ratings. No one uses these as
input to recommenders anymore. Netflix doesn’t even show ratings. The best
input to a recommender is a conversion, buy, watch, listen, etc depending
on the item type. But the recommender you are using only allows one of
these as input. ALS is unimodal. There is no way to combine different
inputs with weighting that is valid with plain matrix factorization. So
ratings (if you choose to ignore my advice) and views cannot be mixed. For
one thing the math requires either implicit or explicit values for input,
but cannot really mix the 2 and for another thing—as I said—it is unimodal.
If there are instructions that say you can mix different data like ratings
and views it is wrong. A unimodal recommender can only find the user’s
intent from one type of signal at a time. If you train on views it will
recommend the user view something and this may be very different than
buying something. I know this because I’ve done experiments on this issue.
The Universal Recommender is the only multimodal recommender that I know of
that works with PIO. Factorization Machines are also multimodal but much
harder to use and there is no PIO template for them anyway.
To use the UR I would suggest using conversions (buy), high ratings = like,
low ratings = dislike, and views (I assume you are talking about detail
page views) as boolean “did view” input. The UR will find correlations
between this multimodal data and make the best recommendations based on
this. You can also set “dislike” to filter out any recommendation where
the user has already expressed the fact that they dislike the item.
From: KRISH MEHTA <krish14011...@gmail.com> <krish14011...@gmail.com>
Reply: email@example.com <firstname.lastname@example.org>
Date: June 13, 2018 at 12:06:16 PM
To: email@example.com <firstname.lastname@example.org>
Subject: Few Queries Regarding the Recommendation Template
I am new to PredictionIO and I have gone through the tutorial provided
regarding the customer buying and rating products. I encountered queries
1. What if I change the rating of the product? Will it update the result in
the database? Like will it use the most recent rating?
2. If I want to recommend a product with implicit as well as explicit
content? Is there a link which helps me to understand the same or anyone
can help me with it? I have gone through the tutorial and it says that for
implicit it adds the number of views to decide whether the viewer likes or
dislikes it. But what if I want to recommend a user with its likes and
dislikes as well as the number of views. For eg, Even if the user has
viewed it 1000’s of times but if it dislikes the product then it should
affect the recommendation. Can anyone suggest me with a simpler way or so I
have to make major changes in my code?
I hope my questions are genuine and not mundane.