Re: Question about spark-itemsimilarity

Pat Ferrel Wed, 14 Dec 2016 18:24:12 -0800

Cross-occurrence allows us to ask the question: are 2 events correlated. 

To use the Ecom example, purchase is the conversion or primary action, a detail 
page view might be related but we must test each cross-occurrence to make sure. 
I know for a fact that with many ecom datasets it is impossible to treat these 
events as the same thing and get anything but a drop in quality of 
recommendations (I’ve tested this). People that use the ALS recommender in 
Spark’s MLlib sometimes tell you to weight the view less than the purchase. But 
this is nonsense (again I’ve tested this). What is true is that *some* views 
lead to purchases and others do not. So treating them all with the same weight 
is pure garbage.

What CCO does is find the views that seem to lead to purchase. It can also find 
category-preferences that lead to certain purchases, as well as 
location-preference (triggered by a purchase when logged in from some 
location).  And so on. Just about anything you know about users or can phrase 
as a possible indicator of user taste can be used to get lift in quality of 
recommendation. 

So in your example below purchase history is the conversion action, likes, and 
downloads are secondary actions looked at as cross-occurrences. Note that we 
don’t need to have the same IDs for all actions. This is why I mention location 
above. 

See this blog post and slide deck for more description of the algo: 
http://actionml.com/blog/cco <http://actionml.com/blog/cco>

BTW to illustrate how powerful this idea is, I have a client that sells one 
item a year on average to a customer. It’s a very big item and has a lifetime 
of one year. So using ALS you could only train on the purchase and if you were 
gathering a year of data there would be precious little training data. Also 
when you have a user with no purchase it is impossible to recommend. ALS fails 
on all users with no purchase history. However with CCO, all the user journey 
and any data about the user you can gather along the way can be used to 
recommend something to purchase. So this client would be able to recommend to 
only 20% of their returning shoppers with ALS and those recs would be low of 
quality based on only one event far in the past. CCO using all the clickstream 
(or important parts of it) can do quite well.

This may seem an edge case but only in degree, every ecom app has data they are 
throwing away and CCO addresses this.

On Dec 13, 2016, at 7:04 AM, Niklas Ekvall <niklas.ekv...@gmail.com> wrote:

Thanks Pat for that information!

I was meant to handle number of clicks or number of downloads and not
rating. But this is not a problem if the Spark doesn't handle values, I
have other algorithms who can handle that. How ever, I am quite curios
about the occurrences, cooccurrences, and cross-occurrences concept.

Can the following be a way to handle different data types?

  - occurrences - purchase history
  - cooccurrences - purchase history/likes
  - cross-occurrences - purchase history/clicks or downloads

Best, Niklas

2016-12-01 18:47 GMT+01:00 Pat Ferrel <p...@occamsmachete.com>:

> No you can’t, the value is ignored. The algorithm looks at occurrences,
> cooccurrences, and cross-occurrences of several event types not values
> attached to events.
> 
> If you are trying to use rating info, this has been pretty much discarded
> as being not very useful. For instance you may like comedy movies but they
> always get lower ratings than drama (raters bias) so using ratings to
> recommend items is highly problematic, but if a user watched a movie, that
> is a good indicator that they liked it and that is a boolean value. With
> cross-occurrence you can also use dislike as an indicator of preference but
> this is also boolean—a thumbs down.
> 
> To see an end-to-end recommender with all the necessary surrounding
> infrastructure check the Apache-PredictionIO project and the Universal
> Recommender, which uses the code behind spark-itemsimilarity to serve
> recommendations. Read about the UR here: http://actionml.com/docs/ur <
> http://actionml.com/docs/ur>
> 
> On Nov 30, 2016, at 6:58 AM, Niklas Ekvall <niklas.ekv...@gmail.com>
> wrote:
> 
> I found that you can, so ignore my question!
> 
> Best reagrds, Niklas
> 
> 2016-11-30 15:42 GMT+01:00 Niklas Ekvall <niklas.ekv...@gmail.com>:
> 
>> Hello!
>> 
>> I'm using *spark-itemsimilarity *to produce related recommendations and
>> the input data has the form *userID, itemID. *Could I also use the from
> *userID,
>> itemID, value* (value > 0)? Or does *spark-itemsimilarity* only handles
>> binary values?
>> 
>> Best regards, Niklas
>> 
> 
>

Re: Question about spark-itemsimilarity

Reply via email to