Re: Question about spark-itemsimilarity

Pat Ferrel Sun, 15 Jan 2017 13:31:28 -0800

> On Jan 14, 2017, at 2:41 AM, Niklas Ekvall <niklas.ekv...@gmail.com> wrote:
> 
> Thanks again Pat!
> 
> Have some other question to you I hope you could help me with:
> 
> 
> *So in your example below purchase history is the conversion action,
> likes, and downloads aresecondary actions looked at as
> cross-occurrences. *


yes

> 
> We want to analyze data from a app so we have other data types like:
> downloaded, likes,
> recommendations shown, recommendations ignored and I guess these
> action is quite good to
> use as secondary actions. Today we feed the algorithms with episodes
> that the users has
> consumed, before we do that we filter out episodes we don't want to
> recommend. Is it possible to
> do this type of filtering inside spark-itemsimilarity*?*

no, any filtering must be done when preparing your data. Also I’d avoid sending 
recs shown and ignored because this sounds like it might cause overfitting. The 
recommender likes to see events that don’t *only* come from recommendations. 
Most apps have many ways to discover items so this is not a problem but if you 
had an app that only showed recommendations, you would end up with 
self-fulfilling recommendations. This problem is called “overfitting” in the ML 
world. 

> 
> Finally, why and when do i want to use the following control option?
> 
> Algorithm control options:
>  -mppu <value> | --maxPrefs <value>
>        Max number of preferences to consider per user (optional). Default: 500
> 

This tells spark-itemsimilarity to subsample the data to use only a max of 500 
events per user. This is so the training time doesn’t increase forever with 
more data and it has been shown that with ecom data that the point of 
diminishing returns is about 500 events per user. 

> Best regards, Niklas
> 
> 
> 2016-12-15 3:23 GMT+01:00 Pat Ferrel <p...@occamsmachete.com>:
> 
>> Cross-occurrence allows us to ask the question: are 2 events correlated.
>> 
>> To use the Ecom example, purchase is the conversion or primary action, a
>> detail page view might be related but we must test each cross-occurrence to
>> make sure. I know for a fact that with many ecom datasets it is impossible
>> to treat these events as the same thing and get anything but a drop in
>> quality of recommendations (I’ve tested this). People that use the ALS
>> recommender in Spark’s MLlib sometimes tell you to weight the view less
>> than the purchase. But this is nonsense (again I’ve tested this). What is
>> true is that *some* views lead to purchases and others do not. So treating
>> them all with the same weight is pure garbage.
>> 
>> What CCO does is find the views that seem to lead to purchase. It can also
>> find category-preferences that lead to certain purchases, as well as
>> location-preference (triggered by a purchase when logged in from some
>> location).  And so on. Just about anything you know about users or can
>> phrase as a possible indicator of user taste can be used to get lift in
>> quality of recommendation.
>> 
>> So in your example below purchase history is the conversion action, likes,
>> and downloads are secondary actions looked at as cross-occurrences. Note
>> that we don’t need to have the same IDs for all actions. This is why I
>> mention location above.
>> 
>> See this blog post and slide deck for more description of the algo:
>> http://actionml.com/blog/cco <http://actionml.com/blog/cco>
>> 
>> 
>> BTW to illustrate how powerful this idea is, I have a client that sells
>> one item a year on average to a customer. It’s a very big item and has a
>> lifetime of one year. So using ALS you could only train on the purchase and
>> if you were gathering a year of data there would be precious little
>> training data. Also when you have a user with no purchase it is impossible
>> to recommend. ALS fails on all users with no purchase history. However with
>> CCO, all the user journey and any data about the user you can gather along
>> the way can be used to recommend something to purchase. So this client
>> would be able to recommend to only 20% of their returning shoppers with ALS
>> and those recs would be low of quality based on only one event far in the
>> past. CCO using all the clickstream (or important parts of it) can do quite
>> well.
>> 
>> This may seem an edge case but only in degree, every ecom app has data
>> they are throwing away and CCO addresses this.
>> 
>> On Dec 13, 2016, at 7:04 AM, Niklas Ekvall <niklas.ekv...@gmail.com>
>> wrote:
>> 
>> Thanks Pat for that information!
>> 
>> I was meant to handle number of clicks or number of downloads and not
>> rating. But this is not a problem if the Spark doesn't handle values, I
>> have other algorithms who can handle that. How ever, I am quite curios
>> about the occurrences, cooccurrences, and cross-occurrences concept.
>> 
>> Can the following be a way to handle different data types?
>> 
>>  - occurrences - purchase history
>>  - cooccurrences - purchase history/likes
>>  - cross-occurrences - purchase history/clicks or downloads
>> 
>> Best, Niklas
>> 
>> 2016-12-01 18:47 GMT+01:00 Pat Ferrel <p...@occamsmachete.com>:
>> 
>>> No you can’t, the value is ignored. The algorithm looks at occurrences,
>>> cooccurrences, and cross-occurrences of several event types not values
>>> attached to events.
>>> 
>>> If you are trying to use rating info, this has been pretty much discarded
>>> as being not very useful. For instance you may like comedy movies but
>> they
>>> always get lower ratings than drama (raters bias) so using ratings to
>>> recommend items is highly problematic, but if a user watched a movie,
>> that
>>> is a good indicator that they liked it and that is a boolean value. With
>>> cross-occurrence you can also use dislike as an indicator of preference
>> but
>>> this is also boolean—a thumbs down.
>>> 
>>> To see an end-to-end recommender with all the necessary surrounding
>>> infrastructure check the Apache-PredictionIO project and the Universal
>>> Recommender, which uses the code behind spark-itemsimilarity to serve
>>> recommendations. Read about the UR here: http://actionml.com/docs/ur <
>>> http://actionml.com/docs/ur>
>>> 
>>> On Nov 30, 2016, at 6:58 AM, Niklas Ekvall <niklas.ekv...@gmail.com>
>>> wrote:
>>> 
>>> I found that you can, so ignore my question!
>>> 
>>> Best reagrds, Niklas
>>> 
>>> 2016-11-30 15:42 GMT+01:00 Niklas Ekvall <niklas.ekv...@gmail.com>:
>>> 
>>>> Hello!
>>>> 
>>>> I'm using *spark-itemsimilarity *to produce related recommendations and
>>>> the input data has the form *userID, itemID. *Could I also use the from
>>> *userID,
>>>> itemID, value* (value > 0)? Or does *spark-itemsimilarity* only handles
>>>> binary values?
>>>> 
>>>> Best regards, Niklas
>>>> 
>>> 
>>> 
>> 
>> 
>

Re: Question about spark-itemsimilarity

Reply via email to