Dislike Dataset Representation to Mahout

Pat Ferrel Sun, 11 Oct 2015 17:42:56 -0700

Actually there is another way to do this but you need a multi-action 
recommender.


Turns out that dislikes can actually predict likes. These are both actions the 
user takes that give us some idea of their taste. To use both actions we need 
to pick one that is most indicative of a user’s positive preference for 
something, we’ll use “like" for this. Now test for a correlation between 
dislikes of items with likes or other items. 

This is done with Mahout’s SimilarityAnalysis.cooccurrence. It can take 
multiple inputs actions, test them for correlation and output one set of 
“correlators” for each action. Then look for the likes and dislikes that most 
closely match the user’s history with a knn similarity engine (search engine). 
Some Mahout docs here: 
http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html

An entire running system with a REST interface has been created in the 
Universal Recommender here: 
https://github.com/pferrel/scala-parallel-universal-recommendation

It’s an engine with services built on Spark, Mahout, and Elasticsearch that 
takes input events of any number of types and can be queried for user or 
item-based recommendations with item metadata based boosts and filters. 

On Oct 11, 2015, at 4:43 PM, Shady Hanna <shadimamdouh...@gmail.com> wrote:

Now I got it, thank you so much again.

But is it possible to encode it like that in Mahout because as far as I
understand I can only use one field for the ratings...  In this case, I
would have to run Userbased recommender system for example twice, once with
the data represented as (rating 1, dislike 0, and rating 0) and once more
with this representation (rating 1, dislike and rating 0)...

Thank you,
Regards,
Shady




On Sun, Oct 11, 2015 at 10:50 PM, Angus Macnab <angus.mac...@gmail.com>
wrote:

> I just meant that you can encode the single categorical value that you
> have as two separate binary values.  You can accomplish this by asking two
> questions.  First you can ask: did user like the product?
> (liked=1,dislike=0,no rating=0).  Next you can ask did the user rate the
> product? (liked=1,dislike=1,no rating=0).
> 
> If this is your original data:
> 
> Customer ID          Product ID           Rating
> 1                           1                        No Rating
> 2                           1                        Like
> 3                           2                        Dislike
> 4                           1                        Like
> 5                           2                        Like
> 6                           2                        Dislike
> 7                           1                        No Rating
> 8                           1                        No Rating
> 9                           2                        Dislike
> 
> You could encode it like this:
> 
> Customer ID          Product ID           Rating          Liked
> Rated
> 1                           1                        No Rating           0
>                0
> 2                           1                        Like
>   1                 1
> 3                           2                        Dislike
> 0                 0
> 4                           1                        Like
>   1                 1
> 5                           2                        Like
>   1                 1
> 6                           2                        Dislike
> 0                 1
> 7                           1                        No Rating           0
>                0
> 8                           1                        No Rating           0
>                0
> 9                           2                        Dislike
> 0                 1
> 
> And train on this dataset:
> 
> Customer ID          Product ID               Liked          Rated
> 1                           1                               0
>    0
> 2                           1                               1
>    1
> 3                           2                               0
>    0
> 4                           1                               1
>    1
> 5                           2                               1
>    1
> 6                           2                               0
>    1
> 7                           1                               0
>    0
> 8                           1                               0
>    0
> 9                           2                               0
>    1
> 
> There is no need to ask a third question (did the user dislike product?),
> since the answer to this question can be linearly derived from the other
> two fields i.e. linearly dependent.
> 
> Hope this helps to clarify things.
> 
> Thanks,
> 
> Angus
> 
> On Sun, Oct 11, 2015 at 1:24 PM, Shady Hanna <shadimamdouh...@gmail.com>
> wrote:
> 
>> Thank you so much Angus for your help.
>> 
>> I did not quite get it, so if I have the following data:
>> 
>> Customer ID          Product ID           Rating
>> 1                           1                        No Rating (0,1)
>> 2                           1                        Like (1,0)
>> 3                           2                        Dislike (0,0)
>> 
>> If what I understood is correct, how can I represent it to Mahout, and is
>> it going to be a boolean pref data model ?
>> 
>> Thank you so much again,
>> Best Regards,
>> Shady
>> 
>> On Sat, Oct 10, 2015 at 3:18 AM, Angus Macnab <angus.mac...@gmail.com>
>> wrote:
>> 
>>> Rather than try impose ordinality on your data, you can think of "like",
>>> "dislike", "did not rate" as a categorical feature with a cardinality of
>>> three, which can be encoded using two binary features.  All possibilities
>>> are fine, but the most logical is probably: rated=(0,1) and liked=(0,1).
>>> 
>>> So you just need to come up with the routine to encode these features.
>>> Hope this helps!
>>> 
>>> Best,
>>> 
>>> Angus
>>> --------------------------------------
>>> Angus Macnab
>>> 
>>> On Fri, Oct 9, 2015 at 3:54 PM, Shady Hanna <shadimamdouh...@gmail.com>
>>> wrote:
>>> 
>>>> Hi ,
>>>> 
>>>> I have a data which is represented in like,user did not rate it, and
>>>> dislike, and I am not sure how I can represent this data to Mahout User
>>>> Based/Item Based Recommender System, and which user Similarity can be
>>>> used
>>>> for such dataset.
>>>> 
>>>> Would you please advise ?
>>>> 
>>>> Thank you,
>>>> Best Regards,
>>>> Shady
>>>> 
>>> 
>>> 
>> 
>

Re: Like/No Rating/Dislike Dataset Representation to Mahout

Reply via email to