Re: Use of latent informations associated to items with Mahout's SimilarityAnalysis.cooccurrences

Pat Ferrel Sun, 04 Jun 2017 11:07:22 -0700

No offense Marius but I wrote the slides and the equation so I do indeed know 
what they are saying. Whether a user writes a tag or you are detecting the user 
preference for a tag you wrote, they are user indicators of preference. The LLR 
filtering of these secondary indicators is what CCO is all about and leaves you 
with a model that can be compared to a user’s history and contains only 
indicators that correlate to some conversion behavior.


T in the "whole enchilada" it used to personalize content based 
recommendations. Each row of T represent an item and it’s content as tokens. 
Tokens are stemmed, tokenized text terms, of can be entities in the item’s text 
(using some form of NLP) or tags, etc.  TT’ then gives you items and items that 
are most similar in terms of whatever content you were using in T. Now you take 
the users’s history of content item preference, which articles did they read 
for instance, and the most similar items in TT’. These will be personalized 
content-based recommendations.

This is not implemented in the UR but is in the CCO tools in Mahout. The reason 
it is not implemented is that it still requires users history and content-based 
recs are worse predictors than collaborative filtering with user history. In CF 
you treat the terms or tags as indicators of preference you do not find items 
similar by content. 

The personalized content-based recs may serve for edge conditions where you are 
recommending items with no usage behavior as the most common case, like news 
articles where you have no items all the time with no usage events. In this 
case extracting something better than “bag-of-words” for content is quite 
important. So highly detailed user tagging or NLP techniques can greatly 
increase the quality of results.



On Jun 4, 2017, at 4:09 AM, Marius Rabenarivo <[email protected]> 
wrote:

IMHO, T represents tag it an Anonymous tag (or property) labeling task
and what you propose is Personalized tag (or property) labeling
as described in https://arxiv.org/pdf/1203.4487.pdf 
<https://arxiv.org/pdf/1203.4487.pdf> (Section 1.4.5 Emerging new 
classification) p. 40

2017-06-04 8:14 GMT+04:00 Marius Rabenarivo <[email protected] 
<mailto:[email protected]>>:
And what the T in the slides is for?

How can we implement it if it's is not implemented yet?

2017-06-04 8:11 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
Buy purchasing an item with a tag that you have given it, they are displaying a 
preference for that tag.


On Jun 3, 2017, at 12:36 PM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

So the tag here is assumed to be a tag given by the user to an item?

I was thinking that it was some kind of tag we give to the item by some mean 
(classification, LDA, etc)

2017-06-03 21:14 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
A = history of all purchases (in the e-com case)
B = history of all tag preferences

r = [A’A]h_a + [A’B]h_b

The part in the slides about content-based recs is not needed here because you 
have captured them as user preferences.


On Jun 2, 2017, at 7:22 PM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

Please correct side to size in my previous e-mail

2017-06-03 6:14 GMT+04:00 Marius Rabenarivo <[email protected] 
<mailto:[email protected]>>:
What will be the size of the matrix if we send an event like tag-pref 
We will get a |U|x|T| matrix I think (where T is the set of all tags).

So [AtA] will be a |T| x |T| matrix and we will do a dot product with the user 
history hT to get recommendation right?

I was assuming that A should be of side |U| x |I| where I is the set of all 
items as it should be added to other terms of the whole enchilada formula 
afterwards.

Thank you for your guidance Pat.

2017-06-02 21:35 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
Please refer to the documents. The “event” is the name of the type of event or 
indicator if preference, it implies the type of the targetEntityId. So a 
“tag-pref’ event would be accompanied by a targetEntityId = tag-id. This is 
separate from attaching “tag” properties to items with the $set event for use 
with filter and boost rules. One looks at the data as a possible preference 
indicator and the other is used to restrict results. This is why we usually 
name events so they sound like a user preference of some type, whereas item 
property values are simply item attributes, intrinsic to the items and 
independent of an individual user.

The event can have any name that makes sense to you.


On Jun 2, 2017, at 9:19 AM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

so, the event field should be the token and targetEntityId the item ID, right?

2017-06-02 20:07 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
Yes, each is analyzed separately as a separate event. If you are using REST you 
can send up to 50 events in a single array. Some SDKs may support this too.


On Jun 2, 2017, at 8:56 AM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

So I have to send an event like category-preference for each tag associated to 
an item right?

entityId: userd-id
event: category-preference
targetEntityId : tag/token

2017-06-02 19:47 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
When a user expresses a preference for a tag, word or term as in search or even 
in content like descriptions, these can be considered secondary events. The 
most useful are tags and search terms in our experience. Content can be used 
but each term/token needs to be sent as a separate preference while search 
phrases can be used though again turning them into tokens may be better.

Please looks through the docs here: http://actionml.com/docs/ur 
<http://actionml.com/docs/ur> or the siide deck here: 
https://www.slideshare.net/pferrel/unified-recommender-39986309 
<https://www.slideshare.net/pferrel/unified-recommender-39986309>

The major innovation of CCO, the algorithm behind the UR, is the use of these 
cross-domain indicators. They are not guaranteed to predict conversions but the 
CCO algo tests them and weights them low if they do not so we tend to test for 
strength of prediction of the entire category of indictor and drop them if weak 
or set a minLLR threshold and filter weak individual indicators out.

Technically these are not called latent, that has another meaning in Machine 
Learning having to do with Latent Factor Analysis.


On Jun 1, 2017, at 11:26 PM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

Hello everyone!

Do you have an idea on how to use latent informations associated to items like 
tag, word vector embedding in Mahout's SimilarityAnalysis.cooccurrences?

Regards,

Marius

-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA0rtD-xg0u-tNA_g%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA0rtD-xg0u-tNA_g%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.





-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bacs5KMzcqS0kDdc0A%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bacs5KMzcqS0kDdc0A%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.




-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3EdULpqjHK3LtEfdcQ%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3EdULpqjHK3LtEfdcQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.



-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoSPnD%2Bv_-4ZCpR0AQ%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoSPnD%2Bv_-4ZCpR0AQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.

Re: Use of latent informations associated to items with Mahout's SimilarityAnalysis.cooccurrences

Reply via email to