As you described here : https://groups.google.com/forum/#!topic/actionml-user/yPKzj1Ej7hM
2017-06-05 0:28 GMT+04:00 Marius Rabenarivo <[email protected]>: > You previously said that the combo of w2v + LDA can be combined with the > existing UR but > would be a separate template add-on to create enriching events for the UR. > > Can you give some guidance about how it should be implemented? > > 2017-06-04 23:14 GMT+04:00 Marius Rabenarivo <[email protected]>: > >> Thank you very much for all these clarifications? >> >> Yes, I have items with no conversions. >> I did read in the literature that content-based recs are less sensible to >> cold-start problem >> so I headed to it. >> >> You suggested to use Word2Vec in previous post for item with few content >> attached to it. >> >> I already computed Word2Vec for my items using simple sum and want to use >> them to >> do some smoothing in the sparse user-item matrix. >> >> I was thinking that a kind of tensor operation may be used with CF with >> the Word2Vec vectors >> atached to items. >> >> 2017-06-04 23:05 GMT+04:00 Pat Ferrel <[email protected]>: >> >>> TT’ does not solve cold start because you need user history for >>> personalizations. There are several other techniques that I’ve mentioned >>> many times on the list that help with cold start but TT’ is for a slightly >>> different thing. It’s use is when you have a user’s history of item >>> preferences but the items are too old to recommend and you only want to >>> recommend new ones with no history. If you think about news, it is close to >>> being like this. Or patent application, law opinions or judgments too. To >>> be helpful there needs to be a lot of content for each item and you only >>> want new things recommended. >>> >>> What cold-start do you need to “solve” new anonymous users with no >>> history or items with no conversions? Search the PIO list and AML group for >>> past posts on this. >>> >>> Tag use is implemented as both CF and content similarity (not TT’). If >>> you ask for item-based recommendation and the item has no conversions, you >>> will get popular items by default. If you boost items with the same tags as >>> the item the user is looking at, you get popular items mostly with similar >>> tags. If you disable the popularity part you get items with similar tags, >>> This requires that you attach tags to the items with $set and your query >>> should contain the tags (or any other properties) of the example item. >>> There are many ways of mixing this. You could also just get recs and mix-in >>> new inventory by some small random amount. You can use different placements >>> for these so you aren’t ruining recs with too much randomized cold-items. >>> >>> Anyway, the best way to do this depends on your GUI and data. >>> >>> >>> On Jun 4, 2017, at 11:35 AM, Marius Rabenarivo < >>> [email protected]> wrote: >>> >>> I didn't mean to tell you what it means, but I just wanted to make it >>> clear for my part. >>> >>> As I understand, the T part is a personalization that we should make if >>> we want >>> to use content based information when doing recommendation. >>> >>> For my use case, I want to use it for to overcome the cold start problem. >>> >>> I was thinking that it was already implemented as you documented it in >>> the slides >>> but I didn't find tag use in the code. >>> >>> Is it SimilarityAnalysis.rowSimilarity() in Mahout that implement TT'? >>> (just to confirm) >>> >>> 2017-06-04 22:06 GMT+04:00 Pat Ferrel <[email protected]>: >>> >>>> No offense Marius but I wrote the slides and the equation so I do >>>> indeed know what they are saying. Whether a user writes a tag or you are >>>> detecting the user preference for a tag you wrote, they are user indicators >>>> of preference. The LLR filtering of these secondary indicators is what CCO >>>> is all about and leaves you with a model that can be compared to a user’s >>>> history and contains only indicators that correlate to some conversion >>>> behavior. >>>> >>>> T in the "whole enchilada" it used to personalize content based >>>> recommendations. Each row of T represent an item and it’s content as >>>> tokens. Tokens are stemmed, tokenized text terms, of can be entities in the >>>> item’s text (using some form of NLP) or tags, etc. TT’ then gives you >>>> items and items that are most similar in terms of whatever content you were >>>> using in T. Now you take the users’s history of content item preference, >>>> which articles did they read for instance, and the most similar items in >>>> TT’. These will be personalized content-based recommendations. >>>> >>>> This is not implemented in the UR but is in the CCO tools in Mahout. >>>> The reason it is not implemented is that it still requires users history >>>> and content-based recs are worse predictors than collaborative filtering >>>> with user history. In CF you treat the terms or tags as indicators of >>>> preference you do not find items similar by content. >>>> >>>> The personalized content-based recs may serve for edge conditions where >>>> you are recommending items with no usage behavior as the most common case, >>>> like news articles where you have no items all the time with no usage >>>> events. In this case extracting something better than “bag-of-words” for >>>> content is quite important. So highly detailed user tagging or NLP >>>> techniques can greatly increase the quality of results. >>>> >>>> >>>> >>>> >>>> On Jun 4, 2017, at 4:09 AM, Marius Rabenarivo < >>>> [email protected]> wrote: >>>> >>>> IMHO, T represents tag it an Anonymous tag (or property) labeling task >>>> and what you propose is Personalized tag (or property) labeling >>>> as described in https://arxiv.org/pdf/1203.4487.pdf (Section 1.4.5 >>>> Emerging new classification) p. 40 >>>> >>>> 2017-06-04 8:14 GMT+04:00 Marius Rabenarivo <[email protected] >>>> >: >>>> >>>>> And what the T in the slides is for? >>>>> >>>>> How can we implement it if it's is not implemented yet? >>>>> >>>>> 2017-06-04 8:11 GMT+04:00 Pat Ferrel <[email protected]>: >>>>> >>>>>> Buy purchasing an item with a tag that you have given it, they are >>>>>> displaying a preference for that tag. >>>>>> >>>>>> >>>>>> On Jun 3, 2017, at 12:36 PM, Marius Rabenarivo < >>>>>> [email protected]> wrote: >>>>>> >>>>>> So the tag here is assumed to be a tag given by the user to an item? >>>>>> >>>>>> I was thinking that it was some kind of tag we give to the item by >>>>>> some mean (classification, LDA, etc) >>>>>> >>>>>> 2017-06-03 21:14 GMT+04:00 Pat Ferrel <[email protected]>: >>>>>> >>>>>>> A = history of all purchases (in the e-com case) >>>>>>> B = history of all tag preferences >>>>>>> >>>>>>> r = [A’A]h_a + [A’B]h_b >>>>>>> >>>>>>> The part in the slides about content-based recs is not needed here >>>>>>> because you have captured them as user preferences. >>>>>>> >>>>>>> >>>>>>> On Jun 2, 2017, at 7:22 PM, Marius Rabenarivo < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>> Please correct side to size in my previous e-mail >>>>>>> >>>>>>> 2017-06-03 6:14 GMT+04:00 Marius Rabenarivo <mariusrabenarivo@g >>>>>>> mail.com>: >>>>>>> >>>>>>>> What will be the size of the matrix if we send an event like >>>>>>>> tag-pref >>>>>>>> We will get a |U|x|T| matrix I think (where T is the set of all >>>>>>>> tags). >>>>>>>> >>>>>>>> So [AtA] will be a |T| x |T| matrix and we will do a dot product >>>>>>>> with the user history hT to get recommendation right? >>>>>>>> >>>>>>>> I was assuming that A should be of side |U| x |I| where I is the >>>>>>>> set of all items as it should be added to other terms of the whole >>>>>>>> enchilada formula afterwards. >>>>>>>> >>>>>>>> Thank you for your guidance Pat. >>>>>>>> >>>>>>>> 2017-06-02 21:35 GMT+04:00 Pat Ferrel <[email protected]>: >>>>>>>> >>>>>>>>> Please refer to the documents. The “event” is the name of the type >>>>>>>>> of event or indicator if preference, it implies the type of >>>>>>>>> the targetEntityId. So a “tag-pref’ event would be accompanied by >>>>>>>>> a targetEntityId = tag-id. This is separate from attaching “tag” >>>>>>>>> properties >>>>>>>>> to items with the $set event for use with filter and boost rules. One >>>>>>>>> looks >>>>>>>>> at the data as a possible preference indicator and the other is used >>>>>>>>> to >>>>>>>>> restrict results. This is why we usually name events so they sound >>>>>>>>> like a >>>>>>>>> user preference of some type, whereas item property values are simply >>>>>>>>> item >>>>>>>>> attributes, intrinsic to the items and independent of an individual >>>>>>>>> user. >>>>>>>>> >>>>>>>>> The event can have any name that makes sense to you. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Jun 2, 2017, at 9:19 AM, Marius Rabenarivo < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>> so, the event field should be the token and targetEntityId the >>>>>>>>> item ID, right? >>>>>>>>> >>>>>>>>> 2017-06-02 20:07 GMT+04:00 Pat Ferrel <[email protected]>: >>>>>>>>> >>>>>>>>>> Yes, each is analyzed separately as a separate event. If you are >>>>>>>>>> using REST you can send up to 50 events in a single array. Some SDKs >>>>>>>>>> may >>>>>>>>>> support this too. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Jun 2, 2017, at 8:56 AM, Marius Rabenarivo < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>> So I have to send an event like category-preference for each tag >>>>>>>>>> associated to an item right? >>>>>>>>>> >>>>>>>>>> entityId: userd-id >>>>>>>>>> event: category-preference >>>>>>>>>> targetEntityId : tag/token >>>>>>>>>> >>>>>>>>>> 2017-06-02 19:47 GMT+04:00 Pat Ferrel <[email protected]>: >>>>>>>>>> >>>>>>>>>>> When a user expresses a preference for a tag, word or term as in >>>>>>>>>>> search or even in content like descriptions, these can be considered >>>>>>>>>>> secondary events. The most useful are tags and search terms in our >>>>>>>>>>> experience. Content can be used but each term/token needs to be >>>>>>>>>>> sent as a >>>>>>>>>>> separate preference while search phrases can be used though again >>>>>>>>>>> turning >>>>>>>>>>> them into tokens may be better. >>>>>>>>>>> >>>>>>>>>>> Please looks through the docs here: http://actionml.com/docs/ur or >>>>>>>>>>> the siide deck here: https://www.slideshare.n >>>>>>>>>>> et/pferrel/unified-recommender-39986309 >>>>>>>>>>> >>>>>>>>>>> The major innovation of CCO, the algorithm behind the UR, is the >>>>>>>>>>> use of these cross-domain indicators. They are not guaranteed to >>>>>>>>>>> predict >>>>>>>>>>> conversions but the CCO algo tests them and weights them low if >>>>>>>>>>> they do not >>>>>>>>>>> so we tend to test for strength of prediction of the entire >>>>>>>>>>> category of >>>>>>>>>>> indictor and drop them if weak or set a minLLR threshold and filter >>>>>>>>>>> weak >>>>>>>>>>> individual indicators out. >>>>>>>>>>> >>>>>>>>>>> Technically these are not called latent, that has another >>>>>>>>>>> meaning in Machine Learning having to do with Latent Factor >>>>>>>>>>> Analysis. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Jun 1, 2017, at 11:26 PM, Marius Rabenarivo < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Hello everyone! >>>>>>>>>>> >>>>>>>>>>> Do you have an idea on how to use latent informations associated >>>>>>>>>>> to items like tag, word vector embedding in Mahout's >>>>>>>>>>> SimilarityAnalysis.cooccurrences? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> Marius >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "actionml-user" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>> To post to this group, send email to actionml-user@googlegroups. >>>>>>>>>>> com. >>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/actionml-user/CAC-AT >>>>>>>>>>> VEO_YON-5E95iPJjBR-FUgEv8TQsOA0rtD-xg0u-tNA_g%40mail.gmail.com >>>>>>>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA0rtD-xg0u-tNA_g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "actionml-user" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to actionml-user@googlegroups. >>>>>>>>> com. >>>>>>>>> To view this discussion on the web visit https://groups.google.co >>>>>>>>> m/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bac >>>>>>>>> s5KMzcqS0kDdc0A%40mail.gmail.com >>>>>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bacs5KMzcqS0kDdc0A%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "actionml-user" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> To view this discussion on the web visit https://groups.google.co >>>>>>> m/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3 >>>>>>> EdULpqjHK3LtEfdcQ%40mail.gmail.com >>>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3EdULpqjHK3LtEfdcQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "actionml-user" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> To view this discussion on the web visit https://groups.google.co >>>>>> m/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoS >>>>>> PnD%2Bv_-4ZCpR0AQ%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoSPnD%2Bv_-4ZCpR0AQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "actionml-user" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> To view this discussion on the web visit https://groups.google.co >>> m/d/msgid/actionml-user/CAC-ATVFoJQpX8XWJ25cQo7CEF8YR%3DRzWx >>> VHTFFZWv_fjGgC6LA%40mail.gmail.com >>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVFoJQpX8XWJ25cQo7CEF8YR%3DRzWxVHTFFZWv_fjGgC6LA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >> >
