You previously said that the combo of w2v + LDA can be combined with the existing UR but would be a separate template add-on to create enriching events for the UR.
Can you give some guidance about how it should be implemented? 2017-06-04 23:14 GMT+04:00 Marius Rabenarivo <[email protected]>: > Thank you very much for all these clarifications? > > Yes, I have items with no conversions. > I did read in the literature that content-based recs are less sensible to > cold-start problem > so I headed to it. > > You suggested to use Word2Vec in previous post for item with few content > attached to it. > > I already computed Word2Vec for my items using simple sum and want to use > them to > do some smoothing in the sparse user-item matrix. > > I was thinking that a kind of tensor operation may be used with CF with > the Word2Vec vectors > atached to items. > > 2017-06-04 23:05 GMT+04:00 Pat Ferrel <[email protected]>: > >> TT’ does not solve cold start because you need user history for >> personalizations. There are several other techniques that I’ve mentioned >> many times on the list that help with cold start but TT’ is for a slightly >> different thing. It’s use is when you have a user’s history of item >> preferences but the items are too old to recommend and you only want to >> recommend new ones with no history. If you think about news, it is close to >> being like this. Or patent application, law opinions or judgments too. To >> be helpful there needs to be a lot of content for each item and you only >> want new things recommended. >> >> What cold-start do you need to “solve” new anonymous users with no >> history or items with no conversions? Search the PIO list and AML group for >> past posts on this. >> >> Tag use is implemented as both CF and content similarity (not TT’). If >> you ask for item-based recommendation and the item has no conversions, you >> will get popular items by default. If you boost items with the same tags as >> the item the user is looking at, you get popular items mostly with similar >> tags. If you disable the popularity part you get items with similar tags, >> This requires that you attach tags to the items with $set and your query >> should contain the tags (or any other properties) of the example item. >> There are many ways of mixing this. You could also just get recs and mix-in >> new inventory by some small random amount. You can use different placements >> for these so you aren’t ruining recs with too much randomized cold-items. >> >> Anyway, the best way to do this depends on your GUI and data. >> >> >> On Jun 4, 2017, at 11:35 AM, Marius Rabenarivo < >> [email protected]> wrote: >> >> I didn't mean to tell you what it means, but I just wanted to make it >> clear for my part. >> >> As I understand, the T part is a personalization that we should make if >> we want >> to use content based information when doing recommendation. >> >> For my use case, I want to use it for to overcome the cold start problem. >> >> I was thinking that it was already implemented as you documented it in >> the slides >> but I didn't find tag use in the code. >> >> Is it SimilarityAnalysis.rowSimilarity() in Mahout that implement TT'? >> (just to confirm) >> >> 2017-06-04 22:06 GMT+04:00 Pat Ferrel <[email protected]>: >> >>> No offense Marius but I wrote the slides and the equation so I do indeed >>> know what they are saying. Whether a user writes a tag or you are detecting >>> the user preference for a tag you wrote, they are user indicators of >>> preference. The LLR filtering of these secondary indicators is what CCO is >>> all about and leaves you with a model that can be compared to a user’s >>> history and contains only indicators that correlate to some conversion >>> behavior. >>> >>> T in the "whole enchilada" it used to personalize content based >>> recommendations. Each row of T represent an item and it’s content as >>> tokens. Tokens are stemmed, tokenized text terms, of can be entities in the >>> item’s text (using some form of NLP) or tags, etc. TT’ then gives you >>> items and items that are most similar in terms of whatever content you were >>> using in T. Now you take the users’s history of content item preference, >>> which articles did they read for instance, and the most similar items in >>> TT’. These will be personalized content-based recommendations. >>> >>> This is not implemented in the UR but is in the CCO tools in Mahout. The >>> reason it is not implemented is that it still requires users history and >>> content-based recs are worse predictors than collaborative filtering with >>> user history. In CF you treat the terms or tags as indicators of preference >>> you do not find items similar by content. >>> >>> The personalized content-based recs may serve for edge conditions where >>> you are recommending items with no usage behavior as the most common case, >>> like news articles where you have no items all the time with no usage >>> events. In this case extracting something better than “bag-of-words” for >>> content is quite important. So highly detailed user tagging or NLP >>> techniques can greatly increase the quality of results. >>> >>> >>> >>> >>> On Jun 4, 2017, at 4:09 AM, Marius Rabenarivo < >>> [email protected]> wrote: >>> >>> IMHO, T represents tag it an Anonymous tag (or property) labeling task >>> and what you propose is Personalized tag (or property) labeling >>> as described in https://arxiv.org/pdf/1203.4487.pdf (Section 1.4.5 >>> Emerging new classification) p. 40 >>> >>> 2017-06-04 8:14 GMT+04:00 Marius Rabenarivo <[email protected]> >>> : >>> >>>> And what the T in the slides is for? >>>> >>>> How can we implement it if it's is not implemented yet? >>>> >>>> 2017-06-04 8:11 GMT+04:00 Pat Ferrel <[email protected]>: >>>> >>>>> Buy purchasing an item with a tag that you have given it, they are >>>>> displaying a preference for that tag. >>>>> >>>>> >>>>> On Jun 3, 2017, at 12:36 PM, Marius Rabenarivo < >>>>> [email protected]> wrote: >>>>> >>>>> So the tag here is assumed to be a tag given by the user to an item? >>>>> >>>>> I was thinking that it was some kind of tag we give to the item by >>>>> some mean (classification, LDA, etc) >>>>> >>>>> 2017-06-03 21:14 GMT+04:00 Pat Ferrel <[email protected]>: >>>>> >>>>>> A = history of all purchases (in the e-com case) >>>>>> B = history of all tag preferences >>>>>> >>>>>> r = [A’A]h_a + [A’B]h_b >>>>>> >>>>>> The part in the slides about content-based recs is not needed here >>>>>> because you have captured them as user preferences. >>>>>> >>>>>> >>>>>> On Jun 2, 2017, at 7:22 PM, Marius Rabenarivo < >>>>>> [email protected]> wrote: >>>>>> >>>>>> Please correct side to size in my previous e-mail >>>>>> >>>>>> 2017-06-03 6:14 GMT+04:00 Marius Rabenarivo <mariusrabenarivo@g >>>>>> mail.com>: >>>>>> >>>>>>> What will be the size of the matrix if we send an event like tag-pref >>>>>>> >>>>>>> We will get a |U|x|T| matrix I think (where T is the set of all >>>>>>> tags). >>>>>>> >>>>>>> So [AtA] will be a |T| x |T| matrix and we will do a dot product >>>>>>> with the user history hT to get recommendation right? >>>>>>> >>>>>>> I was assuming that A should be of side |U| x |I| where I is the set >>>>>>> of all items as it should be added to other terms of the whole enchilada >>>>>>> formula afterwards. >>>>>>> >>>>>>> Thank you for your guidance Pat. >>>>>>> >>>>>>> 2017-06-02 21:35 GMT+04:00 Pat Ferrel <[email protected]>: >>>>>>> >>>>>>>> Please refer to the documents. The “event” is the name of the type >>>>>>>> of event or indicator if preference, it implies the type of >>>>>>>> the targetEntityId. So a “tag-pref’ event would be accompanied by >>>>>>>> a targetEntityId = tag-id. This is separate from attaching “tag” >>>>>>>> properties >>>>>>>> to items with the $set event for use with filter and boost rules. One >>>>>>>> looks >>>>>>>> at the data as a possible preference indicator and the other is used to >>>>>>>> restrict results. This is why we usually name events so they sound >>>>>>>> like a >>>>>>>> user preference of some type, whereas item property values are simply >>>>>>>> item >>>>>>>> attributes, intrinsic to the items and independent of an individual >>>>>>>> user. >>>>>>>> >>>>>>>> The event can have any name that makes sense to you. >>>>>>>> >>>>>>>> >>>>>>>> On Jun 2, 2017, at 9:19 AM, Marius Rabenarivo < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>> so, the event field should be the token and targetEntityId the item >>>>>>>> ID, right? >>>>>>>> >>>>>>>> 2017-06-02 20:07 GMT+04:00 Pat Ferrel <[email protected]>: >>>>>>>> >>>>>>>>> Yes, each is analyzed separately as a separate event. If you are >>>>>>>>> using REST you can send up to 50 events in a single array. Some SDKs >>>>>>>>> may >>>>>>>>> support this too. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Jun 2, 2017, at 8:56 AM, Marius Rabenarivo < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>> So I have to send an event like category-preference for each tag >>>>>>>>> associated to an item right? >>>>>>>>> >>>>>>>>> entityId: userd-id >>>>>>>>> event: category-preference >>>>>>>>> targetEntityId : tag/token >>>>>>>>> >>>>>>>>> 2017-06-02 19:47 GMT+04:00 Pat Ferrel <[email protected]>: >>>>>>>>> >>>>>>>>>> When a user expresses a preference for a tag, word or term as in >>>>>>>>>> search or even in content like descriptions, these can be considered >>>>>>>>>> secondary events. The most useful are tags and search terms in our >>>>>>>>>> experience. Content can be used but each term/token needs to be sent >>>>>>>>>> as a >>>>>>>>>> separate preference while search phrases can be used though again >>>>>>>>>> turning >>>>>>>>>> them into tokens may be better. >>>>>>>>>> >>>>>>>>>> Please looks through the docs here: http://actionml.com/docs/ur or >>>>>>>>>> the siide deck here: https://www.slideshare.n >>>>>>>>>> et/pferrel/unified-recommender-39986309 >>>>>>>>>> >>>>>>>>>> The major innovation of CCO, the algorithm behind the UR, is the >>>>>>>>>> use of these cross-domain indicators. They are not guaranteed to >>>>>>>>>> predict >>>>>>>>>> conversions but the CCO algo tests them and weights them low if they >>>>>>>>>> do not >>>>>>>>>> so we tend to test for strength of prediction of the entire category >>>>>>>>>> of >>>>>>>>>> indictor and drop them if weak or set a minLLR threshold and filter >>>>>>>>>> weak >>>>>>>>>> individual indicators out. >>>>>>>>>> >>>>>>>>>> Technically these are not called latent, that has another meaning >>>>>>>>>> in Machine Learning having to do with Latent Factor Analysis. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Jun 1, 2017, at 11:26 PM, Marius Rabenarivo < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Hello everyone! >>>>>>>>>> >>>>>>>>>> Do you have an idea on how to use latent informations associated >>>>>>>>>> to items like tag, word vector embedding in Mahout's >>>>>>>>>> SimilarityAnalysis.cooccurrences? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Marius >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "actionml-user" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to [email protected]. >>>>>>>>>> To post to this group, send email to actionml-user@googlegroups. >>>>>>>>>> com. >>>>>>>>>> To view this discussion on the web visit https://groups.google.co >>>>>>>>>> m/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA >>>>>>>>>> 0rtD-xg0u-tNA_g%40mail.gmail.com >>>>>>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA0rtD-xg0u-tNA_g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "actionml-user" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To post to this group, send email to [email protected] >>>>>>>> . >>>>>>>> To view this discussion on the web visit https://groups.google.co >>>>>>>> m/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bac >>>>>>>> s5KMzcqS0kDdc0A%40mail.gmail.com >>>>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bacs5KMzcqS0kDdc0A%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "actionml-user" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> To view this discussion on the web visit https://groups.google.co >>>>>> m/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3 >>>>>> EdULpqjHK3LtEfdcQ%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3EdULpqjHK3LtEfdcQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "actionml-user" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> To view this discussion on the web visit https://groups.google.co >>>>> m/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoS >>>>> PnD%2Bv_-4ZCpR0AQ%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoSPnD%2Bv_-4ZCpR0AQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>> >>> >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "actionml-user" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> To view this discussion on the web visit https://groups.google.co >> m/d/msgid/actionml-user/CAC-ATVFoJQpX8XWJ25cQo7CEF8YR%3DRzWx >> VHTFFZWv_fjGgC6LA%40mail.gmail.com >> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVFoJQpX8XWJ25cQo7CEF8YR%3DRzWxVHTFFZWv_fjGgC6LA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >> >
